Postgres FM - Index maintenance

Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of PgMustard, and this is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, what are we going to be talking about today? Yeah, hello Michael. As we decided, let's talk about index maintenance. First of all, bloat removal, but maybe not only, right? Yeah, we've alluded to this in a previous episode around vacuum, I believe. Oh, about bloat specifically, yeah. So yeah, excited to dive into this with you.

Starting point is 00:00:29 So should we start with how this occurs, perhaps? Or quick recap on, is this always a problem? If I have a Postgres database, is it very likely I'm suffering from this at the moment? Or is there a chance that it's fine? Yeah, by the way, you're right. Our episode was called Vacuum, not Bloat. but we are so close to each other right because usually we talk about blood and lack of vacuuming or some inefficient vacuuming and so on this is a great question you asked we were just asked if we tuned our auto vacuum to be quite aggressive everything

Starting point is 00:01:03 looks fine the question is still, should we have index maintenance from time to time? Is it inevitable? And in my opinion, from my practice, the answer is yes, due to many reasons. And autovacuum won't solve everything and some bloat will still be accumulated, even if you have very aggressive vacuuming. And observing other databases, database systems, for example, SQL Server, Microsoft SQL Server, they also have index maintenance as a routine task for DBAs, DBAs, and in my opinion, you should still in very loaded, growing, large systems, you should recreate indexes from time to time. Aggressive auto-v auto vacuuming will only reduce

Starting point is 00:01:45 the frequency of this need that comes like it will come less frequent but still you need to recreate them yeah i think that's a really good point on heavily loaded systems i think that probably the only caveat i would put is if you've got a relatively light load on your postgres database this might be something you don't come across if if right even if you haven't tuned auto vacuum it will be tidying things up as it goes along freeing up index pages especially on later versions of Postgres there's some there's some additional logic to make that even less likely to bloat but yeah there's I think it's even worse than for tables though with indexes right like vacuum's able to free up space in tables much. Well, yeah, refer to a previous episode for more details,

Starting point is 00:02:27 but I think in tables you can free up space and it's much more easily reused. Whereas in a B-tree index, if you get page splits, vacuum can free up those space in those pages again, but it can't unsplit. But it doesn't rebalance B-tree, right? Exactly. It doesn't rebalance that.

Starting point is 00:02:43 And I agree with you, some systems might not need automatic index recreation, but I'm sure everyone needs monitoring and analysis of blood on a regular basis. So this is a must for everyone, in my opinion. And the question is how to analyze blood because it's not a trivial task. All the scripts we have for fast bloat analysis, lightweight analysis, they are all wrong. I mean, they can have some errors. They are not precise. For example, create a table with three columns, small int, timestamp, tz, don't use timestamp without tz, and small int again. Fill it with a few million rows.

Starting point is 00:03:25 Create an index. And use your script to estimate bloat in table and in index. Actually, it's two different scripts, right? But still. At least for table, I'm sure you will see terrible bloat. But we know there is no bloat there yet. We just inserted rows. We didn't delete, didn't update.

Starting point is 00:03:43 So no bloat. You will see something like 30% of bloat there yet we just inserted rows we didn't delete didn't update so no bloat you will see something like 30 percent of load estimated so we need to keep in mind our scripts and and the our skips they have errors sometimes quite significant ones because they don't take into account alignment padding this small experiment i just described it It by purpose has zero bytes gaps between columns inside each page, column values. I'm not sure, by the way, about index. It should also have some bloat. Maybe no, maybe no. Maybe I'm wrong here. It's only about heap only. Maybe multi-column indexes, but I haven't checked. Right. Anyway, estimation scripts are great because they are light, but I always correct everyone saying bloat is this.

Starting point is 00:04:30 I say estimated bloat is this because it's not. The real number can be obtained by using pgstartuple extension. By the way, I had no good luck using it. It had problems as well in the past, so I don't use it myself. In my approach, since we work a lot with clones of production environments, I always say, let's just run vacuum full on the clone, because why not? And compare numbers before and after, and this is a reliable number of blood. This is a real exact number, because vacuum full showed it to us.

Starting point is 00:05:04 So the clones are cool here as well. But you need to wait a little bit, of course. I guess this leads us to quite an interesting part of the topic, which is when should you worry about this? And I might even argue that 30% bloat is probably not that bad. Not that bad. But if it shows 60% and 30% of those 60% is an error, you decided to blow probably.

Starting point is 00:05:30 Like it affects the fact that it's an estimate. It affects our decisions anyway. For sure. But just to give people peace of mind, you know, when we're talking about badly bloated indexes, they could easily be triple the size of a re-index. Oh, this is great. By the way, this is an exercise I do usually. People see 90% bloat, but they say, is it bad? I say 90% bloat means that your index size is 10 times bigger than it

Starting point is 00:05:52 would be without bloat. 99% means 100 times bigger. 100 times bigger, it's already quite noticeable. And by the way, I wanted to mention those, I say lightweight estimate scripts, they sometimes are not light at all. And we have many cases when they fail to finish during statement timeout, like 15 or 30 seconds, because too many indexes and analysis takes time as well. So this is not something you should put to monitoring to run each minute, probably. You don't need it every minute. You need it once per day, maybe, because it doesn't change very fast. But back to the question about we have 99% bloat, meaning our index is 100 times bigger. The question is, is it bad?

Starting point is 00:06:38 Or why is it? We have disk space, for example. Yeah. Yes, it's bad, but why? Yeah. How is it bad yeah this is a fun kind of very specific thing that we came across working with query plans and it's it's funny because because this feels like a we discussed it last week but macro analysis problem you know

Starting point is 00:06:58 system level what's going wrong but you can spot it sometimes from a single query plan so if you if you notice maybe your queries are slower they or they could degrading over time the same query is maybe doing an index scan but that is getting slower over time and you look at buffers again a previous episode you can sometimes see that that those buffer numbers are way higher than they need to be for the amount of data involved or gradually increasing each time you run it. So it's not guaranteed that that's a sign of bloat, but there's a really good chance that it is.

Starting point is 00:07:32 This is great, by the way. You apply this classification of macro and micro, and I even didn't think about it, but it's exactly like alliance with math thoughts. So we have macro effects and micro effects, starting from micro effects. Sometimes some particular queries with particular parameters might behave much worse for bloated.

Starting point is 00:07:54 In the case of bloated index, because for example, instead of dealing with a few buffers, we need to deal with entries which are sparsely stored and we need to involve much more buffers. So we can see degradation sometimes several orders of magnitude. It's like in extreme cases. But it's tricky to find, for example, you checked your query for a few parameter sets. You see it's not bad compared

Starting point is 00:08:18 to bloated versus unbloated, right? But you don't look at other parameters but for other parameters that may be worse so it's it's a tricky how to automatically check how bad it is because actually b3 height grows quite slowly it's logarithmic basis very high like so it grows very slowly and if we go from i don't know like 1 000 buffers to 10 to 100,000 buffers for overall index size or million buffers already, we don't see a huge increase in lookup time because of growth of height. Because we just a few, okay, a couple of more hops to reach relief. Who cares? A couple of more IOs, it doesn't matter. So B3 is excellent here. It grows very slowly.

Starting point is 00:09:09 So searching in, let's find one row among a million rows or let's find one row among a billion rows. Well, difference is not huge, right? It won't be noticeable. It won't be 1,000 times difference. It will be small difference, a few more IO hops. But if you need to deal with many, many entries

Starting point is 00:09:28 and bloat means that distribution of them, they are stored like sparsely in the case of bloat index, of course, the difference will be amazing. I guess that's covering micro a little bit, but on the macro side, we've got things like cash. Macro, in my opinion is much more interesting i feel it i feel it like if we have a 99% bloat it means we have so many more pages to store the same data in our index and it means that not only disk space occupied i worried

Starting point is 00:10:02 about it less like disk space is interesting. I will explain my thoughts in a second. But the most noticeable performance of negative effect from high bloat of indexes, in my opinion, is we need to keep more pages in cache, both in the buffer pool and page cache. So it means that our cache effectiveness reduces. I have cases sometimes where some database or a few databases, they grow so quickly and company may, using these databases may already be a multi-billion company, unicorn, but nobody never was fighting with bloat. So for example, up to half of databases bloat, both table and index. And it means that our shared buffers work much worse. This is a great point. Actually, again, a caveat

Starting point is 00:10:52 is this applies, of course, when your database exceeds the size or database, including indexes, including bloated indexes exceeds the size of shared buffers before then probably not going to cause you any issues but most of us are probably running databases where we exceed or if it exceeds the cache size buffer buffer pool size but if you eliminate bloat it fits again right yeah but that would be that would imagine that the difference you would see then like it would be stark the difference you would see then. It would be stark, the difference. Yeah, this macro effect is quite noticeable. Also, we can talk about budgets here, like spending on hardware. Or in the case of Aurora, where they charge for I.O.,

Starting point is 00:11:38 if we need to do much more I.O., we can save here. Also, this macro effect is very interesting. I think it's maybe it's the most important one, in my opinion, in terms of performance. But third one, and this third one, disk space occupied, is usually the first thing that comes to mind when we think about blowout.

Starting point is 00:12:01 Blowout means that we occupy much more space, so we pay for disk, but not only we pay for disk, if we also recall our previous episodes, we need to write more to wall, right? Full-page writes, for example. Index writes also go there to wall. And more wall is generated. And data files also bigger. Wall is bigger. It gives more work for backup system backup is longer but also replication physical at least it's also slow more bytes need to be transferred to standby nodes negative effects everywhere right it's a really good point yeah check pointer also actually check pointer also needs to take care about more dirty pages

Starting point is 00:12:46 yeah so the ideal world is not to grow your indexes 10x and then re-index them to shrink them back down in an ideal world will stay on top of it and so it stays in a much more manageable range firstly i guess through auto vacuum but also as we've discussed auto vacuum can't shrink it down once it's started to bloat so we do need to do these occasional re-index ideally re-index concurrently i'm guessing or as you were going to say i guess pg repack right so we have a tool called postgres checkup which explains a lot of details about bloat, provides some recommendations and so on. I mean, we, Postgres AI, we have this tool and it tries to explain what to do. But in general, the plan we recommend, especially for cases which are like, for example, database company was super

Starting point is 00:13:37 successful, database grown, but there's no proper processes in place. We usually recommend, of course, to consider autovacuum settings, 100%. But not always this will help. We just discussed it, the index health may degrade. Also, if you have long transactions, also, if you have large tables, let's touch it once again in a minute. Autovacuum won't help you a lot, but it's still needed to make it more aggressive to eliminate dead tuples faster. Then run index maintenance once and then prepare to run it maybe in fully automated fashion

Starting point is 00:14:15 during weekends because index maintenance means index recreation. It's definitely stress for disk and for wall as well and for replications also. It's definitely some stress. So prepare to run automatically every for example every weekend for example in gitlab we did it gitlab is disclaimer our client and they have a lot of interesting information automation

Starting point is 00:14:38 and articles how they automate it and they run fully automated index recreation every weekend. I should read those. I haven't. Right. A question is how to recreate indexes. It's a question depending on your Postgres version. From Postgres 12, it's possible to run a reindex concurrently. Good.

Starting point is 00:14:57 Earlier, well, the idea is, okay, you can create index, drop old one, rename. But if this index is participating in some constraints like primary key, it will be quite a task. But there is also pgRepack. pgRepack can repack indexes, basically recreate them, not touching tables. Because removal below from table is a bigger task than just recreation of indexes. So pgRepack can work with any Postgres major version. I mean, not the old world one, but for example, 9.6. It has some interesting caveats.

Starting point is 00:15:30 For example, if you have deferred constraints, do I pronounce it right? Deferred constraints, right? So if you have deferred constraints, you might have issues with running PGRPack, but for table actually, not for index. For index, it's fine. For table, it can have issues. Miro had problems

Starting point is 00:15:48 and they wrote an excellent article where we can provide a link if you need to fight with bloat in table and you have deferred constraints it's a very good read. But for indexes there won't be any problem, right? But modern approach is just reindex concurrently. Unfortunately,-index concurrently. Unfortunately, re-index concurrently, this feature had so many bugs fixed. All of them are fixed, right? But history shows that many people, including me, already think there might be more found in this problem. So I would recommend if you run re-index concurrently, it's worth also having some process of index verification.

Starting point is 00:16:27 For example, using AmpCheck, you can check for corruption periodically, weekly, for example. Also, after index recreation, maybe it's a good idea to double check if there is corruption or not. Because this recent bug, which was discovered in May, in Postgres 14, we briefly discussed it. So the interesting thing is that if you have huge tables, like terabyte size, multi-terabyte size, and they are not partitioned, we create index or we recreate index. During all this time, it can take hours. AutoICOM cannot delete that tuples in any table, any index. It's a database wide problem. And in Postgres 14, there was an attempt to improve and fix it. XminHorizon was not being held during index creation or recreation. But unfortunately, a bug was discovered in May,

Starting point is 00:17:19 and in June, Postgres 14.4 had this functionality fix optimization reverted. So the rule of thumb, don't allow your tables to grow more than 100 gigabytes because index maintenance will require more index maintenance. Yeah. And actually, just while we're on that topic, so I think if you're on a minor version of 14 lower than 14.4, and you use re-index concurrently, upgrade. With the exception, I think, of RDS and Aurora,

Starting point is 00:17:51 who backpatched it to 14.3. I didn't know about it. And not only re-index concurrently, create index concurrently also was a problem. And everyone uses it, because how can you create index on running system creating this concurrently so if you're running 14.0 till 14.3 you have urgent task i think everyone knows it but just worth repeating anyway i met somebody the other day who didn't

Starting point is 00:18:20 unfortunately so i think it's worth repeating but yeah awesome you have mentioned a couple of things that we probably should touch on so index corruption is another version of maintenance that you might need to do and i know of one time that's really famous for causing corrupt indexes which is operating system upgrades well any operating system is dangerous operating system upgrade is dangerous because a Gleap C version upgrade may cause index corruption silently, and it's a problem. So it's a big problem, unfortunately.

Starting point is 00:18:54 And there's no easy solution, and it's quite easy to get into trouble if you don't think about it yourself, unfortunately. Just wanted to say it, because just in case anybody else wasn't aware of that one, I certainly wasn't a year or two ago. So, yeah, you need to recreate indexes and you need to do it inside maintenance window. Yeah. Also, unfortunately, gene indexes can be bloated as well,

Starting point is 00:19:20 and there is no good way to estimate it. And they can be corrupted as well. Also, there is no good way, official way to estimate it. And they can be corrupted as well. Also, there is no good way, official way to estimate it. There are patches for Amchek that still are not applied. But they are already quite advanced. And I used those patches to verify gene indexes in a couple of places. And it worked. I mean, I didn't find anything.

Starting point is 00:19:41 So there's also always question. We had false positives there. It was fixed, but yeah. I think their promise is no false negatives, right? I'm checking. Right. Right. So if you talk about bigger picture,

Starting point is 00:19:53 imagine if you have this like case, successful company, very big database, but they have a lot of bloat, both in table and indexes. Of course, removing table bloat is also interesting, but indexes, we discussed various problems, including this micro level, when particular queries are slow.

Starting point is 00:20:13 Removing that bloat, by the way, sometimes if you have queue-like pattern of working with table, you insert, probably update a few times and delete because task was processed. This is a good way to get high blood. And I saw many times that high blood at some point, it's like fine. But at some point, queries degrade very fast because of this micro level problem.

Starting point is 00:20:38 I mean, we have memory, we have big shared buffers. Table is quite small, like maybe a few gigabytes only. But we see some queries degrade because of these micro-level issues we discussed. So the question is, where to start? If you have many, many, many hundreds or thousands of indexes, where would you start for this initial run? Would you start from small indexes first? But those which have higher, like you put threshold like 90% and take care of smaller

Starting point is 00:21:07 indexes or large indexes and you go down or you start with indexes which have more below the bytes first. Where would you start? It's a good question. I saw your tweet as well about asking people about unused and redundant indexes as well and i i know that maybe i'm cheating by having seen that but that felt like a really nice like especially redundant i'm not so sure about unused because i wonder if a bloated anyway i'm not i'm uh depends on statistics i guess a little bit but redundant being by a redundant index i guess the easiest example of that is

Starting point is 00:21:45 you've got the exact same index so maybe let's take a simple case of a single column b tree index but we've got two of them well redundant can be for example you have a single column index you have two column indexes of course single column index is redundant to two column indexes if if two column indexes has the same column on the first place not in the second place this is automatic classic redundant but the problem will be what if we try to eliminate the first index because it's redundant but the second index is also unused and according to different report we also decide to drop it drop bothes, and it's not a good idea already. Or it's so bloated

Starting point is 00:22:26 that a Postgres actually avoids using it and goes to the... This is an interesting question. So we discussed these problems, micro and macro. Depending on which you consider is the biggest problem for you, for your case,

Starting point is 00:22:40 I see two options. If you think about particular queries that have very degraded performance because of bloat, you probably should say, let's re-index all indexes with bloat level more than 90% first. Even if they are smaller ones and don't contribute a lot to this macro level problem, like spamming our buffer pool. But if you think macro level problem is bigger, you probably should start with the biggest, like the indexes which have more bloated bytes estimated, right?

Starting point is 00:23:12 Order by, and go from top to down, even though some of them are unused. I might, can I make a potentially wrong argument for always starting with the smaller ones? I'd be interested in your thoughts if you if you've got a macro level problem if your database is on fire and you're trying to reduce load the smaller indexes whilst they they might be being used like just because they're smaller doesn't mean they aren't being used more so i wonder if you could also look at access and if you started with the smaller ones, the other angle would be if I re-index a smaller index, it finishes faster and my system reduces its load slightly sooner than if I re-index a large one and it takes hours i've got hours more at the same level of high what disaster i guess

Starting point is 00:24:06 but if i start re-indexing smaller ones and they finish faster maybe i can reduce the load a little bit quicker well in my opinion if we database on fire as a whole i will start from top to bottom fighting with indexes i i tried to think about it should we look at usage stats for indexes? I didn't see any big reason for that. Like any way we want to, like we didn't discuss the threshold, but usually practical threshold is 30, 40%, somewhere there.

Starting point is 00:24:35 If we have bigger bloat, this is for bloat. Yes, if it's below 30%, 20%, for example, we don't care, usually. Indexes, by the way, are bloated by default 10% on purpose because they have fill factor 90. Most bloat estimation queries factor in fill factor? Yes, they take this into account.

Starting point is 00:24:57 But anyway, 90% fill factor means we bloat on purpose because we want some room for updates inside the pages. If it becomes actually 80, okay, it's not that bad. I mean, 20% blow. But if blow already half of it, it's maybe already time to take care of it. And so I would go from top to bottom

Starting point is 00:25:20 if we take care about whole database. But in some cases, database is fine, but some particular queries, for example, this queue-like pattern we use, and we see these particular queries dealing with this table, they have very bad performance. In this case, I would start from the most bloated indexes regardless of their size. Maybe I would go from top to bottom as well, but I would skip indexes which have bloat estimation 60, 70 for first round to help as soon as possible those queries,

Starting point is 00:25:53 which we know they... Maybe I would take particular tables, actually, in this case, because, right, why not? But I would actually say, I think I understand now why that actually makes sense as a strategy, because bloat is not independent of usage. Chances are, if it's a heavily bloated table, it will go on.

Starting point is 00:26:11 Good, good. But imagine some indexes. By the way, if index is unused, we should apply extreme solution for bloat. We just drop index. If you know it's unused everywhere, on standbys, everywhere. And we observe quite long i usually recommend a statistics edge to be more than one month because we had cases which indexes unused several weeks we drop it but in the beginning of it it's not he mentioned the case when they had it and when first of next month analysts are waiting for some

Starting point is 00:26:48 report and they don't see report because index was dropped so some indexes are used only once per month and they are very important so usage numbers if they are not zero i don't know how to use it maybe i'm wrong i i try to like how to draw and blow it and usage more rights more reads. But imagine the case you have an unused index, you don't use it for index scans. But question doesn't contribute to this macro problem spamming, spamming our caches. Probably not. It's probably been long since it was. Any update, unless it's hard,

Starting point is 00:27:29 it will need to update this index. To update the index, we need to load this page to memory. And write overhead as well. We don't use index, but still it occupies some space in our caches.

Starting point is 00:27:45 Interesting. We don't use index, but still it occupies some space in our caches. So, yeah. Interesting. Maybe it also has some effects at micro level. I don't know. Maybe not. So there are many, many, many interesting things here. But I hope this discussion will help someone to understand it better. And anyway, just fight with bloat in indexes and prepare not only to do it

Starting point is 00:28:06 once manually but automate it we're using pgd pack or index concurrently carefully with some understanding that the index concurrently might lead to well right now a lot of bugs fixed people use index concurrently many people use it many projects large projects so it works so i don't want to be blamed for like nicolai's told us not to use index concurrently use it but just keep in mind that many bugs were fixed and maybe there are some bugs in future i would just automate it but also automate analysis of corruption using um check at least work quickly is am check Amchek available? Managed services? Yeah, it's in standard Contrip. It's a Contrip module.

Starting point is 00:28:49 It's available everywhere. Not all Contrip modules are available everywhere, so I'm glad I asked. But yes, awesome. So you mentioned JIN indexes briefly, and I don't know enough about maintenance of them, but I read a really good blog post by... Inside JINs, there are B3s. For posting list and for keys, as I understand,

Starting point is 00:29:11 like two types. Maybe I'm wrong. It's already past a few years since I touched the gene internals, so maybe I'm wrong here. But definitely there are B3. I know that from developers, first some B3 didn't exist and the performance was not good for larger scale.

Starting point is 00:29:32 But B3 inside GIMP. And that's why they can be also corrupted when you switch to new GRIP-C version. And the rules of character ordering, collation also changes. So GIMP can be be gist of course can be also corrupted because gist is three as well yeah but yeah i'm checking doesn't exist how to eliminate how to estimate blood also we don't know my rule use clones use vacuum full from time to time yeah and see actual blood like that reliable number like

Starting point is 00:30:06 brute force approach yeah i like it it's the first good use case for vacuum for on a on an active system i've seen of course it will take many hours if you have many terabytes in size but maybe you should have partitioning as a reminder right don't allow your tables to grow over 100 gigs. And then if you have partitioning, you can run AmpCheck in parallel. By the way, it's not a trivial task. We have some scripts, automation scripts,

Starting point is 00:30:34 developed for GitLab, I guess. And also, we can run AmpCheck in parallel. We can run vacuum full in parallel on a very temporary clone, which has a lot of power. And this automation is good to have nice is that in recent versions or has that been around for a while vacuum for in parallel no no no vacuum for in parallel doesn't exist and i'm checking parallel doesn't exist you need to script it oh okay i understand now because of the partition like it's it's by the way it's interesting people

Starting point is 00:31:01 random check in single thread and i also also did it until someone from GitLab, thanks for this question, asked why we run it. Like we have so many cores, why? And it was an excellent question. Of course, it should be in parallel because it will produce results much faster. And if you're in the cloud

Starting point is 00:31:19 and you use temporary clone, for example, of course you want to make your job faster, even if it occupies all CPUs, 100% of them, because you pay for minutes or seconds of usage in AWS. Not for 100% or 50%, it doesn't matter, but for time. And you want to make your job faster. You run full speed. Yeah.

Starting point is 00:31:44 Same in maintenance windows, I guess. You're against the clock. Maintenance windows, yes. But not the same for regular index maintenance on production because there you probably want a single thread or a few threads of running an index concurrently, maybe just single because it's still already some stress

Starting point is 00:32:01 and you don't want to make it full speed. By the way, again, I encourage our listeners to read the articles on gitlab blog they have good materials i recommend i remember also peter gagan came and learned something interesting from their experience and then working on b3 deduplication in postgres 13 and 14. So it's a recommended material for it. Awesome. I suspect that's all we've got time for today. Thank you, everybody. Thanks to everyone who keeps giving us feedback,

Starting point is 00:32:31 keeps sending us suggestions and shares it online. We really appreciate it. Looking forward to seeing you next week. Cheers, Nikolai. Thank you, Michael. See you next week. Bye. Take care.

Starting point is 00:32:40 Bye.

Your Ad Here

Postgres FM - Index maintenance

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.