Postgres FM - Index maintenance
Episode Date: September 23, 2022Important correction from this episode: amcheck promises no false positives, not no false negatives, sorry!Here are links to a few things we mentioned: pgstattuplepg_repackpostgres-checkup ...Workload Analysis for GitLab.comBloat Analysis (also from GitLab)Bloat, pg_repack, and deferred constraints (blog post by Miro) amcheckPeter Geoghegan interview (on Postgres TV) ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov and @michristofides or by commenting on our topic ideas Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PgMustard, and this is my co-host Nikolai, founder of PostgresAI.
Hey Nikolai, what are we going to be talking about today?
Yeah, hello Michael. As we decided, let's talk about index maintenance.
First of all, bloat removal, but maybe not only, right?
Yeah, we've alluded to this in a previous episode around vacuum, I believe.
Oh, about bloat specifically, yeah.
So yeah, excited to dive into this with you.
So should we start with how this occurs, perhaps?
Or quick recap on, is this always a problem?
If I have a Postgres database, is it very likely I'm suffering from this at the moment?
Or is there a chance that it's fine?
Yeah, by the way, you're right.
Our episode was called Vacuum, not Bloat. but we are so close to each other right because usually we talk about
blood and lack of vacuuming or some inefficient vacuuming and so on this is a great question
you asked we were just asked if we tuned our auto vacuum to be quite aggressive everything
looks fine the question is still, should we have
index maintenance from time to time? Is it inevitable? And in my opinion, from my practice,
the answer is yes, due to many reasons. And autovacuum won't solve everything and some
bloat will still be accumulated, even if you have very aggressive vacuuming. And observing other databases, database systems, for example, SQL Server, Microsoft SQL Server,
they also have index maintenance as a routine task for DBAs, DBAs, and in my opinion, you
should still in very loaded, growing, large systems, you should recreate indexes from
time to time.
Aggressive auto-v auto vacuuming will only reduce
the frequency of this need that comes like it will come less frequent but still you need to recreate
them yeah i think that's a really good point on heavily loaded systems i think that probably the
only caveat i would put is if you've got a relatively light load on your postgres database
this might be something you don't come across if if right even if you haven't
tuned auto vacuum it will be tidying things up as it goes along freeing up index pages especially
on later versions of Postgres there's some there's some additional logic to make that even less
likely to bloat but yeah there's I think it's even worse than for tables though with indexes right
like vacuum's able to free up space in tables much. Well, yeah, refer to a previous episode for more details,
but I think in tables you can free up space
and it's much more easily reused.
Whereas in a B-tree index, if you get page splits,
vacuum can free up those space in those pages again,
but it can't unsplit.
But it doesn't rebalance B-tree, right?
Exactly.
It doesn't rebalance that.
And I agree with you, some systems might not need automatic index recreation, but I'm sure
everyone needs monitoring and analysis of blood on a regular basis.
So this is a must for everyone, in my opinion.
And the question is how to analyze blood because it's not a trivial task.
All the scripts we have for fast bloat analysis,
lightweight analysis, they are all wrong. I mean, they can have some errors. They are not precise.
For example, create a table with three columns, small int, timestamp, tz, don't use timestamp
without tz, and small int again. Fill it with a few million rows.
Create an index.
And use your script to estimate bloat in table and in index.
Actually, it's two different scripts, right?
But still.
At least for table, I'm sure you will see terrible bloat.
But we know there is no bloat there yet.
We just inserted rows.
We didn't delete, didn't update.
So no bloat.
You will see something like 30% of bloat there yet we just inserted rows we didn't delete didn't update so no bloat you will see something like 30 percent of load estimated so we need to keep in mind our scripts and and the
our skips they have errors sometimes quite significant ones because they don't take into
account alignment padding this small experiment i just described it It by purpose has zero bytes gaps between columns inside each page,
column values. I'm not sure, by the way, about index. It should also have some bloat. Maybe
no, maybe no. Maybe I'm wrong here. It's only about heap only. Maybe multi-column indexes,
but I haven't checked. Right. Anyway, estimation scripts are great because they are light,
but I always correct everyone saying bloat is this.
I say estimated bloat is this because it's not.
The real number can be obtained by using pgstartuple extension.
By the way, I had no good luck using it.
It had problems as well in the past, so I don't use it myself.
In my approach, since we work a lot with clones of production environments,
I always say, let's just run vacuum full on the clone, because why not?
And compare numbers before and after, and this is a reliable number of blood.
This is a real exact number, because vacuum full showed it to us.
So the clones are cool here as well.
But you need to wait a little bit, of course.
I guess this leads us to quite an interesting part of the topic,
which is when should you worry about this?
And I might even argue that 30% bloat is probably not that bad.
Not that bad.
But if it shows 60% and 30% of those 60% is an error,
you decided to blow probably.
Like it affects the fact that it's an estimate.
It affects our decisions anyway.
For sure.
But just to give people peace of mind, you know, when we're talking about badly bloated indexes, they could easily be triple the size of a re-index.
Oh, this is great.
By the way, this is an exercise I do usually.
People see 90% bloat,
but they say, is it bad? I say 90% bloat means that your index size is 10 times bigger than it
would be without bloat. 99% means 100 times bigger. 100 times bigger, it's already quite
noticeable. And by the way, I wanted to mention those, I say lightweight estimate scripts, they sometimes are not light at all.
And we have many cases when they fail to finish during statement timeout, like 15 or 30 seconds, because too many indexes and analysis takes time as well.
So this is not something you should put to monitoring to run each minute, probably.
You don't need it every minute.
You need it once per day, maybe, because it doesn't change very fast.
But back to the question about we have 99% bloat, meaning our index is 100 times bigger.
The question is, is it bad?
Or why is it?
We have disk space, for example.
Yeah.
Yes, it's bad, but why?
Yeah.
How is it bad yeah this
is a fun kind of very specific thing that we came across working with query plans and it's it's funny
because because this feels like a we discussed it last week but macro analysis problem you know
system level what's going wrong but you can spot it sometimes from a single query plan so if you
if you notice maybe your
queries are slower they or they could degrading over time the same query is maybe doing an index
scan but that is getting slower over time and you look at buffers again a previous episode you can
sometimes see that that those buffer numbers are way higher than they need to be for the amount of
data involved or gradually increasing each time you run it.
So it's not guaranteed that that's a sign of bloat,
but there's a really good chance that it is.
This is great, by the way.
You apply this classification of macro and micro,
and I even didn't think about it, but it's exactly like alliance with math thoughts.
So we have macro effects and micro effects,
starting from micro effects.
Sometimes some particular queries
with particular parameters
might behave much worse for bloated.
In the case of bloated index,
because for example,
instead of dealing with a few buffers,
we need to deal with entries
which are sparsely stored
and we need to involve much more buffers.
So we can see degradation sometimes several orders of magnitude. It's like in extreme cases. But it's tricky to
find, for example, you checked your query for a few parameter sets. You see it's not bad compared
to bloated versus unbloated, right? But you don't look at other parameters but for other parameters that may be worse so it's it's a tricky how to automatically check how bad it is because
actually b3 height grows quite slowly it's logarithmic basis very high like so it grows
very slowly and if we go from i don't know like 1 000 buffers to 10 to 100,000 buffers for overall index size or
million buffers already, we don't see a huge increase in lookup time because of growth of
height. Because we just a few, okay, a couple of more hops to reach relief. Who cares? A couple
of more IOs, it doesn't matter.
So B3 is excellent here.
It grows very slowly.
So searching in,
let's find one row among a million rows or let's find one row among a billion rows.
Well, difference is not huge, right?
It won't be noticeable.
It won't be 1,000 times difference.
It will be small difference,
a few more IO hops.
But if you need to deal with many, many entries
and bloat means that distribution of them,
they are stored like sparsely in the case of bloat index,
of course, the difference will be amazing.
I guess that's covering micro a little bit,
but on the macro side, we've got things like cash.
Macro, in my opinion is
much more interesting i feel it i feel it like if we have a 99% bloat it means we have so many more
pages to store the same data in our index and it means that not only disk space occupied i worried
about it less like disk space is interesting. I will explain
my thoughts in a second. But the most noticeable performance of negative effect from high bloat of
indexes, in my opinion, is we need to keep more pages in cache, both in the buffer pool and
page cache. So it means that our cache effectiveness reduces. I have cases sometimes where some database or a few databases, they grow so quickly and
company may, using these databases may already be a multi-billion company, unicorn, but nobody
never was fighting with bloat.
So for example, up to half of databases bloat, both table and index. And it
means that our shared buffers work much worse. This is a great point. Actually, again, a caveat
is this applies, of course, when your database exceeds the size or database, including indexes,
including bloated indexes exceeds the size of shared buffers before then probably not going to cause you any issues
but most of us are probably running databases where we exceed or if it exceeds the cache size
buffer buffer pool size but if you eliminate bloat it fits again right yeah but that would be
that would imagine that the difference you would see then like it would be stark the difference you would see then. It would be stark, the difference.
Yeah, this macro effect is quite noticeable.
Also, we can talk about budgets here, like spending on hardware.
Or in the case of Aurora, where they charge for I.O.,
if we need to do much more I.O., we can save here.
Also, this macro effect is very interesting.
I think it's maybe it's the most important one,
in my opinion, in terms of performance.
But third one, and this third one,
disk space occupied,
is usually the first thing that comes to mind
when we think about blowout.
Blowout means that we occupy much more space,
so we pay for disk, but not only we
pay for disk, if we also recall our previous episodes, we need to write more to wall, right?
Full-page writes, for example. Index writes also go there to wall. And more wall is generated.
And data files also bigger. Wall is bigger. It gives more work for backup system backup is longer
but also replication physical at least it's also slow more bytes need to be transferred to
standby nodes negative effects everywhere right it's a really good point yeah check pointer also
actually check pointer also needs to take care about more dirty pages
yeah so the ideal world is not to grow your indexes 10x and then re-index them to shrink
them back down in an ideal world will stay on top of it and so it stays in a much more
manageable range firstly i guess through auto vacuum but also as we've discussed auto vacuum can't shrink
it down once it's started to bloat so we do need to do these occasional re-index ideally re-index
concurrently i'm guessing or as you were going to say i guess pg repack right so we have a tool
called postgres checkup which explains a lot of details about bloat, provides some recommendations and so on. I mean, we,
Postgres AI, we have this tool and it tries to explain what to do. But in general, the plan
we recommend, especially for cases which are like, for example, database company was super
successful, database grown, but there's no proper processes in place. We usually recommend, of course, to consider autovacuum settings,
100%. But not always this will help. We just discussed it, the index health may degrade.
Also, if you have long transactions, also, if you have large tables, let's touch it once again
in a minute. Autovacuum won't help you a lot, but it's still needed to make it more aggressive
to eliminate dead tuples faster.
Then run index maintenance once
and then prepare to run it
maybe in fully automated fashion
during weekends
because index maintenance
means index recreation.
It's definitely stress for disk
and for wall as well
and for replications also.
It's definitely some stress.
So prepare to run automatically every for example every weekend for example in gitlab we did it gitlab is disclaimer our client and they have a lot of interesting information automation
and articles how they automate it and they run fully automated index recreation every weekend.
I should read those.
I haven't.
Right.
A question is how to recreate indexes.
It's a question depending on your Postgres version.
From Postgres 12, it's possible to run a reindex concurrently.
Good.
Earlier, well, the idea is, okay, you can create index, drop old one, rename.
But if this index is participating in some constraints like primary key, it will be quite a task.
But there is also pgRepack.
pgRepack can repack indexes, basically recreate them, not touching tables.
Because removal below from table is a bigger task than just recreation of indexes.
So pgRepack can work with any Postgres major version.
I mean, not the old world one, but for example, 9.6.
It has some interesting caveats.
For example, if you have deferred constraints,
do I pronounce it right?
Deferred constraints, right?
So if you have deferred constraints,
you might have issues with running PGRPack,
but for table actually, not for index.
For index, it's fine.
For table, it can have issues. Miro had problems
and they wrote an excellent article where we can provide a link
if you need to fight with bloat in table and you have deferred constraints
it's a very good read. But for indexes there won't be any
problem, right? But modern approach is just reindex
concurrently. Unfortunately,-index concurrently. Unfortunately,
re-index concurrently, this feature had so many bugs fixed. All of them are fixed, right? But
history shows that many people, including me, already think there might be more found in this
problem. So I would recommend if you run re-index concurrently, it's worth also having some process of index verification.
For example, using AmpCheck, you can check for corruption periodically, weekly, for example.
Also, after index recreation, maybe it's a good idea to double check if there is corruption or not.
Because this recent bug, which was discovered in May, in Postgres 14, we briefly discussed it.
So the interesting thing is that if you have huge tables, like terabyte size, multi-terabyte size, and they are not partitioned, we create index or we recreate index.
During all this time, it can take hours.
AutoICOM cannot delete that tuples in any table, any index. It's a database
wide problem. And in Postgres 14, there was an attempt to improve and fix it. XminHorizon was
not being held during index creation or recreation. But unfortunately, a bug was discovered in May,
and in June, Postgres 14.4 had this functionality fix optimization reverted.
So the rule of thumb, don't allow your tables to grow more than 100 gigabytes
because index maintenance will require more index maintenance.
Yeah.
And actually, just while we're on that topic,
so I think if you're on a minor version of 14 lower than 14.4,
and you use re-index concurrently, upgrade.
With the exception, I think, of RDS and Aurora,
who backpatched it to 14.3.
I didn't know about it.
And not only re-index concurrently,
create index concurrently also was a problem.
And everyone uses it,
because how can you create index on running system
creating this concurrently so if you're running 14.0 till 14.3 you have urgent task i think
everyone knows it but just worth repeating anyway i met somebody the other day who didn't
unfortunately so i think it's worth repeating but yeah awesome you have
mentioned a couple of things that we probably should touch on so index corruption is another
version of maintenance that you might need to do and i know of one time that's really famous
for causing corrupt indexes which is operating system upgrades well any operating system is
dangerous operating system upgrade is dangerous because a Gleap C version upgrade
may cause index corruption silently,
and it's a problem.
So it's a big problem, unfortunately.
And there's no easy solution,
and it's quite easy to get into trouble
if you don't think about it yourself, unfortunately.
Just wanted to say it, because just in case anybody else wasn't aware of that one,
I certainly wasn't a year or two ago.
So, yeah, you need to recreate indexes and you need to do it inside maintenance window.
Yeah.
Also, unfortunately, gene indexes can be bloated as well,
and there is no good way to estimate it.
And they can be corrupted as well.
Also, there is no good way, official way to estimate it. And they can be corrupted as well. Also, there is no good way, official way to estimate it.
There are patches for Amchek that still are not applied.
But they are already quite advanced.
And I used those patches to verify gene indexes in a couple of places.
And it worked.
I mean, I didn't find anything.
So there's also always question.
We had false positives there.
It was fixed, but yeah.
I think their promise is no false negatives, right?
I'm checking.
Right.
Right.
So if you talk about bigger picture,
imagine if you have this like case,
successful company, very big database,
but they have a lot of bloat,
both in table and indexes.
Of course, removing table bloat is also interesting,
but indexes, we discussed various problems,
including this micro level,
when particular queries are slow.
Removing that bloat, by the way,
sometimes if you have queue-like pattern
of working with table,
you insert, probably update a few times and delete
because task was processed.
This is a good way to get high blood.
And I saw many times that high blood at some point, it's like fine.
But at some point, queries degrade very fast because of this micro level problem.
I mean, we have memory, we have big shared buffers.
Table is quite small, like maybe a few gigabytes only.
But we see some queries degrade because of these micro-level issues we discussed.
So the question is, where to start?
If you have many, many, many hundreds or thousands of indexes,
where would you start for this initial run?
Would you start from small indexes first?
But those which have higher, like you put threshold like 90% and take care of smaller
indexes or large indexes and you go down or you start with indexes which have more below
the bytes first.
Where would you start?
It's a good question.
I saw your tweet as well about asking people about unused and redundant indexes as well and i i know
that maybe i'm cheating by having seen that but that felt like a really nice like especially
redundant i'm not so sure about unused because i wonder if a bloated anyway i'm not i'm uh depends
on statistics i guess a little bit but redundant being by a redundant index i guess the easiest example of that is
you've got the exact same index so maybe let's take a simple case of a single column b tree index
but we've got two of them well redundant can be for example you have a single column index you
have two column indexes of course single column index is redundant to two column indexes if
if two column indexes has the same column on the first place
not in the second place this is automatic classic redundant but the problem will be what if we try
to eliminate the first index because it's redundant but the second index is also unused and according
to different report we also decide to drop it drop bothes, and it's not a good idea already.
Or it's so bloated
that a Postgres actually avoids using it
and goes to the...
This is an interesting question.
So we discussed these problems,
micro and macro.
Depending on which you consider
is the biggest problem for you,
for your case,
I see two options.
If you think about particular queries
that have very degraded
performance because of bloat, you probably should say, let's re-index all indexes with bloat level
more than 90% first. Even if they are smaller ones and don't contribute a lot to this macro
level problem, like spamming our buffer pool. But if you think macro level problem is bigger,
you probably should start with the biggest,
like the indexes which have more bloated bytes estimated, right?
Order by, and go from top to down,
even though some of them are unused.
I might, can I make a potentially wrong argument
for always starting with the smaller ones?
I'd be interested in your
thoughts if you if you've got a macro level problem if your database is on fire and you're
trying to reduce load the smaller indexes whilst they they might be being used like just because
they're smaller doesn't mean they aren't being used more so i wonder if you could also look at access and if you started with the smaller ones, the other angle would be if I re-index a smaller index, it finishes faster and my system reduces its load slightly sooner than if I re-index a large one and it takes hours i've got hours more at the same level of high what disaster i guess
but if i start re-indexing smaller ones and they finish faster maybe i can reduce the load a little
bit quicker well in my opinion if we database on fire as a whole i will start from top to bottom
fighting with indexes i i tried to think about it should we look at usage stats for indexes?
I didn't see any big reason for that.
Like any way we want to,
like we didn't discuss the threshold,
but usually practical threshold is 30, 40%,
somewhere there.
If we have bigger bloat,
this is for bloat.
Yes, if it's below 30%, 20%, for example,
we don't care, usually.
Indexes, by the way, are bloated by default 10% on purpose
because they have fill factor 90.
Most bloat estimation queries factor in fill factor?
Yes, they take this into account.
But anyway, 90% fill factor means we bloat on purpose
because we want some room for updates inside the pages.
If it becomes actually 80,
okay, it's not that bad.
I mean, 20% blow.
But if blow already half of it,
it's maybe already time to take care of it.
And so I would go from top to bottom
if we take care about whole database.
But in some cases, database is fine,
but some particular queries, for example, this queue-like pattern we use, and we see these
particular queries dealing with this table, they have very bad performance. In this case,
I would start from the most bloated indexes regardless of their size. Maybe I would go
from top to bottom as well, but I would skip indexes
which have bloat estimation 60, 70 for first round
to help as soon as possible those queries,
which we know they...
Maybe I would take particular tables, actually,
in this case, because, right, why not?
But I would actually say,
I think I understand now
why that actually makes sense as a strategy,
because bloat is not independent of usage.
Chances are, if it's a heavily bloated table, it will go on.
Good, good.
But imagine some indexes.
By the way, if index is unused, we should apply extreme solution for bloat.
We just drop index.
If you know it's unused everywhere, on standbys, everywhere.
And we observe quite long i usually recommend a statistics edge to be more than one month because
we had cases which indexes unused several weeks we drop it but in the beginning of it it's not
he mentioned the case when they had it and when first of next month analysts are waiting for some
report and they don't see report because index was dropped so some indexes are used only once per
month and they are very important so usage numbers if they are not zero i don't know how to use it
maybe i'm wrong i i try to like how to draw and blow
it and usage more rights more reads. But imagine the case you have an unused index, you don't
use it for index scans. But question doesn't contribute to this macro problem spamming,
spamming our caches. Probably not. It's probably been long since it was.
Any update, unless
it's hard,
it will need to update
this index. To update the index, we need
to load this page to memory.
And
write overhead as well.
We don't use index, but still
it occupies some space
in our caches.
Interesting. We don't use index, but still it occupies some space in our caches. So, yeah.
Interesting.
Maybe it also has some effects at micro level.
I don't know.
Maybe not.
So there are many, many, many interesting things here.
But I hope this discussion will help someone to understand it better.
And anyway, just fight with bloat in indexes and prepare not only to do it
once manually but automate it we're using pgd pack or index concurrently carefully with some
understanding that the index concurrently might lead to well right now a lot of bugs fixed people
use index concurrently many people use it many projects large projects so it works so i don't want to be blamed for like nicolai's
told us not to use index concurrently use it but just keep in mind that many bugs were fixed and
maybe there are some bugs in future i would just automate it but also automate analysis of
corruption using um check at least work quickly is am check Amchek available? Managed services?
Yeah, it's in standard Contrip.
It's a Contrip module.
It's available everywhere.
Not all Contrip modules are available everywhere,
so I'm glad I asked.
But yes, awesome.
So you mentioned JIN indexes briefly,
and I don't know enough about maintenance of them, but I read a really good blog post by...
Inside JINs, there are B3s.
For posting list and for keys, as I understand,
like two types.
Maybe I'm wrong.
It's already past a few years
since I touched the gene internals,
so maybe I'm wrong here.
But definitely there are B3.
I know that from developers, first some B3 didn't exist
and the performance was not good for larger scale.
But B3 inside GIMP.
And that's why they can be also corrupted
when you switch to new GRIP-C version.
And the rules of character ordering, collation also changes.
So GIMP can be be gist of course can be
also corrupted because gist is three as well yeah but yeah i'm checking doesn't exist how to eliminate
how to estimate blood also we don't know my rule use clones use vacuum full from time to time yeah
and see actual blood like that reliable number like
brute force approach yeah i like it it's the first good use case for vacuum for on a on an
active system i've seen of course it will take many hours if you have many terabytes in size
but maybe you should have partitioning as a reminder right don't allow your tables to grow
over 100 gigs.
And then if you have partitioning,
you can run AmpCheck in parallel.
By the way, it's not a trivial task.
We have some scripts, automation scripts,
developed for GitLab, I guess.
And also, we can run AmpCheck in parallel.
We can run vacuum full in parallel on a very temporary clone,
which has a lot of power.
And this automation is good to have nice
is that in recent versions or has that been around for a while vacuum for in parallel no no no vacuum
for in parallel doesn't exist and i'm checking parallel doesn't exist you need to script it
oh okay i understand now because of the partition like it's it's by the way it's interesting people
random check in single thread and i also also did it until someone from GitLab,
thanks for this question,
asked why we run it.
Like we have so many cores, why?
And it was an excellent question.
Of course, it should be in parallel
because it will produce results much faster.
And if you're in the cloud
and you use temporary clone, for example,
of course you want to make your job faster,
even if it occupies all CPUs, 100% of them,
because you pay for minutes or seconds of usage in AWS.
Not for 100% or 50%, it doesn't matter, but for time.
And you want to make your job faster.
You run full speed.
Yeah.
Same in maintenance windows, I guess.
You're against the clock.
Maintenance windows, yes.
But not the same for regular index maintenance on production
because there you probably want a single thread
or a few threads of running an index concurrently,
maybe just single
because it's still already some stress
and you don't want to make it full speed.
By the way, again, I encourage our listeners to read the articles on gitlab blog they have good materials i recommend
i remember also peter gagan came and learned something interesting from their experience
and then working on b3 deduplication in postgres 13 and 14. So it's a recommended material for it.
Awesome.
I suspect that's all we've got time for today.
Thank you, everybody.
Thanks to everyone who keeps giving us feedback,
keeps sending us suggestions and shares it online.
We really appreciate it.
Looking forward to seeing you next week.
Cheers, Nikolai.
Thank you, Michael.
See you next week.
Bye.
Take care.
Bye.