Postgres FM - Vacuum
Episode Date: August 12, 2022Here are links to a few things we mentioned: Do you vacuum everyday? (talk by Hannu Krosing)Autovacuum tuning (EDB guide) When autovacuum does not vacuum (2ndQuadrant blog post by Tomas Von...dra) Autovacuum tuning basics (old 2ndQuadrant blog post)Discussion with Anastasia Lubennikova (on RuPostgres, in Russian)  B-tree indexes (talk by Anastasia Lubennikova, in English) Discussion with Peter Geoghegan (on Postgres TV)pg_repack pg_squeeze ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov and @michristofidesIf you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PGMustard, and this is my co-host Nikolai, founder of PostgresAI.
Hey Nikolai, what are we discussing today?
Hi Michael, we are reacting to requests and we are discussing Vakuum.
Maybe we're only starting to discuss Vakuum.
Yeah, absolutely. Deep, deep topic.
But yes, this was, as you say say i think this was one of the first
ones somebody uh tweeted us once once we said we were open to talking about things people
wanted to hear a vacuum was a is always a topic that people want to talk about it seems like it's
something that people don't necessarily know about when they first encounter postgres but they
quickly learn about one way or the other and as as you get more and more advanced, I think there still seem to be things
that people can learn about vacuum.
Even people with many years of experience.
Right.
Right.
We have a couple of requests and I wanted to thank everyone
who provides feedback to us.
We see it almost every day already, especially on Twitter.
And it's so good to read it and to see what people think about our show.
And also, as usual, I ask everyone to like in the place they listen or watch. We have a video
version, which is uncut, by the way, longer a little bit on YouTube. Postgres.tv is a short
web address. But of course, it's available on iTunes and so on. And I encourage everyone to
like this show or this episode right now.
And also please share in social networks you use
or working groups where you discuss Postgres
or engineering or everything.
Thank you so much, everyone.
So speaking of Postgres TV channel,
we also have great guests coming almost every week.
It's called Open Talks.
The idea is we invite people
who presented some interesting talk
at some conference,
but recording was not done.
And one of such talks recently
was Hanno Crossing, ex-Skype.
He created a lot of postgres related stuff at Skype.
And he's now at Google Cloud.
And the talk was titled,
Do You Vacuum Every Day?
It was great.
Like it's very great, deep,
and at the same time, very simple material.
Everyone can watch it and understand a lot of stuff.
So I again, like encourage 100% everyone should watch this.
So it's directly related to our today topic.
I watched that.
I thought it was fantastic.
He managed to keep it, as you say, kept it simple, but i definitely still learned a few things during it i think it was
about an hour maybe a little bit longer so it is definitely a deep dive but definitely well worth
the watch yeah we don't we don't limit our speakers with timing unlike at conference when
we have very strict constraints so we can spend a couple of hours, definitely, if you want, and we have material and desire, definitely so.
But our show is roughly 30 minutes because, as usual, I want to say a big hello to those who are running or riding bicycle or especially walking their dogs.
I know some people do it.
From feedback, I've learned it.
And I think dogs must love our show.
Yeah, it's their excuse to get out.
I think dogs and elephants would be friends too.
Right.
Keep it in the spirit.
Yeah.
Right.
So, vacuum, where to start?
What do you think?
I think probably right at the beginning, right?
So, why do we have vacuum?
Like, what is vacuum?
Why do we have it?
Some people kind of see it as a negative part of Postgres.
I see it as something we need because of some of the design decisions
right at the beginning of Postgres.
I personally really like it.
I think we've gained so much
from some of those design decisions.
But this one is to do with
multiversion concurrency control, right?
So MVCC.
The fact that we keep old versions of,
I was nearly, I was about to say rows then,
but I guess tuples around.
Row versions, or you say tuples or tuples?
Oh, which is correct?
Did I say the correct one?
I don't know.
I've heard both, so we need to choose.
So I'm going to say tuple.
I don't know why.
I will choose tuple.
Okay, tuples. Let's use tuples. So right, a tuple is a physical version of a row,
and each row can have multiple tuples at the same time. And one transaction sees only one of them.
It may happen that some tuple, which is a physical version of a row, is not visible to any transaction, so basically it's dead. And you can check when the tuple was created because you can select
xmin, xmax, ctid. We discussed ctid some time ago. It's very convenient sometimes to know that there
are hidden system columns in each table. They are created automatically. You cannot, for example, you cannot use ctid or xmin in your column names.
So it's like reserved words.
But you can select and see the value of xmin, xmax, and ctid.
ctid is a physical address, like page and offset.
xmin is when the transaction ID which created the tuple.
But xmax is slightly more complex.
It's usually zero, it means like row is live. But Xmax is slightly more complex. It's usually zero,
it means like a row is live.
But it can be not zero,
it can be something different,
some transaction which was rolled back.
So Xmax is present,
you can see it in your transaction.
Obviously, this tuple is live.
So it means that when you select Xmin, Xmax,
you have a feeling of tuple, right?
And the next time you select, you can have different values of xmin, xmax. Xmin, no, but xmax and ctid, they can change these hidden columns.
So, right. So if tuple becomes dead, it's a problem because it still occupies some space.
This is a key problem that leads us to the need of vacuuming, right?
And also, this very simple exercise I recommend everyone who starts working with Postgres,
create a table with just single column ID integer 8, for example,
and put a row where ID equals 1, and then check city ID,
and then just update this row, not changing values,
say id equals id.
And you will see city id will change.
This is how you can feel that new tuple is created.
Every time you try to do update, even if you try and roll back,
tuple will be created, but new tuple will be marked dead
instead of old one, right?
I didn't realize that.
That's cool.
Yeah.
Or you can insert an inside transaction and then roll back your transaction.
So cancel your insert.
And this insert will produce some tuples again.
This shows us that testing on production is not good.
Some people say, oh, we will insert some data and then roll back our transaction not to
disturb production.
But you disturb anyway, because you will produce fresh dead tuples.
And you will make autovacuum to make more work than without your actions.
Testing on production is not a good idea.
And this is the reason we're discussing this part of it is that one of the core tasks that vacuum has and presumably where it got its
name from is then going and looking for these dead tuples and vacuuming them up or removing them
so that reclaiming the space i guess is there is one that's right by the way to finish this idea
about testing the only type of workload which you can do not disturbing physical layout
and not making autovacuum to work more is cancel, delete.
So you delete a lot, then roll back.
You will produce a lot of IO, a lot of records will go to wall.
So it will put some stress to replication.
But physical layout won't change because your transaction will just put new xmax value. Then when it got
canceled, this transaction ID marked as canceled, so it will logically be equal to zero, meaning
that these tuples are still alive. So this is a very interesting kind of workload I sometimes
use on regular clones when we need to have many iterations of workload but we don't want our
physical layout to be changed so to start from the same point every time so this is the only
kind of workload that won't disturb physical layout and vacuum right sorry for if maybe it's
kind of off topic a little bit but i i want everyone just to start thinking about physical
raw versions which called tuples when they think about performance.
Because it's very, very related.
You cannot optimize performance not understanding MVCC and tuples, right?
Yeah, absolutely.
So where do you want to go next?
Do you want to talk about performance a little bit?
Or do you want to talk about the different...
So one of Vacuum's tasks is to free up space.
Yeah, which tasks Vacuum has?
First of all, cleaning up the tuples, tuples, tuples.
And second is preventing transaction ID wraparound,
which is called freezing.
And third one, additional one,
is recalculation of table statistics. The statistics
of data to help the planner to understand which plan to choose based on data stats, right? And
this is also like analyze part. Yeah. So that happens if you can additionally do vacuum analyze,
can't you? But also auto vacuum does this this so i wasn't sure you're going to bring
this up so yeah yeah yeah well i switched us to auto vacuum actually uh implicitly and this is
of course this is what auto vacuum can do by the way it's interesting that it sometimes it does
just analyze for a table when time comes right But sometimes it chooses to do vacuum analyze. So both actions at once.
It's interesting that both are possible for autovacuum.
And autovacuum is great.
It allows you to forget about vacuuming.
I remember times when autovacuum wasn't present in Postgres.
So by default, you were in trouble.
If you don't vacuum, what happens?
Yeah, that's a really good point, actually.
So there might be people listening
who maybe they're relatively new to Postgres
or they've started a project
and they've not had to worry about vacuum.
And that's probably because,
for so far at least,
auto vacuum has been,
you know, it's got certain default settings
that probably are a bit conservative in general.
But when you're first getting started, when the tables are small, they're OK.
They're fine. They'll keep you. They'll keep this from being a problem for longer than, as you say, if they don't have it.
And I think as far as I can tell, unless you're extremely advanced and know exactly what you're doing, you really shouldn't be turning auto vacuum off. So if anybody suggests doing that,
that should set off big, big sirens and warning signals at your place of work, I think.
Right. Well, my opinion, based on my like, 17 plus experience of working with Postgres
in quite large setups with hundreds of 1000s transactions per second, multi-dozens of terabytes of data, and so on.
In an LTP context, when a lot of users are present
and they need very good performance.
So my opinion is default is not enough for almost everyone.
So it should be tuned in the very beginning.
For example, only three autowacking workers is not enough for modern servers.
Okay, if it's a laptop or some very, very small system, three workers is probably enough.
But since every time you produce the tuples, and we just discussed, many transactions produce the tuples, you need cleanup.
So it means that cleanup is almost like constant work
that Postgres needs to do.
So my recommendation is
check how many cores you have
and allocate at least 30%
of those cores to Wacom.
So if you have 12 cores,
four workers.
If you have, I don't know,
like 40, how many?
Like 96 cores.
It means that you need to have 30 workers at least if you have
a very big server and i'm excited to see that some some settings default settings were recently
changed cost limit auto vacuum it's it's very complex like it's. One setting depends on another and so on.
But roughly, cost limit and cost delay associated with auto vacuum were very, very, very old defaults.
And roughly speaking, you had only 8 maybe bytes per second of reads for all workers that you have, not more. This is like a quota. And it was
changed in Postgres maybe 12 or plus or minus one, and it was like, became 10 times more.
So now roughly 80 megs per second, but it's not enough for big setups setups with modern disks you probably want like half terabyte or even even
bigger quota so you need to tune it further yeah i think there are quite a few good tuning guides
aren't there we could i can link to it i know at least one really good one in terms of giving
people advice on what go on i like a couple of articles from, I don't remember who wrote it, but it was on second corner on the blog.
When vacuum doesn't vacuum, understanding basics of vacuuming.
This is like very great basic material.
And I even open it very often when I need to explain something to others.
I always mention these couple of posts.
So let's link them in in and encourage
everyone to read it but i also wanted to say that like if you don't tune it in the very beginning
bloat will come right eventually yes the two risks right transaction idea wraparound and bloat
two risks yeah and that like that manifests itself in gradually
gradually slower performance or you know like various things will start to degrade at that
point this like will start to increase like there's there's a few different ways that that
will manifest um and i actually wanted to say like before we move on from auto vacuum the the best advice i ever heard was it's
a bit like exercise if you if it hurts you're not doing it enough so it's the if auto vacuum looks
like it's getting in the way of things so sometimes i think people see it running and that's blocking
other things or it's taking ages things like that if you're in that situation don't turn it off i
think some people get tempted to to turn it off so that the problem goes away.
The solution is more make it more aggressive.
Opposite.
Make it happen more frequently.
Exactly.
So that seems to be what trips people up around this.
Yeah, funny word.
When I describe what to do without a vacuum, I also use aggressive.
And sometimes some companies company they have interesting
bot in slack that said well you know you shouldn't use word aggressive choose another one and there
are options but i agree like auto vacuum tuning is it's two things make it aggressive move faster
right so aggressive like what does it mean move faster faster. So give more quota.
Don't limit risk discrets and so on.
And cost limit, cost delay.
This set of settings, there are several settings.
So also auto-vacuum settings depend on vacuum settings.
If it's minus one, it means get it from there.
So that's why I say it's quite complex.
You need to spend some
time understanding these like hierarchy of settings but this is one thing another thing is
frequency so like speed quarters aggressiveness and frequency frequency is very important because
if you allow it move fast but you keep the defaults like 10% of the tuples.
It's a lot.
1% below.
For OTP, we need to go down.
We need to remove the tuples more often.
Yeah, so just for anybody wondering, by default, I think all tables,
so basically any table has to grow by 10% in order for it to trigger another autovacuum,
which when the table is 100 or 1,000...
Sorry, needs to accumulate at least 10% of dead tuples.
Great point, yes.
Right.
It can grow, for example, a PAND-only situation.
It was fixed, by the way, in Postgres 13, maybe,
when this analyze part of autovacuum didn't trigger because we don't have dead tuples.
And special setting was added of additional logic for append-only situations.
But at least 10% of dead tuples by default, in my opinion, should be 1% in OTP.
Or maybe not even.
Should it definitely be a
percentage like i've seen people switch completely to a raw number of tuples instead so right they
don't have this degradation over time as it as it as those relations grow right so yeah it's also
not simple there is scale factors also set of settings for analyze, for vacuum part, and also there is a threshold.
Threshold is like some absolute number, like 50 by default, as I remember, 50 dead tuples.
And there is some formula based on these two settings that gives you a real threshold that
when like defining when that vacuum triggers.
Well, this static number approach,
there are many topics people have different opinions about.
For example, I see several groups of people,
and I still see them,
that say default autovacuum is a very silly algorithm. For example, it doesn't understand that on weekends,
we have much more space for work for autovacuum.
Or at nights.
It doesn't respect the time of day or the day of week.
And we should change it.
So they switch off autovacuum and implement their own algorithm and run vacuum.
Or they do both.
Both seems like really smart to me right so if you know when
your business is quiet and you can afford most of the time to be able to do a manual vacuum analyze
it's not that you're doubling the work by keeping auto vacuum on because when auto vacuum then does
trigger it has less to do that there's there's less work so i do see the logic to that myself yes yeah also if you do it with
your script it means that you run so-called manual of course it's a full automated this case but it's
so-called manual vacuum and in recent versions of posgus it has benefit unlike auto vacuum it can
process indexes of a table in parallel so multiple processes and it's like parallelization is a
great way to improve vacuuming on larger setups it's it's inevitable way i would say and it it
leads us to the topic of partitioning but maybe let's postpone it right right but uh i agree well
i don't i i don't like the idea of turning auto vacuum completely and relying on your custom in-house
tool.
It may be dangerous, but combination sounds good.
But if you run vacuum, if you connect to Postgres using psql and run vacuum, it's unthrottled.
Oh, throttling is a good way.
It's a good way to name quotas differently, right?
So auto vacuum is throttled too much by default. But manual vacuum is not throttled at all because vacuum cost delay is zero by default.
It means the throttling is not applied.
And my question is, can you put your system down in terms of saturation of disk I.O.?
Just doing some vacuum.
Maybe in multiple processes, multiple tables at once. I tried on modern hardware, like NVMe disks, I could not do it.
Maybe on older hardware it's possible.
I saw problems related to performance when vacuum was too aggressive, right?
Moving too fast, using too much of our disk IO capacity.
But on modern disks, we recently tried it. We tried to prove that
unleash autowacuum is not a good idea, but we failed. We always had room, even if we used
almost all CPUs are doing some work. It's interesting. Maybe I'm wrong here. It's an
interesting exercise to see where the limit is. That sounds like, that doesn't sound like a failure at all.
That sounds like a successful experiment where you learned something really important.
Well, there is always bottleneck somewhere, right?
If we cannot saturate our disk IO, it means that probably we have some,
we spend some time in code, loading CPU more than could be.
Like, I don't know, like, it's interesting.
I'm just trying to say that on modern disks,
you probably should almost unleash it to move faster.
Yeah.
Especially if you have a lot of cores.
There were a couple more.
So I think partitioning actually might be worth discussing briefly.
Probably not the depths of it,
but I think a lot of people think partitioning is going to help them with performance.
And my experience is different.
My experience is the main argument for it is all
around maintenance. It's all around being able to delete. I'm getting a big thumbs up here. So yeah,
go on. Yes and no. So I agree. Like B3 is great. The height of B3 grows very slow. So if we have
100 gigabytes, if we have 10 terabytes, the difference is not that big. We need just additional buffers to be checked to perform index scan, right?
Index only scan, if you talk about it.
But the problem is definitely related to maintenance, because if we have 10 terabyte table, a vacuum is taking a day, maybe, depending on your speed and power of disks and CPU and so on.
But it's a problem itself.
Not partition table, right?
Not partition, of course.
I'm trying to understand how partitioning can be connected to vacuuming and how partitioning can help vacuum.
Of course, if a table is partitioned, the problem is that auto vacuum cannot process
even indexes of a table yet.
Maybe it will be implemented in future
because regular vacuum can do it.
But also, even if it processed the indexes in parallel,
for example, you have 10 indexes
and you process using 10 CPUs, 10 processes, right?
But heap itself, it's hard to parallelize it.
Also, I think it's possible, but it's maybe like it's a complex task.
Right now, it's not possible.
So if you have a huge table, the process of vacuuming it with all its indexes will be
single threaded by auto vacuum.
But if you partition it, you can benefit from having many auto vacuum workers
of course if you tuned it from default three to a bigger number yeah you should on larger systems
so this greatly improves the speed of vacuuming but not only this of course if your table is
partitioned you can re-index and create index or recreate indexes much faster but also the state
of cache improves because data is localized much better new data is present in pages where mostly
new data or some pages have old data and if these tuples tuples are not changed in in some page
it's it's all frozen, all visible.
AutoVacuums keeps it, right?
It's so good.
And maybe it's even not present in cache.
So you have more space in your buffer pool
and page cache for newer data.
So cache efficiency increases as well, right?
And this directly improves performance.
But also one topic I recently realized is that indexing and
re-indexing, it affects vacuuming. If creation of some index or recreation of some index takes
hours, during this period, vacuum cannot delete freshly dead tuples. Tuples which became dead.
Interesting.
Tuples which became dead only recently, right? And this is a problem because indexing,
re-indexing, they hold xmin horizon even if it's
concurrent yes concurrently as well there was an attempt to optimize this in postgres 14 for
index and reindex concurrently and it was so great like reindex and index concurrently they they don't
hold the xmin tuple so all the tuples which became dead recently, vacuum can clean them.
Great.
But recently, in June, in Postgres 14.4, this triggered this release, as I understand.
Immediate bug fix needed, and this functionality was reverted in Postgres 14, unfortunately.
So the rule of thumb, you don't want index creation to be very long.
So you need partitioning for faster vacuuming
to run it parallel and to avoid bloat.
Because we didn't discuss what bloat is, right?
Bloat happens when you accumulate too many tuples, right?
And then a vacuum delete them, a lot of them at once,
and you have a lot of space, free space inside pages.
Yes.
And I think we probably should, because we're on the topic of indexes, talk about, we've talked mostly about table bloat so far.
So row versions, dead row versions.
But there's also index bloat, which is subtly different in my opinion because you can you've got the same entries but
like freeing them up doesn't do as much good as in in the table or isn't there in the heap
in the heap they can be reused very easily another row can a row version can be inserted there
it doesn't matter what but in a in an index because it's it or in a b3 index for example adding more rows can
split pages and vacuum won't unsplit the pages so even if you vacuum as yeah very strongly or
very aggressively vacuum doesn't rebalance it does yeah exactly you won't get it back to the same
size as it was at first and therefore you might need to re-index from time to time
if you want to remove index bloat as well.
Or you can stay on top of it as you can.
And there's some good optimizations for this in Postgres 13, I believe.
Maybe 14.
So yeah, it's just something that people sometimes don't realize.
Yeah, we talked about it in both.
In both B3 optimizations,
I think they started in Postgres 12 even,
or 13.
And we discussed it on Postgres TV channel
with Anastasia Lubenikova
and then with Peter Gagan.
It's like so great that these optimizations happen.
They're very fundamental,
affecting every Postgres setup.
So good.
Deduplication and so on i agree but i think
blowout sometimes is useful both in heap and indexes like if in heap it can help to have more
often heap only tuple updates more optimized updates not without need to touch indexes but
in indexes i think it's also useful otherwise why do we have default fill factor indexes. But in indexes, I think it's also useful. Otherwise, why do we have default field factor indexes 90, not 100? I'm not an expert in indexes. We have other experts
like Peter Gaganura and Andrei Borodin, for example. By the way, indexes is a great set
of topics. We should talk about them at some point but related to vacuum i think you're
right over time indexes health degrades anyway even in like in modern versions latest versions
of postgres it's much better situations improved blood growth decreased but still we need to
perform so-called index maintenance and rebuild indexes from time to time.
Even if our after vacuum is well-tuned and we have the latest version, we still need to rebuild indexes.
And when we rebuild indexes, we want to be very fast because of XminHorizon, because we affect all tables, by the way.
So if we rebuild one index and if it takes many hours, all indexes in our database are affected.
So vacuum cannot clean tuples in all tables, all indexes, freshly dead tuples, and entries to that tuple.
So it's a huge global problem.
So you want indexing to be fast.
Yeah, super interesting.
I had one last thing that I thought we couldn't finish this episode without talking about briefly.
And even if it's just a public service announcement, there is a parameter for vacuum.
I think it should really be called a different thing.
But it's it's called vacuum full and you probably never want it.
Like I could imagine people very happily using Postgres for decades and never using it.
But I just wanted to say because I think it can trip people up thinking well like the name suggests that it's going to do a very comprehensive
version of vacuum but because of the locks it takes it's it's i use it all the time i use it
all the time go on what for really are you joking well so i use it all the time because it's related
to how we estimate how we see how much bloat we have.
You know that bloat estimation is not a simple task.
Scripts, everyone has various versions of scripts.
They are lightweight, but they can be very wrong.
For example, I can show you how I can...
The script says we have 40% of bloat, but bloat is zero because the table is just created.
It's easy to do.
It's related to alignment padding.
So if you want to understand real bloat numbers, there is a PGStat tuple extension.
But I had issues with using it some time ago.
So since we are big fans of real experimenting and using clones and thin clones, thick clones, doesn't matter in this case.
What do we do?
We just create clone and promote it to primary.
So it's real clone, writable.
And we vacuum full whole database or specific tables.
And we compare numbers before and after size.
Right.
And we can say 100% we know that blow test is this.
Current blow test is this.
Gotcha.
And also, presumably, we're talking about test environments here.
We're not talking about...
Of course, detached instance.
It's not test because it's production data,
but it's closer to production to get a real measurement of vacuum.
This is a very simple, straightforward, brute force approach.
Yes.
I agree.
I was thinking purely for production use case,
but you're 100% right, as usual.
Is there anything else you wanted to make sure we talked about today?
No, let's make some summary, maybe recommendations.
Understand MVCC, of course, right?
It's like, read about it, understand.
There are many good materials around.
Then tune auto-vacuum.
Then there is a separate topic of monitoring, maybe next time. Also, recreate indexes from
time to time, probably in an automated fashion. It's not a simple task, also worth a separate
discussion. And maybe use some tools like pgrepeg or pgsqueeze to deal with bloat in tables themselves, right?
What else?
Ah, partitioning.
Yeah.
There's a rule of thumb.
It's like an empiric rule, not based on some logic, but it's based on the experience of many people.
If the table grows or has chances to grow over 100 gigs, it should be partitioned.
Yeah, I like this.
And I think you had one more rule, didn't you?
So if you're thinking about 100 gigabytes, think about partitioning.
If you're thinking about a terabyte, think about sharding.
Was that your rule?
Sharding.
Yeah.
Yeah, it was a discussion on Twitter.
It's not my rule.
It's like started in different places.
And then...
Sounds good.
It's easy to remember.
Good.
Wonderful.
Well, thank you.
Thanks again to everybody who's been sharing this on Twitter and wherever else.
We really appreciate it.
And yeah, looking forward to talking to you again next week.
Thank you.
Bye.
Cheers.
Bye.