Postgres FM - Vacuum

Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of PGMustard, and this is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, what are we discussing today? Hi Michael, we are reacting to requests and we are discussing Vakuum. Maybe we're only starting to discuss Vakuum. Yeah, absolutely. Deep, deep topic. But yes, this was, as you say say i think this was one of the first ones somebody uh tweeted us once once we said we were open to talking about things people

Starting point is 00:00:31 wanted to hear a vacuum was a is always a topic that people want to talk about it seems like it's something that people don't necessarily know about when they first encounter postgres but they quickly learn about one way or the other and as as you get more and more advanced, I think there still seem to be things that people can learn about vacuum. Even people with many years of experience. Right. Right. We have a couple of requests and I wanted to thank everyone

Starting point is 00:00:55 who provides feedback to us. We see it almost every day already, especially on Twitter. And it's so good to read it and to see what people think about our show. And also, as usual, I ask everyone to like in the place they listen or watch. We have a video version, which is uncut, by the way, longer a little bit on YouTube. Postgres.tv is a short web address. But of course, it's available on iTunes and so on. And I encourage everyone to like this show or this episode right now. And also please share in social networks you use

Starting point is 00:01:28 or working groups where you discuss Postgres or engineering or everything. Thank you so much, everyone. So speaking of Postgres TV channel, we also have great guests coming almost every week. It's called Open Talks. The idea is we invite people who presented some interesting talk

Starting point is 00:01:48 at some conference, but recording was not done. And one of such talks recently was Hanno Crossing, ex-Skype. He created a lot of postgres related stuff at Skype. And he's now at Google Cloud. And the talk was titled, Do You Vacuum Every Day?

Starting point is 00:02:06 It was great. Like it's very great, deep, and at the same time, very simple material. Everyone can watch it and understand a lot of stuff. So I again, like encourage 100% everyone should watch this. So it's directly related to our today topic. I watched that. I thought it was fantastic.

Starting point is 00:02:23 He managed to keep it, as you say, kept it simple, but i definitely still learned a few things during it i think it was about an hour maybe a little bit longer so it is definitely a deep dive but definitely well worth the watch yeah we don't we don't limit our speakers with timing unlike at conference when we have very strict constraints so we can spend a couple of hours, definitely, if you want, and we have material and desire, definitely so. But our show is roughly 30 minutes because, as usual, I want to say a big hello to those who are running or riding bicycle or especially walking their dogs. I know some people do it. From feedback, I've learned it. And I think dogs must love our show.

Starting point is 00:03:03 Yeah, it's their excuse to get out. I think dogs and elephants would be friends too. Right. Keep it in the spirit. Yeah. Right. So, vacuum, where to start? What do you think?

Starting point is 00:03:12 I think probably right at the beginning, right? So, why do we have vacuum? Like, what is vacuum? Why do we have it? Some people kind of see it as a negative part of Postgres. I see it as something we need because of some of the design decisions right at the beginning of Postgres. I personally really like it.

Starting point is 00:03:30 I think we've gained so much from some of those design decisions. But this one is to do with multiversion concurrency control, right? So MVCC. The fact that we keep old versions of, I was nearly, I was about to say rows then, but I guess tuples around.

Starting point is 00:03:46 Row versions, or you say tuples or tuples? Oh, which is correct? Did I say the correct one? I don't know. I've heard both, so we need to choose. So I'm going to say tuple. I don't know why. I will choose tuple.

Starting point is 00:04:05 Okay, tuples. Let's use tuples. So right, a tuple is a physical version of a row, and each row can have multiple tuples at the same time. And one transaction sees only one of them. It may happen that some tuple, which is a physical version of a row, is not visible to any transaction, so basically it's dead. And you can check when the tuple was created because you can select xmin, xmax, ctid. We discussed ctid some time ago. It's very convenient sometimes to know that there are hidden system columns in each table. They are created automatically. You cannot, for example, you cannot use ctid or xmin in your column names. So it's like reserved words. But you can select and see the value of xmin, xmax, and ctid. ctid is a physical address, like page and offset.

Starting point is 00:04:57 xmin is when the transaction ID which created the tuple. But xmax is slightly more complex. It's usually zero, it means like row is live. But Xmax is slightly more complex. It's usually zero, it means like a row is live. But it can be not zero, it can be something different, some transaction which was rolled back. So Xmax is present,

Starting point is 00:05:13 you can see it in your transaction. Obviously, this tuple is live. So it means that when you select Xmin, Xmax, you have a feeling of tuple, right? And the next time you select, you can have different values of xmin, xmax. Xmin, no, but xmax and ctid, they can change these hidden columns. So, right. So if tuple becomes dead, it's a problem because it still occupies some space. This is a key problem that leads us to the need of vacuuming, right? And also, this very simple exercise I recommend everyone who starts working with Postgres,

Starting point is 00:05:53 create a table with just single column ID integer 8, for example, and put a row where ID equals 1, and then check city ID, and then just update this row, not changing values, say id equals id. And you will see city id will change. This is how you can feel that new tuple is created. Every time you try to do update, even if you try and roll back, tuple will be created, but new tuple will be marked dead

Starting point is 00:06:22 instead of old one, right? I didn't realize that. That's cool. Yeah. Or you can insert an inside transaction and then roll back your transaction. So cancel your insert. And this insert will produce some tuples again. This shows us that testing on production is not good.

Starting point is 00:06:38 Some people say, oh, we will insert some data and then roll back our transaction not to disturb production. But you disturb anyway, because you will produce fresh dead tuples. And you will make autovacuum to make more work than without your actions. Testing on production is not a good idea. And this is the reason we're discussing this part of it is that one of the core tasks that vacuum has and presumably where it got its name from is then going and looking for these dead tuples and vacuuming them up or removing them so that reclaiming the space i guess is there is one that's right by the way to finish this idea

Starting point is 00:07:19 about testing the only type of workload which you can do not disturbing physical layout and not making autovacuum to work more is cancel, delete. So you delete a lot, then roll back. You will produce a lot of IO, a lot of records will go to wall. So it will put some stress to replication. But physical layout won't change because your transaction will just put new xmax value. Then when it got canceled, this transaction ID marked as canceled, so it will logically be equal to zero, meaning that these tuples are still alive. So this is a very interesting kind of workload I sometimes

Starting point is 00:07:59 use on regular clones when we need to have many iterations of workload but we don't want our physical layout to be changed so to start from the same point every time so this is the only kind of workload that won't disturb physical layout and vacuum right sorry for if maybe it's kind of off topic a little bit but i i want everyone just to start thinking about physical raw versions which called tuples when they think about performance. Because it's very, very related. You cannot optimize performance not understanding MVCC and tuples, right? Yeah, absolutely.

Starting point is 00:08:35 So where do you want to go next? Do you want to talk about performance a little bit? Or do you want to talk about the different... So one of Vacuum's tasks is to free up space. Yeah, which tasks Vacuum has? First of all, cleaning up the tuples, tuples, tuples. And second is preventing transaction ID wraparound, which is called freezing.

Starting point is 00:09:01 And third one, additional one, is recalculation of table statistics. The statistics of data to help the planner to understand which plan to choose based on data stats, right? And this is also like analyze part. Yeah. So that happens if you can additionally do vacuum analyze, can't you? But also auto vacuum does this this so i wasn't sure you're going to bring this up so yeah yeah yeah well i switched us to auto vacuum actually uh implicitly and this is of course this is what auto vacuum can do by the way it's interesting that it sometimes it does just analyze for a table when time comes right But sometimes it chooses to do vacuum analyze. So both actions at once.

Starting point is 00:09:46 It's interesting that both are possible for autovacuum. And autovacuum is great. It allows you to forget about vacuuming. I remember times when autovacuum wasn't present in Postgres. So by default, you were in trouble. If you don't vacuum, what happens? Yeah, that's a really good point, actually. So there might be people listening

Starting point is 00:10:07 who maybe they're relatively new to Postgres or they've started a project and they've not had to worry about vacuum. And that's probably because, for so far at least, auto vacuum has been, you know, it's got certain default settings that probably are a bit conservative in general.

Starting point is 00:10:25 But when you're first getting started, when the tables are small, they're OK. They're fine. They'll keep you. They'll keep this from being a problem for longer than, as you say, if they don't have it. And I think as far as I can tell, unless you're extremely advanced and know exactly what you're doing, you really shouldn't be turning auto vacuum off. So if anybody suggests doing that, that should set off big, big sirens and warning signals at your place of work, I think. Right. Well, my opinion, based on my like, 17 plus experience of working with Postgres in quite large setups with hundreds of 1000s transactions per second, multi-dozens of terabytes of data, and so on. In an LTP context, when a lot of users are present and they need very good performance.

Starting point is 00:11:11 So my opinion is default is not enough for almost everyone. So it should be tuned in the very beginning. For example, only three autowacking workers is not enough for modern servers. Okay, if it's a laptop or some very, very small system, three workers is probably enough. But since every time you produce the tuples, and we just discussed, many transactions produce the tuples, you need cleanup. So it means that cleanup is almost like constant work that Postgres needs to do. So my recommendation is

Starting point is 00:11:48 check how many cores you have and allocate at least 30% of those cores to Wacom. So if you have 12 cores, four workers. If you have, I don't know, like 40, how many? Like 96 cores.

Starting point is 00:12:04 It means that you need to have 30 workers at least if you have a very big server and i'm excited to see that some some settings default settings were recently changed cost limit auto vacuum it's it's very complex like it's. One setting depends on another and so on. But roughly, cost limit and cost delay associated with auto vacuum were very, very, very old defaults. And roughly speaking, you had only 8 maybe bytes per second of reads for all workers that you have, not more. This is like a quota. And it was changed in Postgres maybe 12 or plus or minus one, and it was like, became 10 times more. So now roughly 80 megs per second, but it's not enough for big setups setups with modern disks you probably want like half terabyte or even even bigger quota so you need to tune it further yeah i think there are quite a few good tuning guides

Starting point is 00:13:13 aren't there we could i can link to it i know at least one really good one in terms of giving people advice on what go on i like a couple of articles from, I don't remember who wrote it, but it was on second corner on the blog. When vacuum doesn't vacuum, understanding basics of vacuuming. This is like very great basic material. And I even open it very often when I need to explain something to others. I always mention these couple of posts. So let's link them in in and encourage everyone to read it but i also wanted to say that like if you don't tune it in the very beginning

Starting point is 00:13:53 bloat will come right eventually yes the two risks right transaction idea wraparound and bloat two risks yeah and that like that manifests itself in gradually gradually slower performance or you know like various things will start to degrade at that point this like will start to increase like there's there's a few different ways that that will manifest um and i actually wanted to say like before we move on from auto vacuum the the best advice i ever heard was it's a bit like exercise if you if it hurts you're not doing it enough so it's the if auto vacuum looks like it's getting in the way of things so sometimes i think people see it running and that's blocking other things or it's taking ages things like that if you're in that situation don't turn it off i

Starting point is 00:14:42 think some people get tempted to to turn it off so that the problem goes away. The solution is more make it more aggressive. Opposite. Make it happen more frequently. Exactly. So that seems to be what trips people up around this. Yeah, funny word. When I describe what to do without a vacuum, I also use aggressive.

Starting point is 00:15:02 And sometimes some companies company they have interesting bot in slack that said well you know you shouldn't use word aggressive choose another one and there are options but i agree like auto vacuum tuning is it's two things make it aggressive move faster right so aggressive like what does it mean move faster faster. So give more quota. Don't limit risk discrets and so on. And cost limit, cost delay. This set of settings, there are several settings. So also auto-vacuum settings depend on vacuum settings.

Starting point is 00:15:38 If it's minus one, it means get it from there. So that's why I say it's quite complex. You need to spend some time understanding these like hierarchy of settings but this is one thing another thing is frequency so like speed quarters aggressiveness and frequency frequency is very important because if you allow it move fast but you keep the defaults like 10% of the tuples. It's a lot. 1% below.

Starting point is 00:16:09 For OTP, we need to go down. We need to remove the tuples more often. Yeah, so just for anybody wondering, by default, I think all tables, so basically any table has to grow by 10% in order for it to trigger another autovacuum, which when the table is 100 or 1,000... Sorry, needs to accumulate at least 10% of dead tuples. Great point, yes. Right.

Starting point is 00:16:36 It can grow, for example, a PAND-only situation. It was fixed, by the way, in Postgres 13, maybe, when this analyze part of autovacuum didn't trigger because we don't have dead tuples. And special setting was added of additional logic for append-only situations. But at least 10% of dead tuples by default, in my opinion, should be 1% in OTP. Or maybe not even. Should it definitely be a percentage like i've seen people switch completely to a raw number of tuples instead so right they

Starting point is 00:17:12 don't have this degradation over time as it as it as those relations grow right so yeah it's also not simple there is scale factors also set of settings for analyze, for vacuum part, and also there is a threshold. Threshold is like some absolute number, like 50 by default, as I remember, 50 dead tuples. And there is some formula based on these two settings that gives you a real threshold that when like defining when that vacuum triggers. Well, this static number approach, there are many topics people have different opinions about. For example, I see several groups of people,

Starting point is 00:17:55 and I still see them, that say default autovacuum is a very silly algorithm. For example, it doesn't understand that on weekends, we have much more space for work for autovacuum. Or at nights. It doesn't respect the time of day or the day of week. And we should change it. So they switch off autovacuum and implement their own algorithm and run vacuum. Or they do both.

Starting point is 00:18:23 Both seems like really smart to me right so if you know when your business is quiet and you can afford most of the time to be able to do a manual vacuum analyze it's not that you're doubling the work by keeping auto vacuum on because when auto vacuum then does trigger it has less to do that there's there's less work so i do see the logic to that myself yes yeah also if you do it with your script it means that you run so-called manual of course it's a full automated this case but it's so-called manual vacuum and in recent versions of posgus it has benefit unlike auto vacuum it can process indexes of a table in parallel so multiple processes and it's like parallelization is a great way to improve vacuuming on larger setups it's it's inevitable way i would say and it it

Starting point is 00:19:12 leads us to the topic of partitioning but maybe let's postpone it right right but uh i agree well i don't i i don't like the idea of turning auto vacuum completely and relying on your custom in-house tool. It may be dangerous, but combination sounds good. But if you run vacuum, if you connect to Postgres using psql and run vacuum, it's unthrottled. Oh, throttling is a good way. It's a good way to name quotas differently, right? So auto vacuum is throttled too much by default. But manual vacuum is not throttled at all because vacuum cost delay is zero by default.

Starting point is 00:19:50 It means the throttling is not applied. And my question is, can you put your system down in terms of saturation of disk I.O.? Just doing some vacuum. Maybe in multiple processes, multiple tables at once. I tried on modern hardware, like NVMe disks, I could not do it. Maybe on older hardware it's possible. I saw problems related to performance when vacuum was too aggressive, right? Moving too fast, using too much of our disk IO capacity. But on modern disks, we recently tried it. We tried to prove that

Starting point is 00:20:27 unleash autowacuum is not a good idea, but we failed. We always had room, even if we used almost all CPUs are doing some work. It's interesting. Maybe I'm wrong here. It's an interesting exercise to see where the limit is. That sounds like, that doesn't sound like a failure at all. That sounds like a successful experiment where you learned something really important. Well, there is always bottleneck somewhere, right? If we cannot saturate our disk IO, it means that probably we have some, we spend some time in code, loading CPU more than could be. Like, I don't know, like, it's interesting.

Starting point is 00:21:02 I'm just trying to say that on modern disks, you probably should almost unleash it to move faster. Yeah. Especially if you have a lot of cores. There were a couple more. So I think partitioning actually might be worth discussing briefly. Probably not the depths of it, but I think a lot of people think partitioning is going to help them with performance.

Starting point is 00:21:21 And my experience is different. My experience is the main argument for it is all around maintenance. It's all around being able to delete. I'm getting a big thumbs up here. So yeah, go on. Yes and no. So I agree. Like B3 is great. The height of B3 grows very slow. So if we have 100 gigabytes, if we have 10 terabytes, the difference is not that big. We need just additional buffers to be checked to perform index scan, right? Index only scan, if you talk about it. But the problem is definitely related to maintenance, because if we have 10 terabyte table, a vacuum is taking a day, maybe, depending on your speed and power of disks and CPU and so on. But it's a problem itself.

Starting point is 00:22:11 Not partition table, right? Not partition, of course. I'm trying to understand how partitioning can be connected to vacuuming and how partitioning can help vacuum. Of course, if a table is partitioned, the problem is that auto vacuum cannot process even indexes of a table yet. Maybe it will be implemented in future because regular vacuum can do it. But also, even if it processed the indexes in parallel,

Starting point is 00:22:39 for example, you have 10 indexes and you process using 10 CPUs, 10 processes, right? But heap itself, it's hard to parallelize it. Also, I think it's possible, but it's maybe like it's a complex task. Right now, it's not possible. So if you have a huge table, the process of vacuuming it with all its indexes will be single threaded by auto vacuum. But if you partition it, you can benefit from having many auto vacuum workers

Starting point is 00:23:06 of course if you tuned it from default three to a bigger number yeah you should on larger systems so this greatly improves the speed of vacuuming but not only this of course if your table is partitioned you can re-index and create index or recreate indexes much faster but also the state of cache improves because data is localized much better new data is present in pages where mostly new data or some pages have old data and if these tuples tuples are not changed in in some page it's it's all frozen, all visible. AutoVacuums keeps it, right? It's so good.

Starting point is 00:23:48 And maybe it's even not present in cache. So you have more space in your buffer pool and page cache for newer data. So cache efficiency increases as well, right? And this directly improves performance. But also one topic I recently realized is that indexing and re-indexing, it affects vacuuming. If creation of some index or recreation of some index takes hours, during this period, vacuum cannot delete freshly dead tuples. Tuples which became dead.

Starting point is 00:24:17 Interesting. Tuples which became dead only recently, right? And this is a problem because indexing, re-indexing, they hold xmin horizon even if it's concurrent yes concurrently as well there was an attempt to optimize this in postgres 14 for index and reindex concurrently and it was so great like reindex and index concurrently they they don't hold the xmin tuple so all the tuples which became dead recently, vacuum can clean them. Great. But recently, in June, in Postgres 14.4, this triggered this release, as I understand.

Starting point is 00:24:56 Immediate bug fix needed, and this functionality was reverted in Postgres 14, unfortunately. So the rule of thumb, you don't want index creation to be very long. So you need partitioning for faster vacuuming to run it parallel and to avoid bloat. Because we didn't discuss what bloat is, right? Bloat happens when you accumulate too many tuples, right? And then a vacuum delete them, a lot of them at once, and you have a lot of space, free space inside pages.

Starting point is 00:25:27 Yes. And I think we probably should, because we're on the topic of indexes, talk about, we've talked mostly about table bloat so far. So row versions, dead row versions. But there's also index bloat, which is subtly different in my opinion because you can you've got the same entries but like freeing them up doesn't do as much good as in in the table or isn't there in the heap in the heap they can be reused very easily another row can a row version can be inserted there it doesn't matter what but in a in an index because it's it or in a b3 index for example adding more rows can split pages and vacuum won't unsplit the pages so even if you vacuum as yeah very strongly or

Starting point is 00:26:15 very aggressively vacuum doesn't rebalance it does yeah exactly you won't get it back to the same size as it was at first and therefore you might need to re-index from time to time if you want to remove index bloat as well. Or you can stay on top of it as you can. And there's some good optimizations for this in Postgres 13, I believe. Maybe 14. So yeah, it's just something that people sometimes don't realize. Yeah, we talked about it in both.

Starting point is 00:26:44 In both B3 optimizations, I think they started in Postgres 12 even, or 13. And we discussed it on Postgres TV channel with Anastasia Lubenikova and then with Peter Gagan. It's like so great that these optimizations happen. They're very fundamental,

Starting point is 00:27:01 affecting every Postgres setup. So good. Deduplication and so on i agree but i think blowout sometimes is useful both in heap and indexes like if in heap it can help to have more often heap only tuple updates more optimized updates not without need to touch indexes but in indexes i think it's also useful otherwise why do we have default fill factor indexes. But in indexes, I think it's also useful. Otherwise, why do we have default field factor indexes 90, not 100? I'm not an expert in indexes. We have other experts like Peter Gaganura and Andrei Borodin, for example. By the way, indexes is a great set of topics. We should talk about them at some point but related to vacuum i think you're

Starting point is 00:27:47 right over time indexes health degrades anyway even in like in modern versions latest versions of postgres it's much better situations improved blood growth decreased but still we need to perform so-called index maintenance and rebuild indexes from time to time. Even if our after vacuum is well-tuned and we have the latest version, we still need to rebuild indexes. And when we rebuild indexes, we want to be very fast because of XminHorizon, because we affect all tables, by the way. So if we rebuild one index and if it takes many hours, all indexes in our database are affected. So vacuum cannot clean tuples in all tables, all indexes, freshly dead tuples, and entries to that tuple. So it's a huge global problem.

Starting point is 00:28:36 So you want indexing to be fast. Yeah, super interesting. I had one last thing that I thought we couldn't finish this episode without talking about briefly. And even if it's just a public service announcement, there is a parameter for vacuum. I think it should really be called a different thing. But it's it's called vacuum full and you probably never want it. Like I could imagine people very happily using Postgres for decades and never using it. But I just wanted to say because I think it can trip people up thinking well like the name suggests that it's going to do a very comprehensive

Starting point is 00:29:10 version of vacuum but because of the locks it takes it's it's i use it all the time i use it all the time go on what for really are you joking well so i use it all the time because it's related to how we estimate how we see how much bloat we have. You know that bloat estimation is not a simple task. Scripts, everyone has various versions of scripts. They are lightweight, but they can be very wrong. For example, I can show you how I can... The script says we have 40% of bloat, but bloat is zero because the table is just created.

Starting point is 00:29:44 It's easy to do. It's related to alignment padding. So if you want to understand real bloat numbers, there is a PGStat tuple extension. But I had issues with using it some time ago. So since we are big fans of real experimenting and using clones and thin clones, thick clones, doesn't matter in this case. What do we do? We just create clone and promote it to primary. So it's real clone, writable.

Starting point is 00:30:13 And we vacuum full whole database or specific tables. And we compare numbers before and after size. Right. And we can say 100% we know that blow test is this. Current blow test is this. Gotcha. And also, presumably, we're talking about test environments here. We're not talking about...

Starting point is 00:30:30 Of course, detached instance. It's not test because it's production data, but it's closer to production to get a real measurement of vacuum. This is a very simple, straightforward, brute force approach. Yes. I agree. I was thinking purely for production use case, but you're 100% right, as usual.

Starting point is 00:30:51 Is there anything else you wanted to make sure we talked about today? No, let's make some summary, maybe recommendations. Understand MVCC, of course, right? It's like, read about it, understand. There are many good materials around. Then tune auto-vacuum. Then there is a separate topic of monitoring, maybe next time. Also, recreate indexes from time to time, probably in an automated fashion. It's not a simple task, also worth a separate

Starting point is 00:31:19 discussion. And maybe use some tools like pgrepeg or pgsqueeze to deal with bloat in tables themselves, right? What else? Ah, partitioning. Yeah. There's a rule of thumb. It's like an empiric rule, not based on some logic, but it's based on the experience of many people. If the table grows or has chances to grow over 100 gigs, it should be partitioned. Yeah, I like this.

Starting point is 00:31:46 And I think you had one more rule, didn't you? So if you're thinking about 100 gigabytes, think about partitioning. If you're thinking about a terabyte, think about sharding. Was that your rule? Sharding. Yeah. Yeah, it was a discussion on Twitter. It's not my rule.

Starting point is 00:32:02 It's like started in different places. And then... Sounds good. It's easy to remember. Good. Wonderful. Well, thank you. Thanks again to everybody who's been sharing this on Twitter and wherever else.

Starting point is 00:32:15 We really appreciate it. And yeah, looking forward to talking to you again next week. Thank you. Bye. Cheers. Bye.

Postgres FM - Vacuum

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.