Postgres FM - autovacuum

Starting point is 00:00:04 Everyone who is running Posgars has auto vacuum, and most of people heard about auto vacuum and vacuuming and blow out and this garbage collection. But a lot of databases grow with problems being unnoticed. And they start tuning auto vacuum too late. So this topic today is brought by you, Michael, right? Hi, Michael. Hi, Nick. and we definitely had this topic discussed in the past in terms of like vacuum and blood but we never discussed auto vacuum itself and how to tune it properly and

Starting point is 00:00:43 what's our best practices and yeah I'm glad that you brought it let's talk about this sounds good yeah I'm looking forward to hearing your views on this kind of from a seeing lots and lots of different clients perspective because I get the impression dealing with people less frequently, that often it's left really late before people start tuning water vacuum. And when I say late, a startup will grow to relatively significant size before touching those parameters. And I often think they could have avoided quite a few headaches and got into issues slower, if that makes sense, by just changing a few settings much earlier. So yeah, I was keen to get your thoughts on if you see the same and also like when would you advise people like start up life's busy.

Starting point is 00:01:35 When would you advise people looking at this kind of thing? Yeah. First of all, every single client which comes to us, we talk about auto vacuum tuning on any platform. Doesn't matter. Self-managed, any managed service, we always touch this topic and always like in 100% of cases there is something that should be done there. which means your point is absolutely right. It should be done earlier because when people come to us for help,

Starting point is 00:02:03 it's already, they experience already some problems, some troubles, right? And this is a reactive approach. And just to clarify, when you say 100%, do you mean some percentage have done loads of tuning

Starting point is 00:02:17 and still have issues and some percentage have done nothing at all? What's the kind of distribution like? It's hard to say. I actually checked how many clusters we observed lately, it was like 140 or something where we used thorough, like comprehensive health analysis and tuning and recommendations and worked closely and so on, right? So I'd say zero percent have ideal picture in terms of how auto vacuum is tuned. Most customers running on

Starting point is 00:02:46 managed service, especially RDS, have, let me say it, half S tuned auto vacuum. Okay. This is a new term. can use here because this is like I just amazes me why it's so we can dive into details soon and only few like literally that's a several I saw they have very well tuned auto vacuum and we just shift focus to why it's still not enough and what we should do with specific pieces of workload sure makes sense local tuning already or what to do about it and in most cases there we also involve we focus on specific workload, but also we go, of course, beyond just auto vacuum tuning, because usually this is like some pathological workload or some, there are some problems which are, like,

Starting point is 00:03:37 not conflicting, how to say, like, they are blocking auto vacuum work, right? Yes. Before we dive into details, maybe let's just explain to a wider audience what auto vacuum is, because I think many still misunderstand it. recently I talk to, you know, like our focus is fast-growing startups, and I talk to technical founders quite often. These guys are super smart. Technically, they understand, they definitely understand throughput latency, all the numbers and so on. But I see, they just didn't have time to dive into auto-vacuum what it is. The third blowout issue, MVCC, Postgres is very widely

Starting point is 00:04:20 criticized, right? But what auto-vacium is and what to do that? do about it, what they need to have. It's like it's always a good topic to discuss and dive into. So my simple explanation, Auto Vacuum is just garbage collection, first of all, right? It has a few more jobs, but the main job, original job is garbage collection. When you have updates, deletes, or failed inserts, by the way, the third part is usually forgotten, but failed inserts all of them produce successful updates,

Starting point is 00:04:54 successful updates and failed inserts rolled back inserts they all produce that raw versions called tuples or tuples right

Starting point is 00:05:05 and probably also rolled back updates oh yes you are right yeah successful deletes

Starting point is 00:05:14 rolled back inserts or any updates or any updates yeah besides hot updates, nuances, right? Even hot updates, yeah. If we want to be thorough, like we are destroying my intention to provide simple explanation.

Starting point is 00:05:31 Let's keep it short. So it's first of all garbage collections. So if, especially for guys who run postgios on machines with a lot of cores, like typical like Intel machine, 96 cores or 48 or like like AMD hundreds of course, right? We have very, we have very different definition of what typical is for a service. Well, typical big, large database. Sure, sure. So it's, it amazes me like I saw, I told you, I saw post-gust clusters achieving 10, 20

Starting point is 00:06:07 terabytes with default auto-vacom settings, somehow surviving, already having five replicas. It's insane. This is what I meant in terms of like, should people doing it be doing it early? or is it okay? There's a certain argument. No. I think I agree, but as you say,

Starting point is 00:06:25 they've somehow got by even into that large extent. But I've also seen smaller systems really struggling because auto vacuum hasn't been tuned. So far fewer, like maybe not even a terabyte, but struggling because of a certain workload.

Starting point is 00:06:40 So yeah. Yeah, it's not just size, right? It can be a table two gigabytes in size, but it's mostly bloated like 99.9%. I saw it many times, especially with Q-like workloads. And then it can suffer small database, but it suffers from the fact that auto vacuum is not tuned and maybe something is blocking it. Garbage collection job is blocking it. So we need to combine both things here. But my point is,

Starting point is 00:07:04 auto vacuum is just garbage collection. If you tune post-gous, or maybe your provider tuned, for example, to have bigger shared buffers, bigger cache, right? And workman for sort and join operations and so on. if it's already tuned and max connections has increased maybe puller is in front of it like you tuned it to handle workload why the heck garbage collection remains untuned it should be tuned together with everything else right so it should be but we see it's often lagging in tuning right and it can be a local workload on small database already suffering or it can be big cluster and garbage collection mostly default auto-wark

Starting point is 00:07:49 is mostly default. But that being said, we should mention that there are a few more jobs AutoVacom has. Yeah. Maintain statistics? Yeah. Do you want to stick to the vacuum ones first? Because I think the name auto vacuum is quite a good name in that it's all of the jobs that vacuum does but automated. But with one addition, which is the, it's also an auto-analyze feature, which is what you're just about to talk about in terms of statistics. But I feel like the other vacuum jobs are worth. Do you think of it as garbage collection, statistics, and then the other vacuum jobs? It's just easier to memorize. Statistics also should be maintained up to date. If it's lagging, it may affect performance, might affect performance. But you're right. There are also a couple of

Starting point is 00:08:35 more jobs. Transaction ID wrap around. If you have a lot of transactions per day, we know there are sometimes like billion per day. It's a lot. In this case, Churn is. high and integer four is not enough half of integer four is not enough transaction idea up around prevention should be very frequent in each tapple in each row version right and also there is another very important job especially for performance it's maintaining visibility maps it's a bitmap with two bits for each page pages eight kibbytes right and two bits is it whole page fully visible all means all tuples are visible to all transactions. And second is all frozen. Yeah. Yeah, just previous topic. Transaction idea wrap around prevention. So not to visit this page.

Starting point is 00:09:30 Again, if we know that it's all the tuples are in the past, they're already marked frozen. So they're definitely in the past, even if they, they're X mean X max look like they are in the future. This is freezing. It marks like all frozen bit means like it's a whole page in the past. The next vacuum will just skip it. That's optimization as well, right? So for more frequent vacuuming, not to do the same job once again, right? I want to come back to that, but before we do, the all visible is really important for index only scan performance, right? So if a, if an index only scan can know that every single tuple on that page is visible to all transactions, then it can serve the result it finds in the index because it knows it can't

Starting point is 00:10:22 have changed. So that's a really neat optimization without having to do what it calls a heap fetch. So it like fetching the data from the table instead of from the index. So if you ever, if you see an index only scan that ends up with a lot of heap fetches, that's because. Degris couldn't. Yeah, it might be that information was in the index. But because it post-cras didn't know for sure.

Starting point is 00:10:45 that it hadn't changed, it still went and got it from the heap. So it might be exactly the same information that it already had from the index, but there's this inefficiency of vacuum hasn't run recently enough. And it only takes, because it's that bit of zero or one, it only takes one tuple on the page to have changed for any of the tuples on that page to lose out on index only scans. I think if information is not in the index, you mean if we select columns which are not part of index definition,

Starting point is 00:11:12 it will be indexed scan immediately, not index only scan. Of course. And the index on the scan already means that information is present. Information means like values. But what is never is present in indexes, it's visibility information, the 6 Minix Max. So for that we need heap. But if visibility map, bit says for this page it's all visible, this can be skipped. So if the purchase number goes lower, ideally it should be zero, then it's perfect situation.

Starting point is 00:11:43 It's true index only scan. If it's high, it approaches the performance of index scan, which is like always consulting HIP. Naming should be better here, but my position on naming. Index scan could be renamed to something which would be explicitly mentioning that HIPP is always consulted. Anyway, this is a different topic. I think we covered quite well what AutoVacom does. And again, the main job is just garbage collection. Also, I wanted to mention autovacom is improving a lot from release to release.

Starting point is 00:12:21 And it's enabled by default, of course. It's enabled. It should be disabled only by experts. For example, PGQ, Skype engineers decided to disable AutoVacium for Q partitions. And it's also disabled in PGQ tool which is built on top of PGQ. Name is awful, right? So anyway, there auto-vacum is disabled by it. Normally, it shouldn't be, it should be never disabled and it's enabled by default and it's done, drop and so on. So it's just a garbage collection

Starting point is 00:12:54 plus additional features and it's improving from release to release. And funny fact, I recently purchased some Chinese modern auto-vacom for home with washing feature. It's very similar. like advancement like I had a few years ago I had a couple of them and they're now like so many features so it became much smarter and they have multiple functions so I enjoy how it works really but it requires some tuning as well yeah I was going to say Bruce Bruce brought this up in our episode with him didn't he said people complain about vacuum and bloat with Postgres and NBCC but he's saying which version are you complaining about because it is continually getting better but I would say you that some of the defaults are still very conservative or maybe make sense at smaller table sizes. And they do have scale factors built in, but they make less and less sense, I think, as tables get bigger some of them. So my position on defaults, they are all Bias defaults.

Starting point is 00:14:01 They are like for, I call it post-podgrass, podgrass, kitchen kettle, Postgres, which should have, I don't know, like, one gigabyte memory and some Raspberry Pi or something, SmartPot or something, this is it. For them, it's okay. But for any, even like small cluster with 8 cores, 16 gig of memory, it's not enough at all. And settings can be split to three categories for simplicity. Number of workers, which requires restart. The only thing which requires restarted number is number of workers. which is three by default. If you have eight cores, three maybe is okay. It's okay.

Starting point is 00:14:42 If you have 32, 64, 48, 96 course, three is not enough. But we see it all the time. People come to us, 96 course, three workers. This is one thing. You say it's not enough. I think you're right in the sense it's not a sensible default. Like why not give it more in case it needs it? Right.

Starting point is 00:15:09 But as you said, sometimes people come to you and they've only had three workers the whole time and somehow it's not fallen over. It's grinding through challenges. Like super bloated tables, super bloated indexes and it's a lot of dirt to clean. Which is great for us. We have opportunity to help them. Instead of framing how bad it is, I actually have it. It's actually our product strategy.

Starting point is 00:15:37 We frame it, oh, great, we can help you really quickly, really quickly. Right away, we have long-hanging fruits. Let's go. With our tooling and methodologies, it's super simple. This low-hanging fruit, let's go, and your database will feel and breathe quite soon, much better. You can even consider downgrading or postponing upgrade for a couple of years after this. Yeah. And this is great.

Starting point is 00:16:02 Yeah, this is one thing. And this is super important thing. We just discussed. This is how garbage collection works and updates the leads and sometimes inserts. They do just part of the job. Right? And these poor auto vacuum workers need to do the rest of the job. The tidying up after it.

Starting point is 00:16:23 Yeah. You have 1,000 max connections, meaning you have up to 1,000 backend workers which produce all those datapels. and only three that should catch up with them. It's nonsense, right? It feels as nonsense. So it should be tuned. And simple rule, like we say, like, actually, let's consider like 25% of VCPU account. But we should do it very properly thinking about memory management and so on.

Starting point is 00:16:54 I know there is a position in this view, but this is, it means my position. It's a simple case. Just let's give more workers, especially if you have partitioning. because you think funny fun fact a worker processing a table if it's a huge table it will be processing it like sequentially or table and its indexes and that's it if it's a partition table it can be distributed each partition can be you can parallelize this right so it makes sense to have more workers if you have more partitions and it's great to have more partitions and more workers together this is the best situation and two more categories one different defines frequency of revisiting frequency, how frequent workers come to do stuff. And another is it defines how fast they can, like throughput, basically, or throttling. I think throttling is the best way of thinking about it. Throttling, yeah, limiting.

Starting point is 00:17:51 It's cost limit and cost delay. And how often they visit defines by scale factors and threshold, but I'm okay to think about only scale factors, actually. at the first glance it's enough. So three, we have three scale factors today, right? Or maybe there is also in Postgis 18, there is auto vacuum vacuum max threshold. This is a new stuff in postgis 18, which changes the picture. So usually we talked about only scale factor and threshold.

Starting point is 00:18:23 Scale factor by default, three scale factors. But the basic one is auto vacuum, vacuum scale factor, right? And it is either 10 or 20%. So I think 20%, right? Yeah. So I suppose because the threshold by default is 0.2, right? So 20%. Let's see you've got a billion rows in the table. It doesn't wait to get to 200 million. It starts at 100 million. Yeah. So we don't allow more than 100 million, which is huge now. Maybe huge, maybe no. 100 million debt doubles in the table. It's quite a lot, though. Yeah. Yeah. So the basic is auto vacuum vacuum scale factor.

Starting point is 00:19:05 Yeah. This is the first thing. And it's 20% by default, which means you need to accumulate 20% of dead apples in the table before it starts garbage collection. When I just say this, I had recent experiences. When I said this to some startup CTO, which is growing and so on, he's like, oh, we don't have garbage collection before 20% of dirt accumulated. this is a very big of of course when things are small it's not a big deal

Starting point is 00:19:35 but when things get big this is just a huge number and it's not just let's say you have like almost terabyte table in terms of size billion rows and partitioned which is common

Starting point is 00:19:50 I see this 20% means 200 million rows that doubles right which is by the way, above that threshold introduced in post-guist 18. Yeah, but it's not just storage, right?

Starting point is 00:20:07 It's also polluting caches and indexed. It's everything. Oh, yes. Oh, yes. This is a big point. And actually, let me add one important notice here. Usually, it's not a big deal for smaller databases, for smaller tables. But in some cases, we see local.

Starting point is 00:20:26 So if, for example, one row updated many times, thousands of times, for one logical row, you have many versions. And globally, like, table is big. But for this, like, not lucky row, you have so many versions and they can be, might be, scattered among many pages. So they are sitting there as datables. And for this specific query, select where ID equals something. This means that all of them need to be fetched to be checked in terms of visibility for this transaction. And it's, it degrades for particular ID, like local degradation.

Starting point is 00:21:10 Most of queries are fine, but this, like, unlucky because it had a lot of updates or something, right? It became really degraded. And it's like, what's happening here? This is where P99s or something are good to have, because. average will hide it, average latency and so on. And this means you should be vacuuming more frequently, even on smaller tables, right? So this local degradation due to debt double for particular logical rows, debt double count is high.

Starting point is 00:21:44 So it's happening. And it's happening silently. That's a problem. And tooling, which is like lacking P99s and so on, is not helping us to understand. That's why still logging of slow queries helps and auto-explan helps to catch these like edge cases and start analyzing why it's happening. The next day you select the same row. Maybe it's already vacuumed and degradation has gone. But in that moment, if you like go back in time and select it, you see so many buffers this particular query.

Starting point is 00:22:21 it's very like its primary kill woke up, but too many buffers somehow. That's why. So it's important to feel it because it's really like subtle problem. It's hard to catch it if you don't know where to look at. Yeah. And you're right. So scale factors. And since PostGar 13, there is also auto-vacuum, vacuum insert scale factor.

Starting point is 00:22:40 And there is also scale factor for analyzing. For insert, it's really great. I remember Darafay, the guy who from Belarus, I think, he introduced it. It was so obvious. Why didn't we have it? We insert a lot. We double table in size, but vacuum doesn't come. And we need vacuum, not to clean up that apples after successful insert,

Starting point is 00:23:00 but to rebuild visibility maps, first of all. And it's great because if you have select count or something, you want index only scan to be with low hip fetches, as you mentioned. So anyway, there are these scale factors, and it's enough at first glance to look only at them. Yes, like you can go deeper and think about fresh, and how like additional tuning. But scale factors is already enough.

Starting point is 00:23:24 You want to, you want them to be really low in all TP workloads. Just one percent. Sometimes two percent, sometimes half percent. I see people choose. But one percent is simple rule. Set it to one percent for all scale factors and benefit. And to give people an idea of how drastic a change that is, these scale factors are 20 percent, the vacuum threshold,

Starting point is 00:23:48 10 percent for the analyzed threshold. And I think 20% also for the insert scale factor. Yes. So dropping from 20 to 1 might feel like a huge change. It is quite a huge change. But you're not repeating the same work, right? Like it's not, there is a tiny bit of repetition. Yeah.

Starting point is 00:24:06 It's this is a slight difference to analyze, isn't it? In that if you run vacuum twice in quick succession on the same table, it's not doing all of the same work again. It's done a lot of the cleanup. It's like tidings. Some work can be done again. For example, it's very nuanced here. It's super interesting.

Starting point is 00:24:26 Like you mentioned, it's not only about data storage. It's polluting memory, wall. Wall means replication and backups. It's polluting everything. And memory is expensive these days, right? So we should optimize, not about data storage, we should think about how many pages we need to keep in this working set. But the ideal situation, when you raise workers, make a vacuum to visit us frequently,

Starting point is 00:24:57 with lowering scale factors, all of them to say 1% is a very rough rule here, but also partition quite well. Because in this case, some partitions, especially if it's historic data, they will be processed with marked all visible, all frozen, and very rarely those pages will be touched. So vacuuming will be like a. like skipping, right? Unlike if it's a huge, messy heap with terabyte in size or so, and any page can be touched at any given time. It means it will be need to process again. Of course, it means that if you have infrequent vacuuming of a huge table, of course,

Starting point is 00:25:39 you're doing work less because maybe this page was visited multiple times before it was vacuumed. And if you start vacuum frequently and between two processing, this page is revisited by some rates, then it needs to be vacuumed again. So it's multiple processing of the same page versus a single processing. So we cannot say it will be, but it's highly optimized. Vacuum is highly optimized. But if you have a huge table, not partition, maybe it will be like high frequency vacuuming will cost you some extra, but it's worth it usually. So there are nuances here and trade-offs, of course.

Starting point is 00:26:21 But usually in all TP, we very strongly recommend to lower down scale factors so vacuum happens more frequently. And the fact is that if you own managed service, I see it's quite common, this is a big problem, RDS included, for example. They don't care about this somehow and they keep defaults. Yeah. I would like to hear their reasoning. why they don't proactively tune it for the customers. Because these scale factors makes, because they're a scale factor, right, they can make sense at any scale, right?

Starting point is 00:26:57 Like, it's fine when you're small. These are inexpensive things to run on small tables. And it works when it's big. Like, there's no reason not to tune it from the start for all customers, right? You don't have to be clever about which instant size. And I don't complain because, as I said, I enjoy when I see the problem. I'm excited to help and have long-hanging fruits and see how health next day is much better.

Starting point is 00:27:22 But of course, we need also automated re-indexing. We need PG-Repack to be used and so on. By the way, we mentioned that PostGGG19 will have. Yeah. Repack and repack concurrently. That's awesome. And we should somehow revisit this topic, maybe inviting the guy who made it, right? Yeah, I think it was

Starting point is 00:27:47 Yeah, Antonin was, we had him on, didn't we, to talk about PG squeeze Which inspired the work Remember, in the very end of episode, I learned about these plans to make vacuum full concurrently, basically, which is like became repack concurrently. I couldn't believe, I had like doubts, is it like, but here we go, like this is happening a year later, it's planned to be released.

Starting point is 00:28:10 I hope it won't be reverted. It's a super important feature to have. It's interesting. I have to admit when I, yeah, I saw that repack, they introduced effectively a rename of vacuum fall to repack and that was going to make 19, but it didn't have a concurrently option. And I was like, oh, that's such a shame, but it makes sense. Another year of waiting. Yeah.

Starting point is 00:28:31 And then about a week later, repack concurrently gets committed as well. So, yeah, that was great. Yeah, it was like, it was a mood swing for me as well. It was like, oh, no. Oh. Yeah. So anyway, back to scale factor. This is probably one of the most important to lower scale factors for all TPP workloads

Starting point is 00:28:52 and benefit from more frequent vacuuming and analyzing as well, right? Just a question, is there any stage that people are at where you wouldn't recommend this? I was thinking when should people be thinking about these things, but I think those two are such simple, or those three, let's say, are such simple settings to change. You could change them from the start and never think about it again. like clearly i was thinking this is complex and and this is complex thing in terms of we cannot talk about auto vacuum tuning completely without analyzing what's blocking work if anything yeah yes and that's why i'm like we recently developed a dashboard for xmin horizon analysis because

Starting point is 00:29:35 we yeah we probably need to revisit it once again in terms of long transactions versus pure xmine horizon blocker analysis and auto vacuum throughput analysis. And again, like this, I cannot recommend enough Lawrence Albers recent blockpost about how he monitors auto vacuum. Combining with what how we monitor auto vacuum, I think it's much like it's we leveled up monitoring of auto vacuum blockers, basically, right? So X-Men Horizon, there are five reasons why it can be blocked. And if you keep those problems unresolved for our, sometimes dozens of hours, tuning auto vacuum to run more frequently, especially if you reduce also nap time, how long it sleeps between runs and raised workers.

Starting point is 00:30:25 And also, like some problems might happen. For example, it might start consuming too much resources, just trying to do something. It cannot do something. And also logging can be, you can have observer effect. If there is a, like, it tries to do something, it cannot, but it reports. and many workers report. And if you, at extreme, you have log to vacuum in duration set to zero, meaning all attempts are logged.

Starting point is 00:30:51 In this case, if you have 25 workers, for example, and they cannot do, and they nap time one second, it's a storm of logs. This is one problem I can definitely imagine. You mean one millisecond? No, by default nap time is 30 seconds. What am I thinking of that's two milliseconds by default? Oh, actually, I was wrong. I'm looking at it right.

Starting point is 00:31:12 now it's 60, one minute. Nap time is by default one minute, which is quite long in my opinion. So we usually say, let's choose a few seconds. But I admit aggressive tuning without resolution of X-Men Horizon blockers might lead to storm of logging and resource utilization as well. So this is what I can think of from top of my head. Yeah, so sorry, I was thinking of cost, there's a cost delay. Right. This is a, the last thing we haven't touched. So as I said, there are like throughput in terms of number of workers. There is frequency, how often vacuum and analyzing happens. And also the remaining piece, big piece, is throttling. You said we should say throttling, right? So two parameters. And in post-guess,

Starting point is 00:31:59 who is responsible for naming? So there is an attempt to save on some defaults or something, there is some parameters, they have relationships. You know, inheritance, right? So there are two parameters, cost delay and cost limit. There are vacuum cost delay, vacuum cost limit, this pair. And by default, I think cost delay is zero means it's unthrottled. If you run vacuum manually, it's unthrottled. It tries to go full speed, usually bounded by I.O.

Starting point is 00:32:31 Disc. Right. And there is also auto vacuum, vacuum, vacuum cost delay, auto vacuum, vacuum, cost limit, two parameters, one of which I think auto vacuum, vacuum, vacuum, vacuum, cost limit is set to minus 1, which means it inherits from vacuum cost limit, which is set to two milliseconds, right? It was 200 milliseconds and then it was or 20 milliseconds in the past before post-guess. Yeah, the delay is, I've just looked it up there, auto-vacuum vacuum cost delay is two milliseconds. Yeah, I can confuse everything. I see. So like auto-vacom, vacuum cost delay is two

Starting point is 00:33:08 with the seconds. And it was before Postgrease 12, it was 30. So became two in Postgreaves 12, which is great. And auto vacuum, vacuum cost limit is minus one. Means that it inherits from vacuum cost limit, which is by default, 200. Which is like an arbitrary unit of like work, right? And there's some cost. Yeah. Yeah. Are your CPU cycles? There is some math. what I don't like about acquisition of EDB of second quadrant by EDB, they just killed the block. Block was brilliant.

Starting point is 00:33:47 And they had a great block post about some math. If you think only about IO part, all defaults before PostGGOS 12 yielded to 8 maybe bytes per second of reeds. So with 200 and 20 milliseconds, it was roughly 8 maybe bytes of reeds. So bumping 20 milliseconds to 2 milliseconds to 2 milliseconds.

Starting point is 00:34:08 I remember it was first considered to raise cost limit, but then decided to reduce frequency of analyzing of this frotting, right, because it makes things smoother, right? So going down from 20 milliseconds to 2 milliseconds and maintaining cost limit 200 inherited from vacuum cost limit. I don't like this. Just put straightforward defaults. It doesn't hurt. Minus 1 makes life of humans much harder, much harder, right?

Starting point is 00:34:39 Like it's, okay, anyway, maybe I should just propose it. Maybe it wasn't proposed. Maybe it was proposed. I don't know. My battle over dozens of, over decades already. This is happening decades, this suffering from memorizing what minus 1 means. We forgot actually memory management, fourth piece, right? There is minus 1 there as well.

Starting point is 00:35:01 Autovacian work mem is minus 1 by default. which means you need to go and consult and there is a big, interesting trick happened in PostGuS17, in the separate story. That's maintenance work member, right? Yeah, maintenance work memory. If you have time, we can talk about that as well. Dive to it a little bit.

Starting point is 00:35:18 So minus one for auto-vacom vacuum cost limit, inherited 200 this is like arbitrary cost, not arbitrary actually, but some costs like two milliseconds. Since PostGovies 12 meaning in all current versions, it means by default that can just take reads of 80 maybe bytes per second roughly

Starting point is 00:35:38 which is not enough in modern systems and it's shared among all workers unless you start tuning at table level which makes things much more complex because then quotas with throttling becomes local for this table for this worker which is processing

Starting point is 00:35:54 specific table I prefer not to go there we can have a separate episode for table level analysis tuning table level tuning. So big picture is like by default it has only 80 maybe bytes per second of reads and if you have powerful even EBS volume which I don't know it's a nitro old not nitro architecture so it definitely in VME like it has good throughput 80 is tiny among all workers

Starting point is 00:36:27 again this is garbage collection for everything so it should have several hundred megabes per second at least but good thing is for example ardeus fixes this especially if on aurora we see enormous quota throttling there is quite like good in terms of like it has a lot of power but in many other cases we see it needs attention and we need to give more like capacity because it otherwise it's too slow and that's not good So it never catches up, if that makes sense. It never, yeah. Yeah, it's a kind of saturation.

Starting point is 00:37:09 You see all the workers you allocated. They're always busy. This is number one, how you can understand that something is not okay. And second, we have a special report. We have a queries. I have a blog post about that. And we also have it in our monitoring, like queue or line of tables, which sit there and should be already processed, but not processed yet.

Starting point is 00:37:30 and this like basically weight line for tables and I think Lawrence also has in his blog post a slightly different analysis but the idea is the same like how bad is it the picture like it should be done already but not yet done this is how you also feel this situation happening from the other side of things

Starting point is 00:37:52 I saw a good post recently by Jeremy Schneider about what the risk of completely a moving, throttling. So he wrote a good post recently about an issue they encountered once. I think they reduced the... Yeah. I haven't seen the...

Starting point is 00:38:10 Yeah, sometimes people put to Kron. We haven't touched this specific cases related to multi-exact member situation when people have this approaching and they need to vacuum separately. Switching off indexes because index vacuuming, because for that problem you don't need... And this is actually another job to clean up those things. you don't need indexes to be involved

Starting point is 00:38:32 and this slows down the whole thing if you have many indexes so anyway I wonder I need to look at that but and when you do

Starting point is 00:38:41 manual vacuuming it's unthrotled as I said because vacuum cost delay is zero so it's not checking at all let's go but I think that's

Starting point is 00:38:50 isn't that because every now and again you want to run a manual vacuum as like an emergency procedure like yeah makes sense I agree but if it's throttled

Starting point is 00:39:00 and it's unthrotled actually this is super interesting for me in the past I had problems with unthrotled vacuum but it was when we had some enterprise disks and self-hosted postgust and our capacity was like 400 megabytes per second I also I'm very curious right now in postgis 18 with aIO and so on like how a single worker might be have remember I told you I was super impressed to see the PG base backup, which is single-threaded, and I expected 100 megabets per second of reads from it. Suddenly, it amazed me approaching like gigabyte per second of reads. In post-gast 18. So maybe vacuum also, I'm not sure. Like this I haven't visited. Yeah, I think it was me that suggested it might be AIO-related. Yeah, but I was, let me correct

Starting point is 00:39:51 myself. So there is fourth piece, memory is super important. And if we don't think about it, it might bite badly, especially if we raise workers. So all our clients, we say, okay, we raised workers, but jumping from, say, post-gust 15 to 17, you must limit memory for auto-vacumen advanced, because it's by default minus one means it takes the value from maintenance workmen. And maintenance workmen, people usually raise quite quickly thinking they will have much faster index creation. So they put two gigabytes or so. Now I would think like 25 workers, two gigableness. it's already a lot, right? And the problem is, like, before PostGar 17,

Starting point is 00:40:31 you might not notice it because auto-vacom workers suddenly limit themselves with one gigabyte, and then suddenly jump to no this limit anymore. No one-gigabyte limit anymore for auto-vacom workers. And if you have four gigabytes, I saw eight gigabytes like these things, right? And many workers, it's huge memory consumption. Or risk, like, not necessarily, but there's the risk that they could take. take up that much, right? It's not pre-allocated or anything. It's just that's the budget they have. It can influence, and it can increase risks off out of memory. Yeah, yeah, yeah. A killer.

Starting point is 00:41:08 So, it's not, but then you can set, yeah, great. But then you can set the auto vacuum at work mem. I've forgotten the exact name of it. Yeah, like half gigabyte. There's some math should be done there. And we usually like, we have it like some formulas. So usually half a gigabody. by it even maybe lower and it will be okay, right? It's interesting a topic. But yeah, so usually for maintenance work where, again, I wish like this minus one is killing me, like decayed already. Like why we don't put explicit value right away?

Starting point is 00:41:45 Why this like I need to go in some maze and remember it took me many years to settle in terms of this like relationship between vacuum. and auto-vacom settings. And for non-expert, who is not working with post-gess settings every day, it's impossible to memorize. It's super tricky. So you would have hard-coded

Starting point is 00:42:08 values for each, even if they start with the same value? Doesn't hurt. Yeah, so if we have maintenance workmen by default, I don't know, what is, like, 128 megabytes or so, like maybe more. Just copy-paste. This is not the place where we should save on copy-pasting. We shouldn't save, like,

Starting point is 00:42:25 few bytes. Minus one, it's two bytes, right? 120. It's like, okay, a few bytes more. This is not a place where we should save it, right? This is, this is complexity, extra complexity, which is like, okay, I need to redirect this frustration to proper replace. Yeah. By the way, it's 64 megabytes default maintenance work, man. Podgrass, yeah, podgrass. Yeah. All right. Awesome. And actually, the last thing I wanted to say, so as a little

Starting point is 00:42:55 summary, if people have less than eight VCPUs, maybe leave, you can probably leave the workers in place, but still change the thresholds. And it's just those three thresholds will get you like 90% of the way there, probably. I think most of reduce, you said down to 1% I think that makes a lot of sense. I've generally advised

Starting point is 00:43:17 people to 5% or 2% as like a starting point, but 1% makes a lot of sense. It's a bit more, but I think it's... It won't save you from, in large tables, it won't save you from this problem of local degradation for particular rows. It's just like maybe you need to think about it additionally. But overall, it's like good, very rough tuning for all the people.

Starting point is 00:43:40 Yeah, absolutely. I think that will get a lot of people a lot further. Most providers let you set that globally. I did come across one recently that only lets you change these things at a table level, at which point my advice would be focus on you and, important tables and just change a few of those. But that was only one provider. Or just migrate.

Starting point is 00:44:01 Or migrate, yeah. That small task of migrating database. But yeah, so that sounds wise. And then yeah, memory, tuning, make sure it's not using maintenance work. Memory if you've up to that. Yeah, there are other parameters,

Starting point is 00:44:17 but it's like already deeper things. Yeah, so. Sounds good. My simple rough tuning list raise a number of workers. This requires restart, unfortunately. Then make visits more frequent, decreasing scale factors to 1%. For example, roughly.

Starting point is 00:44:37 And give more power unless it's already done by provider, raising, for example, raising cost limit. Oh, yeah, cost limit, right? And default settings, as I said, PostGar's default settings because not provider's default settings. It's roughly 80, maybe bytes per second, for discreet's think how much you want like maybe just raise it maybe sometimes four or five times depending on this capacity it's very important to understand that and finally work ma'am set it explicitly

Starting point is 00:45:08 minus one sucks in all senses it's just very hard to deal with i like that i like saying it explicitly even if you plan to keep it the same as your maintenance workmen yes like free people before upgrading to 17 for example or before touching number of auto vacuum workers. Yeah. Just make it explicit. So to simplify math analysis, including with AI,

Starting point is 00:45:30 so it won't jump additional, like mind hope. And just avoiding that future change where somebody wants to increase maintenance work, then doesn't think at that moment about auto vacuum. Because these are two separate things.

Starting point is 00:45:45 Operational, they are very separate. Indexing and vacuuming, auto vacuuming. Yeah. Yeah. And that's it. These four areas for rough tuning, it's enough. It should

Starting point is 00:45:56 become a blood situation much better, but there are also blockers. If you can tune a lot, tune everything perfectly, but if you have some X-Men Horizon blocker, and we have an episode about X-Men Horizon,

Starting point is 00:46:11 then it won't help. Blood will accumulate anyway. Great. All right. Good. Yeah. Nice one, Nick. Thanks so much. Yeah, I hope it was helpful see you next time see you next time bye

Postgres FM - autovacuum

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.