Postgres FM - Out of disk

Episode Date: July 19, 2024

Nikolay and Michael discuss Postgres running out of disk space — including what happens, what can cause it, how to recover, and most importantly, how to prevent it from happening in the fir...st place. Here are some links to things they mentioned:Disk Full (docs) https://www.postgresql.org/docs/current/disk-full.htmlpgcompacttable https://github.com/dataegret/pgcompacttable Our episode on massive deletes https://postgres.fm/episodes/massive-deletes Getting Rid of Data (slides from VLDB 2019 keynote by Tova Milo)pg_tier https://github.com/tembo-io/pg_tier Data tiering in Timescale Cloud https://docs.timescale.com/use-timescale/latest/data-tiering/ Postgres is Out of Disk and How to Recover (blog post by Elizabeth Christensen) https://www.crunchydata.com/blog/postgres-is-out-of-disk-and-how-to-recover-the-dos-and-donts max_slot_wal_keep_size https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-SLOT-WAL-KEEP-SIZE Our episode on checkpoint tuning https://postgres.fm/episodes/wal-and-checkpoint-tuning Aiven docs on full disk issues https://aiven.io/docs/products/postgresql/howto/prevent-full-disk  ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of PgMusted. This is my co-host Nikolai, founder of PostgresAI. Hello, Nikolai. What are we talking about this week? Hi, Michael. We are out of disk space. What to do?
Starting point is 00:00:15 Panic. This is what Postgres does, right? Yeah, in some cases, right? Yeah. Well, I saw it a lot. Recently saw again, actually. And and yeah it's not fun why panic don't panic right keep calm and carry on right yeah it's a sports girls you should yeah yeah just use postgres you know yeah so as we discussed before this call, before recording, let's just talk about possible reasons, mitigation and avoidance, right?
Starting point is 00:00:51 Yeah. Prevention. Prevention is better. Avoidance. Avoidance. Denial. It's like the stages of grief. It's not my problem, actually. Yeah, I'm a DBA. It's a SRE problem, right? They should just give us more space.
Starting point is 00:01:07 To start about problems, I mentioned it before the call, but let's repeat. The only reason, the only true reason of this is an insufficient disk space allocation. It's always so. And it's a joke, but we will be returning to this joke, I'm quite sure. Because sometimes you do need more space. Of course, I would
Starting point is 00:01:34 deprioritize this path and consider everything else. But sometimes we end up with this reason and mitigation action like just add disk space. But let's consider this as a last last uh like resort right well yeah i like your structure so like causes then recovery and then mitigation or prevention um mitigation is recovery yeah okay great prevention being the
Starting point is 00:02:03 last one so causes what are the like most common reasons you see this happening to people in the wild? Good question. There are many reasons. I don't have statistics. I can tell you the one I see most blog posts about, because I don't see this that often, but the one that comes up time and time again in blog posts is an open replication slot. Oh, it's like it's too narrow. Let's just say a lot of wall written. Yes. Yeah.
Starting point is 00:02:36 And it remains, a lot of wall files remain in pgwall directory or pgxlog if we talk about ancient versions of Postgres. Yes. world directory or pgxlock if we talk about ancient versions of postgres yes but and that's but that's the root cause of why it's the most common one i've seen i think it's super popular reason but i would structure this like if we need to provide comprehensive list of reasons i would put it to the second place and before the first place I would put and not only like because of slots, slots is just one subsection here, whole section is just pgwall directory takes a lot of disk space. This is the big class of reasons, right? But the very first class of reasons I would say anything related to data directory itself. Like a lot of disk space consumed by tables and indexes, right?
Starting point is 00:03:29 It's also like a whole class, so we have specific reasons inside it. Yeah, so we actually have more data than the size of the disk. Than we expected. We try to delete, but the space is not free, and so on. Lots of bloat, for example. Data-related, wall-related, and the third class probably or category is probably anything else.
Starting point is 00:03:54 Anything else is tricky and interesting. We will talk about it soon, right? So how about this classification? Yeah, just to give people an idea of the anything else, like you including logs in that, like things that can grow quickly that aren't data, like main data directory stuff or wall. Base subdirectory is where data is stored.
Starting point is 00:04:18 So if you think about PgData directory structure, we can think about base, so tables and indexes, right? Materialized use as well. And then pgwall sub-directory and it is huge, for example. And everything else. Everything else is interesting as well. Maybe also outside of pgdata, but let's not reveal secrets right before it's time so what about tables and indexes what do you think like possible specific reasons like i deleted 90 of my table but it didn't help this consumption is still 99 what to do time to panic right so like obviously delete doesn't delete data right it just marks it deleted but it's a two-stage process well and and this is
Starting point is 00:05:16 a tricky one because a lot of the times we've discussed how to deal with issues like this we actually require quite a lot of disk space in order to repack or um to to actually mitigate that problem what the easiest way the way that to do that in an online fashion requires double the size of the relation or double the table size normally so it is a tricky one in this case. Exactly. Just a few days ago, I was discussing this exactly problem with one of our customers.
Starting point is 00:05:53 And they mentioned that they have a huge table which consists of like 90% of their database. And to fight with bloat, they know there is bloat. And to fight with bloat, they know there is bloat. And to fight with bloat, especially in heap, it's obviously you need to use pgRepack. And to use it, you need the same size. Temporarily, you need to have two large tables of same size
Starting point is 00:06:19 because it rebuilds the whole table in background and then switches to it. So yeah, this is a problem. And if you don't have enough disk space, oops. But yeah, so I would say this is edge case. Normally, there is no single large table, which is like 90% of whole database, right? So normally, it's not a big problem. And if we keep free disk space 30-40%, which should be so, like at least 25% maybe, we have this space.
Starting point is 00:06:51 And we know it's just temporary. We like just when we're repacking the heap, the table, right? Not a problem usually. But if you're in this situation, well, not good. Maybe you should do something else. There are alternatives to pgRepack. I think pgsquiz alternative from cybertech also has this issue. It requires additional disk space because it rebuilds the table. But older solutions, like in this particular edge case, I would think I would return to solutions which exist for quite some time, several implementation of not let's rebuild table, but another idea. It's called PG compact table. Again, there are three probably attempts from different persons to implement this idea. And all of them, I think, are in Perl.
Starting point is 00:07:51 But the idea is let's issue some updates which don't change data. When you issue update, even saying update blah, blah, set ID equals ID where ID equals some number, you know a new tuple is created always right so this tool is issuing these updates in a smart way it checks which tuples are in the end of table and it knows there is bloat so there is empty space in first pages for example so if we update tuples which sit at the end of table, they will move to the beginning of the table. And once there are no
Starting point is 00:08:30 tuples left in the final page, final block, it will be truncated. By default, this is what vacuum does unless it's turned off. And then we do the same with next from the end,
Starting point is 00:08:47 the next page, and so on and so on. Like basically, this is similar to like, I remember in Windows, like defragmentation, right? So it's moving data to different places. Do you remember that defrag screen on Windows? Yeah, yeah, yeah. People love that. You could visualize it.
Starting point is 00:09:02 Yeah, yeah, yeah. Maybe like red and blue right so yeah and this is this is much slower process than using pgd pack for table but also in our case if we are actually out if we if we notice we've got you know 99% disk full maybe this kind of thing helps but if we're actually out of disk space, even this is an option. Yeah, yeah, yeah. I'm talking about mitigation here,
Starting point is 00:09:31 maybe too early. But of course, if we already achieved 100%, we cannot move. It's not what I would do. But you just raised a very good point about extra disk space needed. And I just, like,
Starting point is 00:09:46 I just remember there are alternative solutions and I think they don't require any extensions. So they can, could move, could be used in anywhere. Sure. Like RDS, cloud SQL,
Starting point is 00:09:57 anything like you, it's just simple updates, super slow, but not requiring extra resources, which is great. Also, probably you will need to make sure the tool is working well with current version of Postgres because they are quite old. And again, there are
Starting point is 00:10:14 three of them. I think Depeche also tried to implement one of them. That makes sense that it's in Perl then. This is a small zoo of ancient tools. And I like the idea, actually. But it's in Perl then. This is a small zoo of ancient tools. And I like the idea actually. But it's not an emergency tool. It's like right before emergency tool, trying to avoid it. So yeah, this slide. But as for indexes, probably yes, when we rebuild index concurrently, like Postgres
Starting point is 00:10:45 builds another index, but I think this should not be super noticeable because no index can, like, yes, it requires extra disk space, but we should have it if we do it, right? So, anyway, if we delete, we should think about,
Starting point is 00:11:01 okay, that tuples and bloat, keep it in mind. Try to avoid massive deletes or design them properly with MVCC and vacuum behavior in mind. We did a whole episode on that, I think. Right, exactly.
Starting point is 00:11:18 So, yeah. I saw crazy cases when some tables and indexes were bloated like 99%. Yeah. And you saw these cause out-of-disk issues. Well, they contributed a lot. And if we keep it as is, we will soon be out of disk space.
Starting point is 00:11:38 Right? So it's definitely worth keeping an eye on the situation in this area bloat and vacuum behavior and yeah if you don't do it and do it very like infrequently then you once like in five years you take care of it and then you realize out of 10 terabytes you of disk space you had now it's below one terabyte and what do you do with other nine terabytes you keep paying the cloud provider right because it's easy to add disk space but it's not easy to reduce this space because they don't have such function and it's possible only if you create a new a new standby cluster with smaller disk and then just switch to it, switch over, right? Yep.
Starting point is 00:12:29 But yeah, it's quite funny seeing that message. You know, this is a one-way ticket. One-way ticket, yeah. So I think those make sense, right? Normal data exceeding the size of the disk, whether that's actual data or whether that's bloat, makes perfect
Starting point is 00:12:45 sense. Do you want to talk any more about Of course. Actually, I still believe since VLDB in Los Angeles I think in 2018, I still believe, in the keynote there, I like this keynote, about
Starting point is 00:13:02 data deletion is a huge, big problem we engineers need to solve and find better solutions. So sometimes it's not about bloat. It's about a lot of garbage. Duplicated data also. And I saw many times when we start paying attention to bloat and pushing customers to fight with it, suddenly they say, you know what? I had it like maybe a month ago.
Starting point is 00:13:31 You know what? We don't need this table. Let's just not spend time and just drop it. Drop this table or truncate it. We don't need the data. And it's like, wow, some terabytes are free. Yeah. I see this with like analytics data for example people tracking everything like they don't know they never
Starting point is 00:13:52 look at their analytics they never look at who's using which feature but they track everything because we might need it someday or you know like that there's all these data use cases and and even the solution to this like in terms of... I've heard the phrase retention policy. I know it's not quite the same topic, but talking about how long we should retain data for, even that we don't even mention deletion. It's not deletion policy, it's retention policy, which is quite funny. Also, maybe one of the last things in this section I would mention is it's becoming more popular instead of deleting all data to have some better approach tiering tiers of data storage and I know there is from Tembo there is extension called pg tier first of all timescale has this yeah right but in cloud only, I think it's not open source.
Starting point is 00:14:47 If I'm not wrong. So the idea is, let's consider some data archived, and it still looks like it's present in Postgres, but it's not stored on an expensive disk, which has also capacity limits. We offload it to object storage, like S3 or AWS. And this is a great idea, actually. But I think some latency issues will appear and also maybe some errors sometimes, inconsistencies or so.
Starting point is 00:15:19 It's interesting. But it's a great idea. And again, like I said, Tembo has a PG tier extension, which I think I hope to test someday when I or my team has opportunity and time for it. It's super interesting to check how it works. some huge tables, maybe already partitioned, you can go either with sharding or if some historical data is not needed all the time, you can just offload it to object storage, which is like virtually infinite. And then, okay, users experience some worse latency when reading this old data.
Starting point is 00:16:02 Imagine e-commerce, for example. Just to quickly, I feel like this is an old kind of an old-fashioned solution to this was table spaces yeah but table spaces in cloud reality became not popular at all last time i used table spaces in serious project was more than 10 years ago i think yeah but that's what i mean we had this we had this before it's just it had a different name like it maybe yeah yeah table spaces yeah you can have cheaper disk attached to your machine but i still don't like this particular approach because i like as offloading to object storage more that because imagine you have primary and multiple standby
Starting point is 00:16:45 nodes if you use table spaces and cheaper disks you are forced to use the same structure of disks on each node and it becomes still expensive and and slower for example right while object storage it's it's worse in terms of uptime compared to EBS volumes on AWS, for example. But it's much better in terms of reliability or vice versa. Durability. Yeah, availability and durability. If you check SLAs, S3 is... I forget. So if you check characteristics, you think, okay, this is what I actually
Starting point is 00:17:28 would like to have for archived data. Maybe latency will be slower, worse, right? But if you think about e-commerce and order history, for example, if you launch an e-commerce website, if you have
Starting point is 00:17:43 ability to offload all the activities and users touch them very rarely, but it's still good to have the ability to see the old order history. But it's not needed often, so we can keep it outside of Postgres and evict from caches and so on. And don't pay for our main cluster. We just pay for S3. And there we also can have tiering, right? Usually. Tiering.
Starting point is 00:18:12 And even automated. Usually providers have these features. If it's older than one month, it goes to some colder and cheaper space. So I think this direction will receive some more popularity and will be better developed and so on. Tiering for storage. Once again, we've been sucked into kind of like prevention. I think, yes, it's hard to talk about reasons. And then, yeah, like this is a big flaw of the structure I proposed.
Starting point is 00:18:50 Well, no, I like it still, but it feels like we haven't really touched on. The most popular, let's talk about. Yeah. Wall, right? So Pidgey wall is huge. We identified it, which is actually already some good skill. Not everyone can do that, but if you manage to identify that your PgWall data is huge, there are a few certain reasons. And there is a good article from CyberTech about this.
Starting point is 00:19:20 Why Postgres doesn't delete wall files. It's already time to split them. This is exactly when we need to apply this article as the list of possible reasons and just exclude one of them until we find our case, our real
Starting point is 00:19:38 reason. So one of them, you mentioned some replication slot, logical or physical physical not progressing and accumulating a lot of changes it's not like accumulating it's not like slot postgres doesn't write to slots it writes to pg wall a lot of walls right but slot has position and if it's inactive or position is like frozen consumers don't consume and don't shift this position. That means Postgres must keep those walls in pgwall directory until it finishes.
Starting point is 00:20:14 And since recently, a few years ago, one of, I think, Postgres maybe 13 or 14 received a setting to limit this, to have threshold when we give up and say, let's better to kill our slot to destroy our slot. And but let's stop this situation and walls should be deleted already. And I remember this big fear in a good engineer I worked with. He was not Postgres. He was more SRE, but with a lot of Postgres experience. And it was long ago when initially replication slots for physical replication were introduced.
Starting point is 00:20:58 We had basically a small fight. Like I said, let's use it. It's a great feature. He said, no, no, no. I'm not going to use it, he said. Like, said, let's use it. It's a great feature. He said, no, no, no, no, I'm not going to use it. He said, this is super dangerous. And he understood that the danger is if some replica is somehow stuck, the slot is not progressing, the primary is out of disk space, and this is the last thing he wanted. After lost backups, of course. Right?
Starting point is 00:21:29 So, but over time, we started using slots. But then this setting, I think it's a great setting. You can understand your disk layout and how much free disk space you can afford. Like, if you approach, like, 90%, it's time to kill slots. So you can do some math and understand that maximum number of gigabytes you can allow for a lag is this. Of course, over time, database grows and still this, like it still can be a problem, right? Because if data grows, tables and indexes grow, and probably your setting will not save you from emergency, right?
Starting point is 00:22:07 Maybe. But this is definitely one of the reasons. What else? What reasons? Because of failing archive command, right? Because for Postgres, it's important if archive command is configured. For Postgres, it's important to archive. And if it's failing, any reason can be. Postgres cannot archive, so if it cannot archive,
Starting point is 00:22:28 it cannot remove these walls. And that's a problem. So you should monitor archiving command lag separately. And if it's growing, it's very bad for backups already. But it's also bad because it can hit you in terms of disk space. So, yeah, these processes, replication and
Starting point is 00:22:51 archiving of walls are two processes that if something is wrong with them, Postgres cannot remove walls. Another reason, actually... By the way, this can grow quickly. I think people don't always realize how quickly this can grow even with some database yeah it can be small but very active database and you can have terabytes of wall generated per day it's actually good to know
Starting point is 00:23:16 how much you generate per day or per hour per second so i think even with like a low activity as long as something's happening every time there's a checkpoint, I've seen people with toy databases like, you know, like free tier RDS, be surprised that it's generating like a couple of gigabytes a day, like 64 megabytes per five minutes or so. Adds up quite quickly with small. smooth. Funny reason I also can mention you have not huge database but also quite active and you think it's time for checkpoint tuning. We had an episode about it. Checkpoint tuning. Let's raise max wall size and checkpoint timeout. You raise it
Starting point is 00:23:56 but you didn't do proper math in terms of how much disk space you have and end up having normally too big, PGA wall because distance between checkpoints increases and Postgres needs to keep more walls. And this can be just a mistake of configuration.
Starting point is 00:24:18 Yeah. So, yeah. Like after this, I think we can move on and let's talk about other reasons. Yeah, what else do you see? I would put on the first place in this category log directory. Yeah. Especially if it's inside PgData, inside this main Postgres directory. Not especially, like no.
Starting point is 00:24:41 Especially if log directory where PostBiz writes logs, if it's in the same drive as data directory. For example, inside pgData. Or just next to it, but on the same drive. If logs are suddenly, like we have some performance degradation and log min duration statement is configured or auto-explained. Or, for example, we decided, oh, we need more audit, and PGAudit is a great tool.
Starting point is 00:25:08 All right, let's bring it. It starts writing a lot to logs. And if you didn't separate your data from logs, you should separate. I'm not talking about PGAwall. I'm not a big fan of having a separate disk for PGO, actually. Oh, interesting. Yeah, it was a thing in the past, but recently I don't see benefits from doing this often.
Starting point is 00:25:34 Benchmarks don't prove it and so on. So it depends, of course. It depends if it's your own data center, maybe it's worth doing this. But if you don't uploadload logs, regular logs, and keep it on the same disk, then you bring Peugeot Audit, they suddenly start writing gigabytes per, I don't know, like hour, or maybe some problem with rotation of logs. Yeah. Like retention or retention. So they are not deleted properly, for example.
Starting point is 00:26:06 And then this can hit you badly and you have panic, actually, for your database and selects are not working. But if you're offloaded to a different disk, and at least with login collector enabled, I know that database actually will be working fully. Then that's great. I actually recently, maybe a few months ago, I had this experience. I got
Starting point is 00:26:28 used to it. Postgres is fully down if we are out of disk space, and I expected the same behavior when our log drive is full. If we are flooded, if it's full, I expect big problems. Postgres didn't notice almost. So, okay,
Starting point is 00:26:43 we cannot log, but we keep working. That's great, actually. So this is a super big benefit of having different drive for regular Postgres logs. And even if it's slow, it can affect performance, right? But if it's 100% full, not that bad. Yeah, we've talked several times about excessive logging having an impact on performance observer effect yeah and i've definitely done benchmarks you know this is the
Starting point is 00:27:13 the age old log min duration statement zero like what why we recommend not or i recommend never setting it to zero because this like you can generate gigabytes in like a 30 minute benchmark like with pgbench or something so it's it's surprised like it surprised me how much it can grow yeah it's actually a good idea maybe to just to you know like maybe to ask our bot to benchmark with regular pgbench and then to have sampling for logging of queries in one way or another and just to see how observer effect grows and kills your performance only people with small databases can recommend log min duration statement zero or quiet databases yeah quiet yeah not active no small databases and um small workloads that's it so or i guess super low like turn it to that only but only very very briefly like a minute or two
Starting point is 00:28:19 or something yeah yeah so for for serious workloads you cannot do that. I already shared my experience putting big critical databases productions down. And yeah, don't repeat my mistakes. As well as logging, what else did you have in this category of things? Oh, that's a good question. Well, let's not maybe spend too much time discussing some unknowns or external things like we have, I don't know, something else installed on the same machine and using the same disks and suddenly we are out of disk space. Also, it can happen sometimes. Or maybe, for example, you do some troubleshooting, you have self-managed Postgres, you can SSH to the box, you started sampling every second some good logs, but do it to, for example, to home directory,
Starting point is 00:29:10 and then it's full or something like this. So you always need to be careful with that. But interesting case when, for example, you try to upgrade, you have Patroni, or I don't remember exact details, but in some cases, Patroni decides to retry provisioning of a standby node, for example, and it fetches PgData in the way you configured
Starting point is 00:29:34 with PgBaseBackup or from archives, but it renames the old directory to make it like backup, not like local backup, right? It just renames it and you end up having two directories suddenly, right? Data directories full-fledged. They're like, wow,
Starting point is 00:29:50 it can quickly be a problem in terms of disk space. Does that mean like if you had, let's say your data director is about 40% of your disk size, so you thought you had loads of headroom, suddenly you're at 80% plus whatever else is on there.
Starting point is 00:30:06 Yeah. Yeah. Okay. Maybe large temporary files, but it's quite exotic. I like, it can be, they can be huge,
Starting point is 00:30:12 right? But it's like, it's exotic. I saw some Stack Overflow posts mentioned that, you know, people that were out of disk got the error, but when they checked, they did have free disk space.
Starting point is 00:30:22 And one of the suggested answers mentioned check for temp file. Like your queries could be doing a lot of temp file stuff. But yeah, I've never seen that myself. What about, well, backups is an interesting one. Like backups, if you, because some people recommend keeping some, obviously keeping all your backups on the same disk as your database is suboptimal for recovery purposes. But keeping some on it makes sense, especially for huge databases, right?
Starting point is 00:30:49 I don't think so. I would not recommend doing this. But what about for recovery, like no network? I'm just thinking about like... Regular backups should be separate, of course. But if you do some operations manually, for example, you decided to back up some tables and just you know like dump them to some compressed form and keep on the same drive well it might happen but
Starting point is 00:31:12 it's it's like there are many mistakes you can do manually fine fine or you just create create table select for example you basically just clone the table and then forgot it. Well, many such mistakes can be done. I think there are many exotic situations we don't discuss here. And I'm curious if our listeners had some interesting situation we didn't discuss yet. It will be interesting. I think many exotic things can happen. Let's talk about what to do in emergency, right? Yeah.
Starting point is 00:31:46 First thing, understand the reasons, right? Or maybe not. No, I don't think so, yeah. I mean, bear in mind, if we're actually in an emergency, our database is no longer, not only not accepting updates and writes and things, but it's also, in a lot of of cases not accepting selects, which... Which is ridiculous, right?
Starting point is 00:32:07 You shared with me before our call, you shared Ivan talk, which says if managed Postgres service has automation, if it understands that we are approaching a full disk space, it converts our database to read-only state, just setting a parameter, which actually any client can override actually. So it's a weak protection. Default transaction read-only turned on, this parameter, default transaction read-only. I like the idea actually, because it's weak protection, but it's quite reliable if you don't have this set to off all the time in your code usually like there are low
Starting point is 00:32:50 chances you have it in your code so it means that it will protect you and then you have alerts you need to have proper monitoring and so on but this is like prevention measure maybe yeah again if yeah it's again yeah but if you're an emergency this this is like i think we should write some good how-to in general how-to for for any case i did check your how-tos and i did i was surprised not to find one for this actually um but it's a miss i think also one thing worth mentioning i nobody's going to be listening to a podcast and be like half an hour in or whatever and be like you know know, because they're panicking. Yeah.
Starting point is 00:33:26 Yeah. But one thing to remember that does seem to catch people out is, I mean, hopefully everybody listening here, but this won't happen to, but they know that wall stands for right ahead logging, assume that it's logs and therefore can delete it. And then because it's a
Starting point is 00:33:46 because it's like a large amount of what they think are logs they delete those to try and recover and i think that the main advice from me in terms of recovery like from every guide you'll read on this is don't do that yeah i will say the main the like classic approach is understand reasons and fix them, mitigate, right? But you might have pressure of time. In this case, it's good if you can quickly identify something you can delete safely without big consequences. For example, regular Postgres logs. If you have a lot of logs, gigabytes, it's good to delete just one file or so, start database if it's down.
Starting point is 00:34:29 And then you do need to do full analysis and mitigate the real reason and so on. Or maybe you can delete something outside Postgres data directory, something. You can check quickly if there are different directories around which have some gigabytes of data, or at least megabytes, so you can quickly delete and let Postgres behave normally for quite some time. It buys you some room for analysis and proper mitigation. But mitigation is inevitable.
Starting point is 00:35:02 You need to understand what happened. Is it wall or data or something else? And then you need to understand what happened is it wall or data or something else and then and then you need to mitigate yeah i was reading some guides on this there's a good one on the crunchy data blog and they made a really good point that we don't all think of doing when you're in that kind of panic mode get you know get things back online is is my primary objective um in cases like that but they recommend taking a file system level copy of everything in its current step like take get a snapshot of the current directory as it is before you start deleting things yeah it's additional yeah yeah because it takes time and we're down. In cloud, actually, if you don't see what you can delete quickly,
Starting point is 00:35:48 like in cloud it's managed, for example, you cannot go and run the UDF and so on. But in this case, probably it's easier just to add a few percent of disk space quickly to buy some room again and then to investigate. Obvious approach, right? I also think because of some reasons we're going to discuss in a bit, this is just so much less likely to happen in cloud environments.
Starting point is 00:36:11 It just feels like this is one of those features. There aren't that many. This is their job, right? Yeah, this is one of those two or three features that they really do manage and can manage for you automatically. Notify properly and so on right or just auto scale yeah you just don't hit this i don't like auto scaling this area because it's auto scaling of budgets well but it's also not solving the root cause of the problem if you're
Starting point is 00:36:40 if for several of those problems we talked about it's not your actual data growing it's not your like uh it's not your true tables and indexes it's right ahead logging going out of control because of an open replication or like some reason that it shouldn't be that much data so auto scaling those is it's just not solving the problem exactly so yeah like hiding dirt under the rug as long as you've been told when it has scaled up then at least you can go and investigate yeah so yeah i think we covered quite well mitigation in general and uh what do you think about yeah um, I think increasing makes sense. I saw some warnings not to increase in place necessarily.
Starting point is 00:37:30 Like maybe you should look at setting up a replica and failing over, but I thought that would be slower. Like I wasn't sure myself. I don't know. Like I think if you can add space or delete something safe, it's a good first step. Then you have a big step of analysis, proper analysis and mitigation of true reason, main reason why you approached it and not noticing. So a big part of mitigation, if you perform proper root cause analysis, I'm quite sure will be we didn't notice this and it's a problem itself. So our observability is bad.
Starting point is 00:38:08 Either monitoring or probably alerting. In fact, we have to go to prevention. Yeah, prevention, I think, philosophically is quite simple. You just need, like, the better way to prevent it is understand and keep keep keep on like good level of understanding of how you spend your bytes of disks of gigabytes so you understand
Starting point is 00:38:33 understand the layout you understand wall data real database data is it like actual data or a lot of bloat and then you understand logs and everything else and you understand, okay, we are good and we understand how we spend our disk bytes. And if this understanding is good, everything will be fine and you will notice problems early. But it's hard to do if you have hundreds or thousands of clusters, right?
Starting point is 00:39:03 Yeah. In this case, you do need some rules to be alerted. If bloat is high, if the PG wall grows without good control, if logs are not properly... Also, logs consume too much. You just need to have good observability right to to have mitigation and then you just with understanding of possible reasons you you navigate in these reasons and find mitigation but the key of prevention is understanding and understanding is observability right yeah two other things i think like whilst i'm kind of against auto scaling in principle i think it's if you've got the option going up in budget and having to like migrate later back down to a smaller size
Starting point is 00:39:54 is so much better than being down that i just i personally would have that on myself if i had that option and i think also oversizing like disc is so cheap compared to everything else we're doing. Unless you really are in the like extreme, huh? Disk is not cheap if you have a lot of terabytes and many standby nodes and oh, it can be not cheap. But most of us aren't in that case. Like most of us aren't in. So yeah, I agree that for like people with serious serious
Starting point is 00:40:26 some some people pay like millions for disk space it's like it's can it can be headache that's why i mentioned pg theater is a quite a very good potentially good thing and tiering is good and like some design understanding how the like structure of your disk space consumption you you design some rules and the float data properly and keep everything under control and that's that's this is good this is good but yeah i think imagine imagine advising some startup and they had like some 10 gigabyte disk or something and their database was only a few hundred megabytes. The cost of being on a 100 gigabyte disk instead is just so tiny. It's just such a small...
Starting point is 00:41:13 But yeah, I take your point. But again, I think observability can be always improved, I think. So especially if we know that this is BGV, this is data, this is logs, everything else, and we have some proper analysis and alerts for it. I think that's the bit that trips people up.
Starting point is 00:41:36 I think a lot of people actually have quite decent observability. They can see what's going on, but alerting people who aren't getting alerted early enough or not at all is seen is the one i see quite often no alerts at all for you know disc is at 50 sends an email 70 raises that you know 90 sets off a page you know like they these kinds of like serious alarms yeah you know i know we need to go already out of time, but let's mention some small practical advice in the end. If you cannot SSH to the box, for example, it's RDS or other managed
Starting point is 00:42:13 Postgres service, and Postgres is still alive, but you wonder why the disk usage goes up. At SQL level, you can check what's happening in directories using pglsdir, pglswalldir. There is a set of administrative functions which allow you to inspect the content of directories. And this is very helpful when you troubleshoot and understand what's happening. And maybe also it can be helpful in mitigation activities in observability. For example, if you have a report triggered out of disk, almost soon we will like usage exceeded 90%. Then you can include some analysis automated to have a snapshot of what's happening in key directories, right? And understand and understand what are consumers of disk space. Just some advice I thought we could miss.
Starting point is 00:43:11 Let's not miss it. That's it, I think, right? I think so. I hope none of you ever actually have to face this and you can all mitigate or, sorry, prevent it from happening. And yeah, thanks so much, Nikolai. Catch you next week. Thank you. Bye-bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.