Postgres FM - Backups

Episode Date: September 29, 2023

Nikolay and Michael discuss Postgres backups — why we need them, what the options are, whether a dump is a backup or not, and some considerations for lowering RPO and RTO at scale.  Here ...are some links to some extra things they mentioned:pg_dump https://www.postgresql.org/docs/current/app-pgdump.html pg_basebackup https://www.postgresql.org/docs/current/app-pgbasebackup.htmlpgBackRest https://github.com/pgbackrest/pgbackrest WAL-G https://github.com/wal-g/wal-g Barman https://github.com/EnterpriseDB/barman Data loss at GitLab (2017) https://about.gitlab.com/blog/2017/02/01/gitlab-dot-com-database-incident/ Dev Deletes Entire Production Database, Chaos Ensues (YouTube video) https://www.youtube.com/watch?v=tLdRBsuvVKc Our episode on corruption https://postgres.fm/episodes/corruption DBLab Engine https://github.com/postgres-ai/database-lab-engine ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM episode 65. I'm Michael, this is Nikolai. Hey Nikolai, what are we talking about today? Hi Michael, let's discuss backups finally. Yeah, this has been on our list for a while and I feel like this is a small win for me because finally you picked the most boring topic that we've ever had on the list so yeah i'm happy boring and also one of the most important ones because if you ask some people why they migrate for example for rds from self-managed postgres a very popular answer is i i'm afraid uh i i don't want to be responsible for backups anymore. Yes. Obvious reason, right? And not just backups, right? Like just the whole process around it.
Starting point is 00:00:50 Yeah. But backups is on the first place of the list. Of course, the list can be long. Yeah. But backups, we are afraid to lose data. Of course, some projects are not afraid to lose data, and they value availability more than data loss risks. So sometimes one is more important sometimes than the other.
Starting point is 00:01:19 Sometimes both. Actually, in most cases, both characteristics are important to be 100% available or like five nines, four nines, three nines available. And also don't tolerate any data loss. But I see there is a tendency to say, okay, we can be down for some time, but we cannot lose any data at all. Yeah, let's come back to that. I think that like that is definitely a really interesting discussion. I think we should quickly talk about why do we even like need like what the basics here first I think that's a good like trade off discussion. But what are the basics like why ask backup something we need to care so much about like what are the main things we're protecting ourselves from or against right so database system should not lose data if we inserted something it should be present there and the main internal mechanism not to lose data is write a headlock first of all. So if we perform some write, it can be insert, update, delete, it first goes to this special system wall write-ahead log and it's written there first. And not only written in
Starting point is 00:02:36 memory, it's also written to disk. And if you have normal settings, if you committed your transaction, it means that the data is already on disk in wall. As for the normal data files, tables, indexes, not necessarily they received this change on disk. They received it in memory but not synchronized to disk yet. This synchronization happens during checkpoints. But wall, it's definitely there. But unfortunately, sometimes we have... So, if we have a failure, database system knows how to... Postgres knows how to reach the old point of consistency just replaying
Starting point is 00:03:23 proper wall records since the latest checkpoint. And the bigger distance between checkpoints, the longer it takes to replay these changes, to redo these changes. So it means that if a crash happens, we are good, the database can recover itself. But unfortunately, sometimes we lose the whole server. Yeah, that's one reason, right? Yeah. Right. Or something is broken with disk, for example, and we cannot achieve consistency at all and
Starting point is 00:03:50 we cannot start or something. So I think you're listing the most, like the thing people think about first, but that tends to not be the time I see backups actually used most often. It tends to be protecting people from themselves or like customers doing things that they shouldn't have done or didn't mean to do. That's a great point. And this explains why replicas cannot serve for backup, for disaster recovery purposes. There is such term DR, disaster recovery. It's like a set of measures we take to avoid data loss and so on. So, right, so I explained only problems from hardware.
Starting point is 00:04:32 You're explaining problems from humans, and they're also quite common. For example, I deleted something and want to restore, but if you rely only on replicas, this change is already propagated on all those replicas, and it's hard to restore from there, right? Well, and I only mentioned humans that were doing things accidentally, and then we've also got to worry about people doing things maliciously, like ransom attacks, that kind
Starting point is 00:04:56 of thing. There can be other reasons why backing up your data can be really important, and yeah, having a copy of it somewhere. Right, and in this case, not only you need to backup your data, you also need to store it in some second relocation because this special type of attack, if a hacker steals access to a whole cloud account, all backups are in danger as well, not only Postgres servers themselves.
Starting point is 00:05:24 And in this case, the risk to lose everything, including backups, is also not zero. And from this kind of problem, we can be only protected if we store data in some additional account and only few people can access that account. This is hard to achieve, by the way. Especially, for example, if you are an RDS user, you cannot store backups on, for example, Google Cloud or Azure because RDS doesn't allow you to access those backups directly. Right. You can store only dumps. Maybe it's time to discuss difference between dumps and backups.
Starting point is 00:06:02 Yeah. Awesome. Yeah. Awesome. Yeah. Well, yeah, I was just going to say, I wanted to make sure we didn't move on without doing that. I think there is one more as well, actually, before we do move on, is that that's corruption. I think it gets mentioned a lot, but I tend to not see people,
Starting point is 00:06:19 I don't tend to see it being the most used time for backups. So it's like people mention like your data could be corrupted but people tend to like move forwards from that rather than go backwards in my experience at least sometimes at least in the cases where the corruption is quite silent and hasn't been noticed for a while for example you don't know exactly when it there are mixed cases for example you cannot recover some table or some of some table, so you may be fixed it somehow with resetting wall or something, which is quite risky. But if you have backups and you can perform point-in-time recovery, you can restore data
Starting point is 00:06:56 that you lost or you cannot read and so on. So having backups is useful in this case as well. If you want additional reason why backups are useful, additional reason is you can provision new nodes. It can be replicas or it can be dependent clones. You can provision these nodes from backup system, from archives, not touching real-life nodes, not making them busy with your cloning or provisioning activities. So this is
Starting point is 00:07:27 also a popular reason to have backups as a source for node provisioning. Yeah, good one. Right. So logical versus physical. And this is similar to previous discussions we've had in their definitions, right? So logical being taken it without bloat, maybe, is the simplest way of putting it. But like recreate the objects from scratch, insert the data from scratch. So recreating the entire database. Whereas physical would be at the file level, at the disk level, right? The block level, right. So, right, at file level, actually, also, right.
Starting point is 00:08:11 So if you consider PgDump, PgRestore as a set of tools for backups, you're actually following official documentation because it says PgDump is a backup tool. But a lot of database people, including myself, are against this phrase because backups should have special characteristics and one of them is very flexibility. So we can talk about RPO, RTO right here.
Starting point is 00:08:41 RPO, RTO is two important characteristics for any backup system. RPO is recovery point objective, RTO is recovery time objective. It's maybe confusing, but one of them is about how much data you lose if you need to recover from zero, like from after some disaster. For example, all nodes are lost, whole cluster, primary and only its replicas, all of them are lost. And RTO is how much time it's needed to
Starting point is 00:09:12 recover. So one is measured, for example, in bytes, another is measured, for example, in seconds. Good characteristics can be close to zero data loss and okay like roughly one hour for one terabyte to recover maybe it's too much it depends right I was going to say I think the genius of whoever came up with these
Starting point is 00:09:39 is it forces the discussion of a trade off here I think if you ask business leaders what they want, they want zero data loss and super fast recovery. It depends. Well, yes, they want both, of course, right? Yeah, they always want both, but this forces the discussion saying, well, which are you willing to let us have some leeway on here? Because zero data loss in zero milliseconds is not possible like generally like you know i mean it's so it forces that discussion or is extremely expensive to
Starting point is 00:10:12 achieve right right right for example you can have quite frequent cloud snapshots for your disk and in this case restoration well it still will be quite slow, a lot of minutes, not hours at least, right? Usually. But in this case, you lose some deltas between these points of snapshots. So if we go back to PgDump, PgDump creates a set of SQL statements. This is what it creates. You can speed it up to go with multiple workers, and then you can speed up recovery as well.
Starting point is 00:10:52 For example, if you do it during some maintenance window or in case of disaster recovery, you can go with full speed, for example, taking a number of CPUs or knowing how your disk is, how disk capabilities look like and saturating these capabilities but not significantly, right? So in this case, we have both characteristics in a very bad situation. First of all, PgDump, you cannot do it very often, especially if you have a large database. Again, like roughly 1 terabyte per hour. Some people get more terabytes per hour, some people get less. It depends on your infrastructure. First of all, disk, then CPU, also network, important network. And second thing is, so you can do it only, for example,
Starting point is 00:11:42 once per day, which is actually quite often for dumping. But it means we might lose up to a full day of data if we're relying on that as our backup tool. Exactly, exactly. Why you cannot do it often, also let's mention it right away. So if you do it often, you put a lot of stress. Each PgDump is a stress. We discussed it a few times. There is such setting, hot stand by feedback. And in clusters where replicas are receiving read-only traffic, usually hot stand by feedback is on.
Starting point is 00:12:13 Meaning that replicas report to the primary, I need this version of data, this snapshot, these tuples. So when you run PgDdump it runs transaction or multiple sessions in repeatable read isolation level. It's higher than default. It's not serializable at least, but it's already higher than default read committed. And why is it needed? Because we need to get consolidated snapshot of all tables, all data. And even if you do it in a single session, you still need it. If you do it in multiple sessions, behind the scene, it will tell all sessions to deal with the same snapshot, to get data in consolidated form.
Starting point is 00:13:05 Right? So consistent data in terms of version of tuples, refraction consistency, everything. So during this you hold this snapshot. And this means that auto vacuum cannot remove dead tuples which became dead after you started. So you become a problem to auto vacuum. And this means that even if we started on replica, since it reports to the primary, it's bad. So you can hold the snapshot for multiple hours and during this we accumulate a lot of dead tuples and later when we finished AutoVacuum will delete these tuples converting them to bloat. This is how you can get a lot of bloat. Just perform frequent PG dump
Starting point is 00:13:55 activities. That's it. So we can't perform frequent PG dumps so therefore we have bad RPO but we also have bad RTO. Exactly. Yeah. Because not only we need to load all the data to tables, we need also to rebuild everything. First of all, indexes. Yeah. And also verify constraints.
Starting point is 00:14:19 Like we're starting from scratch. It's like we're not taking our database. We're taking fresh database with the same data. This is what the dump restore process does. It creates new cluster, completely new data, not cluster database, but with the same data. It refiles constraints and it rebuilds all indexes. And it's very CPU consuming operation. Of course, IO also, but CPU as operation. Of course, I also bought CPU as well. And if you have a lot of indexes,
Starting point is 00:14:46 interesting thing that of course, indexes are not dumped in terms of data. Each index, it's just one line like create index. Easy, right? But if tables are large, if you have a lot of indexes, it's a lot of work when restoring. And it means that time to recover will be significant. It can be even bigger in terms of time than data load itself, if you have a lot of indexes. So building indexes can take longer time than loading data. And overall, it can exceed the simple copy at file level. That's why base backup with World Replay might be faster. So I think you've nailed something, but I want to add something quickly, That's why best backup tool would be happy adding
Starting point is 00:15:45 that word in, but other people can't disagree anymore. It's just not a good one. It's definitely not the best one available to us. Well, yes, but you know, like the third law of Newton, if you apply force, there is opposite force. So since the commutation is quite strong force, this statement is so like bold and written very clearly, we need to somehow to mitigate this, right? So that's why we're trying to say... I agree with you, it's kind of backup tool, not ideal, yes. In some cases it's fine for tiny databases but if you have one terabyte already it's definitely time to switch from and I see so many projects so many teams start with pgdump and then suffer quite long and then switch I don't
Starting point is 00:16:36 understand but like I think I blame I blame documentation here because they open the documentation CBCAP tool they have a false feeling they are fine until they are not. Like terabyte or like thousands of TPS and maybe even 100 gigabyte is already not good. So, and I see some platform builders also make this mistake. Oh, really? I'm not going to name. Okay, good. But But yeah, it happens. We should probably move on though to physical. Like what should people be using instead? Instead we need
Starting point is 00:17:13 to use physical backups. Unfortunately if you are using managed Postgres you just need to rely on they provide. And it's a pity that... it's not a pity, I think it's a huge mistake and it's not fair that RDS and others don't provide access to backups. If those backups are regular Postgres backups, users should be able to reach them, to download them. But instead, RDS and Cloud SQL and others, they provide only access to at logical level. You cannot get your database. You can only get data from your database. I'm trying to continue my concept of like dump restore is like cloning your database.
Starting point is 00:17:59 It's not taking your database. Taking database means taking data directory as it is, right? So if you're working with RDS, they do physical backups but only behind the scene, right? So yeah, this isn't something I realized actually. So you're saying there's a couple of tools for doing physical backups, for example, are pgbackquest or warg. I can't use one of those to backup from rds or the equivalents is that what you're saying you cannot right right so the basic backup is just copying data directory with for example cp scp rsync doesn't matter and of course while you are
Starting point is 00:18:41 copying the files they are changing and there is an internal mechanism to deal with this inconsistency. Basically, you can copy, but you will get inconsistent copy. And for example, you can, again, you can perform cloud snapshot of your disk. Again, it will be inconsistent. I remember times when Google Cloud says they do consistent copy of disk. Then they remove these words from their documentation. So, internal mechanism to deal with such inconsistency is two functions. pgStartBackup and pgStopBackup. And you can do it on replicas as well.
Starting point is 00:19:19 So, you say pgStartBackup, you get a copy of your data directory anyhow and then you say pg stop backup. In this case this copy of data directory in pg wall sub directory there will be enough walls to reach consistency. So yes data files are kind of from different times points points of time, right? But when Postgres will start, it will know that it was a backup, it was a copy which was created between pgstartbackup and stopbackup functions, so there are walls and we can replay them and reach consistency point. Great.
Starting point is 00:20:01 Again, it's possible to do it on replicas as well. But, of course, it's not super convenient. It's like low level. And I remember I like quite experienced DevOps guys had very big fears to do this approach. I told them, if we say pgstart backup, oh, by the way, super important, this mistake is very frequent. If you say pgstart backup, you need way, super important. This mistake is very frequent. If you say pgstartbackup, you need to maintain this connection. If you close it, you will lose proper files. For example, if you do full backup using Volji or other tools, same. You need to maintain the connection where you started this.
Starting point is 00:20:42 If you close this session, when you say stop, pgstop backup, it will report that lost connection. So the terabytes of data you just backed up cannot be used. So you need to be careful with this connection. And I remember how DevOps guys had fears to do this. Like how come? Database is live, it's writing constantly, I cannot copy data directly, it's not possible. No, it's possible if you're between pgstartbackup and pgstopbackup, you're fine. But of course, it's low level, so to make it more convenient, there is a tool called pgbasebackup, which automates a lot of stuff. And I also remember
Starting point is 00:21:26 some time ago by default it didn't copy wall files. It was so many backups were created without wall files. If you don't have wall files to reach this consistency point these terabytes are useless. You can move them to a trash bin. So you need walls to reach consistency point. But right now, by default, PG-based backup will take care of walls as well. So you're protected right now. I also made this mistake, of course, not once. If you just follow, like in wall documentation, you just follow steps and it's normal to forget about walls and then a few hours later you have a lot of terabytes backed up but no wall. But it's not enough. Why it's not enough?
Starting point is 00:22:09 Because not only you need these walls to reach consistency point we also want to have almost like close to zero RPO means means that we don't want to lose anything and if our backup starts at midnight but disaster happens at 11pm, we want 11 hours of data, right? So we need to replay all changes. Perfect way to replay changes is wall string. So we just need to have... Okay, we perform full backups, for example, daily or weekly, but we also need to write all walls. And for this, natural approach is to use archive command. Most backup tools use archive command. Only one tool I know which used different approach was Barman. I think it's already using both approaches.
Starting point is 00:23:08 But originally Barman used different approach. It was like I'm a physical replica. I'm receiving changes on streaming replication. Yes, and I save this data. It's also interesting approach, but Archive Command is kind of more reliable and it's standard de facto. So archive command just archives all walls. Usually by default it's 16 megabyte files. On RDS I think it's 64 adjusted.
Starting point is 00:23:39 You can also adjust it. But default is 16 megabytes. And each file is just compressed usually additionally so there are two things two kinds of compression I would I will describe if it's interesting but in general it's usually compressed to some smaller by megabytes and archived so if we have whole stream of walls without any holes and like overlapping or something if it's consistent right we can replay until the latest available point which means we can achieve near to zero
Starting point is 00:24:13 rpo but it will take time increasing our rto which is interesting and replaying might be quite slow usually it's slow if your shared buffers are slow. For example, sometimes we have additional replica with smaller number of gigabytes of memory. So we allocate fewer gigabytes to shared buffer and observe how replaying is significantly slower than original. So we have a lag accumulated.
Starting point is 00:24:46 But without lag, very rough rule is, usually to replay doesn't take longer than originally it was, right? It should be much faster. So if we need to replay 11 hours of work, and of course we will be replaying less because some hours, it depends on the size of course, but some part of it will be replayed while we reach consistency point related to full backup. But then we need to replay more and more and more, and of course it should not take longer, usually a few times faster, but still a lot.
Starting point is 00:25:23 Maybe 10 times faster, depends on data nature, hardware, a lot of factors. So what are options then for reducing that RTO? It's a great question. So some tools also offer delta backups instead of replaying changes. Like replaying changes means like we are kind of becoming a replica, temporarily. It's same as having a physical replica based not on streaming but wall shipping. We just ship not from server, we ship from archives. By the way, I forgot to mention, usually these days we keep backups on some object storage such as S3 or Google GCS or something like that.
Starting point is 00:26:11 So if you replay, it takes time. But there is an idea to have delta backups. Instead of performing full backup, we consider previous backup as the basis and just backup difference, delta. For example, every 24 hours we have full backup and every 4 hours we have delta backups. Or every week we have full backup and every day or every 6 hours we have delta backup. Why we still want to have full backup? Because maybe we don't trust fully to delta backups, they're quite fragile. But these days... And you need a full one at some point.
Starting point is 00:26:54 Yes, to start. And then you need to apply multiple deltas. And then I replay additional walls between deltas, we still have some distance in time, right? So it's like three-stage process. Full backup, applying deltas, and then applying walls. Walls, right. So this can speed the process up, definitely. And also reduce space, because if you perform full backups every day, it will cost you in terms of object storage price.
Starting point is 00:27:29 Yeah. So what do you tend to, I think you've said previously, you tend to use WooG for the management of these. Let me finish with, I just realized there's also obvious idea that, and I know platforms they use this approach themselves, instead of restoring full backup, base backup from archive, which will take time, we can just restore snapshot of disk and then apply walls. And this can be super fast. Minutes. Interesting minutes for many terabytes
Starting point is 00:28:07 it can take minutes. AWS, GCP they all offer snapshots of disks. The restriction here is that you can restore only to new disk usually. You cannot restore for example two backups on one disk. You need to create new disk and restore from this cloud snapshot. But it can be super fast and unfortunately for example Volji doesn't support... I cannot say you know I have snapshots let's consider them instead of full backups let's consider snapshots and just perform all orchestration around it and just apply walls and so on. I'm not sure about pgpcrest I also think
Starting point is 00:28:53 it's so yeah but the idea is restoring full backup and deltas as well from zero from scratch it takes a lot of time increasing our RTO and we need to move faster because if it's a disaster our service is down. In this case, cloud snapshots and restoration of disk, full disk from such snapshots is good. So, I mean, we can consider them instead of full backups and then just perform point-in-time recovery to the latest point using wall replay, restore command is responsible for this. We just configure our new fresh Postgres to continue restoring replaying changes from archive and then we reach the latest point.
Starting point is 00:29:38 In this case you can achieve quite stable and like kind of predictable RTO, which will be almost like... like dependency of this RTO on the database size will be minimal because... well, it will be, but it will not be such significant as for restoration in traditional way from full backup. So, I mean you have like 50 terabyte database. It takes like 15-20 minutes to restore from cloud snapshot and then you just need to replay.
Starting point is 00:30:16 It depends on how active this database is during the day. If you need to replay up to 24 hours, in the worst case, for example, if we take snapshots only once per day. Actually, we should take snapshots more often. In this case, 24 hours is quite a lot, but maybe it will take a couple of hours if hardware is good. Yeah, interesting.
Starting point is 00:30:43 For smaller databases, though, you said it's not that dependent on database size anymore. I would guess for smaller databases, it would still be significantly faster. It can take a few minutes. For example, if you have one terabyte, you can restore in a few minutes. For example, one terabyte, every four hours,
Starting point is 00:31:02 we perform full snapshot, cloud snapshot. And we, for example, have, I don't know, like half a terabyte. No, that's too much. 100 gigabytes per day worth of wall data. It's good to know, by the way, the distance in terms of bytes between two points. Like how much during busy day, how much of data is written to wall. And you can expect backups in this case. And, for example, Barman reported just number of gigabytes.
Starting point is 00:31:32 It was good. Wall-G doesn't report, but it reports LSN positions. If you take two LSN positions, you can just subtract one from another, and difference will be in bytes. And then, yeah, I do it all the time. So you know the difference, you know, okay, I generate 100 gigabytes per day. Replaying 100 gigabytes on quite good software,
Starting point is 00:31:58 it will take dozens of minutes, maybe just minutes. In this case, our RTO will be, we can achieve RTO below 10 minutes in this situation, which is good. And almost zero RPO. I just explained the recipe for good characteristics of a backup system. Yeah, nice. I feel like there's one thing we haven't talked about yet that I feel super guilty not mentioning in an episode on backups and that's testing your backups right so we need to remember two mantras about backups pitch dump is not a backup tool and you didn't
Starting point is 00:32:31 convince me not a good backup tool okay not a good backup tool okay good and second mantra is not tested backup is a schrodinger backup it means it's not backup as well so we don't know the state of this backup is unknown even if the tool which performs full backup exited with code zero meaning no errors who knows is it restorable well and this isn't just a philosophical question right like there are real things that could have gone wrong right yes so Yes. So my personal story is I was reading Hacker News in 2017 in February. And I was reading about GitLab case, of course. They lost a lot of data and they had many backup methods applied, but poor testing of backups.
Starting point is 00:33:47 And they lost data. So this is a good, not super accurate but entertaining YouTube video about that situation in 2017 which explains it like in very entertaining form what happened but again like kudos to GitLab team many years they share everything with community with wider community not only those who work with Git or Ruby but also Postgres and I remember I was super curious to know about details and to learn from mistakes of course and then this is actually how we started to work with them and backup backup system remains to be super important focus not to allow any issues anymore so testing is super essential and ideally you need to test every full backup and not only every full backup but also whole wall stream. So you need to verify that you can restore from backup the Postgres starts. This is like minimal verification. Postgres starts and can receive connection.
Starting point is 00:34:40 Ideally you should check data which is there, right? Reading some data. Some people tend to dump to DevNow to verify that all table data can be read. PGDump to DevNow. It doesn't mean index data can be read, for example. Indexes can be corrupted. Some people tend to use AmpCheck and we discussed how to deal with corruption risks. So AmpCheck is a good tool, pgAmpCheck with dash j as many of these CPUs you have, because this machine probably is temporary. Also to check that index data can be read. But it takes time as well. So it's expensive to perform such operations, maybe not every time you want to do it but also you need to verify all walls and ideally you restore from one backup and replay all walls
Starting point is 00:35:31 until the point the next backup right this is yeah whole verification that's yeah i think it's worth saying that the re like the you're testing for the reasons it could have not worked right like you're checking that it's not empty or that it's not you might want to check this yes you check this it's restorable because it hasn't failed the restore hasn't failed so that's a that's a great starting point you're checking it's not empty like that's a potential failure maybe you're backing up the wrong database within your maybe you're backing up one of like you know i mean there's there's so so many things that could have gone wrong checking it's not empty i like the idea of checking for recent data right making sure it's not an old backup timestamps yeah exactly so like there's a few kind of smoke tests you can do
Starting point is 00:36:19 i think i've seen some people doing counts of tables, but I don't think you need to do anything. You can do as much as you want, obviously, maybe the more the better, but there's a limit to how much we want to do, I guess. Right, so we can talk about two classes of verification here. First is infrastructural class, so restorable and wall stream is fine. We can replay until next backup. And actually we can skip the second part here if you have a replica which you monitor well and it's based on wall shipping.
Starting point is 00:36:55 If that replica is fine, it's constantly replaying from backups, it means walls are fine. But if you don't have such replica, you need to do it during your backup verification. It will take time, again, to replay all walls. But you will know your RTO better from such logs, right? And second layer is logical. This is what you described. Timestamps, counts, we can read data, and so on. So, arm check, corruption, logical. Last question. You mentioned it's difficult to get physical replicas from clouds. How do you tend to do it?
Starting point is 00:37:33 It's impossible, not difficult. It's impossible. They don't provide access to PIDG data. They don't provide access to backups, and they don't provide physical replication connection. Some of them do. Crunchy Bridge. They provide access to physical backups. This Some of them do. Crunchy Bridge. They provide access to
Starting point is 00:37:54 physical backups. This is a great example. So, when we say, this is your data, you can take it. Well, yes, but not only I want my data, I want my database. If I pay for it, I pay a bigger cost than to EC2. Of course, I pay for backups to work, HA, everything. I want my database. If it's Postgres, give me this Postgres. I want data directory and I want backups. I was wondering if doing logical replication to maybe a self-managed instance and then backing that up.
Starting point is 00:38:19 I'm wondering about having a secondary one off-site. It's not perfect because you've got the lag the logical replication lag but it's better than not having one maybe well again we discussed there are two types of problems again physical and logical physical like for example something happens with hardware logical human mistake or intentional attack and deleted data. If it's logical replica without lag, the change is propagated there as well, and so you cannot recover from replica in this case. Sorry, I'm talking about having a logical replica and doing a physical backup from that, like having PG
Starting point is 00:39:00 Backquest running off your logical replica. Ah, well, yes, it's possible. But again, it's the same data but different database. I mean, you will not get your blood. Sometimes we want to get our blood because, for example, for some kind of research, if some incident happens on production, we want to carefully research what happened, perform root cause analysis. In this case, we want exact copy of our data, exact clone including all the blood, how data is distributed in tables and
Starting point is 00:39:30 indexes and logical replica has different distribution of data. All I was thinking was maybe you might be able to achieve lower RTO, RPO on a cloud system even in the event of a major outage on that cloud, for example? Well, ideally, you should use what provider of managed service provides you. For example, RDS, they have good, reliable backups. I don't see details about testing of those backups. I cannot test them myself except restoring to some kind of explicit restoration and running some tests. But we kind of trust them because if these backups were not reliable, we would see a lot of
Starting point is 00:40:12 blog posts, tweets, LinkedIn messages about it, right? We don't, so they are reliable. They have large scale, right? But I don't like it's not open source. It's kind of they say it's Postgres, but I don't have access to database. I have access only to data. As I say, heavy clouds are gathering upon open source. Is the cloud pun intentional? Right. Right. So quick mentioning, some people who control their databases, not only data, they also consider using delayed replica because physical replication allows you to having replica which is like eight hours behind, for example.
Starting point is 00:40:57 And this is exactly for the reasons of faster recovery of some, like, something was deleted, let's replay only a few number of minutes or hours quickly. If we do it quickly, because if we already passed, like, something happened but we learned about it only 10 hours later, but our delayed replica is 9 hours behind. It's always too late. Or even if you find out about it like seven hours later, you're on like a stopwatch for you've got an hour to fix this problem. Right, right. But you can quickly pause it. Oh, true. Yeah, yeah. Good point.
Starting point is 00:41:37 Stopping replay. And then you can know when exactly the problem happened and replay exactly like one second before it, for example, and get, extract data manually. It happens. Sometimes it's a good idea to have such tool. But, for example, DatabaseLab users, they have even better. They can provision clone and there is a recipe to replay walls if a database lab engine is working based on physical replication, walls from archive for example, you can create snapshot which corresponds to time before the point of incident, then you can replay again from point of time
Starting point is 00:42:21 recovery, there is a recipe, it's not fully automated but it's definitely possible and then you can achieve very very good rto in this case this is for partial deletion when manually we deleted something cool good good i think we covered all all main principles of backups it's not super deep material but at least it's something maybe one day we will discuss more deeper things you know or specific tools maybe
Starting point is 00:42:55 awesome well thanks so much Nikolai thank you everybody for listening and catch you next week good see you later bye

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.