Postgres FM - Copying a database

Episode Date: January 13, 2023

Here are links to a few things we mentioned: pg_dump pg_restore  pgcopydb (tool by Dimitri Fontaine)  pg_dump and external snapshots (blog post by Michael Paquier) Systems Performance (b...ook by Brendan Gregg)  Performance troubleshooting methodologies (two part talk by Brendan Gregg)  pg_basebackup  wal-g pgBackRest Database Lab Engine  Database branching episode  ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov / @michristofides / @PostgresFM, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I'm Michael, founder of PGMastered, and this is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, what are we talking about today? Hi Michael, this topic is my choice this week. So I chose copy, how to copy the Postgres database. Let's discuss various options and the pros and cons and how to improve obstacles, everything, and use cases, of course. Yeah, that was going to be my first question.
Starting point is 00:00:29 When do you see people needing to do this most often? Right. Well, since we agreed that we are going to talk about Postgres questions, having some DBA ops experience, but trying to explain this for probably wider audience for developers, backend engineers, and so on. Let's try to think about use cases from this perspective. Of course, first of all, you need to copy Postgres database when you provision a new node in infrastructure. And this is like less developer friendly area right so you either want to have a replica standby node logical or physical or you want to provision another environment
Starting point is 00:01:14 or you want to bring data to from production to non-production and so on many cases but also there are cases when we design our system and need for example to split monolith to multiple services so we need to think about how we will be perform splitting and when we do split we probably need to copy all data right because we like it's a very good privilege if we can start from an empty database, a new service, I mean. Usually we need to keep old data to preserve it, so we need to copy it from old database, from a monolith database. Or some service database becomes too big and we see how we can split it as well. So we considered it as like also kind of monolith and splitting. So these are basic use cases. There are sometimes cases when we need to bring data from one place
Starting point is 00:02:13 to another place, and this is our routine job. And of course, I think every engineer should understand the latency and throughput and how to troubleshoot bottlenecks, how to find bottlenecks of the copy process. So these are probably the basic classes of use cases. Provisioning, splitting, and some kind of copying just because our application is designed so for example, it's also kind of provisioning, everything is provisioning. But even if you copy database on your laptop sometimes in small projects people do this and it has its own pros this approach it's also provisioning but to just to your laptop right on on that note if we start there if if we i guess we should ignore
Starting point is 00:03:00 for now worrying about personally identifiable information and privacy and things like that what do you see on the smaller side before we get into the bit larger databases on the smaller side copying databases around i've seen people have a lot of success with just simple pg dump pg restore not even multi-threaded and i've also seen because i don't know if we're going to talk about it again this being a good use case for even create database with template being even possibly faster thick cloning right thick cloning so it's like data if you have 100 gigabytes you clone you have 200 gigabytes used on disk yes well first of all let's start with logical. PgDumpRestore is a logical approach. So we copy data, transforming it to SQL statements, create a table, copy, or individual inserts sometimes. First of all, I would like to say that PgDump and PgRestore are quite advanced programs, very advanced tools.
Starting point is 00:04:04 They have a lot of options, a lot. And also formats. There's plain text format and there are custom and directory, so-called directory formats, which support parallelization and filtering. But this is definitely a logical level. And sometimes people think a dump is a backup for example which is very like i i don't agree with that it's a backup it's kind of backup but it's logical backup so you can run it and copy it of course it's always thick so because again if you load it you consume same space again but
Starting point is 00:04:43 you will lose the bloat if you had bloat when you dump restore you lose it, you consume the same space again, but you will lose the bloat. If you had bloat, when you dump restore, you lose it. So there should not be surprises if new copy takes less disk space, even if you took everything. Indexes are optimized. They are just created. PgDump puts them to the end of dump. So they are created after data is already loaded by pgRestore. If it's plain text format, you can even load it using PSQL, also an option. But since I'm usually dealing with quite large databases, dozens of terabytes, of course,
Starting point is 00:05:17 PgDump is like for large systems, it's quite, first of all, dangerous to, if you run it in production, it will keep transaction open at repeatable read level. It opens transaction at repeatable read levels to have persistent snapshot always. So it dumps tables that correlate, like there is correlation in data. So foreign keys won't be broken and so on. That's why a repeatable read snapshot is needed. And if you do it quite a lot, of course, auto-vacuum will be affected. So on large database, my recommendation would be to do it not only on a standby node, but on a standby node either with hot standby feedback off and increased limit for transactions, I don't remember the knob, which controls to run a pgdump for a whole database,
Starting point is 00:06:27 I would say have a clone, like detach it from cluster, and then temporary clone. In cloud, it's easy, right? So you create a replica probably from cloud snapshot, and then you just promote it. It doesn't stream changes from the primary anymore it's independent so you can run dump there and then you dispose it like destroying this and of course for throughput single thread is not enough we have a lot of cpus these days usually so i would speed it up
Starting point is 00:06:58 but for small databases i would say less than 10 gigabytes, definitely it's okay to use a single-threaded approach. 100 terabytes, 100 gigabytes terabyte, it's already time to think about parallelization. And here we come into issues with pgdump, pgrestore, because you cannot use parallelization if you want to restore immediately. You want to restore it, like dump and restore on the fly. It's possible if you have single-threaded approach. So you just use pipe, you dump and restore immediately.
Starting point is 00:07:36 So you don't need additional disk space. It's good. But if you need parallelization to have good throughput, and you probably want to have good throughput if you deal with real production nodes as well, because you don't want autowacuums workers to interfere with their work. So probably you want to finish it earlier. So you might want multiple workers. So you want dump restore. But in this case, you need first dump in directory format and then restore from it. Also, pg-restore also supports parallelization-j. So in this case, this is some disadvantage of pg-dump and pg-restore.
Starting point is 00:08:13 They cannot use parallelization and avoid creation of this temporary dump, which is solved by third-party tools. There is a tool from Dimitri Fontaine called pgcopyDB, quite new. One of the benefits of which is exactly this. It can run multiple threads and dump restore on the fly. Great. So this is for larger databases. But again, if you have smaller database, it's fine to use a single thread and choose either you want to dump it or dump restore on the fly. Right? So yeah, before we move on from that, I'd also noticed that both the custom and directory formats are
Starting point is 00:08:58 compressed by default, which is pretty cool. I didn't realize that. Yep. Yep. Compression is there. It's good good compression you probably want it if you have large databases and sometimes compression is significant but they also support filtering directory format supports i always mix custom and directory and always check documentation but one of these formats support filtering dash L small, uh, in pg restore tool that we have a pair of options dash L uppercase and lowercase. One of them allows you to see the list of the content of dump. So you can see, okay, we have table creation for this table here. We have data for this table here.
Starting point is 00:09:43 We have indexes. We have materialized to, have like refresh materialized to comment and so on. And then you can using grep, like in command line, it's easy using grep. You can choose only what you want or remove what you don't want. For example, if you have hierarchy of materialized use, Sometimes there is an issue with dependencies. So I remember I removed the commands to refresh materialized use and just performed it separately. Otherwise, you cannot restore. So this is quite handy if you need partial restoration. And also worth mentioning, dump,
Starting point is 00:10:21 pgDump, by default, it will produce errors but won't stop on an error oh wow yeah it's it's quite sometimes dangerous you you haven't noticed it restore i don't remember maybe the same but in general approach as usual i don't remember right but in general you should check the commutation here always like what's the default behavior? And decide, do you need to stop on error or ignore errors and to do best effort losing some objects and then return? Because, of course, in each case it's different. But in my experience, I usually want to stop. That's why default behavior for me doesn't usually work properly. And I check, okay,
Starting point is 00:11:03 again, we have an issue here. I would like to understand there is an issue and don't continue so I want to stop. There is an option, of course, to change behavior in both PgDump and PgRestore. And the docs for this are great. I'll link them up in the show notes. Reference docs are great. I would say reference docs, use cases, usually you want to find them in blog posts. More wisdom there in terms of use cases, usually you want to find them in blog posts. More wisdom there in terms of use cases, but references perfect, of course. And dash dash help always works. Right. So one more thing if of course, you can perform surgery with PgRestore
Starting point is 00:11:38 right, you can take what you want, you can combine things from different places, a lot of things can be done. But of course, if you want, for example, default behavior to stop on error, you should understand that probably you will need to start from beginning, right? So retry logic, you are responsible for retry logic. You need to either ignore errors or to stop on error and then to perform this like dash L pair of dash L, dash L comments and continue and so on. So it doesn't have good retry logic itself, but it gives you some options as like tooling to write your scripts as you want. So this is probably it about pgdump, pgrestore. One more thing interesting for developers probably. This repeatable read transaction, it's interesting that you can control it.
Starting point is 00:12:30 For example, you can open transaction yourself, export snapshot, and then command PidgeDump to use your snapshot. So you keep transaction open with snapshot already created, preventing autovacuum to clean up freshly that tuples, as we discussed before a couple of times. And then using function pgexportSnapshot, you know the snapshot name and you can feed the snapshot name to pgdump in command line. And this is quite interesting also. And actually, I think this is how parallelization works as well. Workers know exact snapshot name and they work synchronized, right? So they deal with the same snapshot. And this allows you to design some procedures to move data between places, understanding that your data is correlated. Right? And I wish developers of logical replication commercial tools understood it better
Starting point is 00:13:31 because some of them don't. For example, the tool called Qlik, former Attunity. I tried to explain this for one of our customers. I tried to explain to this company several times, like, guys, you have issues because you export data to analytical database, which doesn't have unique keys. If initial copy performed in a way that further CDC change data capture process, this switch is rough.
Starting point is 00:14:01 We don't understand these snapshots. We will have duplicates. So the trick is you open a logical replication slot, not using SQL, but using replication connection. It's kind of a separate, very small language, commons and replication connection. There you can create the same logical slot and know not only LSN, but also snapshot name. And then you can dump using that snapshot name. Easy.
Starting point is 00:14:26 And then when you switch to CDC, you don't think about duplicates at all. Right? But you cannot achieve this if you create logical slot using select PG logical slot create. I don't remember. As usual, I'm very bad with remembering function names and so on. And in this case, it's great. And exactly like, I think in each particular case, understanding this concept of snapshots and you can synchronize various Postgres workers, backends, telling them, okay, I want to deal with these snapshots. It's quite powerful. Right.
Starting point is 00:15:04 Yes, very cool. I was reading that blog post you shared from, is it Michael Pacquiao? I'll link it up as well. Almost 10 years old, but it's very well explaining this, what I just explained, right? Create replication
Starting point is 00:15:20 slot, logical, and you know snapshot name. And then you can even use pgdump with dash dash snapshot and that's it. So yeah, I guess when it comes to a point where pgdump is no longer or not great for us, what are our other options?
Starting point is 00:15:36 Well, you mentioned create database template. In Postgres we have this very old, by the way, approach. So we have template 0, template 1 created by default. It's like reference databases. And if you want all databases to have some, for example, table always, all freshly created databases, you can put something to template 1. It will be used by default.
Starting point is 00:16:00 But you can also say directly, explicitly say, I I want new database created based on that database, it will be thick cloning. So it will create a copy objects in regular way like CP, right. And of course, it will take time. By the way, we didn't discuss throughput and so on. With logical approach, it's worth to understand that time, you cannot say, okay, like one terabyte per hour. It's very hard to say because it depends a lot on the structure, first of all, indexes. A lot of time. So the logical copy, if you check CPU utilization, disk utilization, it's not evenly distributed.
Starting point is 00:16:46 First, we copy data. So first, of course, we create tables. And then we copy data. Very IO-intensive work. CPU is like nothing to calculate here at all. But then indexes. And this can take a lot of time. If you don't have indexes, it's easy.
Starting point is 00:17:06 But if you need to create a lot of indexes, because again, logical means we're transforming to SQL and then back to database, right? So it's like there is a change of state here. We had physical state, we transform it to SQL and then from SQL to physical state again. So when you need to create a lot of indexes, it's CPU-intensive work, of course. Maintenance work, ma'am, is worth checking, of course,
Starting point is 00:17:31 so maybe tuning and so on. And of course, when we do things, for example, if it's one time, it's okay. If not, it's not perfect. But if we know we will be performing dump restore routinely, so we need this procedure to to be optimized first thing to remember as i mentioned already about the stress on production we covered that but then how to improve and how to troubleshoot throughput troubleshooting throughput
Starting point is 00:17:59 here bottlenecks is usual so like in linux we check cpu load of course average is not enough we need to be able to understand each core separately 50 on average can be 50 each core or half of them 100 and the other half is not doing at all anything because if we use dash j for pg restore we can for example say half of our available cores or we can match available cores on destination it's interesting right so we just yeah we need to check each core and we also need to check disk load of course first of all throughput in terms of IOPS and maybe bytes per second, and we should know our limits and understand how far from there, and so on. In terms of rules of thumb, I saw some advice that if you're restoring to a machine
Starting point is 00:19:00 that that's all you're doing on that machine, the reasonable starting point is the number of cores as exactly number of threads unlike pgdump for pgdump we if we do it from production node we should think about a lot especially if it's the primary and like the most expensive resource we have there probably we should be very careful with dash j and control our CPU and disk load, definitely, because we are going to read a lot. By the way, interesting to understand that if our database is, for example, more than half of it is indexes, dumping indexes is very easy, unlike restoring. Dumping indexes is just definition. It's like one line. Data, yes, we
Starting point is 00:19:46 read it from disk. But indexes, we don't read it. It's good, right? So in terms of dumping. Restoring, yes, it's CPU intensive work. You will see many cores are like 100% busy. So if you're alone on destination node, yeah, dash j match
Starting point is 00:20:01 CPU. Yeah, we sometimes even do this either cut proc CPU in four or num CPU. I don't remember. We just take number of cores and put it to dash J dynamically. If we have eight cores, okay, it will be eight workers restoring and so on. Going back to something you mentioned earlier, dumping and restoring at the same time without having to, without needing to duplicate the data, without having to double the space on disk, what are other options for that?
Starting point is 00:20:33 At logical level, well, even at physical, I guess, is that when physical... Physical is very different. Physical level is absolutely different. And if we talk about opportunities for backend developers, of course, logical looks like more flexible because many things, filtering, control of throughput and so on. At physical level, in Postgres, we cannot copy one database, unfortunately. Unlike in some other systems, for example, SQL Server, we copy whole cluster, all databases inside it, unfortunately. So sometimes it matters there's no way to take only one database because of walls that walls they cannot like we have single stream
Starting point is 00:21:12 of walls and it's it's mixing all databases there so we cannot take only one of them why do we need walls when we copy at physical level it's takes some time and if you check throughput usually i mean not throughput if you check resources monitoring you will see it's quite uniform there is no cycle like okay we did the io intensive work now we switch to cpu intensive work like in in restoring at logic with pg restore at physical level it's just taking all files and bringing them new to new place but it takes time and about the speed here i would expect these days i would say one terabyte per hour is already so so probably you know like previously i changed i like i i i decided okay 2023 it's time to say one terabyte per hour. It's not enough already.
Starting point is 00:22:05 It's like very moderate. Probably you have not modern disks, right? And so probably you don't have good network. Maybe it depends. You also need to troubleshoot it in a normal Linux way. Of course, for troubleshooting, I think any engineer who wants to be full stack or backend engineer should read Brandon Gregg's books, System Performance, and so on. And at least watch on YouTube quite old two-parts talk from Brandon Gregg about performance troubleshooting methodologies. So it gives good basic understanding.
Starting point is 00:22:45 So if your network, for example, is not good or disks are slow either on source or destination, one terabyte per hour or even less, not good. Three terabytes per hour is good these days. I was going to ask. Yeah, it's method is simple here. Modern NVMe, well, should be at least one gigabyte per second right for reading and writing maybe two already right we have good already disks and in cloud
Starting point is 00:23:15 if you take abs volume nitro system most modern instances abs volume will be based on these disks it should produce you of Of course, you need to pay for IOPS in cloud, right? So it's expensive. But talking about big systems, 1 terabyte per hour is already so-so for raw physical file copy over network.
Starting point is 00:23:38 So 1 gigabyte per second, we have 3600 seconds per hour, 3TB per second we should have. Maybe more even, sometimes if everything is well tuned, network is good, maybe more. And of course there is no parallel conflicting, competing workloads. Okay, so this process takes time. And the key here is that if our PgData is 1 terabyte, 10 terabytes, during this time, when we start and when we finish, it's not consistent, right?
Starting point is 00:24:15 So we have inconsistent PgData directory copied. And to fix that, Post, can use checkpoint and walls to replay changes and to reach point of consistency. So if you do it manually old school, we say pg start backup, select pg start backup. We can do it on replica as well, which is good. and then we copy data directory not thinking okay with rsync for example of with scp anything it can feel dangerous but it's quite reliable we copy with walls accumulated in pg wall in dpg wall directory and then we say pg stop backup then we start postgres in new place and wait we need to wait it will take time to reach a point of consistency. This is how
Starting point is 00:25:07 physical, regular, thick cloning works. And if you use automation, Postgres has automation for this, pg-based backup. It will do basically a similar thing. Taking walls. I remember days, some years ago,
Starting point is 00:25:23 by default, it didn't bring walls. And so many people, including myself, made mistake using pg-based backup to create physical copy, and then trying to start it. But without walls, you cannot start it. It's inconsistent, as I just explained. So you needed to specify dash x or something some like additional option but modern versions of postgres take care of by default they take walls as well either additional connection at postgres level or using ssh connection so pg-based backup is is a very good tool but very big disadvantage of it. If something goes wrong, goes south,
Starting point is 00:26:07 we need to start again. And this is a huge disadvantage. And if you provision your nodes regularly, on a regular basis, for example, standby nodes, if, for example, one node goes down, we need to reprovision it. It can be automated
Starting point is 00:26:23 with Patroni, of course. First of all, it's stress. Second disadvantage, it's stress. We need to read everything. Disks are very busy. If we do it from the primary node on production, it's stress from primary node on production. And if we did 90% of copy and then something is wrong,
Starting point is 00:26:43 some network issue or something. It will restart. And more hours to spend if you have a multi-terabyte database. This advantage can be mitigated if you switch to a different approach for physical copy. It's called restore from archive. Here we should discuss backups briefly so dump is not backup why because it's a single snapshot and that's it it doesn't give you opportunity you do it and then if so it's like once per day or something but if a problem of obviously not if problem occurred later, and the distance
Starting point is 00:27:27 between our dump and the problem it's called in ops language, it's called RPO, store point objective. So this is the amount of data you lose, right? For backups is very bad, right? I hear the alternative get called point-in-time recovery quite often. Right. So point-in-time recovery, any modern backup tool such as Wall-G, PG-Backrest, and others, they support it. And they usually do two basic things. Full inconsistent copy of PG data.
Starting point is 00:28:00 Why inconsistent? Because it takes time to copy. And second, continuous stream of walls. So we archive two things and then we restore first of all we restore pgData in inconsistent state and we take all walls we need to reach consistency and also to replay to the latest available point by default or to the point you specified if you want point-in-time recovery. This is a backup system.
Starting point is 00:28:29 Dump is a very, very weak backup system. PgDump, I mean, PgRestore. It's flexible. It's good. In many cases, it's much better. For example, if you want to take only one database or some tables in it, of course, PgDump is better because taking whole clusters is more expensive. So in this case, if you use this tool, usually data is stored in object store,
Starting point is 00:28:51 such as S3 or GCP, Google storage, cloud storage, GCS or others, Azure blob storage and so on. And also it takes care about encryption, or about compression, encryption as well, but compression. We need compression. Data files in Postgres and PgData are shrinked to one gigabyte files. So if you have one terabyte table, by the way, you shouldn't. You should have partitioning. You will see a lot of one gigabyte files in the data directory, in data directory base, subdirector. And then you need to navigate using OIDs.
Starting point is 00:29:35 So it compresses it. And also it also shrinks into chunks probably to be able to retry more politely. For example, I remember we did some work with LG developers to improve how it works for Google Cloud because retry logic was working differently compared to S3 there. So we needed to, like, you don't want to retry when you upload, when you backup, you don't want to retry a whole file. Even if it's compressed, it takes only 300 megabytes, for example, like three times compressed. Still retrying the whole may be not good, so you need chunks.
Starting point is 00:30:16 It's everything automated. And then to join chunks on the cloud side, on this object storage side. But it gives you opportunity to have retries both for safe to storage and to retrieve. In this case, if you use pgBaseBackup, as I said, some issue you need to restart. But with VaultG or pgBackrest, if you restore from archive, first of all, you don't put stress on any production node. And second, you have retries. If it fails at 90%, maybe this is an issue with network, for example. Okay, we will wait until our cloud provider fixes this issue,
Starting point is 00:30:57 and we retry from this exact point. This is more efficient, of course. But there is one big disadvantage of using these tools. We should trust them. Backups also can have issues. For example, corruption. Well, replica nodes and standby nodes also can have corruption compared to primary. Primary can have corruption as well.
Starting point is 00:31:22 Backups also can have corruption. That's why they try to improve everything. I know PgBackrest has quite strong work with checksums and so on to verify everything and so on. So it's an interesting topic. But in general, if you want physical, by default, I wouldn't use PgBaseBackup if I have proper archives. I would restore from archives. Also, you can parallelize this work. Unlike pgBaseBackup, you can say, okay, object storage like S3, AWS S3, it's quite good in terms of multiple workers. You can suck data using 16 workers and you see it's good. Because also you have SSD and VME SSD, it's also good in terms of parallelization. You have a lot of CPUs. Use 16 workers, for example.
Starting point is 00:32:19 And in this case, if you have good disks, like big system, you have expensive, you pay for IOPS and so on, like provisioned IOPS. In this case, I would say try to achieve 3 terabytes per hour. This is good. And this extract database from archives should be very fast. And then it's only the question of how much it will take for Postgres to start to achieve consistency point and to replay additional walls. This is actually it in terms of physical. It's maybe less interesting than dump restore. Yeah.
Starting point is 00:32:54 Well, it depends, right? I think that's the nice thing about this topic. It's a lot, depending on the use case, depending on exactly what you need at any given point, either is useful. Yeah. You know what I'm wondering still? I'm wondering already several years.
Starting point is 00:33:11 I think it's underappreciated, the approach when instead of copying whole inconsistent PgData, we take cloud snapshot, which should take only minutes, even if you have terabyte, dozens of terabytes. And then you replay walls to achieve the point you need. I think it should be like if Wall-G and PGB Crest consider themselves cloud-friendly, they should keep this as an option instead of copying a lot of terabytes. Why? We can provision disk.
Starting point is 00:33:45 If you don't need to change file system. Of course, if we need to change file system, we cannot use Cloud Snapshot. But these Cloud Snapshots, they are quite powerful. Sometimes they are really fast. Sometimes they have stress to create. I saw under heavy load, if you create snapshot on disk on production node,
Starting point is 00:34:07 at some point you have issue, like a couple of minutes, latency spike and everything slows down. So it's interesting how it's implemented under the hood. Is this on like a database as a service platform? No, I'm talking about regular snapshots of network-attached volumes like EBS volume or PD, persistent disk on GCP. It was on GCP actually, I think. Such behavior observed. And sometimes I remember some people reported
Starting point is 00:34:42 they could not restore from such snapshot. But the good thing about these cloud snapshots is that they are incremental. They are definitely like copy and write. But of course, if you want to restore, you create another volume. You pay for this volume, definitely. So for you, it looks like thick, not thin. For cloud, it's thin. They save a lot of, of course save a lot of resources on hardware.
Starting point is 00:35:09 But last thing probably to touch this also topic close to my home, it's if you want many snapshots and clones on one machine, perfect for non-production, development, testing, and so on. Of course, thin cloning using LVM2 or ZFS looks very good because you pay only for one disk and have many clones, which for application behavior is independent. And Database Lab Engine achieves exactly this. It automates it providing CLI API and UI so
Starting point is 00:35:49 it will take only a second to create snapshot it will take only a second or two to create a new clone based on that snapshot and you can have a powerful standby node right so you run it using database lab engine
Starting point is 00:36:05 and it replays walls, for example, on physical level. It works well. Like interesting that to deal with thin cloning, we needed to explore thick cloning in very detail. So because to provision our node, we also need to bring data initially. And there is physical and logical option and many, many aspects of it.
Starting point is 00:36:24 How to parallelize, how to, like, can we parallelize on the fly? We cannot using pgdump, pgrestore, pgcopyDB. We have plans to integrate with it. Not yet, but it would be good as an option. Or we can do it physically using pgbackrest, walg, from archives and so on. Like a lot of options to choose from. But then you have constantly replaying
Starting point is 00:36:49 walls at physical level, for example. It's like, it behaves like a replica, but for your environments, it looks like dozens of environments, right?
Starting point is 00:36:59 So it's quite interesting. Yeah, I think people that haven't seen it before it's pretty magical when you see a huge database and you can have your own isolated copy of it in seconds. Think cloning for the win.
Starting point is 00:37:14 Think cloning and we already developed branching. We discussed it. We had an episode about branching. So we have that Bessler-Pengin 4.0 first alpha. We are testing it internally. Soon I will present it for a wider audience to test.
Starting point is 00:37:30 And it's quite interesting because indeed it's similar to Git. You can change, you can commit, you can have new snapshot, like commit ID with comment, and you can create new clones from there to share it with colleagues and so on. So, yeah, but for how to copy database topic, I think we covered practically everything. Logical, physical, cloud clones, which, ah, okay. Also, Aurora has thin clones, right? But for storage, you don't pay for additional storage, but you need to pay for each instance that works with such clone. And finally, local thin clones we work with.
Starting point is 00:38:14 Okay, four big classes. Nice. So I think that is everything, any last things you wanted to mention? Well, uh, test everything before you decide test, optimize, test, optimize, especially these days, like recently Google cloud increased storage prices. We, and of course, economical situation is not perfect these days. And, uh, I think more people will try to optimize their costs and how you copy how you use resources in terms of storage first of all and also compute because if cloning of one terabyte takes a day that's not right so i think engineers it's good to improve skills in this area right to understand various options and how to test,
Starting point is 00:39:07 how to troubleshoot and how to make choices. So your company doesn't lose money for, for nothing. Right. Yeah. Well said. Wonderful. Well,
Starting point is 00:39:19 thanks so much. Thanks everyone. And catch you next week. See you next week.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.