Postgres FM - Copying a database
Episode Date: January 13, 2023Here are links to a few things we mentioned: pg_dump pg_restore  pgcopydb (tool by Dimitri Fontaine)  pg_dump and external snapshots (blog post by Michael Paquier) Systems Performance (b...ook by Brendan Gregg)  Performance troubleshooting methodologies (two part talk by Brendan Gregg)  pg_basebackup  wal-g pgBackRest Database Lab Engine  Database branching episode  ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov / @michristofides / @PostgresFM, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I'm Michael, founder of PGMastered, and this is my co-host Nikolai, founder of PostgresAI.
Hey Nikolai, what are we talking about today?
Hi Michael, this topic is my choice this week.
So I chose copy, how to copy the Postgres database.
Let's discuss various options and the pros and cons and how to improve obstacles, everything, and
use cases, of course.
Yeah, that was going to be my first question.
When do you see people needing to do this most often?
Right.
Well, since we agreed that we are going to talk about Postgres questions, having some
DBA ops experience, but trying to explain this for probably wider audience for
developers, backend engineers, and so on. Let's try to think about use cases from this perspective.
Of course, first of all, you need to copy Postgres database when you provision a new
node in infrastructure. And this is like less developer friendly area right so you either want to have
a replica standby node logical or physical or you want to provision another environment
or you want to bring data to from production to non-production and so on many cases but also there are cases when we design our system and need for example to split
monolith to multiple services so we need to think about how we will be perform splitting and
when we do split we probably need to copy all data right because we like it's a very good
privilege if we can start from an empty database, a
new service, I mean. Usually we need to keep old data to preserve it, so we need to copy
it from old database, from a monolith database. Or some service database becomes too big and
we see how we can split it as well. So we considered it as like also kind of monolith and splitting.
So these are basic use cases. There are sometimes cases when we need to bring data from one place
to another place, and this is our routine job. And of course, I think every engineer should
understand the latency and throughput and how to troubleshoot bottlenecks, how to find bottlenecks of the copy process.
So these are probably the basic classes of use cases.
Provisioning, splitting, and some kind of copying just because our application is designed
so for example, it's also kind of provisioning, everything is provisioning.
But even if you copy database on your laptop sometimes
in small projects people do this and it has its own pros this approach it's also provisioning
but to just to your laptop right on on that note if we start there if if we i guess we should ignore
for now worrying about personally identifiable information and privacy and things
like that what do you see on the smaller side before we get into the bit larger databases
on the smaller side copying databases around i've seen people have a lot of success with just simple
pg dump pg restore not even multi-threaded and i've also seen because i don't know if we're
going to talk about it again this being a good use case for even create database with template being even possibly faster
thick cloning right thick cloning so it's like data if you have 100 gigabytes you clone you
have 200 gigabytes used on disk yes well first of all let's start with logical. PgDumpRestore is a logical approach. So we copy data, transforming it to SQL statements, create a table, copy, or individual inserts sometimes.
First of all, I would like to say that PgDump and PgRestore are quite advanced programs, very advanced tools.
They have a lot of options, a lot.
And also formats.
There's plain text format and there are custom and directory, so-called directory formats,
which support parallelization and filtering.
But this is definitely a logical level.
And sometimes people think a dump is a backup for example which is very like i i don't
agree with that it's a backup it's kind of backup but it's logical backup so you can run it and copy
it of course it's always thick so because again if you load it you consume same space again but
you will lose the bloat if you had bloat when you dump restore you lose it, you consume the same space again, but you will lose the bloat. If you had bloat, when you dump restore, you lose it.
So there should not be surprises if new copy takes less disk space, even if you took everything.
Indexes are optimized.
They are just created.
PgDump puts them to the end of dump.
So they are created after data is already loaded by pgRestore.
If it's plain text format, you can even load it using PSQL, also an option.
But since I'm usually dealing with quite large databases, dozens of terabytes, of course,
PgDump is like for large systems, it's quite, first of all, dangerous to, if you run it in production, it will keep
transaction open at repeatable read level.
It opens transaction at repeatable read levels to have persistent snapshot always.
So it dumps tables that correlate, like there is correlation in data.
So foreign keys won't be broken and so on.
That's why a repeatable read snapshot is needed.
And if you do it quite a lot, of course, auto-vacuum will be affected.
So on large database, my recommendation would be to do it not only on a standby node, but on a standby node either with hot standby feedback off and increased limit for transactions, I don't remember the knob, which controls to run a pgdump for a whole database,
I would say have a clone, like detach it from cluster,
and then temporary clone.
In cloud, it's easy, right?
So you create a replica probably from cloud snapshot,
and then you just promote it.
It doesn't stream changes from the primary anymore it's independent
so you can run dump there and then you dispose it like destroying this and of course for throughput
single thread is not enough we have a lot of cpus these days usually so i would speed it up
but for small databases i would say less than 10 gigabytes, definitely it's okay to use a single-threaded approach.
100 terabytes, 100 gigabytes terabyte, it's already time to think about parallelization.
And here we come into issues with pgdump, pgrestore, because you cannot use parallelization if you want to restore immediately.
You want to restore it,
like dump and restore on the fly.
It's possible if you have single-threaded approach.
So you just use pipe,
you dump and restore immediately.
So you don't need additional disk space.
It's good. But if you need parallelization to have good throughput,
and you probably want to have good throughput
if you deal with real production
nodes as well, because you don't want autowacuums workers to interfere with their work. So probably
you want to finish it earlier. So you might want multiple workers. So you want dump restore. But
in this case, you need first dump in directory format and then restore from it. Also, pg-restore also supports parallelization-j.
So in this case, this is some disadvantage of pg-dump and pg-restore.
They cannot use parallelization and avoid creation of this temporary dump,
which is solved by third-party tools.
There is a tool from Dimitri Fontaine called pgcopyDB,
quite new. One of the benefits of which is exactly this. It can run multiple threads
and dump restore on the fly. Great. So this is for larger databases. But again, if you
have smaller database, it's fine to use a single thread
and choose either you want to dump it or dump restore on the fly. Right? So yeah,
before we move on from that, I'd also noticed that both the custom and directory formats are
compressed by default, which is pretty cool. I didn't realize that. Yep. Yep. Compression is
there. It's good good compression you probably want it
if you have large databases and sometimes compression is significant but they also
support filtering directory format supports i always mix custom and directory and always check
documentation but one of these formats support filtering dash L small, uh, in pg restore tool that we have a pair of options dash L uppercase and lowercase.
One of them allows you to see the list of the content of dump.
So you can see, okay, we have table creation for this table here.
We have data for this table here.
We have indexes.
We have materialized to,
have like refresh materialized to comment and so on. And then you can using grep,
like in command line, it's easy using grep. You can choose only what you want or remove what you
don't want. For example, if you have hierarchy of materialized use, Sometimes there is an issue with dependencies.
So I remember I removed the commands to refresh materialized use and just performed it separately. Otherwise, you cannot
restore. So this is quite handy if you need
partial restoration. And also worth mentioning, dump,
pgDump, by default, it will produce errors but won't
stop on an error oh wow yeah it's it's
quite sometimes dangerous you you haven't noticed it restore i don't remember maybe the same but
in general approach as usual i don't remember right but in general you should check the
commutation here always like what's the default behavior? And decide, do you need to stop on
error or ignore errors and to do best effort losing some objects and then return? Because,
of course, in each case it's different. But in my experience, I usually want to stop.
That's why default behavior for me doesn't usually work properly. And I check, okay,
again, we have an issue here. I would like to understand there is an issue and don't continue
so I want to stop. There is an option, of course, to change behavior in both
PgDump and PgRestore. And the docs for this are great.
I'll link them up in the show notes. Reference docs are great. I would say reference docs,
use cases, usually you want to find them in blog posts.
More wisdom there in terms of use cases, usually you want to find them in blog posts. More wisdom there in terms of use cases, but references perfect,
of course. And dash dash help always works. Right. So one more
thing if of course, you can perform surgery with PgRestore
right, you can take what you want, you can combine things
from different places, a lot of things can be done.
But of course, if you want, for example, default behavior to stop on error,
you should understand that probably you will need to start from beginning, right? So retry logic, you are responsible for retry logic. You need to either ignore errors or to stop on error and then to perform this like dash L pair of dash L, dash L comments and continue and so on.
So it doesn't have good retry logic itself, but it gives you some options as like tooling to write your scripts as you want.
So this is probably it about pgdump, pgrestore.
One more thing interesting for developers probably.
This repeatable read transaction, it's interesting that you can control it.
For example, you can open transaction yourself, export snapshot, and then command PidgeDump to use your snapshot.
So you keep transaction open with snapshot already created, preventing autovacuum to clean up freshly that tuples, as we discussed before a couple of times.
And then using function pgexportSnapshot, you know the snapshot name and you can feed the snapshot name to pgdump in command line.
And this is quite interesting also.
And actually, I think this is how parallelization works as well.
Workers know exact snapshot name and they work synchronized, right?
So they deal with the same snapshot.
And this allows you to design some procedures to move data between places, understanding that your data is correlated. Right? And I wish developers of logical replication commercial tools understood it better
because some of them don't.
For example, the tool called Qlik, former Attunity.
I tried to explain this for one of our customers.
I tried to explain to this company several times, like,
guys, you have issues
because you export data to analytical database, which doesn't have unique keys.
If initial copy performed in a way that further CDC change data capture process, this switch
is rough.
We don't understand these snapshots.
We will have duplicates.
So the trick is you open a logical replication slot, not using SQL, but using replication
connection.
It's kind of a separate, very small language, commons and replication connection.
There you can create the same logical slot and know not only LSN, but also snapshot name.
And then you can dump using that snapshot name.
Easy.
And then when you switch to CDC, you don't think about duplicates at all.
Right?
But you cannot achieve this if you create logical slot using select PG logical slot create.
I don't remember.
As usual, I'm very bad with remembering function names and so on.
And in this case, it's great. And exactly like, I think in each particular case, understanding this
concept of snapshots and you can synchronize various Postgres workers, backends, telling them,
okay, I want to deal with these snapshots. It's quite powerful. Right.
Yes, very cool.
I was reading that blog post you shared
from, is it Michael Pacquiao?
I'll link it up as well.
Almost 10 years old, but
it's very well explaining
this, what I just
explained, right? Create replication
slot, logical, and you know
snapshot name. And then
you can even use pgdump
with dash dash snapshot and that's it.
So yeah, I guess when
it comes to a point where pgdump is no
longer or not great for us,
what are our other options?
Well, you mentioned
create database
template. In Postgres we have this
very old, by the way,
approach. So we have template 0, template 1 created by default.
It's like reference databases.
And if you want all databases to have some, for example, table always, all freshly created databases, you can put something to template 1.
It will be used by default.
But you can also say directly, explicitly say, I I want new database created based on that database, it will be thick
cloning. So it will create a copy objects in regular way like
CP, right. And of course, it will take time. By the way, we
didn't discuss throughput and so on. With logical approach, it's
worth to understand that time, you cannot say, okay, like one terabyte per hour.
It's very hard to say because it depends a lot on the structure, first of all, indexes.
A lot of time.
So the logical copy, if you check CPU utilization, disk utilization, it's not evenly distributed.
First, we copy data.
So first, of course, we create tables.
And then we copy data.
Very IO-intensive work.
CPU is like nothing to calculate here at all.
But then indexes.
And this can take a lot of time.
If you don't have indexes, it's easy.
But if you need to create a lot of indexes, because again, logical means we're transforming to SQL
and then back to database, right?
So it's like there is a change of state here.
We had physical state, we transform it to SQL
and then from SQL to physical state again.
So when you need to create a lot of indexes,
it's CPU-intensive work, of course.
Maintenance work, ma'am, is worth checking, of course,
so maybe tuning and so on.
And of course, when we do things, for example,
if it's one time, it's okay.
If not, it's not perfect.
But if we know we will be performing dump restore routinely,
so we need this procedure to to
be optimized first thing to remember as i mentioned already about the stress on production we covered
that but then how to improve and how to troubleshoot throughput troubleshooting throughput
here bottlenecks is usual so like in linux we check cpu load of course average is not enough we need to
be able to understand each core separately 50 on average can be 50 each core or half of them
100 and the other half is not doing at all anything because if we use dash j for pg restore
we can for example say half of our available cores or we can match available cores on destination
it's interesting right so we just yeah we need to check each core and we also need to check
disk load of course first of all throughput in terms of IOPS and maybe bytes per second,
and we should know our limits and understand how far from there, and so on.
In terms of rules of thumb, I saw some advice that if you're restoring to a machine
that that's all you're doing on that machine,
the reasonable starting point is the number
of cores as exactly number of threads unlike pgdump for pgdump we if we do it from production
node we should think about a lot especially if it's the primary and like the most expensive
resource we have there probably we should be very careful with dash j and control
our CPU and disk load, definitely, because we are going to read a lot. By the way, interesting to
understand that if our database is, for example, more than half of it is indexes, dumping indexes
is very easy, unlike restoring. Dumping indexes is just definition. It's like one line. Data, yes, we
read it from disk. But indexes, we don't read
it. It's good, right? So in terms
of dumping. Restoring, yes, it's CPU
intensive work. You will see
many cores are like
100% busy.
So if you're alone on destination
node, yeah, dash j match
CPU. Yeah, we sometimes even do
this either cut proc CPU in four or num CPU.
I don't remember.
We just take number of cores and put it to dash J dynamically.
If we have eight cores, okay, it will be eight workers restoring and so on.
Going back to something you mentioned earlier, dumping and restoring at the same time without
having to, without needing to duplicate the data, without having to double the space on
disk, what are other options for that?
At logical level, well, even at physical, I guess, is that when physical...
Physical is very different.
Physical level is absolutely different.
And if we talk about opportunities for backend developers,
of course, logical looks like more flexible because many things, filtering, control of throughput and so on. At physical level, in Postgres, we cannot copy one database, unfortunately.
Unlike in some other systems, for example, SQL Server, we copy whole cluster, all databases
inside it, unfortunately. So sometimes it matters there's no
way to take only one database because of walls that walls they cannot like we have single stream
of walls and it's it's mixing all databases there so we cannot take only one of them why do we need
walls when we copy at physical level it's takes some time and if you check throughput
usually i mean not throughput if you check resources monitoring you will see it's quite
uniform there is no cycle like okay we did the io intensive work now we switch to cpu intensive work
like in in restoring at logic with pg restore at physical level it's just taking all files and bringing them new to new
place but it takes time and about the speed here i would expect these days i would say one terabyte
per hour is already so so probably you know like previously i changed i like i i i decided okay
2023 it's time to say one terabyte per hour. It's not enough already.
It's like very moderate.
Probably you have not modern disks, right?
And so probably you don't have good network.
Maybe it depends.
You also need to troubleshoot it in a normal Linux way.
Of course, for troubleshooting, I think any engineer who wants to be full stack or backend engineer should read Brandon Gregg's books, System Performance, and so on.
And at least watch on YouTube quite old two-parts talk from Brandon Gregg about performance troubleshooting methodologies.
So it gives good basic understanding.
So if your network, for example, is not good
or disks are slow either on source or destination,
one terabyte per hour or even less, not good.
Three terabytes per hour is good these days.
I was going to ask.
Yeah, it's method is simple here.
Modern NVMe, well, should be at least one gigabyte per second
right for reading and writing maybe two already right we have good already disks and in cloud
if you take abs volume nitro system most modern instances abs volume will be based on these disks
it should produce you of Of course, you need to
pay for IOPS in cloud, right?
So it's expensive. But
talking about big systems,
1 terabyte per hour is already
so-so for raw physical
file copy over network.
So 1 gigabyte per second,
we have
3600 seconds
per hour, 3TB per second we should have. Maybe more even, sometimes if everything is well tuned, network is good, maybe more.
And of course there is no parallel conflicting, competing workloads. Okay, so this process takes time.
And the key here is that if our PgData is 1 terabyte, 10 terabytes,
during this time, when we start and when we finish,
it's not consistent, right?
So we have inconsistent PgData directory copied.
And to fix that, Post, can use checkpoint and walls to replay changes
and to reach point of consistency. So if you do it manually old school, we say pg start backup,
select pg start backup. We can do it on replica as well, which is good. and then we copy data directory not thinking okay with rsync for example of with
scp anything it can feel dangerous but it's quite reliable we copy with walls accumulated
in pg wall in dpg wall directory and then we say pg stop backup then we start postgres in new place
and wait we need to wait it will take time to reach a point of consistency.
This is how
physical, regular, thick
cloning works.
And if you use automation,
Postgres has automation for this,
pg-based backup. It will do basically
a similar thing.
Taking walls. I remember days,
some years ago,
by default, it didn't bring walls. And so many
people, including myself, made mistake using pg-based backup to create physical copy,
and then trying to start it. But without walls, you cannot start it. It's inconsistent,
as I just explained. So you needed to specify dash x or something some like additional
option but modern versions of postgres take care of by default they take walls as well either
additional connection at postgres level or using ssh connection so pg-based backup is is a very
good tool but very big disadvantage of it. If something goes wrong,
goes south,
we need to start again.
And this is a huge
disadvantage. And if
you provision your nodes
regularly, on a regular basis,
for example, standby nodes,
if, for example, one node goes down, we need to
reprovision it. It can be automated
with Patroni, of course.
First of all, it's stress.
Second disadvantage, it's stress.
We need to read everything.
Disks are very busy.
If we do it from the primary node on production,
it's stress from primary node on production.
And if we did 90% of copy and then something is wrong,
some network issue or something.
It will restart.
And more hours to spend if you have a multi-terabyte database.
This advantage can be mitigated if you switch to a different approach for physical copy.
It's called restore from archive.
Here we should discuss backups briefly so dump is not
backup why because it's a single snapshot and that's it it doesn't give you opportunity you do
it and then if so it's like once per day or something but if a problem of obviously not if problem occurred later, and the distance
between our dump and the problem it's called in ops language, it's called RPO, store point
objective. So this is the amount of data you lose, right? For backups is very bad, right?
I hear the alternative get called point-in-time recovery quite often.
Right.
So point-in-time recovery, any modern backup tool such as Wall-G,
PG-Backrest, and others, they support it.
And they usually do two basic things.
Full inconsistent copy of PG data.
Why inconsistent?
Because it takes time to copy.
And second, continuous stream of walls.
So we archive two things and then we restore first of all we restore pgData in inconsistent
state and we take all walls we need to reach consistency and also to replay to the latest
available point by default or to the point you specified if you
want point-in-time recovery.
This is a backup system.
Dump is a very, very weak backup system.
PgDump, I mean, PgRestore.
It's flexible.
It's good.
In many cases, it's much better.
For example, if you want to take only one database or some tables in it, of course,
PgDump is better because taking whole clusters is more expensive.
So in this case, if you use this tool, usually data is stored in object store,
such as S3 or GCP, Google storage, cloud storage, GCS or others, Azure blob storage
and so on.
And also it takes care about encryption, or about compression, encryption as well, but
compression.
We need compression.
Data files in Postgres and PgData are shrinked to one gigabyte files.
So if you have one terabyte table, by the way, you shouldn't. You should have partitioning. You will see a lot of one gigabyte files in the data directory, in data directory base, subdirector.
And then you need to navigate using OIDs.
So it compresses it.
And also it also shrinks into chunks probably to be able to retry more politely. For example, I remember we did some work with LG developers
to improve how it works for Google Cloud
because retry logic was working differently compared to S3 there.
So we needed to, like, you don't want to retry when you upload,
when you backup, you don't want to retry a whole file.
Even if it's compressed, it takes only 300 megabytes, for example, like three times compressed.
Still retrying the whole may be not good, so you need chunks.
It's everything automated.
And then to join chunks on the cloud side, on this object storage side. But it gives you opportunity to have retries both for safe to storage and to retrieve.
In this case, if you use pgBaseBackup, as I said, some issue you need to restart.
But with VaultG or pgBackrest, if you restore from archive, first of all, you don't put
stress on any production node.
And second, you have retries.
If it fails at 90%, maybe this is an issue with network, for example.
Okay, we will wait until our cloud provider fixes this issue,
and we retry from this exact point.
This is more efficient, of course.
But there is one big disadvantage of using these tools.
We should trust them.
Backups also can have issues.
For example, corruption.
Well, replica nodes and standby nodes also can have corruption compared to primary.
Primary can have corruption as well.
Backups also can have corruption.
That's why they try to improve everything.
I know PgBackrest has quite strong work with checksums and so on to verify everything and so on.
So it's an interesting topic.
But in general, if you want physical, by default, I wouldn't use PgBaseBackup if I have proper archives. I would restore
from archives. Also, you can parallelize this work. Unlike pgBaseBackup, you can say, okay,
object storage like S3, AWS S3, it's quite good in terms of multiple workers. You can suck data using 16 workers and you see it's good.
Because also you have SSD and VME SSD, it's also good in terms of parallelization. You have a lot of CPUs. Use 16 workers, for example.
And in this case, if you have good disks, like big system, you have expensive, you pay for IOPS and so on, like provisioned IOPS.
In this case, I would say try to achieve 3 terabytes per hour.
This is good.
And this extract database from archives should be very fast.
And then it's only the question of how much it will take for Postgres to start to achieve consistency point and to replay additional walls.
This is actually it in terms of physical.
It's maybe less interesting than dump restore.
Yeah.
Well, it depends, right?
I think that's the nice thing about this topic.
It's a lot, depending on the use case,
depending on exactly what you need at any given point,
either is useful.
Yeah.
You know what I'm wondering still?
I'm wondering already several years.
I think it's underappreciated, the approach when instead of copying whole inconsistent
PgData, we take cloud snapshot, which should take only minutes, even if you have terabyte,
dozens of terabytes.
And then you replay walls to achieve the point you need.
I think it should be like if Wall-G and PGB Crest consider themselves cloud-friendly,
they should keep this as an option instead of copying a lot of terabytes.
Why?
We can provision disk.
If you don't need to change file system.
Of course, if we need to change file system,
we cannot use Cloud Snapshot.
But these Cloud Snapshots, they are quite powerful.
Sometimes they are really fast.
Sometimes they have stress to create.
I saw under heavy load,
if you create snapshot on disk on production node,
at some point you have issue, like a couple of minutes, latency spike and everything slows down.
So it's interesting how it's implemented under the hood.
Is this on like a database as a service platform?
No, I'm talking about regular snapshots of network-attached volumes like EBS volume
or PD, persistent disk on GCP.
It was on GCP actually, I think.
Such behavior observed.
And sometimes I remember some people reported
they could not restore from such snapshot.
But the good thing about these cloud snapshots is that they are incremental.
They are definitely like copy and write.
But of course, if you want to restore, you create another volume.
You pay for this volume, definitely.
So for you, it looks like thick, not thin.
For cloud, it's thin.
They save a lot of, of course save a lot of resources on hardware.
But last thing probably to touch this also topic close to my home, it's if you want many snapshots
and clones on one machine, perfect for non-production, development, testing, and so on.
Of course, thin cloning using LVM2 or ZFS looks very good because you pay only for one
disk and have many clones, which for application behavior is independent.
And Database Lab Engine achieves exactly this.
It automates it
providing CLI
API and UI so
it will take only a second to create
snapshot it will take only a second or
two to create a new clone
based on that snapshot and
you can have a powerful
standby node
right so you
run it using database lab engine
and it replays walls, for example, on physical level.
It works well.
Like interesting that to deal with thin cloning,
we needed to explore thick cloning in very detail.
So because to provision our node,
we also need to bring data initially.
And there is physical and logical option
and many, many aspects of it.
How to parallelize, how to, like, can we parallelize on the fly?
We cannot using pgdump, pgrestore, pgcopyDB.
We have plans to integrate with it.
Not yet, but it would be good as an option.
Or we can do it physically using pgbackrest, walg, from archives and so on.
Like a lot of options to choose from.
But then you have
constantly replaying
walls at physical level,
for example.
It's like,
it behaves like a replica,
but for your environments,
it looks like
dozens of environments,
right?
So it's quite interesting.
Yeah, I think people
that haven't seen it before
it's pretty magical when you
see a huge database and you
can have your own isolated copy
of it in seconds.
Think cloning for the win.
Think cloning and
we already developed branching.
We discussed it.
We had an episode about branching. So we have
that Bessler-Pengin 4.0
first alpha.
We are testing it internally.
Soon I will present it for a wider audience to test.
And it's quite interesting because indeed it's similar to Git.
You can change, you can commit, you can have new snapshot,
like commit ID with comment, and you can create new clones from there
to share it with colleagues and so on.
So, yeah, but for how to copy database topic, I think we covered practically everything.
Logical, physical, cloud clones, which, ah, okay.
Also, Aurora has thin clones, right?
But for storage, you don't pay for additional storage, but you need to pay for each instance that works with such clone. And finally, local thin clones we work with.
Okay, four big classes.
Nice. So I think that is everything, any last things you wanted to mention?
Well, uh, test everything before you decide test, optimize, test, optimize, especially these days, like recently Google cloud increased storage prices.
We, and of course, economical situation is not perfect these days.
And, uh, I think more people will try to optimize their costs
and how you copy how you use resources in terms of storage first of all and also compute because
if cloning of one terabyte takes a day that's not right so i think engineers it's good to
improve skills in this area right to understand various options and how to test,
how to troubleshoot and how to make choices.
So your company doesn't lose money for,
for nothing.
Right.
Yeah.
Well said.
Wonderful.
Well,
thanks so much.
Thanks everyone.
And catch you next week.
See you next week.