Postgres FM - Logical replication
Episode Date: September 15, 2023Nikolay and Michael discuss logical replication — some history, initialization, change data capture, how to scale it, some limitiations, and ways that it is getting better. Here are some l...inks to some things they mentioned:Logical replication https://www.postgresql.org/docs/current/logical-replication.html GitLab upgraded multi-terabyte, heavily-loaded clusters with zero-downtime https://twitter.com/samokhvalov/status/1700574156222505276 pg_waldump https://www.postgresql.org/docs/current/pgwaldump.html pg_dump and external snapshots (blog post by Michael Paquier) https://paquier.xyz/postgresql-2/postgres-9-5-feature-highlight-pg-dump-snapshots/ Failover of logical replication slots in Patroni (talk by Alexander Kukushkin) https://www.youtube.com/watch?v=SllJsbPVaow Our episode on replication https://postgres.fm/episodes/replication ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, this is Nikolai, this is episode 63.
What are we going to talk about today, Nikolai?
Let's talk about logical replication decoding, everything logical.
Everything logical, yeah, I like it.
So this was your choice, why did you choose it?
Just because I realized we didn't talk about it.
And also I wanted to congratulate our colleagues from GitLab
with very, very good, successful upgrades
to major upgrades without downtime at all
and without data loss at all.
So it was based on logical.
So I think I have something to say about logical now
based on that work.
I expect from them, we actually discussed with them
and agreed that there should be materials published, textual and also talks.
We actually could invite them to discuss it.
It's probably a good idea.
But it will be mostly about major upgrades involving logical.
But let's discuss logical itself.
Because it's interesting that more and more systems are using it.
Yeah, absolutely. And it's getting better and better, right? It's been getting better in recent
releases. It's about to get better again in the next release. And you were just telling me before
the call, there's some exciting changes coming in the year ahead, hopefully as well.
Yes, I would like to use the version from 2025, I think, or so,
because it will be definitely much better.
It's getting improvements every year.
There are still pieces that people need but don't have yet.
For example, DDL, replication of DDL,
one of the biggest pieces left unsolved.
But it's already much better than five years ago. Definitely.
Wonderful. Should we start there then? Should we start with a bit
of history?
Yeah, let's talk about history. Probably. I remember when there
was discussion that replication should not be implemented in
Postgres core. It was like 15 years ago, there was opinion
that application is something like external to Postgres.
Let's rely on other tools.
It's not the job for Postgres to have replication.
It's so strange, similar to discussions of autofailover and so on.
And such systems were created.
One of the most noticeable were Slony, and Londeste from Skype, and also Bukardo.
I used two of these, Slony and Londeste, a lot of work to set up and to maintain and constant headache.
Especially Slony, it was a lot of work a lot of maintenance work and these systems
were logical replication they were based on for example on triggers and some insert or right any
right happens on a table it's also written to different table and propagates to different
posts and this was like overhead was not only maintenance, but also performance, obviously, because you
basically you double your rights.
So you need to write to two places and to avoid data loss, it slows down all transactions,
writing transactions.
Yeah.
So, I mean, Laundrister was great.
It was based on pgq pgq was implemented in a very good way in terms
of scalability in terms of performance because it was using partitioning and like everything was
well but still some maintenance was needed and it was not pleasure to deal with it but inversion
like later inversion i don't remember and some version maybe nine zero not a nine zero
physical application was created and it became the main way to replicate data from one node to
another and so i mean people before physical replication implementation, people relied only on third-party tools like Slony and Blondeste
and only on logical replication to achieve HA, high availability.
With physical replication, where whole records are replicated as is,
it became much less headache.
I mean, physical replication is also not trivial,
but it's much more reliable and
it's based on the on the idea that we have wall already right wall is our way right ahead log
is our way to recover from after crushers so if we crash we just redo all changes since the last successful checkpoint and we are already in
consistent and good state again so the idea was let's be constantly in recovery state
on standby nodes constantly recovering that's why by the way the function to check if it's
primary or standby is called pgIsInRecovery,
which is confusing because you might think we recently crashed.
We didn't.
I mean, we probably crashed, but if we crashed and in recovery, you cannot connect.
If you can connect and ask pgIsInRecovery, it means it's not in recovery, it's just a replica.
I mean, standby.
Yeah, that's funny.
I didn't know that.
It also reveals the implementation details how
replication was created based on recovery. And then Hudson by
was created, meaning that you're not only recovering on the
standby nodes, but you also allow read only queries. Yeah,
because why not? And so of course, a lot of work was done
to achieve this,
but I'm just like trying to
have some high-level overview
of history here.
And then,
after several years
of development
of physical replication,
obvious idea was
the very same wall records
we use to provide recovery
and physical replication can be used also for logical replication.
It means that with logical, it's called logical decoding.
You just read what is in wall record.
If it like insert this data, okay, we can say let's execute insert, SQL insert.
By the way, it's very confusing for those who use tools like
pg wall dump and it's now in fresh postgres it's available through sql you can read wall yourself
and see what's there it might be confusing to see insert delete and update uppercase because it's
not sell it's not sql's physical operation, tuple level.
If you delete 100 records, 100 tuples,
you will see 100 independent delete records in wall.
And they will be translated to 100 delete SQL statements on logical replication.
By the way, it also shows that logical can be implemented differently,
in different ways.
For example, in MySQL, it's very different.
It can be statement level.
You can just take a statement and propagate it to a different machine.
But decoding logical replication in Postgres, it's based on wall. The same wall we use for physical application.
So if you have multiple tuples, operation with multiple tuples, you will
see multiple SQL queries on the subscriber side. It's called subscriber. For publisher, subscriber,
logical has different, logical replication has different terminology than physical replication.
Not primary standby, but publisher and subscriber. So logical decoding was created and it opened
a lot of new possibilities to replicate based on the same wall to replicate at logical level.
Meaning you recreate cluster, you have different cluster, you lose your bloat,
you can lose your indexes, you can have different set of indexes, you can have different
major version and you can have different database management systemes. You can have different manager version,
and you can have different database management system.
It can be not Postgres, but something else based on logical decoding, right?
So it's great.
I mean, the development of it, it's not super fast.
I mean, it's a very complex thing,
and other systems implemented things Postgres lacks many years.
But currently in 2023, logical decoding, logical replication, especially with release 16 and previous release 15, a lot of work was done to
make them mature and it already can be used for very heavily loaded systems like dozens of terabytes,
hundreds of thousands of TPS.
And this can help you to achieve zero downtime operations.
Of course, you need to understand many things to achieve this,
a lot of interesting things.
And still some things are missing, but things are improving.
That's my small overview of history.
Yeah, nice.
I don't know anyone who could consider like consider slowing your own this to these
days. I mean, maybe some people think it's better right now. But
I would you just consider always logical decoding and
replication. If you need logical level.
Yeah, the only other thing that I've seen the only thing I've
seen it used for that you didn't mention is that you don't have
to unlike physical application, you don't have to replicate everything.
So you can take a subset.
A filter, exactly.
Or maybe you're forced to use logical.
For example, if you are using managed Postgres, RDS, Cloud SQL, they don't provide physical replication connection outside of their infrastructure. So it means that if you need to replicate between clouds, for example,
or on your managed Postgres service, or you need to migrate,
you are forced to use logical replication.
It's a bad thing, but it's also a good thing because these cloud providers,
they also rely on logical replication to migrate customers from Oracle or from competitors.
And this helps develop logical replication further because they are big users of it.
Right.
And it's in order to achieve zero downtime, right?
It's kind of the change data capture part of it.
I mean, we could migrate from those systems through dump and restore, right? It's kind of the change data capture part of it. If I mean, we could migrate
from those systems through dump and restore, right, but not without downtime. It's the key
aspect that enables you to do it without or with minimal downtime.
So downtime is not related to dump restore. I'm not sure why you... So let's say we're moving from Oracle to Postgres,
and we want to...
We could take a dump modifier
and send it through something
and put it and restore it as Postgres, right?
We could take the data.
Well, I'm not an expert on Oracle.
And first of all, AWS and GCP, Google Cloud,
they have so-called database migration services, DMS.
And they support from system to system different directions.
Also, there are companies like systems like Click and Fivetran,
commercial systems, also there's Debezium open source.
So they help to create replication from different database management
systems. When we talk about logical decoding and replication, it's actually either between
Postgres or from Postgres to somewhere else. But I don't know anyone who migrated from Postgres to
Oracle. But let's discuss in general, picture in general.
So when you need to use it, you have two stages.
First is initialization, and second is so-called CDC, change data capture.
Streaming changes, not streaming, replicating changes.
Because streaming is about physical replication, let's not mix terminology.
So initialization is very challenging if you have
a lot of data. I mean, yes, it's kind of dump restore and it can be dump restore can be with
dash J means like with parallel, of course, you put a lot of stress to your disk system or network
system. But if you can afford it, it's good. I mean, you can move faster. And there's a big trade
off. For example, if you consider initialization of logical replication for Postgres, if you can afford it, it's good. I mean, you can move faster. And there is a big trade-off. For example, if you consider initialization
of logical replication for Postgres,
if you have, for example, 10 terabytes and 100,000 TPS,
what you do prefer?
To read from disk slower and have...
Like the problem is when you read while you are reading,
you already open logical slot because you need future CDC to start from the very beginning of your, like when you read, you read it with, Bidgedump reads it with repeatable read, transaction isolation.
Why? Because we need a snapshot.
So all data, all tables are in consistent state.
All foreign keys matching and so on.
Even if you run pgdump-j, there is a snapshot there.
And this snapshot cannot go away because we need to finish initialization, copy, initial copy first,
and then start CDC from that point where this snapshot was created.
So if you read slowly, disks are okay. I mean, like
10% of your capacity in terms of disk I.O. But it means that you need to keep this postgres
needs to keep the snapshot very long. And this will affect guess who? Auto vacuum, right?
Because auto vacuum cannot delete that tuples which became dead after our snapshot creation.
And so we have a trade-off. We want to disturb auto vacuum less, so we want to move faster.
But it means we will put a lot of stress on to our disk system.
And every time you need to decide based on capabilities of your infrastructure in particular situation.
So there is no universal recipe here.
For example, if we say we are creating this from some system which nobody is using, for example, can be dedicated replica with no failover, no load balance flags in Patroni. It's like dedicated replica.
Probably we can utilize 100% of disk IO capacity
and move as fast as possible
because it will report with constant by feedback.
It will report to primary.
Or we can turn off constant by feedback in this case
and allow replication to be practically paused for a long time.
But we need to adjust
settings for it.
Is this, this is new though, right? Is this, this is new in 16?
What?
Being able to do logical replication from a physical replica.
It's a good question. You can create a slot on the primary. Remember its position in terms of LSN and snapshot.
And then you can ask open transaction on standby,
repeatable read transaction,
asking to deal with the same snapshot
and read it from there.
Yes.
Okay.
It's possible.
It's like synchronization.
And PitchDump actually has not only dash J,
but it also has, I don't remember option,
it has ability to specify snapshot ID.
So you can dump from standby and not disturb primary.
But standby, we have different dilemma,
different trade-off.
Hot standby dilemma, we discussed it, I think, right?
So it will report to primary and we have auto-vacuum issue again.
Or it will not be able to replay physical replication while we are dumping
because we have long-running transaction.
So hot standby dilemma, it's interesting thing.
We can zoom into it one day.
But in general, it's possible.
And of course, if you do it from primary, yeah.
So what you're referring to Postgres 16 feature to have logical replication, including CDC from standby.
This is great.
This is super great.
Okay, cool.
Yeah.
But I'm talking about initialization
process yes good point and last bit of information here and by the way there is a very good post from
michael paquera i'm sorry for pronunciation i am constantly using this there are some blog posts
which i'm using for years and recommend all the time and people say thank you all the time like i
think we should have like something like
golden right library how to say like golden hole of blog posts because there are some posts which
are useful for very long time so there is a post about how to synchronize position of slot and pg
dump it was in nine five i think it was implemented implemented in 95, this kind of thing. The only problem, if you want to know logical slot position,
not only in terms of LSN, but log sequence number,
but also in terms of this snapshot ID,
you need to create the slot.
You cannot create the slot with SQL.
You need to use a replication protocol.
So it's kind of a different type of connection.
And many vendors of logical
replication systems, including Qlik, don't understand that for years. And I tried to
help them, they don't understand. They continue using SQL. In this case, you cannot synchronize
properly and you have duplicates if your subscriber lacks unique keys. So inserts might overlap and happen twice.
So this is initialization.
One last bit about initialization.
You can convert physical replica to logical.
This is super cool for very large databases.
You just use recovery target LSN.
You create slot, you know LSN.
You physically replicate until this LSN using recovery target LSN.
And then just reconfigure Postgres to
continue with CDC with
logical replication.
It's possible.
This is so cool. Last time when we were talking about
replication more generally, you
brought this up, but I think it was a theory.
You'd done it in tests, but you'd
never done it in production. I believe now
you must have done it. Well, I'm guessing that's
how GitLab did it. One of
the tricks is this. The only
problem if you first achieve
recovery target LSN, switch
to logical, and then upgrade, this is
a problem. And there is a post on
hackers which discusses
while on logical you cannot run
pgUpgrade. You need to first run pgUpgrade
and then switch to logical. And this
is one of the features I think which is also under development to make it officially supported
without additional manual actions. But it's doable already. So yeah, this is what was
used there. So again, like how to understand what you need, which kind of provisioning
you need. If you have like less than one terabyte, probably you will be fine everywhere. If you have one to 10 terabytes, probably
you would like to read faster to cause less bloat. By the way, if you cause bloat on old
system and you are going to switch to new one, you don't care.
Yeah, a degradation on since
logical replication. In this case, we are dealing with fresh
cluster with you're losing your bloat. Like, like have a joke
like, with RDS, they don't allow physical replication, they allow
only logical application. So they don't allow me to download
my bloat. Like I can download only my data losing my bloat.
But what if I need my bloat?
In some cases, for example, testing,
like lower environments for experimentation,
you probably want your bloat to be preserved, right?
Because this would help to have more realistic testing,
more like in real production. I i mean i get what you're saying
but you could create your own like in a way you are creating your own bloat on the logical replica
right if you're if you've got auto vacuum running in the same way and you're running the same
workload you probably are getting bloat in yourself right but imagine it's not the same workload, you probably are getting bloat yourself.
It's not the same bloat.
Exactly.
If you have 90% bloat,
real pages is higher. It will affect plans.
It's not a good problem.
I'm very familiar with this problem, but you're getting
kind of like a different bloat.
Different bloat is different.
Okay, but it's a different topic.
So if you have one to two terabytes, for example,
you probably want to be very careful optimizing
how you do this initialization of logical application.
But also important if you want everything
or some part of it, right? True. logical application. But also important if you want everything
or some part of it. Right, true. Above 10 terabytes, I will
probably think about if you if above 10 terabytes, and if you
want everything, it's better to convert physical replica to
logical preserving all blood by the way.
And what's a converting a physical replica to a
logical replica?
Again, you have
physical replica, you
replay, you say
recovery target LSN,
putting position
of slot, you created slot,
you know LSN, you put
it to your physical replica
and say pause, for for example or promote,
depends.
I don't remember, this is also interesting detail, I don't remember our decision actually
here, we experimented with it.
You can say either promote or pause when it reaches LSN.
Once it's done, you can say okay now it's logical replica.
All my blood will be with me.
It's bad and good. I mean it's good for testing, it's bad for, I mean I would like logical replica, all my blood will be with me. It's bad and good. I mean,
it's good for testing. It's bad for I mean, I would like to get rid of my blood.
If you provision logical replica in regular way, you are getting rid of blood. It's like
a dump restore. It's good thing. But in this case, we if we have a lot of data, like 15, 20, 30 terabytes, probably we want to convert physical replica to logical.
But only if we don't need to take on it. For example, if we need to take 10% of it,
it's kind of resource waste. You will need to perform a huge cleanup afterwards.
Yeah, of course, because you're physically replicating the entire thing for only 10% of the benefit.
Right, because physical replication doesn't allow you filtering.
While with logical, you can say, I want these tables only.
Yeah, or even as of recently, you can even do just these columns from these tables.
It's pretty impressive how much you can filter that now.
Right. Next step would be let's probably replace PII with something, right?
Yeah, I guess that is a change data capture problem. Yeah, you
could. That's cool idea. Yeah.
It's like, let's create some rules and then some tables,
some columns, we will apply some modification on the fly. And during initialization as well.
This would be a super cool feature, I think.
Nice.
Some regexp or something.
Yeah.
I think it's doable.
And it's doable right now already if you go down to logical decoding level,
losing all features from logical replication, native for logical replication,
and just apply yourself, making some transformations
if needed. But again, like if it's a small database, regular approach. If it's a larger
database, think about this trade-off, disk versus overhead related to vacuuming. And
if it's a super big database, consider a physical to logical approach. It's an interesting trick,
underappreciated think, right now. Nice.
Do you think there's a limit size-wise? If you're accumulating
changes on the primary during that
flip, what's the impact? This is already about
CDC. Okay, sorry. Do you want to move on to that?
Yes. So CDC is interesting. First of all, most of the problems you will experience with Logical are related to CDC.
Inability to apply changes from standbys, from physical standbys until Postgres 16, which hopefully will be released the day before we published this podcast since we are not online.
Fingers crossed.
Fingers crossed, right. So this is solved. Why this is important? Because nobody wants
to start from scratch if primary is down, for example, and it's good to be able to replicate
from standby. Also, in terms of overhead, of course, logical decoding is small overhead.
There are cases when wall sender, which sends data for logical replication, can saturate one core.
And if you have multiple, like single slot, it's always single process, single wall sender. And
usually its utilization is quite low. In some cases, for
example, if you delete a lot of rows and then roll back, delete a lot of rows, roll back,
delete a lot of rows, roll back, this is a harmful workload for logical replication.
And at levels one or two walls per second, you will see a wall sender saturating your CPU, even quite good
one like Xeon, Modern, like something. You will see 100%. And it will be bottleneck of
logical replication, which, by the way, is very hard to solve because even if you, for
example, decide to use four slots, for example, all of them will be parsing everything,
even if you need like only part of data, they will still pass all data. So you will see four slots,
but it's very rare. It's interesting in my experience, I saw it in non-production and I
thought it's a big bottleneck, but but in reality our real workload is not like that
it's not like marking a lot of
tuples with xmax
meaning they are deleted and then rolling back
no no no normal
rights are different so probably you
won't see situation of wall senders
if you do see situation of wall senders
of CPU situation
it's bad
it's not solvable right now.
Bad news.
But again, it's rare.
So probably you don't,
like being able in Postgres 16,
being able to perform CDC from standbys,
from physical standbys,
it's good because you don't have overhead on primary.
And different thing,
what happens if primary is down i don't know if
if it's possible to like i'm new to post-16 also so big problem is how to sustain various failures
until like logical applications should be also highly available because right now at least until
post-16 if primary is down or the node like like from where you perform CDC, change data capture, is down, you need to start from scratch.
You need to redo initialization.
And this is a huge headache.
I see the work to sustain switchovers, failovers ongoing for Postgres 17. And I also know using some tricks
Kukushkin implemented for Patroni,
Kukushkin implemented support for switchover.
So we logically replicate from primary,
but if we switchover, we don't lose slots.
Slots should persist.
This problem, I think it's not solved in 16,
but it looks like good chances it will be solved for 17.
And it's solved in Patroni using some tricks.
Use on your own risk.
I remember, I think Craig Ringer helped in the GitHub issue with some good insights how to implement it correctly.
But officially, it's not supported.
I mean, officially by Postgres, it's not supported.
And it's a big problem of CDC, right?
Yeah, that makes sense.
I didn't know it was solved in,
or there was a solution for it in Patroni.
That's very cool.
Yeah, so there are two problems, big classes of problems.
High availability and performance.
I already touched a little bit both.
About high availability.
If node is down, okay, use Patroni and this feature is good.
What else bad can happen?
For example, DDL can happen.
Yeah.
And DDL is not solved.
It's in commit first, requires review.
There is, I think, version 4 of patch proposed.
It's good to see that it's
in work, in progress
this is super good
if this will be
solved in future Postgres versions
because
if the deal comes
usually, at least you can
mitigate it in many cases
what happens, CDC stops
I mean, what happens? CDC stops.
I mean, logical replication stops.
The slot starts to accumulate a lot of changes,
risk to be out of disk space.
And you just need to replay the same DDL on the subscriber in most cases.
It will help.
Yeah.
And then it will restart automatically
or you have to kick that off again? Yeah, it will pick yeah and then it will restart automatically or you have to kick that off
again it will pick it up here we like must say logging all ddl is a good idea always all ddls
should be logged must be logged you will like save a lot of nerves if you have logs of ddl
for example here you just fix it manually i did did it manually many times. Some DDL happens, especially if you have some automated partitioning.
Yeah, true.
I've heard people, like, for example,
if you're doing a major version upgrade using Logical,
I've heard people try to put a feature freeze on, you know,
no changes while we're doing the upgrade.
But if you're partitioning, that's happening automatically.
You need to pause automatic partitioning.
Or create the next few
in advance,
depending on how you're doing
things.
Another interesting thing is that some DDL
can be left
unnoticed. For example, if some index was created
on the publisher,
nobody will notice because indexes don't
matter. But you can lose this
index and one month later you
will realize you have slow queries
again and you need to recreate it.
Because it doesn't work. Index creation won't
block you.
But it's also not replicated.
Well, that's also a feature
though, isn't it?
I've heard of people using it for analytics database
where you want the data indexed in a different way
or you want to do roll-ups on it or something
and you don't want those indexed.
So I wonder if they're going to do it so it's optional
or you can control which DDL gets replicated.
Right. Yeah, I agree.
Yes, that's a good idea.
And with logical replicas,
we don't have a host and buy feedback problem. But we might have problem with the lag. For example,
in many cases, one slot is not enough. And I'm talking not about wall sender, which is quite rare.
I'm talking about logical replication worker on the subscriber side and it's an ability to catch up all the changes
in proper time. It can be related to disk IO or CPU saturation, I saw both.
And in this case you might want to consider multiple slots.
But here we have an interesting problem many people don't realize.
It's a minor problem actually. So the problem is that if you publish
everything
this DDL is very
light
you say I'm going to publish all tables
to this slot
it's easy
we can just continue
but if you want to balance
first of all you need some balancing algorithm
and it's good if you base this algorithm based
on tuple statistics from pgstatall tables saying, okay, we want to distribute writes
between four slots or 10 slots, like as even as possible.
But then you are going to publish a bunch of tables to one slot.
And this publishing requires alter table and after labor guess what it requires
a lock unlike publishing all tables and when you want this lock guess who can block us
our old friend auto vacuum running in transaction idea around prevention mode which everyone
every DBA should hate because you you prepared beautiful procedure, you are going to implement to run it at 1
AM not sleeping and then you forgot about this.
So any changes, any DDL, anything, any complex changes, always should keep in mind that auto
vacuum is yielding and will kill itself unless
it's running in transaction ID wrap around prevention mode, which won't yield and you
need to take care of it yourself.
So this is interesting, but fortunately this kind of issue, this kind of conflict won't
lead to downtime unlike alter table at column because alter table at column
will block selects. This kind of log, I don't remember details, but this kind of log which
is needed to define publication for table, even if you cannot acquire this log and see
it for 10 minutes, for example, it happened some day in my experience. You will just
be blocked. This is the worst outcome. Selects will still be running. This is good.
What about modifications? Like updates, deletes?
Yes, yes. So, DML won't be blocked. So, likely your users won't notice it.
So, it's a super interesting thing how to distribute between slots.
And there is also another very big caveat from having multiple slots.
With single slot, all things are simple.
With multiple slots, for example, imagine table A and table B,
and they have foreign key between them.
Table A is published to one slot, table B is published to a different slot.
Like a race condition or something.
Two slots, two different legs.
Right, so foreign keys is a problem.
By default, they will be ignored, by the way,
by logical replication.
I didn't know about it until recently.
So logical replication ignores foreign keys
and allows eventual consistency
if you use multiple slots.
Wow.
So Postgres eventual consistency is an official approach in Postgres as well.
Wow. Indeed.
Just use logical and multiple slots. That's it. And you can enforce it, of course,
but you won't like it if you enforce it. So it depends on the situation, of course, but
if, for example, it's for big migration from major Postgres version to the major Postgres version,
or you deal with glipsy version change, something like that, operational system change, in this case,
probably you want to allow it, just don't redirect read-only traffic to such replicas.
Yeah, yeah. don't redirect read-only traffic to such replicas.
Yeah, yeah.
Of course, don't redirect read-only traffic because it will be split-brain.
Or multi-master split-brain.
Multi-master is almost the same, right?
I'm joking.
And you probably want a short pause
before resuming traffic.
Of course, to catch up all the slots,
to ensure no changes are lost.
Of course,
of course. And in this case, it will be already in a consistent
state again, because publisher guarantees foreign keys on
publisher side guaranteed. But it's interesting, right? So if
you use multiple slots, you cannot redirect it on the
traffic as well. It means testing strategy is different.
The only thing I don't
think we've covered that I had to
look out for was I've seen
a few people put issues with
sequences, so like those not
synchronizing between the publisher
and the subscriber. Honestly,
I don't remember details.
I think there was some work
done to support it. Eventually
it should be supported.
Not in 16, right?
Not yet.
I don't think so, no.
Right.
But it's not a big problem.
You just dump all values and you propagate them to new primary and just add some constant for safety.
Sequences are 8 bytes.
So if you add 1 million, nobody will notice it.
I mean, in some cases, of course, big gaps are a problem. It depends. But it requires some additional coding. But
it's not as annoying problem as a lack of DDL application, in my opinion, in my case,
in my cases I had, I had,
Yeah, it's it's solvablevable as opposed to not solvable.
I've just seen it trip people up on their
first attempt.
Right. Or just use
your UID, but version 7,
please, which is not officially supported
by Postgres, but some
libraries in your
favorite language are already supported. By the way,
news from there is
that standard progress is happening.
Hopefully we will see standard, uh, releasing version seven, I mean, RFC for UID version seven
soon. And we already have a patch for Postgres. If things go very well in Postgres 17, we'll
support this. Very cool. Because Postgres is waiting for RFC to be finalized.
Nice. And these are like timestamp ordered UUIDs?
Lexicography. So basically, if you order by, new IDs will be close to each other.
Right. Great. new ideas will be close to each other right audible UUID
there are many things about
logical there are many things I don't
remember everything I just
wanted to share some
most like challenging things I had
recently
yeah that's wonderful I've learned a lot
about multi-slot especially
I've learned myself recently
many things about it.
But I think honestly, the good thing about logical replication,
if you do it very carefully, not allowing slot to accumulate a lot of data
to be out of disk space. By the way, Postgres 15, I think, or 14,
got previous versions, got some setting to say maximum number of bytes in slot.
If you reach it, it's better to kill the slot than to allow out of disk space.
Disk space, yeah.
Right.
So if you do it carefully, users shouldn't notice your experiments and you can run these
experiments on production and test it and prepare good procedure and then
when it comes to deployment you already have everything verified right on production
right so this is good thing about it and if you just start with one slot and see it's enough it's
good you're lucky if you have for example multiple thousand tuple writes per second, for example, 10,000 tuple writes per second on primary,
it's quite significant load. So in this case, probably you will need to have multiple slots.
In this case, if you have partitioning, you will say thank you to yourself, because if you have
huge table and partition, it will go to one of the slots representing, for example, like 50% of all load. Partitioning helps here. It
doesn't help in some cases, but here it helps a lot to
redistribute rights among slots. And then if you use multiple
slots, be careful with this foreign key issue. Yeah, because
your your replica will be inconsistent in terms of referential integrity
at any given time
until you perform switchover
and switchover is interesting
probably we should discuss switchover
one day also
I like your idea of the
algorithm to work out which
tables should go where but the foreign key
thing makes me think if you've got
clusters of tables that
have foreign keys between them and a separate
cluster that have foreign keys.
If you could separate those, that would be cool.
I explored this idea with machine
learning with k-means.
It works
quite well in some cases.
It depends on your schema.
Of course.
We know k here. If you want your schema. Of course. Right. And we know K here.
If you want four slots,
our K is four.
So it's easy.
But unfortunately,
in well-developed systems,
relationships between tables
are usually quite complex.
But it's a valid idea.
I agree.
Yeah.
Awesome.
Well, thank you so much, Nikolai.
Thanks, everybody, for listening listening and see you next week.
See you.
Oh,
of course,
as usual reminder,
not to forget,
we have YouTube.
We have podcast published everywhere,
literally.
So Spotify,
Apple podcasts,
everything.
Just go to PostgresFM and subscribe and you will see it and choose your favorite method.
As usual, please subscribe.
Please leave reviews,
stars, reviews,
and leave ideas on YouTube
or on Twitter, what you want.
We have a document
where you can add your idea
for future episodes,
some feedback.
So we would really appreciate it.
And on YouTube,
we continue publishing
Polish transcripts, subtitles, right? So if you want, you can recommend your friends who
barely understand my Russian accent or Michael's British accent and subtitles help a lot. And also
they can be automatically translated quite well to many
many languages by youtube itself so you can see the subtitles in your language
yeah so yeah youtube's great for the transcription and for the translation and the podcast ones we
do edit out some of the the ums and the rs and also some bits where we go off topic a little
bit too much so if you
if you want the slightly shorter version then those are there for you too
right right so we invest in quite an effort to to these subtitles and we will continue improving
okay thank you see you next time bye bye