Postgres FM - Logical replication common issues

Starting point is 00:00:00 Hello, hello, this is Postgres FM, your favorite podcast about Postgres. And my name is Nikolai. My co-host is Michael. Hello, Nikolai. Hi, Michael. And we have a very good guest today, Sai. Hey, Nik and Michael. Founder of PeerDB. We are very glad to have you here. And we just discussed that we are not inviting guests just for guests. choose topic first so it's topic centric discussion but my first thought when we discussed we need to talk about logical

Starting point is 00:00:32 and also i saw your blog post published recently my first thought was you're very great person to have for this discussion right so that's why we invited you. Glad you found time to join. Thanks, Nick and Michael for inviting me. And yeah, logical replication and logical decoding has been my life since the past like seven, eight months. And we are trying to get into as much depth as possible to, you know, understand how it works and probably down the line contribute upstream as well. But yeah, I'm very excited to be here. Cool. I know Michael has a lot of questions, so I probably will be less active this episode, but I will have questions as well and maybe comments.

Starting point is 00:01:14 Yeah, I definitely have questions. But yeah, I completely echo everything Nikolai said. And yeah, we did a kind of a basics intro episode to Logical where Nikolai also talked us through how to handle scale in terms of major upgrades at least um so there's a common use case for logical but one thing we didn't even touch on which was the topic of one of your recent blog posts was i listened back and i think protocol versions got a single mention kind of like an off the cuff mention at one point in the episode, you know,

Starting point is 00:01:46 30, 40 minute episode. So that was where I was thinking of starting is, it'd be awesome to hear a little bit about the protocol versions, why you wrote about them recently. Yeah, absolutely. Thanks, Michael. So the thing is, logical replication has this concept of protocol versions that you could specify as a part of the subscription, or if you are like an external client who is reading the replication slot, you could do it as a part of the start replication API, which lets you read the replication slot. And there are like four versions that Postgres supports now with Postgres 16, right? The first version is the default version, which decodes transactions that are only committed. So it doesn't like start the decoding process

Starting point is 00:02:31 before the transaction commits. And the second is the more advanced option, which does logical decoding of transactions that are not yet committed, right? Like, so it basically does decoding for in-flight transactions, right? Like, so it basically does decoding for in-flight transactions, right? That is number two. And then third is it lets you decode two-phase commit transactions. That is third.

Starting point is 00:02:53 And the fourth is it lets you decode in-flight transactions and apply them in parallel to the target, which is the subscriber, which is Postgres basically, right? Now in that blog, which you're talking about, we compared like version one and version two, because they are more like the common ones. Two-phase commits are not like very common, right? That's the reason we didn't go into that piece.

Starting point is 00:03:16 But the biggest difference is the impact of two is it improves logical decoding performance because you're not, you're actually decoding while the transaction is happening, right? Like you are not letting the transaction finish and then only start decoding, but rather you're decoding as the transactions go on. So the idea is you're giving more time to the decoder, the wall sender process to perform the decoding. And now this has a lot of benefits, right? Like number one, it reduces the slot growth, right? Like, so in that blog, we talk about a situation where we have two long running transactions that are being committed. And with version one, the decoding process doesn't start and the slot keeps growing, growing, growing.

Starting point is 00:04:04 And only after the transactions are committed, the decoding starts. And then for a few minutes until the decoding is finished, the slot size remains the same. Right now, this is with one. With two, what happens is as the transactions happen, decoding starts. Right. Like so you do see the decoder doing its job. And as in when the transactions finish, the slot immediately falls. And the advantage of this is so the summary was with version one, the slot growth of like, I think we showed like five, six gigabytes remained for like five, six minutes until the entire decoding finished. But with version two, it remained there for an instance, because, you know, like the decoding was already happening

Starting point is 00:04:50 as the transaction was going on, right? So this can have tremendous impact in use cases which have long running or sizable transactions. And that is very common, right? We recently were working with like a fintech customer, right? Like where they had sizable transactions, right? Like hundreds, we recently working, we're working with like a FinTech customer, right? Like where they had a lot, they had like sizable transactions, right? Like hundreds of like, you know, operations happening in a transaction. And then these transactions were like interleaved. And now the advantage with like two, the two would have helped them a lot

Starting point is 00:05:16 because as the, you know, operations are happening, the decoder, like, you know, just decodes them. With one, the problem that was happening was was there was one one large run long running transaction which took like an hour and then every time the decoder needs to decode committed transactions it was decoding this long running transaction also so so the order of time is more quadratic basically with like version one right because you know wall is like sequential right like so you keep writing writing writing and then like and then as and when there is a commit like the decoder like starts like working now the problem is with version one as the existing uh

Starting point is 00:05:54 like the long-running transaction is not getting decoded still the decoder process decodes it for like other transactions right so but with like, two, this problem wouldn't have happened because once this long running transaction is committed, I mean, it's already consumed. It's already decoded, if that makes sense, right? So the high level summary is version two is very helpful when there are like, you know, long running transactions that are like, you know, interleaved.

Starting point is 00:06:21 So the lesson is upgrade, guys, right? Just use the latest Postgres version. Version 2 comes with 14. Version 4 is in 16. Sorry, a small thing. I've noticed you mentioned subscriber, which is Postgres. This phrase

Starting point is 00:06:38 attracted my attention. It's not always Postgres. But parallelization, which to apply long transactions, it's done only for Postgres subscribers, not for others, right? Correct. Exactly. So the fourth version, which does like, you know, parallel apply of these in-flight transactions is more relevant for Postgres to Postgres replication, right? The apply piece is, you know, Postgres, and that's the standard logical replication.

Starting point is 00:07:08 And logical decoding is more like a subset of logical replication where external clients like, you know, PRDB can read the slot and replicate to homogeneous or heterogeneous targets. Super. Yeah. In your experience, it doesn't mean that there's no sense to have this feature for other situations like to Snowflake or others? The parallel apply feature, you mean, right? Right. Yeah. Good question. So that is the onus of the ETL tool, right?

Starting point is 00:07:33 Like, so in PRDB, we have a option to do parallel apply or not do parallel apply. So it's like a single flag that we provide. And the differences with parallel apply, we cannot guarantee the order of transactions across tables, right? So foreign key reference consistency, similar to multiple slots in regular Postgres to Postgres logic. Correct. Very, very similar, right? And then both, I mean, customers do it, the advantage of parallelism is it will improve latency, right? Like replication latency reduces, lag reduces, etc. But it doesn't guarantee, you know, consistency across tables. But the of our capacity, no worries if TPS grow using a single slot. If TPS grow, we know the bottleneck of its own subscriber, we will just use multiple slots. foreign keys, you need basically to agree that they are broken temporarily on the subscriber and

Starting point is 00:08:48 can cause a lot of troubles if you point application to such nodes, right? That's interesting. Yeah, absolutely. Makes sense. And then I did want to call out, right, like for that, I mean, there is a setting for disabling foreign keys just to like, you know, complete that there is a setting called recession replication. Basically just to complete that. There is a setting called session replication program, basically. You can set that as replica. That is pretty standard. That's the reason Postgres gave this setting. So you can set that to replica and it would disable foreign keys and triggers on the target. But by default, if you use multiple slots, it's disabled. By default, is it like disabled yes i know it since yeah since recent works uh for to use it for upgrade with zero downtime upgrades so it's indeed disabled and it's interesting like

Starting point is 00:09:34 so you you need to like think maybe you shouldn't use it i I mean, maybe go back to single slot and just choose different time for your work. I mean, work on weekends, guys, right? Lower traffic and you can afford working with single slot. This means only for, like, logical for

Starting point is 00:09:59 temporary time, like, for upgrades we don't need it. Exactly. It really depends on the use case. Yeah, yeah. But if for a long time, forget about foreign keys maybe. Yeah. And you said latency, but it's also about throughput. We need to process more bytes per time.

Starting point is 00:10:18 Absolutely. Legs can be nasty. I don't like dealing with logical in Postgres 11, 12. It's not pleasant at all. So these improvements are exciting and everyone should upgrade. That's what I see here. You just uncovered a lot of details. That's great. Yeah, absolutely. And the tricky thing, I mean, the beauty of Postgres of like Postgres is that like, this is already inbuilt in like Postgres core, right? Like a logical replication does this by default. It's just like, you know, a setting you need to do say that like, hey, I want to use version two, and it will be significantly more performant, reduce slot growth. But if it's an

Starting point is 00:10:57 ETL tool who's using logical decoding, the transaction logic needs to be managed by the ETL tool. Because as you're reading like in-flight transactions, we need to keep track whether this transaction is committed or not. And then only we should push it to the target. So it becomes tricky for like ETL tools, which we are working on in PADB now as we speak. But the beauty of Postgres is that it just gives you the setting out of the box.

Starting point is 00:11:21 And you just need to upgrade your Postgres version to 14 15 and 16 so yeah and uh with this uh in 16 we have example i mean the developers of these tools like your company they now have example postgres to postgres native logical shows how it can be implemented so the reference example exists so it's's good. Yeah, that's great. Sorry, Michael, I interrupted you. No, not at all. This is a really good diversion, but we've talked about three things already, right? We've talked about throughput, we've talked about lag,

Starting point is 00:11:59 and we've talked about slot size growth. And I feel like when you said initially the performance of the version 2 protocol can be better, I'm guessing we're talking mostly in terms of that lag metric. Like we can start processing things faster, therefore the lag is lower, makes total sense. And when you said the slot size growth is reduced, I think I understand where you're coming from. I think the specific benchmark in the blog post shows that the growth is the same, but then it kind of reduces quicker. But when you mentioned the interleaved long-running transactions,

Starting point is 00:12:35 I guess it's in a less synthetic workload where you've got lots of potentially overlapping long-running transactions. The peak will be lower in the version if you're using great exactly right like because in the scenario we did it's like just do two transactions that are long running and then we just had two of them right but then and then they ended at the same time but in real world workloads like it i mean it can be very arbitrary, right? But with like version two, as the slot consumption is faster, and like slot size falls quicker, right? The cumulative effect can be very significant in slot growth. If that makes sense. And that is the next benchmark we want to do where in the real world benchmark, I want to see how does the peak slot size like compare with version one

Starting point is 00:13:25 and version two, we are actually building that feature. So we can we plan to like, you know, we have a few like, customers, we are design partnering with, you know, to implement this feature. And we have like, we know, you know, what is the peak size slot size we are seeing with version one. And with this design partnership, we will get to know, okay, what is the peak slot size, my like understanding is it will fall very quickly, because like get to know, okay, what is the peak slot size? My understanding is it will fall very quickly because you're giving more time to the decoder, right? It's more efficient. And with these long-running transactions, it cannot be quadratic, the decoding process, right? But that's a very good question you pointed, Mike.

Starting point is 00:13:57 And that is the next benchmark we want to do. Nice. So then the question then becomes are there any workloads and i'm thinking now that it must only be if you had let's say you had a lot of long-run transactions that end up getting aborted or rolled back instead of committed i'm thinking that might be the only case where version the version one protocol might be like you might prefer to carry on using it or are there other cases where you might prefer to carry on using the v are there other cases where you might prefer to carry on using the v1 protocol over the v2 one when as long as you're on 14 above even when the transactions

Starting point is 00:14:30 are rolled back the wall is not removed so it still need to be like decoded basically right like for future transactions that that's the and that's exactly what happened in a customer scenario where there was a long-running transaction with the customer killed, basically. It was running for like a few hours, right? But still that impacted other transactions because Postgres is not yet smart where it removes the wall. This is my favorite use case. So you can overload wall sender if you just do a very simple trick. You create a table with like a million, 10 million rows and then delete rows in in transaction and roll it back massive delete rollback massive digital back i like this workload in many cases and this included and this makes wall sender consume 100 cpu very quickly

Starting point is 00:15:18 because it needs to yeah because this transaction spams wall writing xmax constantly for a lot of tuples and then saying this xmax this transaction ID was rolled back so it doesn't mean they're still alive the next transaction does the same

Starting point is 00:15:39 and wall sender is becoming crazy and I think there should be some way to optimize decoding to understand it, maybe. Because if you start processing, ah, when it starts processing, it's not known that this transaction is already rolled back. But maybe there is some opportunity to optimize, because if it's known already, it's already rolled back. Why? We can probably skip better rolled back. Why can't we probably

Starting point is 00:16:06 skip better or something? I don't know. I don't know details here. That's a very good point, Nick. And I was recently chatting with one of the committers and we don't even skip rows is my understanding basically. So every time we try to decode basically. So the immediate

Starting point is 00:16:22 optimization is if there is an entry in the wall of a rollback transaction, then we don't decode that entry in the wall. But you bring up a great point here. So there is a lot of scope of improvement there. Yeah. And this led us to the conclusion that we won't be able to deal with logical at a few modifying transactions per second.

Starting point is 00:16:46 It was the wrong conclusion because this was a pathological workload. And in production, I think it's a very rare case when WallSender is hitting 100% of single core. Or it's not, I'm not sure. Because the problem is you cannot scale it. You add a second slot,

Starting point is 00:17:06 but you see WallSender is hitting 100% of single core CPU with the constantly or from time to time. And you think, okay, I will try to distribute workload among two slots or four slots, eight slots. But this doesn't help. All of them are hitting 100% of different cores and that's it so this this can be bottleneck but it looks like it's very rare right now in production yeah absolutely and that is another optimization that like i mean the community could do is where

Starting point is 00:17:40 we uh i mean currently if there are multiple right, like the decoding process runs for every slot, right? If there are ways in which the decoder runs for just one slot and the other slots reuse these like decoded changes, right? Like if that makes sense. Yeah, exactly. The same work that's done multiple times. Exactly. Everyone is process everything.

Starting point is 00:18:03 This is what, like definitely there is opportunity for improvement here but i think people just don't see this as bottleneck in production often that's why it's not optimized yet in production what we see is like it's like these one-off batch operations where customers delete like a bunch of data or add like do copy with like millions of rows where there is like a wall spike and you know the recommendation that we give right like is guys have more disk i mean it is i mean that it's hard because like faster disk faster and like you know larger disk because logical replications can only scale up to 25 000 to 30 000 like you know messages per second basically so

Starting point is 00:18:42 have larger disks so that like once the spot ball spike like falls it'll catch up okay sure at that time there is more latency right but it's a good good CPU if if like 25 30 000 messages per second because for pg bouncer on modern intels and AMD I didn't see more than 20 messages per second we can compare this like also messages per second. We can compare this, like, also messages per second. PgBouncer processes transactions just bypassing them to backends, Postgres backends, and returning result. And logical replication also similar. Some messages, just some messages, right?

Starting point is 00:19:17 Different type of work, because decoding maybe is more consuming, but 30,000 sounds very good. Yeah, 100%. But the thing is,000 sounds very good. Yeah, a hundred percent. But the thing is, it has to be done well, right? No, I mean, like it has to be done properly because like one of the tip that we do, right? Like which I've seen lack in a few ETL tools

Starting point is 00:19:35 is always consuming the slot. We cannot give up on consuming the slot. We need to constantly consume the slot and flush the slot, right? And not to lag, right? Not to lag. And the thing is, if you give up and start reading it again it doesn't read from the confirm flusher lesson it reads from the restart a lesson and sometimes that restart a lesson can be very like old and and i don't want to get into when postgres like updates the restart a lesson because it's a more complicated thing

Starting point is 00:20:02 but the idea is when i give up the connection and read the connection again it starts from like restart a lesson right which can increase my decoding times a lot so this is very common when we initially built padb like if the slot size was like 150 200 gigs right we were giving up connection periodically and like whenever like this slot is big for like 10-15 minutes it it was just stuck in Valry.io. It was not even getting one change. So that is one tip I recommend, where you always consume the replication slot. Right. I was trying to recall another problem I saw when I inspected what's happening if you start using multiple slots. And it's kind of an interesting problem.

Starting point is 00:20:43 When you start publishing changes to a single slot, you use for all tables, for example. For example, for upgrade, it makes sense. And this is great. It works very well. But if you start distributing tables among multiple slots, you need to alter table, I think. I think. Anyway, like you need a lock for each table individually in the group. So you divide groups and then you need to say to to publish start publishing changes. And this is similar lockers like the deal basically. No, not similar. It's not exclusive log, but it's shareable. I remember it got stuck. It's not exclusive log. So it's not that bad as alter table, as in column. So I remember it got stuck. It couldn't complete, but at least it didn't block others. And was this while creating the multiple slots or was this like just like,

Starting point is 00:21:46 you know, creating publications? Publications. Yeah. So you specify tables for which you need to create publication and this needs to acquire logs on all of them. It can be successful for some of them, not successful for others. It's a single operation, so you're waiting and this is not good but not that bad as again as ddl which can block uh selects which come after we discussed many times how dangerous it can be under heavy load yeah that's interesting anyway multiple slots are tricky this is the lesson here so you need to test and learn before using them. Yeah, absolutely. Good. What's next? Well, before we move on from that, I'm interested in, are there any other hard lips?

Starting point is 00:22:34 So you mentioned that rule of thumb, 25,000 to 30,000 messages per second. Are there any other rules of thumb or hard limits or anything that people could be thinking, oh, we're probably we've probably not tuned it enough because we're not getting that much throughput or that low lag or something like that i guess this number depends on the on the core on the type of cpu you have if it's sold it can be 10 000 for example yeah 100 and then know, this setting is, this number that I shared is more in like, you know, on the higher end in a non-local environment where like, you know, the source, the target are not local. They are like in the same region, like, but in different boxes. So network latency is a very big factor.

Starting point is 00:23:18 One thing we do want to see is like how, what is the logical replication performance when both are on the same local server right like which is not in real world that doesn't happen so this whatever i'm saying is like you know network latency bound because it's single threaded right and the network latency kicks in and it can only scale up to say 20 30 mbps if it's like you know 30 40 mbps if it's done well also right so those are also some things to keep in mind. Now, coming to gotchas, right? Like a few things that I did, like, you know, make a note of, of logical replication. First is it doesn't support like replication of DDL commands, which includes like adding of columns,

Starting point is 00:23:58 like dropping of columns, like, you know, adding new tables, creating indexes, like, you know, truncating tables, right? Like, so these are not supported out of the box. And this is one of the common concerns that we hear from customers, because in the real world, you know, people, I mean, you add tables, you like, you have these Django migrations, which like add a bunch of indexes, et cetera, et cetera, right?

Starting point is 00:24:18 Partitions we talked about last time. Yeah, they don't do that also. Yeah, yeah. When you say customers raise it as a concern is that before starting or like realizing it later on where does that pop up for you it happens both ways right like because i mean obviously postgres logical replication is the easiest way to like you know replicate databases right they start off and then there's a column that is added and then logical replication breaks and now the good thing is you can add a column manually on the target and it would continue

Starting point is 00:24:49 but then this becomes like difficult in in production which is when they reach out saying that hey you guys like are also you know doing using logical decoding and you support postgres as a target and that's how like they come to us and we do support like schema changes but like only add columns and drop columns we don't't yet support like, you know, creating tables and like, you know, indexes and like, you know, truncate and all of that yet, basically. And sometimes it's like, it happens before itself. We're like, okay, this is a, I mean, we keep adding tables. We keep like adding columns very often and like, this doesn't work for us. So like we want another solution yeah nice a

Starting point is 00:25:25 couple of the other things that i see people confused or or struggling with are things like sequence synchronization i guess that's just a one-off task at the beginning do you see anything else catching people out that's a good question right like Like I think DDL commands is number one. Replication slot growth issues is number two, which we talked a lot about basically. You know, always keep reading the slot, avoid long running or like open transactions, you know, monitor slot growth, right? Like fourth is, you know, use protocol versions, right? Like upgrade to like latest Postgres versions and start using the version two three four which is faster right the third thing that we see is postgres logical replication doesn't replicate you know

Starting point is 00:26:11 toast columns out of the box you need to set up replica identity full to you know make that happen and replica identity full could be expensive if you are not on postgres 16 because for updates and deletes it could lead to like a sequential scan. And this thing, what happened? CTID was used or is used or like, why it's better than 16? It's using indexes. No, it's like using indexes basically. Okay. And the thing is, if you have them and if you have the right indexes and another thing is some customers, right? Like who have primary keys, but they have toast columns.

Starting point is 00:26:51 We still like need to do replica identity full and that helps. So replica identity full with primary keys is more efficient. And there is a good blog that the Zata guys wrote, like which I really liked. And that is something that I refer to customers where, hey guys, like you can add this. I think it increased like CPU and IO by 30%. That was their analysis. But that is about like toast columns

Starting point is 00:27:16 where you need to have replica identity full and the impact of that can change based on scenarios. Yeah, awesome. I didn't know about that one. I would love to read that blog post. If you send it to me afterwards, I'll include it in the show notes as well. Absolutely. And the fourth thing that we have seen is logical replication, at least, you know, logical decoding did not support virtual and generated columns. If there are like generated columns, they didn't show up on while decoding. the way we i mean we easily saw that i mean is

Starting point is 00:27:47 on the target the customer set up like a virtual column or they use like dbt or like some transformation tool to you know backfill this right like so that is number four and then last but not the least the fifth one that we see right like, like I'm still, you know, trying to discover that is like slot invalidation issues where the restart LSN becomes null. And I have seen that happen in two scenarios. So the first scenario is there is a setting called max keep slot size or something, I mean, like which actually limits the size of the slot. And in this, this setting was recently added 13 or 14 which lets you like safeguard from storage out of storage issues so as and when the slot hits you know over a two terabyte right like or if that's the setting that you have it basically invalidates

Starting point is 00:28:36 the slot and in that scenario the restart lesson becomes null and the slot gets invalidated and you need to restart logical replication the second scenarios we have seen is sometimes it happens randomly also in scene and and we are still figuring out why that can happen and now we reached out to the community right like and we heard that like it could be because of cloud providers postgres right like because cloud providers have a bunch of like backup operations some some folks you know slightly fork postgres right and then it could be i mean the community doesn't say that like i mean their point is like can we try to reproduce this on vanilla postgres right like not cloud and then that was not becoming easy for us but like we have seen that out of the seven to eight months one to two times randomly the slot gets invalidated and we don't have like an rca on when that can happen and immediately we

Starting point is 00:29:25 think that we check hey is the mall max wall like keep slot sizes like less than is it set up right but it's still minus one it's disabled but still like we run into these issues so i'm on that quest of figuring out when this happens because right but that is another scenario that's terrifying and yeah how do you even begin to go about reproducing that? Good luck. Exactly. So we were able to reproduce this on one of the cloud providers. They're like for high throughputs, like over 50k transactions per second.

Starting point is 00:29:55 Every 15 to 16 hours, it was like getting invalidated. 50,000 per second and every 15 to 16 hours. It's a lot. That's a to 16 hours. It's a lot. That's a lot of messages. That's not real. I mean, I don't think that happens often, but yeah. Wow. Cool.

Starting point is 00:30:14 I don't feel like we talked about use cases much. Is that because these aren't use case dependent? I guess other than people using two-phase commit, do you see any differences between people that are using logical replication for analytics databases versus for did you see any difference between use cases basically good question like i mean logical replication is very powerful i mean it supports like you know migration online migrations number one then it supports like online upgrades right like which nick mentioned right like number two number three it also supports like online upgrades, right? Like which Nick mentioned, right? Like number two. Number three, it also supports like HA and backup scenarios, right?

Starting point is 00:30:48 So I have seen some customers who use the, I mean, logical decoding or logical replication to have HA across regions or in like hybrid environments, right? Okay, I'm running stuff on cloud and I still want to have a backup or like a replica on my on-premise, like, you know, self-hosted environments, right? Like, and the only way it's possible is via logical replication because cloud providers don't give access to like wall, right?

Starting point is 00:31:14 That is number three, where like HA and backups is another use case, right? Four is replication to like non-Postgres like targets or even Postgres targets for like workload isolation, use case isolation, where, okay, I have my OLTP database running on Postgres like targets or even Postgres targets for like workload isolation, use case isolation, where, okay, I have my OLTP database running on Postgres. Now I want to put this to another database. Like it could be Snowflake or Postgres for analytics or for search. Like I want to put this on like a ClickHouse or like Elasticsearch, right? Like for optimizing search. Yeah. Logical replication is very powerful and I believe that it's going to be the future for Postgresgres and down the line it can open up like active active use cases also right like where you know okay you have like cross region and i recently read a blog where folks were using logical

Starting point is 00:31:53 replication for like active active right like so i mean it opens up like a plethora of use cases and makes postgres more you know powerful extensible and yeah yeah i read that blog post too i think it was version 16 change that made it possible to like not replicate changes that got made by logical replication so you could kind of like set up a logical replication in both directions but it seems super dangerous to me like do okay cool would you recommend it at the moment Or do you think we need more features before that? I think it's more, I mean, it's more like intense than that, right? Because like conflict resolution and all of this is tricky, right? Like, and, you know, I mean, if it's like very workload specific, where like, okay, I don't touch the same rows, right? Like, and, you know, maybe there it could work. But I mean, out of the box, like implementing that is tricky and it requires more like effort. And maybe we go there in the future, right? Like, because we are seeing a few like cases with customers where they want active active and, you know, there is not like a, you know, out of the box solution. So.

Starting point is 00:32:59 Why do people need active active? Good question. I think I have, I have like a lot of thoughts here so like the thing is i think it really helps with like uh ha right like for example i remember this scenario in microsoft where customers were having like a bunch of like sql server active active across regions and then this was a bank and then one of the region went down right like and then every second is like thousands of dollars. And then they immediately pivoted to the other region and it kind of worked seamlessly. But doesn't it mean that all clusters need to perform the same rights?

Starting point is 00:33:36 And I didn't buy this idea that replaying changes through logical replication is less expensive than applying them initially, which was advertised in the BDR documentation. I didn't buy this. I hope to find time to test it properly and blog post by that. So far, didn't find time yet for this. It's an interesting topic. Because in BDR, it means, for example,

Starting point is 00:34:04 you have multiple regions, four or eight. I know such cases as well. And they suffer because everyone needs to perform the same rights. And if there is a spike in one region, it's not a resilient system at all. I don't understand this. Yeah, 100% agree. I think that's the reason there is a big opportunity there and i recently saw one company right like what the pg edge who are like trying to do something like this yeah new wave of this yeah i know every cto is dreaming about this i know it because we are like we work in multiple regions maybe we should first check that all our physical standbys are in different availability zones than primaries, right?

Starting point is 00:34:52 100%. There are more basic things. Yeah, but multiple regions is a great thing to have, but still, I'm very skeptical. But I see in DBA's mind's mind there is mind shift as well like 10 years ago all dbas said you don't need this but things are changing so in postgres 16 this to avoid loops with infinite loops right this feature it's interesting so but i still i don't understand use cases and how like we can scale rights if we don't split them like in sharding. We split them in sharding and that's great.

Starting point is 00:35:31 Here we, like everyone needs everything. Well, I'm very skeptical still. I don't understand this part of, I don't understand this part of landscape. Yeah, a hundred percent. And like, I think it becomes very critical in tier zero use cases guys right like i mean not like i mean tier one tier two tier three i think it's kind of lesser but like it's more these tier zero where like it's like a chase bank or something like that right like when it becomes like hyper but you think it's possible to build good system no i don't think

Starting point is 00:35:58 it's i mean with postgres i am also very skeptical but i think there is an opportunity there right like and community both community and like you know know, I mean, community will be very critical here, right? Like, I don't think that it can happen just by logical application. Maybe if we build system categorizing data in tiers as well and replicating rights only for most critical data between regions, right? David Bainbridge We're probably tiptoeing into a different topic here. Yeah, it has been interesting. Alex Bialik It's not different. It's not different. Many people think logical leads to multi master definitely like

Starting point is 00:36:35 so called old old term. So David Bainbridge I think I think the tier zero use case like feels like it's a long way from being supported by native Postgres. Like we have a lot of providers out there for that kind of thing, and I think the general term for it is being called distributed SQL. That seems like distSQL

Starting point is 00:36:55 is what I'm referring to. But I would warn people against using logical for this in the short, like anytime soon. Okay. I agree. Then what other people against using logical for this in the short like any time soon yeah okay i agree okay then what what other questions do you have i didn't have anything else i wanted to thank sai what did you have anything else nikolai well no thank you so much it was interesting

Starting point is 00:37:17 thank you for coming absolutely guys i really enjoyed like uh chatting with you and thanks for like you know inviting me and uh yeah i hope you won't stop posting interesting technical posts no no no so content is our currency guys i mean like and the thing is for me like i'm i mean i'm building the company because uh i'm very curious right like now the thing that is like haunting me is that slot invalidation not like i'm not understanding why it can happen right like so so i mean like so because of curiosity we will be publishing a lot of content but benchmarks graphs data and so on reproducible also yeah that's great thank you so much and and also if any of our listeners have seen that as well have any theories or have a reproduction

Starting point is 00:38:02 case for it let us know absolutely good wonderful thank you thank you both take care thanks guys bye bye bye

Your Ad Here

Postgres FM - Logical replication common issues

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.