Postgres FM - Long-running transactions

Starting point is 00:00:00 Hello, hello. This is Posgess FM. I'm Nick, PostGIS AI. And as usual here, Michael, Pigea Master. Hi, Michael. Hi, Nick. I proposed this topic because I thought we have a gap here. Before we started recording, you said we have already episode about transaction timeout. But it's very related, but it's not the same. So I wanted to dive deep into long running transactions. What do you think? Yeah, I was surprised we didn't have an episode on it specifically already. We had our first ever episode, didn't we, on slow queries and, I guess, long transactions as part of that. But we hadn't done a specific episode on it. And I guess as a long-running podcast, maybe it's definitely time to revisit. Yeah.

Starting point is 00:00:52 Sometimes idle in podcasting, yeah. So I would like to challenge us and think by the end of this. recording we record in advance as usual because some preparation is needed post-processing is needed thank you for doing this by the way so I would like to challenge us and think should this episode be called long-running transactions or somehow else because I you already understand what I'm talking about right only because of our pre-discussion I got a good hint from you but is it more the problems they cause that you want to talk about and therefore maybe it's only a subset of long running transactions that are an issue?

Starting point is 00:01:39 So listeners who listen, listen, we already see the decision. Did we preserve long-running transaction title originally? Or we decided to change it. But the problem in my head is simple. I see in monitoring tools, in blog posts, in every, everywhere, like everywhere, actually, that long-running transactions are bad. And at the same time, I remember, actually, it was an interesting moment. I had an experimental educational course where I explained various concepts of database

Starting point is 00:02:15 health. And we touched this topic how long-running transactions are harmful to vacuum. And I said, you just open a transaction and it will be blocking vacuum. And I said, it's so easy to demonstrate, let's open a couple of PSQL sessions or something like, yeah, PSQL, vacuum verbose, and we will see it has that tuples, but cannot clean up them because there is long running transaction in progress. And I just wrote beginning and it didn't work. It cleaned up tuples perfectly. I think I also tried again with begin and select, doing some select and keeping transaction open, being in identical. state according to PGSOT activity. And again, it didn't work. And I looked stupid.

Starting point is 00:03:01 And the reason I look stupid is actually because we have actually two problems and we constantly mix them. And saying long running transactions block vacuum is wrong. Not all of them block vacuum. Some of them block vacuum. And we have in my mind, we should improve. tooling and materials and everywhere understanding and start distinguishing two problems. One is lock related and one is vacuum related. We should change monitoring tools and like snippets, everything to hunt two problems separately. Yeah. Yeah. Should we go back to the basics though in terms of the maybe let's first cover the long running transactions that are harmful or like can be harmful and then reverse our way backwards to the ones that aren't and therefore

Starting point is 00:04:02 trying to distinguish between them. Some people say long transactions, which I think is oversimplification because it may be very long but fast or it may be very short but very long running. Wait, what does long mean if it's not just about the time? A lot of text. Oh, sure. Okay. I didn't even think about that. Like, I'm not a person who tries to be like pedantic and like, I gave up, for example, my scientific mentor was distinguishing database systems and database and saying,

Starting point is 00:04:43 you cannot say postgres is database. You must say like postgres is database system or database management system. Like I already give up. You cannot fight with all people. They will say, it's a good database, meaning database system. So it's okay to say long transactions. It's not okay for my texts I write and like products we build. There we always pursue precision and we say long running transactions. But at a very high level, if you have long running transactions, there are two dangers, at least.

Starting point is 00:05:14 Actually, there are maybe a few more. But two main dangers are some locks are. are acquired and as we know like this is a simple rule to remember we talked about it so many times locks are released only at the end of transaction

Starting point is 00:05:32 they cannot be released before a rollback or commit that's it so it means that transaction if it lasts long accumulates a lot of locks and even excess share locks can be harmful right sometimes even if the statement that

Starting point is 00:05:47 acquired the lock was short if it's followed by statements that are lower. It's acquired. Exactly. Lock is held until the very end of transaction. It might block someone like DDL. DDL might block other selects, as we also discussed many times.

Starting point is 00:06:04 And this is a big surprise and like this two-step effect, which can put whole server down, actually. This is a big deal. So if you just selected something and keep transaction open for long, you just, for example, should avoid the changing schema. Right. Or you have a dump happening. Very common situation. There's a dump happening or something like initialization of logical replica, which lasts a couple of hours, for example. And at the same time, we have deployment trying to change the deal, but trying to change schema, right? And yeah, this is a very dangerous situation because if you don't have long time out, retrys, we talked about it as well a lot. In these cases, can cause downtime. So, log acquisition. position plus holding these locks for long. This is danger number one. And danger number two is how MVCC in post-guises organized. MVCC is a multivversion concurrency control for like

Starting point is 00:07:05 new folks. I know we have new folks all the time. And specifically vacuum problem. So to clean up that tuples promptly, server should be in good state. And if you keep transaction long, in some cases, not in all, as we will discuss now. Your long-running transaction might block, not might block, it will block. If, for example, you inserted a row and just keep transaction open, this is a real transaction. It has real transaction ID, right? This is definitely will block vacuum and it will need to skip that tuples which became that recently after your transaction.

Starting point is 00:07:49 started because and this is global. This is the biggest surprise here for people as I see to realize this is global. You insert it to one table and keep transaction open. But vacuum cannot clean any table. It can clean only that tuples which became dead before because X-max, this is hidden column in every row. You can select X-max from any table, right? X-max for dead rows, it filled with transaction which deleted this row with update or delete, it deleted this row version. And if this X-Max is in the past compared to your transaction ID, which holds this like snapshot, right? This will be possible to delete. But if it's later, PostGus thinks maybe your transaction will need it. Yeah, it could still, if that transaction

Starting point is 00:08:38 wants to read that row, let's say the one that got updated or deleted, it needs to return the old version. Yeah. According to its snapshot. So it's still needed, but all, theoretically essentially still needed it's a feature right it's a feature but also we need to think about the transaction isolation levels here

Starting point is 00:08:58 true because by default so if you have a transaction ID and so basically in PGist activity you can see the column back-end X-min which defines this like horizon that your transaction

Starting point is 00:09:14 needs right this snapshot it defines this snapshot and if you have repeatable read transaction, which is not repeatable read is already higher isolation level. It's more like it's stricter. It's snapshot isolation in fact. And it's not default. But it's used during things like dumping. P.J. Dump will use a repeatable read. It opens transaction, a repeatable read. And also logical replica initialization and sometimes in multiple threads, multiple connections. So in this case, with snapshot,

Starting point is 00:09:49 needs to be held because this transaction which is running and repeatable read needs this snapshot for consistency. All queries will need to read from this very snapshot. So all tables you read during dumping must be from the same moment, right? While in traditional, like default transaction isolation, which is read committed, it's shifting. Snapshot is shifting, so it's not being held. And if you, for example open transaction select from one table select from another table you will see that this vacandex mean is shifting right which means that our horizon is shifting and we not blocking vacuum like we could block if you are dumping or just using transaction at repeatable read which means that not all not all transaction long-running transactions are

Starting point is 00:10:47 harmful to vacuum Does it make sense? Yeah, at least from a bloke perspective, so the, actually, probably from everything in terms of vacuum, I was thinking there might be an exception around like transaction ID wraparound, for example, if we had a very long running transaction, but if it didn't have a transaction ID, then actually that's not going to be an issue for a transaction ID wraparound either.

Starting point is 00:11:13 That's an important distinction, because if we inserted or we ran, exceed current function or there is a new version i'm constantly forgetting i'm old so you have allocated real ccid it's one story but if you're just reading and you don't have your transaction doesn't have allocated cede it has only basically there is concept of virtual cede which is optimization to save space because we have only four four byte transaction IDs right so it's like it's not a lot these days. And if you like most of your queries, most of your transactions are reading transactions, it would be not wise to give them to consume from this space, right? Yeah. So this is optimization

Starting point is 00:11:58 with virtual transaction ID and purely reading transactions, they only consume from that space, right? Virtual transaction IDs. So we don't have it, which means that you need to distinguish these topics and by my point is that not all long running transactions are harmful but at the same time if you're just reading but this reading itself is still happening you are reading from a huge table it lasts long definitely back-andexmin will be like some value not changing in pteristic activity in this transaction reading transaction and read committed it will be harmful so I'm talking only about cases when we read something and we either read something else constantly, small reads, right?

Starting point is 00:12:49 Or we read and see the Nile transaction state not doing anything and not closing this transaction. Yeah. Are you talking, is this a bit of an edge case? Like who's actually doing this a lot? Like when does it happen? What exactly? Like a small read and it doesn't get committed for... Oh, it happens all the time.

Starting point is 00:13:09 The number one reason is humans. like opening an editor or something. Yeah. In PSQL or even in more cases, some graphic interface, right? They begin some production, read something, and then they all maybe change something also, right? More like usually when you are reading,

Starting point is 00:13:31 you are not opening a transaction. Like when you're just exploring stuff, you run some queries to understand the state of database. Usually you don't use begin commit transaction block. But when you try to write, sometimes you do. And abandoned transaction from humans so common. It's very common. So just abandoned.

Starting point is 00:13:54 Forgot to close. In some cases, this is called turning off auto commit. So you in some editors, you can say, okay, I don't like auto commit. I will turn it off. Then you start working, but not committing. And then this is abandoned transaction. So it doesn't do anything. And if it holds this back-end X-min in PIGSAC.

Starting point is 00:14:12 this is what starts blocking auto-vacum. And first, maybe a few hours you won't notice. So yes, I understand the dangerous case. I'm talking about the non-dangerous case. The opening a transaction begin, then a select. You talked about just doing a read query, not doing an update. How, like, when would that happen? In the same situation, it might happen as well.

Starting point is 00:14:35 I'm just like, I was trying to understand all the possible cases. In this case, I think, yeah. But also, I just don't think it happens in the same way. Like, I understand doing begin, update, and then you've got the option to roll back. If you're scared or you're traumatized maybe from doing an update once and then you're expecting one row updates and then you see 30,000 rows updated or something like that,

Starting point is 00:14:59 maybe you start using transactions explicitly so you can roll back. I get what you mean, yeah, yeah. But who's doing that for select queries? Like I don't have... Our purpose is to be complete, right? cover all. No, I understand. So this is why I'm asking, is this an edge case? Yeah, usually I think if I have a bunch of selects and I combine them to transaction, I would probably use repeatable

Starting point is 00:15:23 read. So all of them are consistent. That's why I combine them, right? This is like basically like PJ Dump does, right? If I don't, if I use transaction, but then I don't use repeatable read, why do I use transaction in the first place? It's just a bunch of selects. Maybe to just like what comes to my mind. Maybe to make sure schema won't be changed, to block the deal. Or to do with the polar maybe,

Starting point is 00:15:51 like to make sure you stay with the same Yeah, back and maybe. There might be some cases. And another very common situation when we see, so there is this term X-Men Horizon. You can see it in some places

Starting point is 00:16:07 in source code. You won't be able to see it. I think you won't be able to see it. documentation. I checked long ago. I think it's very like unofficial term. You can see it in some blog posts, but again, I feel it's not fully understood topic, right? So what is XMME Horizon? And I was trying to understand like all aspects of it, how to properly measure. And my question is like, why is not properly measured by monitoring tools? Because it's not. You like check all popular monitoring tools, excluding PostGGGII monitoring tools.

Starting point is 00:16:41 right and you won't be able to see like they like they talk okay there is a long running transaction lasting like this amount of time okay have you excluded transactions which are not harmful or you included them what's the purpose of this chart are we talking about danger related to locking or we're talking about danger related to vacuuming unclear my point is if we start distinguishing and if we start thinking about X-mead Horizon we need to understand what it is and how to measure, how to properly understand it. So I came up with the idea we have four sources. First is epigist activity, becandex mean, we just select like the oldest, right?

Starting point is 00:17:26 Actually, you can use function age to back endix mean, which will give you difference, just like integer difference between your current. Yeah, very cool. Yeah. This function should understand that we are. have wrap around of the space, right? So it will just, it will give you a number of transactions, the agent in terms of number of transactions, not in terms of seconds, right? The second, very, very common situation, super common. You create that logical slot and either

Starting point is 00:17:57 haven't started using it or abandoned or it's lagging. So if you check PG replication slots, it also has a column. It's called X-min there, X-min. And you, you also can use age. So it's another very popular reason to have for some something lagging. And we need to distinguish logical replication and physical replication because as I remember, if you have physical replication slot but haven't started using it, I think it's not harmful. I might be mistaken. Logical replication slots are definitely harmful. Unused logical replication slots, definitely harmful. Did this change? I feel like this, did this change in a recent version where I think there might be, some new setting where by default they know they expire after a certain age of not being used.

Starting point is 00:18:47 Yeah, there is such setting. It's just a protection mechanism. And I think we should talk about protection measures separately. Yes. Okay. Cool. I'm just trying to say that physical and logical replications, replication basically streams, they are slightly different here. Definitely if you have logical, if it's unused, it's harmful. If it's lagging, it's also harmful. makes sense it starts harming you right when it's created so you don't want logical

Starting point is 00:19:17 replication slot to be unused or lagging too much physical slightly different and I think if I'm not mistaken if you just create a slot but nothing connected to it it won't hold the X-Men Horizon moreover if you have physical replication

Starting point is 00:19:33 but this replication is not using Hotson by feedback on replica it's set to off, which is actually not common. Usually people prefer to set it to on. Hots and biofeedback sends for physical application. It sends to the primary the information which Xmin Horizon is needed on replica, right, on standby. If it's not sending, it's better, right, because we don't care. But this will give you like physical replica, which will start lagging every time it has

Starting point is 00:20:06 long-running transaction on its own. While Hotsenby feedback will protect replication lag, but if there is long-running transaction on replica, it will report to the primary, and we will have the same effect as we had the long-running transaction on our primary, which explains while this dilemma makes it's really hard to have long-running transactions for analytical purposes on regular physical nodes, right?

Starting point is 00:20:32 And you cannot offload it to replicas, only if you can afford the replicas which are lagging for hours, for example. Anyway, we have this, we have logical and physical replication, which can also affect us. And we have fourth component, which is, which is prepared transactions. There is a PG-prepared XAXACTS transactions. System view, you can see also, I don't remember a column name, maybe transaction or something. I don't remember. But you can also get, if you use 2PC, right, prepare transactions, in this case, you have them, right?

Starting point is 00:21:12 In this case, some of them might be affecting, they may affect X-Men Horizon. So I found four sources for our analysis to define which. So we dig information from all of them, take ages and then just take the greatest of ages. There is such function greatest, right? Yeah. And this will be our true age of our Xmin horizon. What is big, what is small? This is hard to understand.

Starting point is 00:21:43 When I think about it, what is big? Is 1,000 too much or 1 million too much or 1 billion too much? Billion is definitely too much because we are our... Halfway there. Yeah, yeah, yeah. We have 2.1 is basically our... the whole capacity for Xe drop around, right? But there's also not exact mechanism additionally, but let's not go there.

Starting point is 00:22:08 Billion is definitely too much. Is one million too much? In heavily loaded systems, I like to think about it, like thinking about what tables we have and how many rows they have, and how many rows we consider how many dead tuples. So dead row versions, how many of them we consider too much. Usually we prefer for all TP, heavy loaded systems, we prefer to make auto vacuum be very aggressive. Default settings for auto vacuum vacuum vacuum scale factor, auto vacuum and sur-scale factor, analyze scale. They are like 10 and 20 percent.

Starting point is 00:22:45 It's on two common default numbers. It's too much. If we have 10 hundred billion rows, 10 percent is way too much. So we usually said, okay, like 1%, so 0.01 or even half percent. even sometimes even go even lower and make it delete the tapos more often now you you already have some hint okay you think i have like 10 million row table and i have 1% scale factor it means that 100,000 rows is when autovacom should start cleaning up and then yes it's good point in xmin horizon is lagging that much okay is it lagging thousand It's okay.

Starting point is 00:23:31 We like, we like don't care. Our like basically actual like horizon when auto vacuum starts like we expect to work from it, 100,000 like roughly. But the problem is that we have different tables, small, big, right? So we need to think about core tables and like nature of our database to think how much of Xmin horizon. Like Xmin age is too much for us, right? I think there's two, you're talking almost like a bottom. approach, right, working out how much bloat we can tolerate.

Starting point is 00:24:03 But there's also the top of down. How much de-tples we can tolerate? It's not yet bloat. As we remember, if we allow too many de-tapples, auto-vacom is a converter of de-tapples to bloat. It will come and convert our millions of de-tapples to bloat. But it's not bloat yet. At least in my terminology, I wish we had official terminology for bloat.

Starting point is 00:24:23 Like, for debt-tapples, it's clear, but for bloat. We don't have it as well. In documentation, we won't find it. The more dead tuples we leave around, the worst blow issues can get. Like, in terms of like fragmented indexes, for example, in terms of hot updates, if we've got dead two dead tuples that we can't benefit from it. But okay. Hot updates might benefit from blood because they have space.

Starting point is 00:24:46 No, because it's not space yet if you can't reclaim it. If it's not, if it's not being processed by the air, they can't. But then eventually a vacuum will happen. Eventually. Give a lot of gaps and these gaps might be reused, right? So it's like. Maybe, yeah. There are opposite direction of effects here, right?

Starting point is 00:25:03 But the point I was trying to get to was often people are monitoring this in order to prevent transaction ID wraparound. So the top-down approach could be how much time. Okay, a million might not be that bad in terms of getting towards $2 billion. But if $2 billion could accumulate within an hour or two, if you've got like a ridiculously, so actually you might want an earlier alert or a lower threshold because, as soon as you find out that it's going above a healthy number or above like a not normal number for your system,

Starting point is 00:25:34 you've only got to stopwatch. You've only got two hours. You've got to fix this problem. So it might be that you're thinking in terms of top down as well. Yeah. For example, if I see age of this is 100 million, it's already, wow, it's a lot. It's already like emergency. You might only, yeah, exactly, you might only have a couple of hours at that point. Yeah, speed is interesting. I wish we had also good. Again, money. monitoring systems lack proper exceed consumption, exceed growth rate metric to be covered properly. If you think, okay, this is a My Exmin Horizon, some numbers. Okay, I know it's transaction, but you're right.

Starting point is 00:26:15 We like it's hard to map it to real time. But here I see, okay, we consume two billion transactions per week. Oh, it's quite loaded database in fact, right? Yeah. Real transaction ideas. In this case, I see peaks when there are busy hours. I understand my exceed growth rate. And this is helpful for me.

Starting point is 00:26:35 I understand, okay, I can already start playing with this. I think we can could go even deeper. I just don't see proper analysis tools in this area developed. Like this is like my thoughts from people I work with and talk to. And I see huge potential here. So it could be much better visualized, presented for analysis. And as you say, separated out in terms of things that are risky in terms of locks and things that are risky in terms of... Oh, yeah, locks are completely...

Starting point is 00:27:10 I'm going to keep saying bloat. Yeah. Yeah. But that's what I mean. You're talking about different alerts, different charts, different. Yes. So I would have alert and we actually do have alerts for Xmin Horizon, based on understanding, this particular system with its growth rate and like table sizes right and behavior of vacuum auto vacuum

Starting point is 00:27:33 and we set it for example i think 100 million is huge number we should set it lower and already started investigating what blocks our x-min horizon and this is right direction to understand and also can check logs if you enable auto vacuum we usually aim to enable if not all but at least like above one second auto-vacomb processing. So log autovacomb, mean duration, one second. Yeah, sometimes we set to zero. Yeah, there is some risk that you might see a lot of and have observer effect, but usually like at least one second is beneficial.

Starting point is 00:28:14 So you start, again, I think all default was 10 minutes. It's ridiculously huge. Like you don't see a lot. And I think it was decreased recently, right? Oh, cool. Yeah. So what I'm talking about here, if you have a logging for auto-vacuum behavior, it reports like X-Men Horizon, it reports it, right? Transaction ID was this, X-Men Horizon was that. It's this number behind. So you might extract information from those logs. And also it reports how many datables it would remove, but couldn't because X-Men Horizon is blocking. So auto-vacom messages and logs are extremely. helpful for this type of risk analysis. I think it might be...

Starting point is 00:28:59 Yeah. I think it's still 10 minutes. Maybe, maybe. Wow. All right. So this is about X-MIN and VACUME behavior, X-Men Horizon, and this concept. Again, like, I think documentation could be fair that these concepts exist. Maybe one should propose to define this because everyone suffers.

Starting point is 00:29:24 offers from blood. And I think in my opinion, everyone needs to understand, working with Postgres deeply enough, like it needs to understand the concept of X-Men Horizon to start distinguishing it from long-running transactions, right? And the second problem is like first problem, first problem historically, lock acquisition, what to say about it. So it's, yeah. I think we should point people, I think we did a good episode on that specifically.

Starting point is 00:29:54 But yeah, maybe that's a good excuse to switch over to mitigations, though, because... Mitigation. Yeah, one point. I wanted to mention there are some exotic problems also, which might happen from long-running transactions. Oh, yeah, sure. I point to my sub-transactions post explaining four problems with sub-transactions, and one of them, which happened to GitLab. And we had episode, I think, about it, right? So it was the most interesting.

Starting point is 00:30:22 It was happening when you have long-running... transaction, I think when you actually hold this X-min horizon on the primary, whole snapshot, or maybe locks. This is a question, actually. I have some homework to do to return to that material and understand what of two problems happened there. So long-running transactions on the primary and high-tPS replica started to suffer at some point because of sub-transactions.

Starting point is 00:30:48 Once you closed long-running transaction on the primary, suffering ends. So this is exotic already, exotic problems where I need to understand, is it lock-related or snapshot-related. I also think replicas are an interesting thing in general in terms of long-running transactions. So long-running transactions on the primary and whether they affect the replica, and then also long-running transactions on the replicas, and even when they can affect the primary. No, a commercial transaction from replica, if host and by feedback is turned on, as we discussed,

Starting point is 00:31:22 They will report to primary. They will report to primary. It can have some effect, same effect. Yeah. So I think that's not always something people are aware of. But yeah, in terms of mitigations, though, it feels like timeouts are our friends in a lot of these cases. Is that, and monitoring, we've brought up, monitoring a lot, right?

Starting point is 00:31:43 Monitoring, even in extreme cases, killing certain problematic long-running transactions, what did you want to talk about in terms of mitigations? So, yeah, of course, if you can get rid of all wrong running transactions, both dangers don't affect your database health, which is great. But sometimes we need long-running transactions, sometimes we need to accept some lag in replication slots. By the way, this, what caused me to start distinguishing something is not right. We talk about long-running transactions, but the same problem, vacuum being blocked.

Starting point is 00:32:25 Vacuum, not fully blocked, you understand. So you cannot delete some debt couples. It's not only about long-ranged transactions. It's also about slots. It's also about prepared transactions. What's happening? Okay, prepared transactions, there are at least transactions, but slots, right, or replication, like anyway, Hotsdamby feedback.

Starting point is 00:32:42 Okay, Hotsonby feedback is also about transactions. They just happen on standby, but just unused logical slot. So it's a wrong explanation of some problem. Problem is X-Men horizon is frozen. It's not progressing, right? Not progressing, yeah. Maybe it's progressing, but too slow also might happen. Yeah, so...

Starting point is 00:33:00 If it's progressing slowly, though, it's just blocked at a new point, right? Like, it's... Yeah, if it's progressing too slow in your horizon of X-ED, not horizon, but actual X-ED is really too far. So basically what we should look at is age of X-Men horizon. age of the old ex-examine needed. Yeah. Yeah, this is what we all we need.

Starting point is 00:33:22 And if it's too far in the past, this is a problem. And the reasons of this, as I said, four reasons. And mitigation is simple. Don't allow it to be too far in the past. How? First of all, in all TP, I'm like very highly recommend to be very aggressive in time, with timeout settings globally.

Starting point is 00:33:44 I know documentation says, It's controversial and so on. Like I know like if HTTP server setting can afford 30 seconds or 60 seconds as maximum for your all TP requests, why do we database cannot afford at least the same for all like regular traffic? Of course for create index concurrently, reindex index concurrently we need to set it to zero. What to set? Statement time out and two other. things right if you think about transaction it might be like something is running i don't transaction

Starting point is 00:34:20 something is running statement i don't transaction so it's either statement or idon transaction and there's a whole transaction so basically we have whole and two phases like they switch after each hour right it means we can limit the whole transaction or statement or idon transaction state and for all three we have setting right and transaction yeah only only since recently though only since version 17. We had episode about transaction time out because it was my idea. It was super surprise.

Starting point is 00:34:52 I was walking around thinking why it doesn't exist. Maybe I don't understand PostGus enough after 15 years of experience. Then I told it to Andre Beardine and he implemented great. In PostGus 17 it exists. So in all new Postgres we have transaction timeout so we can finally limit the whole thing

Starting point is 00:35:10 because before we could limit only a statement and I don't transaction session time out. And if you have a series of brief statements and brief waiting periods between statements, you could have an hour or day-long transaction easily. Now you can limit the whole thing. And limiting the whole thing is helpful. Unfortunately, we cannot limit what we actually would like to limit is Xmin Horizon. Maybe we should have, I just came.

Starting point is 00:35:39 This idea came to my mind just now, honestly. I need that. We could have probably like X-Men Horizon maximum after which we start abandoning transaction, like we canceling transaction, abandoning prepared transactions, like deleting them, and maybe killing some replication. Yeah, yeah.

Starting point is 00:35:58 Yeah, I know, like 100 million protection. Ultimate protection mechanism. And in this case, you don't need to map what's happening with your like server in terms of seconds and actual problems, like how many transactions you don't want to lag, right, for your garbage collection. Garbage collection is a vacuum, right? So maybe we should have those settings,

Starting point is 00:36:20 maybe too much settings at the same time, but it will be flexible. I would say 100 million is my maximum or 10 million is my maximum, regardless of... Even if Postgres shipped with conservative defaults there, it would be fine. Even if it was like 100 million as a default, it would be fine for like cases, wouldn't it?

Starting point is 00:36:38 I'm curious, maybe this. idea was discussed. Maybe it's not a good idea. To my mind, it just came. It's interesting. If you have opinion listening to us, leave this opinion as a comment under YouTube or somewhere or like on Twitter or LinkedIn, I don't know. Anyway, this is interesting. But what we can do? We can protect with transaction timeout, statement time out, all the idle transaction time out. What else? We should, we must actually monitor legs of replication slots, especially logical replication but physical as well if we use Hotsdam by feedback and we should not allow lagging too much and finally we should be very careful with two PC and abandoned prepared transactions that's it

Starting point is 00:37:21 so only these four areas yeah when you say monitoring you mean alerting as well right like a yeah so if some replication slot is unused logical replication slot or it's lagging too much again this is interesting it will be also either in bytes or in seconds we should think about seconds how much, we need to map it to transactions, understanding our exceed growth rate. So there are different metrics in play here. So to understand health properly, it takes time. And if you have cattle, not pets,

Starting point is 00:37:55 so you should have simple rules. If some slot is lagging too much. By the way, slot lagging too much, it's in bites, right? We cannot limit it in seconds of lag. Keep playing. There are bytes, seconds, and then exceed growth, exceed like number of transactions. So three metrics.

Starting point is 00:38:15 This is a triangle here. Interesting. Anyway, this is to protect from affecting vacuum. You can also like reactively just monitor what auto vacuum reports and logs. And if it will start to complain clearly, right? So you can set threshold based on logs. And if you have some good log analysis system, you could set up alerts from there. And as for logs, it's a, for me, it's very completely different

Starting point is 00:38:44 topic. Like, yeah, long running transactions contribute to it, but problems might happen very fast. Unlike Xmin Horizon being blocked, you might not notice it at all. Even you, okay, you ran, you blocked it for many, many hours. Maybe it was not, maybe it was needed at just one time. It won't affect you so much. The negative effects, they are not like they are postponed and they grow like it's not a cliff. It's like it's slowly slow degradation. Unlike situation with locks, with logs it might be very fast. So if you have, as I said, this classic example is you have very, the weakest law, the weakest heavyweight lock possible, access to lock on table.

Starting point is 00:39:34 You block DDL, DDL blocks other selects. Boom. What to do here? We need to use proper DDL deployments. We need to allow lock time out and retries. This is number one thing I would recommend. And number two thing is to craft schema changes properly to avoid long-running exclusive locks, first of all,

Starting point is 00:39:57 but also accession locks as well at some point. They are not harmful unless you have conflicting sessions. And to detect conflicts sooner is helpful. Another measure, which is actually a common problem with PostGbos, if we have vacuum which blocks us, vacuum always blocks us when we do DDR. We cannot acquire access, exclusive lock on the table to change its schema if vacuum is happening. But fortunately, in most cases, vacuum is running in normal state and it will kill itself after, I think, a second. Do you mean auto vacuum or do you mean anti-vacom?

Starting point is 00:40:36 You mean not an anti-vacor. Yeah. I'm talking about auto-vacomb, good correction. Sure. Right. But if it's running in transaction ID wraparound prevention mode, which happens often if you have high exceed growth related topics, right? If we'll insert a lot, not in batches but single rows, for example, we consume

Starting point is 00:40:55 exceed so much, safe points can contribute to exit consumption and exceed growth, succeed growth rates a lot. In this case, we probably have a problem like if XSID growth is high, auto-vacom transaction idea up-around happens more often. It might happen, for example, every two days in a very busy system. In this case, if it happens every two days, but we aim to have deployments five times per day, we are like very rapidly developed startup. In this case, this might clash. Like you have a to vacuum transaction-up-round prevention, and you try to deploy DEDL. And even if you have retrys, you must have low lock time out and retrise. You try to deploy it, but it runs on your terabyte size table and blocks you. This is super annoying. You cannot deploy some

Starting point is 00:41:45 thing you would need to deploy right now. What to do about it? Question is open. Well, you've talked a lot about partitioning, so the vacuces, the tables are smaller and the vacuums are quicker. But I actually also think, at the time I thought it was a good thing, but I thought the threshold where an anti-wrapp around vacuum kicks in, is relatively low in the grand scheme of things in terms of the risk. I would seriously consider, if it's happening every two days, that seems like a bad smell to me. That feels like something that shouldn't be, like, let's make sure normal vacuum happens enough so that it doesn't have to run an anti-wraperound mode. I'm not against, I'm not against this like, it's good.

Starting point is 00:42:28 that it's happening often and freezing is good. Like once. So partitioning is helpful not only because tables are smaller, but more pages are all frozen and we don't touch them anymore because they are like our archive. And this is super. It makes things much more efficient compared to huge and partition table where any time page can receive new tappel

Starting point is 00:42:48 and it's not all frozen anymore in visibility map, right? Visibility map has two bits all frozen and all visible. And boom, we need to take care of auto vacuum needs to care about off this page again so the problem is for me is not that it's too often happening the problem is like this threshold i would sometimes so it's okay to have it low and make it very frequent good partitioning is great but sometimes i wish it would be easier to raise this bar temporarily while i'm doing my changes and there is like but as far as remember i might be mistaken here last time i I checked this maybe a couple of years ago, this topic. It's not so easy. You cannot change it

Starting point is 00:43:32 globally so easy. It requires restart, I think. Maybe I'm mistaken again. And at table level, there is also some caveat. I don't remember which, but I didn't see a good way to, okay, for each deployment, we won't. And also interesting, you know, like you kill that vacuum which runs in transaction ID wraparound. And if you if you don't acquire lock on the table fast enough, it will kick in again. Especially like it might kick in. If you have many workers and nap time is low, so it's interesting problem as well. So raising this is quite like it was not easy as I remember. So I came up with two ideas. First idea is if this is deployment, which is big deployment, we have like major upgrade or

Starting point is 00:44:22 something because it's a problem during major upgrades we had it during major upgrades when autovacom running transaction idea up around was a problem for us or some major changes if it's a rare change we probably should forecast when autovacom will start freezing and do it before i like that yes yes another thing is that probably we should just sometimes kill it ourselves but if you want to kill it you do it you need to do it properly you need to issue attempt to acquire lock on your table to change and kill it and immediately kill at a vacuum and your attempt to acquire a lock is successful and then it good luck to try to prevent transaction idea up around while we're doing our job this is like wait is that it doesn't start again yeah yeah so i yeah okay i win i won't

Starting point is 00:45:17 that's it so my the lock is mine and i'm doing my job It's like it feels slightly dirty, right? But there's no, there is no harm if you also monitor exceed and multi-xid wraparound risks. If you know you're only at 10% of capacity, 200 million, it's fine to go to 300,000 million during like even big deployment. You will be fine afterwards, right? But this feels dirty, honestly, to do this trick. I did this trick manually a few times in my career because I needed to finish deployment

Starting point is 00:45:55 and I knew that we are not in danger though, so. Yeah. Yeah, that makes loads of sense to me. My preference, I think, still would be avoid happening as often as that and actually trying to make sure things got frozen before it was needed.

Starting point is 00:46:11 So it really is a failsafe rather than a normal like thing that happens in normal operation. But yeah, super interesting. We started the episode talking about what we should call it. And I think maybe we should end it, like with a, what do you think now, having talked about it? Oh, yeah.

Starting point is 00:46:32 But we try to keep titles very short, right? We didn't have X-Men Horizon episode. I tend to think this should be called X-Men Horizon episode, because this is where we spend time most. And it's less about locks. It's less like, and I think. think long running transactions is the wrong concept to chase. Yeah, let's call it X-Men Horizon because I think that's going to be...

Starting point is 00:46:57 Yeah, let's call it X-Men Horizon. But my concern is that people think in terms of long-running transactions and... That's fine. I think long-running transactions is interesting, and I actually don't mind that there's this edge case of long-running transactions that aren't harmful. Let's keep long-running transactions because it's easier to understand what we talk about. this is what people expect. Let's keep the name,

Starting point is 00:47:22 long-running transactions. Exmin Horizon, this concept is still, it's not fully clear in my head. It's clear, but I want it become official in documentation. Maybe it should be explained in documentation. And then that opens us up so we can still do an episode on that at some point. Yeah.

Starting point is 00:47:39 Maybe I missed some parts of it, how to properly monitor. I know some people who develop observability tooling listen to our episodes. Oh, cool. Yeah, I hope. it will be improved. Nice.

Starting point is 00:47:54 It's like clarity. And I think, thank you for listening to this. I feel I went quite deep. We went quite deep. I personally, it's quite, I personally enjoyed it.

Starting point is 00:48:03 Yeah, I hope everyone else did too. And yeah, let us know. Great. Nice one. It was great to chat to you, as usual.

Starting point is 00:48:09 Likewise, Nick, Nick, take care and see you next time. Yeah, bye.

Postgres FM - Long-running transactions

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.