Postgres FM - Long-running transactions
Episode Date: March 27, 2026Nik and Michael discuss long-running transactions, including when they're harmless, when they cause issues, and how to mitigate those issues. Here are some links to things they mentioned: t...ransaction_timeout https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-TRANSACTION-TIMEOUTOur episode on transaction_timeout https://postgres.fm/episodes/transaction_timeoutOur episode on slow queries (which was also our first ever episode!) https://postgres.fm/episodes/slow-queries-and-slow-transactionsOur episode on locks https://postgres.fm/episodes/lockslock_timeout https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-LOCK-TIMEOUTTransaction Isolation levels https://www.postgresql.org/docs/current/transaction-iso.htmlpg_current_xact_id_if_assigned() https://pgpedia.info/p/pg_current_xact_id_if_assigned.htmlMonitor xmin horizon to prevent XID/MultiXID wraparound and high bloat (how-to guide by Nik) https://postgres.ai/docs/postgres-howtos/performance-optimization/monitoring/how-to-monitor-xmin-horizonidle_replication_slot_timeout https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-IDLE-REPLICATION-SLOT-TIMEOUTPREPARE TRANSACTION https://www.postgresql.org/docs/current/sql-prepare-transaction.htmllog_autovacuum_min_duration https://www.postgresql.org/docs/devel/runtime-config-logging.html#GUC-LOG-AUTOVACUUM-MIN-DURATIONPostgreSQL Subtransactions Considered Harmful (blog post by Nikolay) https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmfulstatement_timeout https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-STATEMENT-TIMEOUTidle_in_transaction_session_timeout https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUTlock_timeout https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-LOCK-TIMEOUT~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
Transcript
Discussion (0)
Hello, hello. This is Posgess FM. I'm Nick, PostGIS AI. And as usual here, Michael, Pigea Master. Hi, Michael.
Hi, Nick.
I proposed this topic because I thought we have a gap here. Before we started recording, you said we have already episode about transaction timeout. But it's very related, but it's not the same. So I wanted to dive deep into long running transactions. What do you think?
Yeah, I was surprised we didn't have an episode on it specifically already.
We had our first ever episode, didn't we, on slow queries and, I guess, long transactions as part of that.
But we hadn't done a specific episode on it.
And I guess as a long-running podcast, maybe it's definitely time to revisit.
Yeah.
Sometimes idle in podcasting, yeah.
So I would like to challenge us and think by the end of this.
recording we record in advance as usual because some preparation is needed post-processing is
needed thank you for doing this by the way so I would like to challenge us and think should this
episode be called long-running transactions or somehow else because I you already understand
what I'm talking about right only because of our pre-discussion I got a good hint from you
but is it more the problems they cause that you want to talk about and therefore
maybe it's only a subset of long running transactions that are an issue?
So listeners who listen, listen, we already see the decision.
Did we preserve long-running transaction title originally?
Or we decided to change it.
But the problem in my head is simple.
I see in monitoring tools, in blog posts, in every,
everywhere, like everywhere, actually, that long-running transactions are bad.
And at the same time, I remember, actually, it was an interesting moment.
I had an experimental educational course where I explained various concepts of database
health.
And we touched this topic how long-running transactions are harmful to vacuum.
And I said, you just open a transaction and it will be blocking vacuum.
And I said, it's so easy to demonstrate, let's open a couple of PSQL sessions or something like, yeah, PSQL, vacuum verbose, and we will see it has that tuples, but cannot clean up them because there is long running transaction in progress.
And I just wrote beginning and it didn't work.
It cleaned up tuples perfectly.
I think I also tried again with begin and select, doing some select and keeping transaction open, being in identical.
state according to PGSOT activity. And again, it didn't work. And I looked stupid.
And the reason I look stupid is actually because we have actually two problems and we constantly
mix them. And saying long running transactions block vacuum is wrong. Not all of them block vacuum.
Some of them block vacuum. And we have in my mind, we should improve.
tooling and materials and everywhere understanding and start distinguishing two problems.
One is lock related and one is vacuum related. We should change monitoring tools and like snippets,
everything to hunt two problems separately. Yeah. Yeah. Should we go back to the basics
though in terms of the maybe let's first cover the long running transactions that are
harmful or like can be harmful and then reverse our way backwards to the ones that aren't and therefore
trying to distinguish between them.
Some people say long transactions, which I think is oversimplification because it may be very
long but fast or it may be very short but very long running.
Wait, what does long mean if it's not just about the time?
A lot of text.
Oh, sure. Okay. I didn't even think about that.
Like, I'm not a person who tries to be like pedantic and like, I gave up, for example,
my scientific mentor was distinguishing database systems and database and saying,
you cannot say postgres is database. You must say like postgres is database system or database management
system. Like I already give up. You cannot fight with all people. They will say,
it's a good database, meaning database system.
So it's okay to say long transactions.
It's not okay for my texts I write and like products we build.
There we always pursue precision and we say long running transactions.
But at a very high level, if you have long running transactions, there are two dangers,
at least.
Actually, there are maybe a few more.
But two main dangers are some locks are.
are acquired
and as we know
like this is a simple rule to remember
we talked about it so many times
locks are released
only at the end of transaction
they cannot be released
before a rollback or commit
that's it so it means that transaction
if it lasts long accumulates
a lot of locks and even
excess share locks can be harmful
right sometimes
even if the statement that
acquired the lock was short
if it's followed by statements that are
lower.
It's acquired.
Exactly.
Lock is held until the very end of transaction.
It might block someone like DDL.
DDL might block other selects, as we also discussed many times.
And this is a big surprise and like this two-step effect, which can put whole server down,
actually.
This is a big deal.
So if you just selected something and keep transaction open for long, you just, for example,
should avoid the changing schema.
Right. Or you have a dump happening. Very common situation. There's a dump happening or something like initialization of logical replica, which lasts a couple of hours, for example. And at the same time, we have deployment trying to change the deal, but trying to change schema, right? And yeah, this is a very dangerous situation because if you don't have long time out, retrys, we talked about it as well a lot. In these cases, can cause downtime. So, log acquisition.
position plus holding these locks for long. This is danger number one. And danger number two
is how MVCC in post-guises organized. MVCC is a multivversion concurrency control for like
new folks. I know we have new folks all the time. And specifically vacuum problem. So to clean up
that tuples promptly, server should be in good state. And if you keep transaction long, in some cases,
not in all, as we will discuss now.
Your long-running transaction might block, not might block, it will block.
If, for example, you inserted a row and just keep transaction open, this is a real transaction.
It has real transaction ID, right?
This is definitely will block vacuum and it will need to skip that tuples which became
that recently after your transaction.
started because and this is global. This is the biggest surprise here for people as I see to realize
this is global. You insert it to one table and keep transaction open. But vacuum cannot clean any
table. It can clean only that tuples which became dead before because X-max, this is hidden
column in every row. You can select X-max from any table, right? X-max for dead rows, it filled
with transaction which deleted this row with update or
delete, it deleted this row version. And if this X-Max is in the past compared to your
transaction ID, which holds this like snapshot, right? This will be possible to delete. But if it's
later, PostGus thinks maybe your transaction will need it. Yeah, it could still, if that transaction
wants to read that row, let's say the one that got updated or deleted, it needs to return
the old version. Yeah. According to its snapshot. So it's still needed, but all,
theoretically
essentially still needed
it's a feature right
it's a feature but also we need
to think about the transaction isolation
levels here
true because by default
so if you have
a transaction ID and
so basically in
PGist activity you can see the column
back-end X-min
which defines
this like horizon that your transaction
needs right this snapshot
it defines this snapshot
and if you have
repeatable read transaction, which is not repeatable read is already higher isolation level. It's
more like it's stricter. It's snapshot isolation in fact. And it's not default. But it's used
during things like dumping. P.J. Dump will use a repeatable read. It opens transaction,
a repeatable read. And also logical replica initialization and sometimes in multiple threads,
multiple connections. So in this case, with snapshot,
needs to be held because this transaction which is running and repeatable read needs this snapshot for
consistency. All queries will need to read from this very snapshot. So all tables you read during
dumping must be from the same moment, right? While in traditional, like default transaction isolation,
which is read committed, it's shifting. Snapshot is shifting, so it's not being held. And if you, for
example open transaction select from one table select from another table you will
see that this vacandex mean is shifting right which means that our horizon is shifting and we
not blocking vacuum like we could block if you are dumping or just using transaction at
repeatable read which means that not all not all transaction long-running transactions are
harmful to vacuum
Does it make sense?
Yeah, at least from a bloke perspective,
so the, actually, probably from everything in terms of vacuum,
I was thinking there might be an exception around like transaction ID wraparound,
for example, if we had a very long running transaction,
but if it didn't have a transaction ID,
then actually that's not going to be an issue for a transaction ID wraparound either.
That's an important distinction,
because if we inserted or we ran,
exceed current function or there is a new version i'm constantly forgetting i'm old so you have allocated
real ccid it's one story but if you're just reading and you don't have your transaction doesn't
have allocated cede it has only basically there is concept of virtual cede which is optimization to
save space because we have only four four byte transaction IDs right so it's like it's not a lot
these days. And if you like most of your queries, most of your transactions are reading transactions,
it would be not wise to give them to consume from this space, right? Yeah. So this is optimization
with virtual transaction ID and purely reading transactions, they only consume from that space,
right? Virtual transaction IDs. So we don't have it, which means that you need to distinguish
these topics and by my point is that not all long running transactions are harmful but at the same
time if you're just reading but this reading itself is still happening you are reading from a huge
table it lasts long definitely back-andexmin will be like some value not changing in pteristic activity
in this transaction reading transaction and read committed it will be harmful so I'm talking
only about cases when we read something and we either read something else constantly,
small reads, right?
Or we read and see the Nile transaction state not doing anything and not closing this transaction.
Yeah.
Are you talking, is this a bit of an edge case?
Like who's actually doing this a lot?
Like when does it happen?
What exactly?
Like a small read and it doesn't get committed for...
Oh, it happens all the time.
The number one reason is humans.
like opening an editor or something.
Yeah.
In PSQL or even in more cases,
some graphic interface, right?
They begin some production, read something,
and then they all maybe change something also, right?
More like usually when you are reading,
you are not opening a transaction.
Like when you're just exploring stuff,
you run some queries to understand the state of database.
Usually you don't use begin commit transaction block.
But when you try to write, sometimes you do.
And abandoned transaction from humans so common.
It's very common.
So just abandoned.
Forgot to close.
In some cases, this is called turning off auto commit.
So you in some editors, you can say, okay, I don't like auto commit.
I will turn it off.
Then you start working, but not committing.
And then this is abandoned transaction.
So it doesn't do anything.
And if it holds this back-end X-min in PIGSAC.
this is what starts blocking auto-vacum.
And first, maybe a few hours you won't notice.
So yes, I understand the dangerous case.
I'm talking about the non-dangerous case.
The opening a transaction begin, then a select.
You talked about just doing a read query, not doing an update.
How, like, when would that happen?
In the same situation, it might happen as well.
I'm just like, I was trying to understand all the possible cases.
In this case, I think, yeah.
But also, I just don't think it happens in the same way.
Like, I understand doing begin, update,
and then you've got the option to roll back.
If you're scared or you're traumatized maybe from doing an update once
and then you're expecting one row updates
and then you see 30,000 rows updated or something like that,
maybe you start using transactions explicitly
so you can roll back.
I get what you mean, yeah, yeah.
But who's doing that for select queries?
Like I don't have...
Our purpose is to be complete, right?
cover all. No, I understand. So this is why I'm asking, is this an edge case? Yeah, usually I think
if I have a bunch of selects and I combine them to transaction, I would probably use repeatable
read. So all of them are consistent. That's why I combine them, right? This is like basically like
PJ Dump does, right? If I don't, if I use transaction, but then I don't use repeatable
read, why do I use transaction in the first place? It's just a bunch of selects. Maybe to just
like what comes to my mind.
Maybe to make sure
schema won't be changed, to block
the deal.
Or to do with the polar maybe,
like to make sure you stay with the same
Yeah, back and maybe.
There might be some cases.
And another very common situation
when we see,
so there is this term
X-Men Horizon.
You can see it in some places
in source code.
You won't be able to see it.
I think you won't be able to see it.
documentation. I checked long ago. I think it's very like unofficial term. You can see it in some
blog posts, but again, I feel it's not fully understood topic, right? So what is XMME
Horizon? And I was trying to understand like all aspects of it, how to properly measure.
And my question is like, why is not properly measured by monitoring tools? Because it's not.
You like check all popular monitoring tools, excluding PostGGGII monitoring tools.
right and you won't be able to see like they like they talk okay there is a long running
transaction lasting like this amount of time okay have you excluded transactions which are not
harmful or you included them what's the purpose of this chart are we talking about danger
related to locking or we're talking about danger related to vacuuming unclear my point is if we start
distinguishing and if we start thinking about X-mead Horizon we need to understand what it is
and how to measure, how to properly understand it.
So I came up with the idea we have four sources.
First is epigist activity, becandex mean, we just select like the oldest, right?
Actually, you can use function age to back endix mean, which will give you
difference, just like integer difference between your current.
Yeah, very cool.
Yeah.
This function should understand that we are.
have wrap around of the space, right? So it will just, it will give you a number of transactions,
the agent in terms of number of transactions, not in terms of seconds, right? The second,
very, very common situation, super common. You create that logical slot and either
haven't started using it or abandoned or it's lagging. So if you check PG replication slots,
it also has a column. It's called X-min there, X-min. And you,
you also can use age. So it's another very popular reason to have for some something lagging.
And we need to distinguish logical replication and physical replication because as I remember,
if you have physical replication slot but haven't started using it, I think it's not harmful.
I might be mistaken. Logical replication slots are definitely harmful. Unused logical replication
slots, definitely harmful. Did this change? I feel like this, did this change in a recent version where I think there might be,
some new setting where by default they know they expire after a certain age of not being used.
Yeah, there is such setting. It's just a protection mechanism. And I think we should talk about
protection measures separately. Yes. Okay. Cool. I'm just trying to say that physical and logical
replications, replication basically streams, they are slightly different here. Definitely
if you have logical, if it's unused, it's harmful. If it's lagging, it's also harmful.
makes sense
it starts harming you
right when it's created
so you don't want logical
replication slot to be unused or lagging too much
physical slightly different
and I think if I'm not mistaken
if you just create a slot
but nothing connected to it
it won't hold the X-Men Horizon
moreover if you have
physical replication
but this replication is not
using Hotson by feedback
on replica it's set to off, which is actually not common.
Usually people prefer to set it to on.
Hots and biofeedback sends for physical application.
It sends to the primary the information which Xmin Horizon is needed on replica, right, on standby.
If it's not sending, it's better, right, because we don't care.
But this will give you like physical replica, which will start lagging every time it has
long-running transaction on its own.
While Hotsenby feedback will protect replication lag,
but if there is long-running transaction on replica,
it will report to the primary,
and we will have the same effect as we had the long-running transaction
on our primary, which explains while this dilemma
makes it's really hard to have long-running transactions
for analytical purposes on regular physical nodes, right?
And you cannot offload it to replicas,
only if you can afford the replicas which are lagging for hours, for example.
Anyway, we have this, we have logical and physical replication, which can also affect us.
And we have fourth component, which is, which is prepared transactions.
There is a PG-prepared XAXACTS transactions.
System view, you can see also, I don't remember a column name, maybe transaction or something.
I don't remember.
But you can also get, if you use 2PC, right, prepare transactions, in this case, you have them, right?
In this case, some of them might be affecting, they may affect X-Men Horizon.
So I found four sources for our analysis to define which.
So we dig information from all of them, take ages and then just take the greatest of ages.
There is such function greatest, right?
Yeah.
And this will be our true age of our Xmin horizon.
What is big, what is small?
This is hard to understand.
When I think about it, what is big?
Is 1,000 too much or 1 million too much or 1 billion too much?
Billion is definitely too much because we are our...
Halfway there.
Yeah, yeah, yeah.
We have 2.1 is basically our...
the whole capacity for Xe drop around, right?
But there's also not exact mechanism additionally, but let's not go there.
Billion is definitely too much.
Is one million too much?
In heavily loaded systems, I like to think about it, like thinking about what tables we
have and how many rows they have, and how many rows we consider how many dead tuples.
So dead row versions, how many of them we consider too much.
Usually we prefer for all TP, heavy loaded systems, we prefer to make auto vacuum be very aggressive.
Default settings for auto vacuum vacuum vacuum scale factor, auto vacuum and sur-scale factor, analyze scale.
They are like 10 and 20 percent.
It's on two common default numbers. It's too much.
If we have 10 hundred billion rows, 10 percent is way too much.
So we usually said, okay, like 1%, so 0.01 or even half percent.
even sometimes even go even lower and make it delete the tapos more often now you you already have some hint
okay you think i have like 10 million row table and i have 1% scale factor it means that 100,000 rows
is when autovacom should start cleaning up and then yes it's good point in xmin horizon is lagging
that much okay is it lagging thousand
It's okay.
We like, we like don't care.
Our like basically actual like horizon when auto vacuum starts like we expect to work from it,
100,000 like roughly.
But the problem is that we have different tables, small, big, right?
So we need to think about core tables and like nature of our database to think how much of Xmin horizon.
Like Xmin age is too much for us, right?
I think there's two, you're talking almost like a bottom.
approach, right, working out how much bloat we can tolerate.
But there's also the top of down.
How much de-tples we can tolerate?
It's not yet bloat.
As we remember, if we allow too many de-tapples,
auto-vacom is a converter of de-tapples to bloat.
It will come and convert our millions of de-tapples to bloat.
But it's not bloat yet.
At least in my terminology, I wish we had official terminology for bloat.
Like, for debt-tapples, it's clear, but for bloat.
We don't have it as well.
In documentation, we won't find it.
The more dead tuples we leave around, the worst blow issues can get.
Like, in terms of like fragmented indexes, for example, in terms of hot updates,
if we've got dead two dead tuples that we can't benefit from it.
But okay.
Hot updates might benefit from blood because they have space.
No, because it's not space yet if you can't reclaim it.
If it's not, if it's not being processed by the air, they can't.
But then eventually a vacuum will happen.
Eventually.
Give a lot of gaps and these gaps might be reused, right?
So it's like.
Maybe, yeah.
There are opposite direction of effects here, right?
But the point I was trying to get to was often people are monitoring this
in order to prevent transaction ID wraparound.
So the top-down approach could be how much time.
Okay, a million might not be that bad in terms of getting towards $2 billion.
But if $2 billion could accumulate within an hour or two,
if you've got like a ridiculously, so actually you might want an earlier alert
or a lower threshold because,
as soon as you find out that it's going above a healthy number or above like a not normal number for your system,
you've only got to stopwatch. You've only got two hours. You've got to fix this problem.
So it might be that you're thinking in terms of top down as well.
Yeah. For example, if I see age of this is 100 million, it's already, wow, it's a lot. It's already like emergency.
You might only, yeah, exactly, you might only have a couple of hours at that point.
Yeah, speed is interesting. I wish we had also good. Again, money.
monitoring systems lack proper exceed consumption, exceed growth rate metric to be covered properly.
If you think, okay, this is a My Exmin Horizon, some numbers.
Okay, I know it's transaction, but you're right.
We like it's hard to map it to real time.
But here I see, okay, we consume two billion transactions per week.
Oh, it's quite loaded database in fact, right?
Yeah.
Real transaction ideas.
In this case, I see peaks when there are busy hours.
I understand my exceed growth rate.
And this is helpful for me.
I understand, okay, I can already start playing with this.
I think we can could go even deeper.
I just don't see proper analysis tools in this area developed.
Like this is like my thoughts from people I work with and talk to.
And I see huge potential here.
So it could be much better visualized, presented for analysis.
And as you say, separated out in terms of things that are risky in terms of locks and things that are risky in terms of...
Oh, yeah, locks are completely...
I'm going to keep saying bloat.
Yeah.
Yeah.
But that's what I mean.
You're talking about different alerts, different charts, different.
Yes.
So I would have alert and we actually do have alerts for Xmin Horizon, based on understanding,
this particular system with its growth rate and like table sizes right and behavior of vacuum auto vacuum
and we set it for example i think 100 million is huge number we should set it lower and already
started investigating what blocks our x-min horizon and this is right direction to understand
and also can check logs if you enable auto vacuum we usually aim to enable if not all but
at least like above one second auto-vacomb processing.
So log autovacomb, mean duration, one second.
Yeah, sometimes we set to zero.
Yeah, there is some risk that you might see a lot of and have observer effect,
but usually like at least one second is beneficial.
So you start, again, I think all default was 10 minutes.
It's ridiculously huge.
Like you don't see a lot.
And I think it was decreased recently, right?
Oh, cool.
Yeah. So what I'm talking about here, if you have a logging for auto-vacuum behavior, it reports like X-Men Horizon, it reports it, right? Transaction ID was this, X-Men Horizon was that. It's this number behind. So you might extract information from those logs. And also it reports how many datables it would remove, but couldn't because X-Men Horizon is blocking. So auto-vacom messages and logs are extremely.
helpful for this type of risk analysis.
I think it might be...
Yeah.
I think it's still 10 minutes.
Maybe, maybe.
Wow.
All right.
So this is about X-MIN and VACUME behavior, X-Men Horizon, and this concept.
Again, like, I think documentation could be fair that these concepts exist.
Maybe one should propose to define this because everyone suffers.
offers from blood.
And I think in my opinion, everyone needs to understand, working with Postgres deeply enough,
like it needs to understand the concept of X-Men Horizon to start distinguishing it from
long-running transactions, right?
And the second problem is like first problem, first problem historically, lock acquisition,
what to say about it.
So it's, yeah.
I think we should point people, I think we did a good episode on that specifically.
But yeah, maybe that's a good excuse to switch over to mitigations, though, because...
Mitigation.
Yeah, one point.
I wanted to mention there are some exotic problems also, which might happen from long-running transactions.
Oh, yeah, sure.
I point to my sub-transactions post explaining four problems with sub-transactions, and one of them, which happened to GitLab.
And we had episode, I think, about it, right?
So it was the most interesting.
It was happening when you have long-running...
transaction, I think when you actually hold this X-min horizon on the primary, whole snapshot,
or maybe locks.
This is a question, actually.
I have some homework to do to return to that material and understand what of two problems
happened there.
So long-running transactions on the primary and high-tPS replica started to suffer at some point
because of sub-transactions.
Once you closed long-running transaction on the primary, suffering ends.
So this is exotic already, exotic problems where I need to understand,
is it lock-related or snapshot-related.
I also think replicas are an interesting thing in general in terms of long-running transactions.
So long-running transactions on the primary and whether they affect the replica,
and then also long-running transactions on the replicas,
and even when they can affect the primary.
No, a commercial transaction from replica, if host and by feedback is turned on, as we discussed,
They will report to primary.
They will report to primary.
It can have some effect, same effect.
Yeah.
So I think that's not always something people are aware of.
But yeah, in terms of mitigations, though,
it feels like timeouts are our friends in a lot of these cases.
Is that, and monitoring, we've brought up, monitoring a lot, right?
Monitoring, even in extreme cases, killing certain problematic long-running transactions,
what did you want to talk about in terms of mitigations?
So, yeah, of course, if you can get rid of all wrong running transactions,
both dangers don't affect your database health, which is great.
But sometimes we need long-running transactions,
sometimes we need to accept some lag in replication slots.
By the way, this, what caused me to start distinguishing something is not right.
We talk about long-running transactions, but the same problem, vacuum being blocked.
Vacuum, not fully blocked, you understand.
So you cannot delete some debt couples.
It's not only about long-ranged transactions.
It's also about slots.
It's also about prepared transactions.
What's happening?
Okay, prepared transactions, there are at least transactions, but slots, right, or replication,
like anyway, Hotsdamby feedback.
Okay, Hotsonby feedback is also about transactions.
They just happen on standby, but just unused logical slot.
So it's a wrong explanation of some problem.
Problem is X-Men horizon is frozen.
It's not progressing, right?
Not progressing, yeah.
Maybe it's progressing, but too slow also might happen.
Yeah, so...
If it's progressing slowly, though, it's just blocked at a new point, right?
Like, it's...
Yeah, if it's progressing too slow in your horizon of X-ED,
not horizon, but actual X-ED is really too far.
So basically what we should look at is age of X-Men horizon.
age of the old ex-examine needed.
Yeah.
Yeah, this is what we all we need.
And if it's too far in the past, this is a problem.
And the reasons of this, as I said, four reasons.
And mitigation is simple.
Don't allow it to be too far in the past.
How?
First of all, in all TP, I'm like very highly recommend
to be very aggressive in time,
with timeout settings globally.
I know documentation says,
It's controversial and so on.
Like I know like if HTTP server setting can afford 30 seconds or 60 seconds as maximum for your all TP requests,
why do we database cannot afford at least the same for all like regular traffic?
Of course for create index concurrently, reindex index concurrently we need to set it to zero.
What to set?
Statement time out and two other.
things right if you think about transaction it might be like something is running i don't transaction
something is running statement i don't transaction so it's either statement or idon transaction and there's a
whole transaction so basically we have whole and two phases like they switch after each hour right
it means we can limit the whole transaction or statement or idon transaction state and for all three
we have setting right and transaction yeah only only since
recently though only since version 17.
We had episode about transaction time out
because it was my idea.
It was super surprise.
I was walking around thinking why it doesn't exist.
Maybe I don't understand PostGus enough
after 15 years of experience.
Then I told it to Andre Beardine
and he implemented great.
In PostGus 17 it exists.
So in all new Postgres we have transaction timeout
so we can finally limit the whole thing
because before we could limit only a statement
and I don't transaction session time out.
And if you have a series of brief statements and brief waiting periods between statements,
you could have an hour or day-long transaction easily.
Now you can limit the whole thing.
And limiting the whole thing is helpful.
Unfortunately, we cannot limit what we actually would like to limit is Xmin Horizon.
Maybe we should have, I just came.
This idea came to my mind just now, honestly.
I need that.
We could have probably like X-Men Horizon maximum
after which we start abandoning transaction,
like we canceling transaction,
abandoning prepared transactions, like deleting them,
and maybe killing some replication.
Yeah, yeah.
Yeah, I know, like 100 million protection.
Ultimate protection mechanism.
And in this case, you don't need to map what's happening
with your like server in terms of seconds
and actual problems, like how many transactions
you don't want to lag, right, for your garbage collection.
Garbage collection is a vacuum, right?
So maybe we should have those settings,
maybe too much settings at the same time,
but it will be flexible.
I would say 100 million is my maximum
or 10 million is my maximum, regardless of...
Even if Postgres shipped with conservative defaults there,
it would be fine.
Even if it was like 100 million as a default,
it would be fine for like cases, wouldn't it?
I'm curious, maybe this.
idea was discussed. Maybe it's not a good idea. To my mind, it just came. It's interesting.
If you have opinion listening to us, leave this opinion as a comment under YouTube or
somewhere or like on Twitter or LinkedIn, I don't know. Anyway, this is interesting. But what we can
do? We can protect with transaction timeout, statement time out, all the idle transaction time
out. What else? We should, we must actually monitor legs of replication slots, especially logical
replication but physical as well if we use Hotsdam by feedback and we should not allow lagging too much
and finally we should be very careful with two PC and abandoned prepared transactions that's it
so only these four areas yeah when you say monitoring you mean alerting as well right like a
yeah so if some replication slot is unused logical replication slot or it's lagging too much again this is
interesting it will be also either in bytes or in seconds we should think about seconds
how much, we need to map it to transactions,
understanding our exceed growth rate.
So there are different metrics in play here.
So to understand health properly, it takes time.
And if you have cattle, not pets,
so you should have simple rules.
If some slot is lagging too much.
By the way, slot lagging too much, it's in bites, right?
We cannot limit it in seconds of lag.
Keep playing.
There are bytes, seconds, and then exceed
growth, exceed like number of transactions.
So three metrics.
This is a triangle here.
Interesting.
Anyway, this is to protect from affecting vacuum.
You can also like reactively just monitor what auto vacuum reports and logs.
And if it will start to complain clearly, right?
So you can set threshold based on logs.
And if you have some good log analysis system,
you could set up alerts from there. And as for logs, it's a, for me, it's very completely different
topic. Like, yeah, long running transactions contribute to it, but problems might happen very fast.
Unlike Xmin Horizon being blocked, you might not notice it at all. Even you, okay, you ran,
you blocked it for many, many hours. Maybe it was not, maybe it was needed at just one time.
It won't affect you so much.
The negative effects, they are not like they are postponed and they grow like it's not a cliff.
It's like it's slowly slow degradation.
Unlike situation with locks, with logs it might be very fast.
So if you have, as I said, this classic example is you have very, the weakest law, the weakest heavyweight lock possible, access to lock on table.
You block DDL, DDL blocks other selects.
Boom.
What to do here?
We need to use proper DDL deployments.
We need to allow lock time out and retries.
This is number one thing I would recommend.
And number two thing is to craft schema changes properly
to avoid long-running exclusive locks, first of all,
but also accession locks as well at some point.
They are not harmful unless you have conflicting sessions.
And to detect conflicts sooner is helpful.
Another measure, which is actually a common problem with PostGbos,
if we have vacuum which blocks us, vacuum always blocks us when we do DDR.
We cannot acquire access, exclusive lock on the table to change its schema if vacuum is happening.
But fortunately, in most cases, vacuum is running in normal state and it will kill itself after, I think, a second.
Do you mean auto vacuum or do you mean anti-vacom?
You mean not an anti-vacor.
Yeah.
I'm talking about auto-vacomb, good correction.
Sure.
Right.
But if it's running in transaction ID wraparound prevention mode,
which happens often if you have high exceed growth related topics, right?
If we'll insert a lot, not in batches but single rows, for example, we consume
exceed so much, safe points can contribute to exit consumption and exceed growth, succeed growth rates
a lot. In this case, we probably have a problem like if XSID growth is high,
auto-vacom transaction idea up-around happens more often. It might happen, for example, every two days
in a very busy system. In this case, if it happens every two days, but we aim to have deployments
five times per day, we are like very rapidly developed startup. In this case, this might clash.
Like you have a to vacuum transaction-up-round prevention, and you try to deploy DEDL. And even if you
have retrys, you must have low lock time out and retrise. You try to deploy it, but it runs
on your terabyte size table and blocks you. This is super annoying. You cannot deploy some
thing you would need to deploy right now. What to do about it? Question is open.
Well, you've talked a lot about partitioning, so the vacuces, the tables are smaller and the
vacuums are quicker. But I actually also think, at the time I thought it was a good thing,
but I thought the threshold where an anti-wrapp around vacuum kicks in, is relatively low
in the grand scheme of things in terms of the risk. I would seriously consider, if it's happening
every two days, that seems like a bad smell to me. That feels like something that shouldn't be,
like, let's make sure normal vacuum happens enough so that it doesn't have to run an anti-wraperound
mode. I'm not against, I'm not against this like, it's good.
that it's happening often and freezing is good.
Like once.
So partitioning is helpful not only because tables are smaller,
but more pages are all frozen and we don't touch them anymore
because they are like our archive.
And this is super.
It makes things much more efficient compared to huge and partition table
where any time page can receive new tappel
and it's not all frozen anymore in visibility map, right?
Visibility map has two bits all frozen and all visible.
And boom, we need to take care of auto vacuum needs to
care about off this page again so the problem is for me is not that it's too often happening the problem is
like this threshold i would sometimes so it's okay to have it low and make it very frequent good
partitioning is great but sometimes i wish it would be easier to raise this bar temporarily while
i'm doing my changes and there is like but as far as remember i might be mistaken here last time i
I checked this maybe a couple of years ago, this topic. It's not so easy. You cannot change it
globally so easy. It requires restart, I think. Maybe I'm mistaken again. And at table level,
there is also some caveat. I don't remember which, but I didn't see a good way to, okay,
for each deployment, we won't. And also interesting, you know, like you kill that vacuum which
runs in transaction ID wraparound. And if you if you don't acquire
lock on the table fast enough, it will kick in again. Especially like it might kick in.
If you have many workers and nap time is low, so it's interesting problem as well.
So raising this is quite like it was not easy as I remember. So I came up with two ideas.
First idea is if this is deployment, which is big deployment, we have like major upgrade or
something because it's a problem during major upgrades we had it during major upgrades when
autovacom running transaction idea up around was a problem for us or some major changes if it's a
rare change we probably should forecast when autovacom will start freezing and do it before i like that
yes yes another thing is that probably we should just sometimes kill it ourselves but if you want to kill it
you do it you need to do it properly you need to issue attempt to acquire lock on your table
to change and kill it and immediately kill at a vacuum and your attempt to acquire a lock is successful
and then it good luck to try to prevent transaction idea up around while we're doing our job
this is like wait is that it doesn't start again yeah yeah so i yeah okay i win i won't
that's it so my the lock is mine and i'm doing my job
It's like it feels slightly dirty, right?
But there's no, there is no harm if you also monitor exceed and multi-xid wraparound risks.
If you know you're only at 10% of capacity, 200 million, it's fine to go to 300,000 million during like even big deployment.
You will be fine afterwards, right?
But this feels dirty, honestly, to do this trick.
I did this trick manually a few times in my career
because I needed to finish deployment
and I knew that we are not in danger
though, so.
Yeah.
Yeah, that makes loads of sense to me.
My preference, I think, still would be avoid
happening as often as that
and actually trying to make sure
things got frozen before it was needed.
So it really is a failsafe rather than a normal
like thing that happens in normal operation.
But yeah, super interesting.
We started the episode talking
about what we should call it.
And I think maybe we should end it, like with a, what do you think now, having
talked about it?
Oh, yeah.
But we try to keep titles very short, right?
We didn't have X-Men Horizon episode.
I tend to think this should be called X-Men Horizon episode, because this is where we
spend time most.
And it's less about locks.
It's less like, and I think.
think long running transactions is the wrong concept to chase.
Yeah, let's call it X-Men Horizon because I think that's going to be...
Yeah, let's call it X-Men Horizon.
But my concern is that people think in terms of long-running transactions and...
That's fine.
I think long-running transactions is interesting, and I actually don't mind that there's this
edge case of long-running transactions that aren't harmful.
Let's keep long-running transactions because it's easier to understand what we talk about.
this is what people expect.
Let's keep the name,
long-running transactions.
Exmin Horizon, this concept is still,
it's not fully clear in my head.
It's clear, but I want it become official in documentation.
Maybe it should be explained in documentation.
And then that opens us up
so we can still do an episode on that at some point.
Yeah.
Maybe I missed some parts of it,
how to properly monitor.
I know some people who develop observability tooling
listen to our episodes.
Oh, cool.
Yeah, I hope.
it will be improved.
Nice.
It's like clarity.
And I think,
thank you for listening to this.
I feel I went quite deep.
We went quite deep.
I personally,
it's quite,
I personally enjoyed it.
Yeah,
I hope everyone else did too.
And yeah,
let us know.
Great.
Nice one.
It was great to chat to you,
as usual.
Likewise, Nick,
Nick, take care
and see you next time.
Yeah,
bye.
