Postgres FM - Subtransactions
Episode Date: November 24, 2023Nikolay and Michael discuss PostgreSQL subtransactions — what they are, and four issues they can cause at scale.  Here are some links to things they mentioned:Subtransactions (docs) https...://www.postgresql.org/docs/current/subxacts.html  SAVEPOINT (docs) https://www.postgresql.org/docs/current/sql-savepoint.html  PostgreSQL Subtransactions Considered Harmful (blog post by Nikolay) https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful  Subtransactions and performance in PostgreSQL (blog post by Laurenz at Cybertec) https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/  Notes on some PostgreSQL implementation details (blog post by Nelson Elhage) https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/ Why we spent the last month eliminating PostgreSQL subtransactions (GitLab blog post) https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/  ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PgMustard.
This is my co-host Nikolai, founder of PostgresAI.
Hey Nikolai, what are we talking about today?
Hi Michael, let's talk about sub-transactions and be very short with this.
This is a super simple topic because I have a super simple rule,
just don't use sub-transactions unless you're absolutely necessary.
And I'm going to describe details, but not going too deep.
Let's keep this episode short.
Sounds great.
And also, it's Thanksgiving, so hopefully people are spending a lot of time with their families.
So happy Thanksgiving to anybody that celebrates.
And hopefully you'll be thankful for a short episode.
Right. I wanted to thank you, of course you'll be thankful for a short episode. Right.
I wanted to thank you, of course, for doing this with me so long.
Oh, likewise.
It's been a pleasure.
Where should we start then?
So let's start with the idea of sub-transactions very briefly. The idea is if you have a complex transaction and you want to handle some kind of failure,
like for example, a statement might fail, but you don't want to lose the previous statements
already completed in the same transaction.
Or you want, for example, to explicitly undo some steps, right?
It might be actually multiple statements.
You can return to previous point.
So you can use sub-transactions, also called nested transactions,
in some cases, although it's not very nested, but kind of, okay, it's nested.
Nested transactions also, there is a SQL standard keyword.
I think it's SQL standardl standard right a safe point oh yeah i don't know if it's standard but i think it is because i read the postgres docs for this
and they mention one tiny part of it that's not standard compliant and so therefore i assume
everything else is right yeah i think it's standard yeah what was it that part i also
remember but i don't remember which part oh it gets into the details but it's standard yeah what was it that part i also remember but i don't remember
which part oh it gets into the details but it's around the naming of sub transactions so you can
name them so that you can then roll back to the name and it's if you use the same name multiple
times the standard says you should destroy previous ones whereas the postgres implementation
allows you to roll back to one with
the same name and then roll back to it with the same
name again, but that rolls back to the previous one.
Anyway, hopefully that
made some sense.
The idea is nice, right? If you have
really complex transactions
and you expect that
you need to roll back sometimes, you just
put some save points in various
places and you can return to
them and you can imagine it like there are two ways to imagine it i i'm not sure both are correct
let's discuss it so one is like we have some flat picture and save points and we can jump another is
nested and i think nested is more like it's if you look at code, usually it doesn't have indentation, right?
It's like flat, vertical, semi-indented lines of code,
save point one, save point name one, save point name two,
and some statements between them.
And you can roll back to one of previous save points.
But internals are, there is nesting there applied
because when you return to some older save point,
you lose the deeper, newer, fresher save points
because they are internally implemented deeper.
And one of the problems we will discuss,
it's related to this level of nesting or depth of this nesting right yes and
i think on the just on the basics i think it's worth mentioning that even if you know that you
haven't used save points anywhere in your code you could still be using sub transactions without
without exactly yeah all it can call it nested transaction so like i think nesting is a prop
is the right concept here although again like it looks like not nested but it's nested and i think
janga they they use nesting word and they janga and maybe some other or rms will provoke you using
this to use it like implicitly
so you just do something and you
don't know but you use it. Also
PLPG SQL will provoke you to use
it because if you at some point
if you use a lot of PLPG
SQL code as I do at some point
you will want to have begin
except end blocks
except when blah blah blah
so just to handle some failures.
And this is 100% already sub-transactional
because there is another beginning block
which defines the main body of the PLPG SQL function.
That's the big one, I think.
So the ORMs can be using it behind the scenes
without you realizing.
And if you've got any exception
handling in functions, whether that's
functions or triggers,
you are using sub-transactions
or you might be using sub-transactions.
Right. Any exception
handling or maybe the top-level
exception handling?
Is it possible, like, say, begin, blah, blah?
We always have
begin and block for top level, right?
Like implicitly. Oh, no, you mean in a function.
We can put exception.
I think as soon as you're using the exception clause, you are using.
Yeah, I think so as well, but I'm just curious. Okay. But definitely if you use
additional begin, exception, and blocks inside the body of the LPG scale function, it's already a sub-transaction.
So it's an implicit sub-transaction.
So what are the problems with them?
Yeah, great.
Let's be short, as we promised.
The very first problem is that we have uh like four
billion 4.2 billion space and we can use it only only half of it because one half of it is the past
one half of it is the future so there is a transaction id wrap around problem what happens
after two billion 2.1 billion transactions if If freezing didn't happen, if you blocked
TotoVacuum, for example, with long transaction or something, or just turned it off, but it
should wake up even if you turn it off. Okay, so we have four byte transaction IDs. And
to optimize how these IDs are spent, there is also the concept of virtual transaction ID. So if you just run select
or some read-only transaction, it won't increment transaction ID counter, right? So you won't waste
seed value. If you use sub-transactions, a transaction with a couple of sub-transactions,
it's already three. Yeah, that's the big issue, right? Well, not the big issue, but
in this problem.
You start
spending these quite
not cheap seeds
faster than if you didn't
use sub-transactions. But also
there are multi-seeds.
It's a different topic, right?
So this is not, I think it's not
a super big topic. I just want to understand that we move faster if we use sub-sub-
transactions quite often. Cool. Let's move on to sub-n-t.
Second problem is this nesting level, like
depth of nesting. There is a cache, and if you
reach nesting level 64, I don't know who does it.
But this is quite well-documented
in mailing list and other places,
quite well-documented overflow problem.
Yeah, so we're talking about
doing 64 levels of that nesting depth.
I did see a blog post, was it by,
it was on the CyberTech blog, I think, by Lawrence.
Was that on this?
Yeah.
So I think it's quite easy to demonstrate this problem.
Right, and we had a benchmark for it,
a synthetic benchmark.
We did it.
It was easy.
So you just, too many save points,
and you reach level 65.
This is cash per backend,
so it's like local cash,
and in this case, degradation is obvious.
Yeah, so it's not a hard limit, right?
It's just you pay a performance penalty at that point.
And I've never seen this in the real world.
Have you ever seen a case of this?
Well, only in synthetic tests.
Yeah, okay, fine. Yeah, great.
Right, so not in production.
And it's not a hard limit in terms of the problems it gives,
but it's a hard limit.
You cannot change it.
It's a constant in source code, so it's hard.
But it's not huge.
You can go there, but probably
don't want to go there.
A third problem is related to
multi-exceeds, this additional
space of transaction IDs,
multi-transaction IDs,
multi-exec IDs.
Actually, I see it's not very well
documented, these things. For example, multi-exceed wraparound also might happen. We don't see
big blog posts like from Sentry or MailChimp as for regular transaction ideas. But it still
can happen. So documentation doesn't have good words,
like details about how to properly monitor it and so on.
But I mean, it's possible.
Anyway, multi-exec, it's a mechanism.
It's like different space.
And it's a mechanism when multiple sessions
establish a lock on the same object.
For example, like share lock, for example and for example when you
have foreign keys you you deal with multi-exec like implicitly again so progress establish
multiple locks on the same object and the multi-seed value is assigned and there is a
problem like there's a very good blog post from Nelson LH.
How to pronounce?
I have no idea, but Nelson seems like a good fit for the first name.
I'll link it up in the show notes, of course.
Right.
This is excellent blog post from 2021.
I'm not sure which company it was, but the blog post explains several problems actually as well.
And it explains like there is a well-known problem honestly i must admit i didn't know about that problem as well but
the blog post assumes everyone knows it okay so the problem it's not related to sub transactions
but there is a problem with select for update so there is like performance cliff this blog
post explains it so once i read, I started to be very careful
when recommending to use select for update
to parallelize, for example, queue-like workloads.
But you can select for share, right?
Select for update.
Oh, okay.
It's not related to sub-transactions.
We start from there.
So it says there is a performance cliff
related to select for update.
Okay, it explains it,
blah, blah, blah.
Okay, but,
and I now understand,
okay, select for update,
we should be very careful with it.
So, and that's why I became
a bigger fan of single session
consumption of queue-like workloads
because usually with single session,
you can move very fast
because once you start
using select for update you can meet this performance cliff of course at larger scale
like you have many thousands of tps for example so okay forget about select for update then the
plus explains select for share we we're oh for sorry sorry i i i may be messed up with this.
You think select for share is the known problem
and select for update is where sub-transactions come to play with.
Yeah, so yes, sorry, sorry.
Sorry, yes.
So select for share is very well known problem as blog post explains.
But then if we use select for update and sub-transactions, we have similar problems as with select for share is a very well-known problem as blog post explains, but then if we use select for update
and sub-transactions,
we have similar problems as with select for share.
I'm very sorry.
So if
you use select for update, for example,
queue-like workloads,
and you have multiple consumers, and you
don't want them to fight with
log acquisition, so you use select for update,
and also you use sub-transactions,
you might meet a similar
performance cliff as for select for share.
So anyway, watch
out, be very careful
if you use a lot of
thousands of TPS,
be careful with both select
for share and select for update.
This is the
explanation. Again, i'm like making mistakes
here because i didn't see it myself in in production i saw it in synthetic workloads
again but not in production that's why i already keep forgetting forgetting what do you think
about this problem particularly yeah you've made me question which way around it is and i'm not even sure if worth rereading this if this is something that you make extensive use of at high volume and want to
be aware of but if neither of us have seen it in the real world especially you it feels to me like
this is not the one most likely like we've done three issues already and i think it's the fourth
that's going to be most likely to actually affect people and most interesting.
I'm not sure, because again,
this post by Nelson
explains, if you use Django, for example,
and uses nested transactions,
you have save points, and then
you think, oh, it's a good idea, I have
workers fighting with the same
rows to consume
queue-like workloads, or
lock them, I'm going to use select for update.
And here you have the similar performance
clip as they explained for select for
share. But again, it
happens only under
very heavy loads, like
thousands of TPS.
Yeah, and one point they made in
fixing it was
they changed... For Django, even. Yes, for Django.
There was a parameter they could change that was
that past save point equals false transactional topic yeah yeah which completely fixed the issue
right like it there's all the save points exactly yeah yeah it's great that people can do that and
it's really interesting that that's not the yeah it's not the default again with fixed problems getting rid of sub transactions and they actually before we move on from that post they included one
more line that i really liked which was them talking to somebody else who they considered an
expert on the postgres side and uh they quoted sub transactions are basically cursed, rip them out. Cursed. Yeah, cursed or cursed, I guess.
Now I'm like the, like, how to say,
I'm bringing these thoughts to people
and sometimes I see pushbacks like,
no, no, no, like, they're still good.
Well, sorry, it's not only my thought.
I inherited it from this post as well.
But so let me explain the we what we saw
at gitlab yeah this is problem number four problem on four and i saw it myself and help help
troubleshoot and they had also great blog post about this uh and this led to eliminate to big
effort to eliminate some transactions in gitlab code so there is another cache, which is kind of global, not per session,
and it has also threshold 16, hardcoded in source code, there is constant 16. And when you reach 16
sub-transactions on a replica, and you have a long grind transaction, different one, some select, for example,
or just somebody open transaction
and sitting, not consuming XSID anymore.
On the primary, on all replicas,
at some point you will see huge degradation
and some sub-trans SLRU contention.
Lightweight log, if you check,
so it looks like a huge spike in
activity of active sessions.
These active sessions are sitting
in weight event sub-trans SLRU.
Of course, it happens
again under heavy load, like
thousands of TPS. You won't
see it if you don't have a load enough.
It's a very hard problem
to overcome.
Also worth to mention, it's great that you can see it. So it's like, and also worth to mention,
it's great that you can see it in wait events,
Subtrans, SLRU.
Also very great that Amazon RDS guys
documented wait events.
They have best wait event documentation.
Postgres itself doesn't have this level of detail
for wait events.
So I'm constantly learning from those documents.
And also if you have Postgres 13 or newer, there is a new system view, pgstat SLRU. When
we saw that GitLab, unfortunately, it was an older version. So I was thinking, oh, so
sad I'm not having this view because it provides a lot of good counters and
you can see, okay, events of trans-SLRU is popping up there. So this helps with diagnostics.
After spending some offers, we replicated this problem in synthetic environment
and discussed it. And the conclusion was we need to eliminate sub-transactions.
It was, again, as I said, big effort.
I found that really interesting, by the way, I found it really interesting why reducing
sub transactions wasn't enough. And it was really a case of actually eliminating them.
Yeah, well, sorry, I said 1632 here, like 64 is for this local cache and here 32 is the limit of SLRU buffer size.
So yeah, you can reduce them, but it will happen only to some extent.
The problem is this long-lasting transaction on the primary. You have this and also the level of seed consumption.
If you have high TPS, the higher TPS you have, the shorter this long-term transaction needs to be so you can hit, or your replicas hit this performance cliff.
So for example, if you have lower TPS,
you can still have this problem,
but you will need to wait more
with this longer running transaction.
You will only hit it if the query is,
let's say, dozens of seconds or minutes,
or maybe even longer.
Whereas I think you showed nicely in your blog post,
or maybe it was the git lab one
if you ramp that up enough this could even be achieved within seconds like i think the synthetic
one you you were benchmarking was like 18 seconds or something but it was quite easy to see that
this could be low numbers of seconds as well quite easily yeah this like we thought about like maybe
we should just control long-running transactions on primary and be more proactive for beating them.
But 60 seconds, right?
It's like ready to show.
The reason I wanted to hammer that home is I don't think most people think of two second
queries when they're talking about long running transactions.
Yeah, long is relative.
Yeah, true.
And so also, if you don't use replicas for read-only traffic,
there's no problem here.
So it doesn't happen with primary.
It was a difficulty when we needed to troubleshoot it
because with single-node experiment, you cannot reproduce it.
You need two nodes.
You need a replica to see how it works.
But again, we have publicly available reproduction in synthetic environments,
so you can play yourself and see how it hits you. There are good attempts to improve it.
Also, Andrej Borodin is involved. This guy is everywhere. When I have some problem, I see some
effort already. He participates. So there is an effort to increase the salary buffer size
and also change algorithm, search algorithm there and so on. It's interesting. I hope eventually it
will make it into source code. And finally, my approach is let's eliminate sub-transactions
altogether if you aim to grow till like many thousand TPS and so on. But I think we need to eliminate it in source code of
application because we have less control on it. But sometimes I still use sub transactions,
for example, when deploying some complex DGL. Because when you need to do something,
do, do, do something,
and then you need to perform some additional step,
and this step might fail because you are gentle.
I mean, you know you need to acquire a log,
but you have a low log timeout and retries.
You can do this retry at higher level, but you will
need to redo all the steps before it.
Right?
Or you can retry here
with sub-transactions.
And crucially, the transaction's
giving you a lot of benefits that it will
roll back if it fails, ultimately.
So the transaction itself,
the top-level transaction,
is really valuable.
You can't get rid of that.
You can't do it in steps.
You need to do it all or nothing.
Right.
For example, you have some table, you put some data into it,
and then you establish a foreign key,
and you need to establish a foreign key, you need to retry something.
But you also need to make sure these retries don't last long
if you don't want to keep all
the locks you already acquired so far because locks, the rule, remember the rule, locks are
being held until the very end of transaction, rollback or commit. They cannot be free in the
middle of transaction, right? This is impossible. So you should be careful with all the logs already acquired.
So yeah, here we can use certain actions,
retire logic,
and then you just deploy it
when we have lower level of TPS
and we definitely don't have long transactions,
read-only transactions
or other transactions on the primary.
Great.
This is the key.
Yeah, yeah.
So it's interesting and it's
like a story two years old already.
But it was super interesting to
see these things.
And my last words
like I'm very thankful that it's
implemented at Postgres and I
hope it will be improved. I mean, I'm not
an enemy of sub-transactions
like some people might think.
I think it's a great feature, but we should be very careful under heavy loads.
Yeah, it makes a lot of sense.
There's a couple more super minor things that I wanted to mention before we finish,
and that's the Subtrans SLRU in Postgres 12 and older.
So I know Postgres 12 is the only older version that is still supported,
but it had a different name, right?
Subtrans Control Lock, just in case. Right, but it was it had a different name right subtrans control lock
just in case all right yeah and the last one was the gitlab blog post ended on really positive note
that because they the reason they were able to diagnose this with you with some help from other
people internally was that postgres's source code is well documented
and that the source code's open and this would have been a really difficult issue to diagnose
on a closed source system. Anyway I thought that was a really nice point that even though Postgres
does have this issue or can hit this issue at scale using save points or sub-transactions in general,
the fact that it's open gives it that huge benefit
of people being able to diagnose the issue themselves.
Right. Makes sense.
Just keep it in mind. That's it.
Use other people's issues already documented to improve.
Nice one. Well, thanks everybody.
Thank you, Nikolai. Happy Thanksgiving.
Hope you enjoy it and see you
next week. Bye-bye.