Postgres FM - Subtransactions

Episode Date: November 24, 2023

Nikolay and Michael discuss PostgreSQL subtransactions — what they are, and four issues they can cause at scale.  Here are some links to things they mentioned:Subtransactions (docs) https...://www.postgresql.org/docs/current/subxacts.html  SAVEPOINT (docs) https://www.postgresql.org/docs/current/sql-savepoint.html  PostgreSQL Subtransactions Considered Harmful (blog post by Nikolay) https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful  Subtransactions and performance in PostgreSQL (blog post by Laurenz at Cybertec) https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/  Notes on some PostgreSQL implementation details (blog post by Nelson Elhage) https://buttondown.email/nelhage/archive/notes-on-some-postgresql-implementation-details/ Why we spent the last month eliminating PostgreSQL subtransactions (GitLab blog post) https://about.gitlab.com/blog/2021/09/29/why-we-spent-the-last-month-eliminating-postgresql-subtransactions/  ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of PgMustard. This is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, what are we talking about today? Hi Michael, let's talk about sub-transactions and be very short with this. This is a super simple topic because I have a super simple rule, just don't use sub-transactions unless you're absolutely necessary. And I'm going to describe details, but not going too deep.
Starting point is 00:00:29 Let's keep this episode short. Sounds great. And also, it's Thanksgiving, so hopefully people are spending a lot of time with their families. So happy Thanksgiving to anybody that celebrates. And hopefully you'll be thankful for a short episode. Right. I wanted to thank you, of course you'll be thankful for a short episode. Right. I wanted to thank you, of course, for doing this with me so long. Oh, likewise.
Starting point is 00:00:51 It's been a pleasure. Where should we start then? So let's start with the idea of sub-transactions very briefly. The idea is if you have a complex transaction and you want to handle some kind of failure, like for example, a statement might fail, but you don't want to lose the previous statements already completed in the same transaction. Or you want, for example, to explicitly undo some steps, right? It might be actually multiple statements. You can return to previous point.
Starting point is 00:01:31 So you can use sub-transactions, also called nested transactions, in some cases, although it's not very nested, but kind of, okay, it's nested. Nested transactions also, there is a SQL standard keyword. I think it's SQL standardl standard right a safe point oh yeah i don't know if it's standard but i think it is because i read the postgres docs for this and they mention one tiny part of it that's not standard compliant and so therefore i assume everything else is right yeah i think it's standard yeah what was it that part i also remember but i don't remember which part oh it gets into the details but it's standard yeah what was it that part i also remember but i don't remember which part oh it gets into the details but it's around the naming of sub transactions so you can
Starting point is 00:02:10 name them so that you can then roll back to the name and it's if you use the same name multiple times the standard says you should destroy previous ones whereas the postgres implementation allows you to roll back to one with the same name and then roll back to it with the same name again, but that rolls back to the previous one. Anyway, hopefully that made some sense. The idea is nice, right? If you have
Starting point is 00:02:35 really complex transactions and you expect that you need to roll back sometimes, you just put some save points in various places and you can return to them and you can imagine it like there are two ways to imagine it i i'm not sure both are correct let's discuss it so one is like we have some flat picture and save points and we can jump another is nested and i think nested is more like it's if you look at code, usually it doesn't have indentation, right?
Starting point is 00:03:09 It's like flat, vertical, semi-indented lines of code, save point one, save point name one, save point name two, and some statements between them. And you can roll back to one of previous save points. But internals are, there is nesting there applied because when you return to some older save point, you lose the deeper, newer, fresher save points because they are internally implemented deeper.
Starting point is 00:03:38 And one of the problems we will discuss, it's related to this level of nesting or depth of this nesting right yes and i think on the just on the basics i think it's worth mentioning that even if you know that you haven't used save points anywhere in your code you could still be using sub transactions without without exactly yeah all it can call it nested transaction so like i think nesting is a prop is the right concept here although again like it looks like not nested but it's nested and i think janga they they use nesting word and they janga and maybe some other or rms will provoke you using this to use it like implicitly
Starting point is 00:04:26 so you just do something and you don't know but you use it. Also PLPG SQL will provoke you to use it because if you at some point if you use a lot of PLPG SQL code as I do at some point you will want to have begin except end blocks
Starting point is 00:04:42 except when blah blah blah so just to handle some failures. And this is 100% already sub-transactional because there is another beginning block which defines the main body of the PLPG SQL function. That's the big one, I think. So the ORMs can be using it behind the scenes without you realizing.
Starting point is 00:05:04 And if you've got any exception handling in functions, whether that's functions or triggers, you are using sub-transactions or you might be using sub-transactions. Right. Any exception handling or maybe the top-level exception handling?
Starting point is 00:05:19 Is it possible, like, say, begin, blah, blah? We always have begin and block for top level, right? Like implicitly. Oh, no, you mean in a function. We can put exception. I think as soon as you're using the exception clause, you are using. Yeah, I think so as well, but I'm just curious. Okay. But definitely if you use additional begin, exception, and blocks inside the body of the LPG scale function, it's already a sub-transaction.
Starting point is 00:05:49 So it's an implicit sub-transaction. So what are the problems with them? Yeah, great. Let's be short, as we promised. The very first problem is that we have uh like four billion 4.2 billion space and we can use it only only half of it because one half of it is the past one half of it is the future so there is a transaction id wrap around problem what happens after two billion 2.1 billion transactions if If freezing didn't happen, if you blocked
Starting point is 00:06:25 TotoVacuum, for example, with long transaction or something, or just turned it off, but it should wake up even if you turn it off. Okay, so we have four byte transaction IDs. And to optimize how these IDs are spent, there is also the concept of virtual transaction ID. So if you just run select or some read-only transaction, it won't increment transaction ID counter, right? So you won't waste seed value. If you use sub-transactions, a transaction with a couple of sub-transactions, it's already three. Yeah, that's the big issue, right? Well, not the big issue, but in this problem. You start
Starting point is 00:07:09 spending these quite not cheap seeds faster than if you didn't use sub-transactions. But also there are multi-seeds. It's a different topic, right? So this is not, I think it's not a super big topic. I just want to understand that we move faster if we use sub-sub-
Starting point is 00:07:29 transactions quite often. Cool. Let's move on to sub-n-t. Second problem is this nesting level, like depth of nesting. There is a cache, and if you reach nesting level 64, I don't know who does it. But this is quite well-documented in mailing list and other places, quite well-documented overflow problem. Yeah, so we're talking about
Starting point is 00:07:55 doing 64 levels of that nesting depth. I did see a blog post, was it by, it was on the CyberTech blog, I think, by Lawrence. Was that on this? Yeah. So I think it's quite easy to demonstrate this problem. Right, and we had a benchmark for it, a synthetic benchmark.
Starting point is 00:08:12 We did it. It was easy. So you just, too many save points, and you reach level 65. This is cash per backend, so it's like local cash, and in this case, degradation is obvious. Yeah, so it's not a hard limit, right?
Starting point is 00:08:27 It's just you pay a performance penalty at that point. And I've never seen this in the real world. Have you ever seen a case of this? Well, only in synthetic tests. Yeah, okay, fine. Yeah, great. Right, so not in production. And it's not a hard limit in terms of the problems it gives, but it's a hard limit.
Starting point is 00:08:43 You cannot change it. It's a constant in source code, so it's hard. But it's not huge. You can go there, but probably don't want to go there. A third problem is related to multi-exceeds, this additional space of transaction IDs,
Starting point is 00:09:00 multi-transaction IDs, multi-exec IDs. Actually, I see it's not very well documented, these things. For example, multi-exceed wraparound also might happen. We don't see big blog posts like from Sentry or MailChimp as for regular transaction ideas. But it still can happen. So documentation doesn't have good words, like details about how to properly monitor it and so on. But I mean, it's possible.
Starting point is 00:09:30 Anyway, multi-exec, it's a mechanism. It's like different space. And it's a mechanism when multiple sessions establish a lock on the same object. For example, like share lock, for example and for example when you have foreign keys you you deal with multi-exec like implicitly again so progress establish multiple locks on the same object and the multi-seed value is assigned and there is a problem like there's a very good blog post from Nelson LH.
Starting point is 00:10:06 How to pronounce? I have no idea, but Nelson seems like a good fit for the first name. I'll link it up in the show notes, of course. Right. This is excellent blog post from 2021. I'm not sure which company it was, but the blog post explains several problems actually as well. And it explains like there is a well-known problem honestly i must admit i didn't know about that problem as well but the blog post assumes everyone knows it okay so the problem it's not related to sub transactions
Starting point is 00:10:36 but there is a problem with select for update so there is like performance cliff this blog post explains it so once i read, I started to be very careful when recommending to use select for update to parallelize, for example, queue-like workloads. But you can select for share, right? Select for update. Oh, okay. It's not related to sub-transactions.
Starting point is 00:11:01 We start from there. So it says there is a performance cliff related to select for update. Okay, it explains it, blah, blah, blah. Okay, but, and I now understand, okay, select for update,
Starting point is 00:11:12 we should be very careful with it. So, and that's why I became a bigger fan of single session consumption of queue-like workloads because usually with single session, you can move very fast because once you start using select for update you can meet this performance cliff of course at larger scale
Starting point is 00:11:29 like you have many thousands of tps for example so okay forget about select for update then the plus explains select for share we we're oh for sorry sorry i i i may be messed up with this. You think select for share is the known problem and select for update is where sub-transactions come to play with. Yeah, so yes, sorry, sorry. Sorry, yes. So select for share is very well known problem as blog post explains. But then if we use select for update and sub-transactions, we have similar problems as with select for share is a very well-known problem as blog post explains, but then if we use select for update
Starting point is 00:12:05 and sub-transactions, we have similar problems as with select for share. I'm very sorry. So if you use select for update, for example, queue-like workloads, and you have multiple consumers, and you don't want them to fight with
Starting point is 00:12:21 log acquisition, so you use select for update, and also you use sub-transactions, you might meet a similar performance cliff as for select for share. So anyway, watch out, be very careful if you use a lot of thousands of TPS,
Starting point is 00:12:38 be careful with both select for share and select for update. This is the explanation. Again, i'm like making mistakes here because i didn't see it myself in in production i saw it in synthetic workloads again but not in production that's why i already keep forgetting forgetting what do you think about this problem particularly yeah you've made me question which way around it is and i'm not even sure if worth rereading this if this is something that you make extensive use of at high volume and want to be aware of but if neither of us have seen it in the real world especially you it feels to me like
Starting point is 00:13:17 this is not the one most likely like we've done three issues already and i think it's the fourth that's going to be most likely to actually affect people and most interesting. I'm not sure, because again, this post by Nelson explains, if you use Django, for example, and uses nested transactions, you have save points, and then you think, oh, it's a good idea, I have
Starting point is 00:13:37 workers fighting with the same rows to consume queue-like workloads, or lock them, I'm going to use select for update. And here you have the similar performance clip as they explained for select for share. But again, it happens only under
Starting point is 00:13:53 very heavy loads, like thousands of TPS. Yeah, and one point they made in fixing it was they changed... For Django, even. Yes, for Django. There was a parameter they could change that was that past save point equals false transactional topic yeah yeah which completely fixed the issue right like it there's all the save points exactly yeah yeah it's great that people can do that and
Starting point is 00:14:19 it's really interesting that that's not the yeah it's not the default again with fixed problems getting rid of sub transactions and they actually before we move on from that post they included one more line that i really liked which was them talking to somebody else who they considered an expert on the postgres side and uh they quoted sub transactions are basically cursed, rip them out. Cursed. Yeah, cursed or cursed, I guess. Now I'm like the, like, how to say, I'm bringing these thoughts to people and sometimes I see pushbacks like, no, no, no, like, they're still good. Well, sorry, it's not only my thought.
Starting point is 00:15:01 I inherited it from this post as well. But so let me explain the we what we saw at gitlab yeah this is problem number four problem on four and i saw it myself and help help troubleshoot and they had also great blog post about this uh and this led to eliminate to big effort to eliminate some transactions in gitlab code so there is another cache, which is kind of global, not per session, and it has also threshold 16, hardcoded in source code, there is constant 16. And when you reach 16 sub-transactions on a replica, and you have a long grind transaction, different one, some select, for example, or just somebody open transaction
Starting point is 00:15:47 and sitting, not consuming XSID anymore. On the primary, on all replicas, at some point you will see huge degradation and some sub-trans SLRU contention. Lightweight log, if you check, so it looks like a huge spike in activity of active sessions. These active sessions are sitting
Starting point is 00:16:10 in weight event sub-trans SLRU. Of course, it happens again under heavy load, like thousands of TPS. You won't see it if you don't have a load enough. It's a very hard problem to overcome. Also worth to mention, it's great that you can see it. So it's like, and also worth to mention,
Starting point is 00:16:26 it's great that you can see it in wait events, Subtrans, SLRU. Also very great that Amazon RDS guys documented wait events. They have best wait event documentation. Postgres itself doesn't have this level of detail for wait events. So I'm constantly learning from those documents.
Starting point is 00:16:45 And also if you have Postgres 13 or newer, there is a new system view, pgstat SLRU. When we saw that GitLab, unfortunately, it was an older version. So I was thinking, oh, so sad I'm not having this view because it provides a lot of good counters and you can see, okay, events of trans-SLRU is popping up there. So this helps with diagnostics. After spending some offers, we replicated this problem in synthetic environment and discussed it. And the conclusion was we need to eliminate sub-transactions. It was, again, as I said, big effort. I found that really interesting, by the way, I found it really interesting why reducing
Starting point is 00:17:31 sub transactions wasn't enough. And it was really a case of actually eliminating them. Yeah, well, sorry, I said 1632 here, like 64 is for this local cache and here 32 is the limit of SLRU buffer size. So yeah, you can reduce them, but it will happen only to some extent. The problem is this long-lasting transaction on the primary. You have this and also the level of seed consumption. If you have high TPS, the higher TPS you have, the shorter this long-term transaction needs to be so you can hit, or your replicas hit this performance cliff. So for example, if you have lower TPS, you can still have this problem, but you will need to wait more
Starting point is 00:18:32 with this longer running transaction. You will only hit it if the query is, let's say, dozens of seconds or minutes, or maybe even longer. Whereas I think you showed nicely in your blog post, or maybe it was the git lab one if you ramp that up enough this could even be achieved within seconds like i think the synthetic one you you were benchmarking was like 18 seconds or something but it was quite easy to see that
Starting point is 00:18:55 this could be low numbers of seconds as well quite easily yeah this like we thought about like maybe we should just control long-running transactions on primary and be more proactive for beating them. But 60 seconds, right? It's like ready to show. The reason I wanted to hammer that home is I don't think most people think of two second queries when they're talking about long running transactions. Yeah, long is relative. Yeah, true.
Starting point is 00:19:22 And so also, if you don't use replicas for read-only traffic, there's no problem here. So it doesn't happen with primary. It was a difficulty when we needed to troubleshoot it because with single-node experiment, you cannot reproduce it. You need two nodes. You need a replica to see how it works. But again, we have publicly available reproduction in synthetic environments,
Starting point is 00:19:47 so you can play yourself and see how it hits you. There are good attempts to improve it. Also, Andrej Borodin is involved. This guy is everywhere. When I have some problem, I see some effort already. He participates. So there is an effort to increase the salary buffer size and also change algorithm, search algorithm there and so on. It's interesting. I hope eventually it will make it into source code. And finally, my approach is let's eliminate sub-transactions altogether if you aim to grow till like many thousand TPS and so on. But I think we need to eliminate it in source code of application because we have less control on it. But sometimes I still use sub transactions, for example, when deploying some complex DGL. Because when you need to do something,
Starting point is 00:20:45 do, do, do something, and then you need to perform some additional step, and this step might fail because you are gentle. I mean, you know you need to acquire a log, but you have a low log timeout and retries. You can do this retry at higher level, but you will need to redo all the steps before it. Right?
Starting point is 00:21:09 Or you can retry here with sub-transactions. And crucially, the transaction's giving you a lot of benefits that it will roll back if it fails, ultimately. So the transaction itself, the top-level transaction, is really valuable.
Starting point is 00:21:25 You can't get rid of that. You can't do it in steps. You need to do it all or nothing. Right. For example, you have some table, you put some data into it, and then you establish a foreign key, and you need to establish a foreign key, you need to retry something. But you also need to make sure these retries don't last long
Starting point is 00:21:44 if you don't want to keep all the locks you already acquired so far because locks, the rule, remember the rule, locks are being held until the very end of transaction, rollback or commit. They cannot be free in the middle of transaction, right? This is impossible. So you should be careful with all the logs already acquired. So yeah, here we can use certain actions, retire logic, and then you just deploy it when we have lower level of TPS
Starting point is 00:22:15 and we definitely don't have long transactions, read-only transactions or other transactions on the primary. Great. This is the key. Yeah, yeah. So it's interesting and it's like a story two years old already.
Starting point is 00:22:28 But it was super interesting to see these things. And my last words like I'm very thankful that it's implemented at Postgres and I hope it will be improved. I mean, I'm not an enemy of sub-transactions like some people might think.
Starting point is 00:22:43 I think it's a great feature, but we should be very careful under heavy loads. Yeah, it makes a lot of sense. There's a couple more super minor things that I wanted to mention before we finish, and that's the Subtrans SLRU in Postgres 12 and older. So I know Postgres 12 is the only older version that is still supported, but it had a different name, right? Subtrans Control Lock, just in case. Right, but it was it had a different name right subtrans control lock just in case all right yeah and the last one was the gitlab blog post ended on really positive note
Starting point is 00:23:15 that because they the reason they were able to diagnose this with you with some help from other people internally was that postgres's source code is well documented and that the source code's open and this would have been a really difficult issue to diagnose on a closed source system. Anyway I thought that was a really nice point that even though Postgres does have this issue or can hit this issue at scale using save points or sub-transactions in general, the fact that it's open gives it that huge benefit of people being able to diagnose the issue themselves. Right. Makes sense.
Starting point is 00:23:54 Just keep it in mind. That's it. Use other people's issues already documented to improve. Nice one. Well, thanks everybody. Thank you, Nikolai. Happy Thanksgiving. Hope you enjoy it and see you next week. Bye-bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.