Postgres FM - LWLocks

Starting point is 00:00:00 Hello and welcome to Postgres FM, a weekly show about all things PostgreSQL. I am Michael, founder of PG-Mustard, and as usual, I'm joined by Nick, founder of Postgres AI. Hey, Nick, how's it going? Hello, Michael, I'm going great. How are you? I am good. Also, what are we talking about this week? We're talking about what's written on the cover. Are we going to write LW locks or lightweight locks as like a full word? What do you think? I like the shorter version, of course. There's also a question to write three uppercase letters or everything lowercase.

Starting point is 00:00:38 Yeah, lightweight, Lox, it's like it's even hard to pronounce. But LV locks also not good, right? LW locks. I like some systems call it latches, you know? Latches, yeah. Yeah, yeah. Why PostGurz doesn't do it? Maybe because of Linux or I don't know.

Starting point is 00:00:56 naming things is hard yeah yeah yeah because we have confusion sometimes when we say it just locks you know like when we say backups or logical backs up backups yeah here also locks there are two types of locks there are more types of locks right but there are big two major types yeah is it category or type in pg start or in pgs that or good point Yeah, maybe types. I don't know. Another loaded term. Types is another loaded term.

Starting point is 00:01:31 Or mode. Or maybe mode. Let me just check right now. Maybe it's called mode and PG logs. I'm constantly confused about, you know, there are some terms like class type mode. Like they can be like they are quite abstract, right? Yeah, it's called mode. In PG locks, it's called mode.

Starting point is 00:01:53 Cool. But PG locks is about heavy. weight locks we are talking today about lightweight looks right yeah we did we did a whole episode that we just called locks because that's generally what they're referred to the heavy weight locks yeah if there is no additional word it means heavyweight locks so yeah and and why why so because heavy weight locks you can name them just logs because they are closer to user, right? You can see them in PG logs, for example.

Starting point is 00:02:27 You can sometimes, not always, but sometimes you can acquire them directly using just lock SQL comment, right? Just log table name. Or select for update and you see the transaction being acquired some locks, right? Yeah, I also think in general more people come across them because they affect you at earlier stages in project life cycle. Like, you don't have to be at such extreme scale to start being affected by them or having to be aware of them. Yeah, I agree.

Starting point is 00:03:01 In general case, because in some cases, for example, if you have read-only workloads, that's it. In Redonly, you don't, like, you have heavyweight logs, but you won't notice them because they're, like, access share lock, that's it. Right. But if you have really a lot of, like, a lot of TPS, you might start. observing some light wet locks right yeah i was thinking even like schema changes though like even this was edge case okay in general i agree with you heavy locks you bump into them sooner schema change is a great example yeah cool so where did you i mean starting with the difference between locks and that work locks is probably great um yeah let's talk more about differences

Starting point is 00:03:49 because there are important differences to understand and feel and take into account when you develop things or tune, optimize, scale, migrate, I don't know. So heavyweight locks are acquired during like SQL operations and they are acquired for like database objects, like relational level locks. By the way, this is super confusing. I will be talking about heavy weight locks a little bit, trying to make it shorter. But as you know, I write again almost every day. I skip some weekends, but I write post-guismarathon posts again. And many days I already sit in between heavyweight locks and lightweight locks and research one of lightweight locks lock manager, right?

Starting point is 00:04:44 So relation level locks, it's quite confusing name. because they are called in documentation, they are called table level logs, which is misleading because they are also this type of like the same thing. And inside documentation, it's already it becomes clear that indexes are also involved and materialized views and views. So all relations. Sequences are also relations, right? Yes. I never thought of it like that. Well, if you check PG class and real type, I think S is sequence. maybe I should check again. Should we

Starting point is 00:05:21 develop a habit to check things right online? Why not? Yeah, so I will be checking but meanwhile you can acquire logs, heavyweight locks, on tables, on indexes on database, right? Like even higher level, you can lock the whole database

Starting point is 00:05:37 and we know the recent problem when Recall AI, blog post our clients they posted about database level lock acquired when Netify happens to establish sequential notify events at

Starting point is 00:05:53 commit time. And also row level logs. Yeah. Tapel level or raw level. Let's leave it for another time. So you can acquire locks on database objects.

Starting point is 00:06:07 These are heavy weight locks. The commutation is also confusing because it says explicit locking. Although most of the cases where you have it, it's implicit. Locking. You say alter table and you don't say lock table. You say alter table.

Starting point is 00:06:22 So it's actually implicit. I have always like some shift in my brain when I need to Google documentation for heavyweight logs. I just remember I need to search. I need to ignore the fact that I'm going to look at explicit locking documentation, although I need the implicit locking documentation, right? By the way, before we move on, I checked PG-Cliquist. class and you're right sequences are in there. And weirdly, the rel kind is capital S.

Starting point is 00:06:55 All of the others are lowercase, I think. Well, the ones I can see anyway. Does it mean something? I don't know. Yeah. Oh, by the way, explicit locking documentation, it mentions that you can lock indexes with access share lock, for example. But you cannot do it explicitly. you cannot say lock and index name so i'm pretty sure you cannot do it with sequences as well yeah anyway so these are heavyweight locks right so you basically your actions your actions i mean your sequel this is what directly creates heavyweight locks and why why is it needed because we need to we are not working in single user mode yeah we need to we need to

Starting point is 00:07:47 protect resources from concurrent operations, reading, writing, changing. And usually we don't need to protect from reading. But while somebody is reading, another backend shouldn't modify it, usually, right? For example, if you read from table, dumping it, for example, other session cannot modify, like an add a column, for example, cannot run DDL, for example. Or drop it, for example. Yeah. And the important thing about heavyweight locks compared to lightweight locks

Starting point is 00:08:22 is to understand that once a lock is acquired, it can be released only in the very end of transaction. Commit or rollback, that's it. Only two options to release this lock. You cannot release it midway. Right. And this is super important for understanding, always. It means that transactions should be shorter.

Starting point is 00:08:41 So your actions won't affect others. or chances to affect others would be lower, right? Or time that it affects others is lower, right? Yeah, yeah, yeah. Because you will affect people just for less time and there's a point at which that becomes unnoticeable or acceptable. Yeah, yeah. Or won't affect at all if they come a little bit later.

Starting point is 00:09:06 But if you change something or even if you read something and keep transaction open for hours, it means nobody can modify this. stable no ddl is possible auto vacuum cannot do some things and so on like it's it's bad yeah it's worse than that isn't it we've took i know we've talked about this before but if ddl comes along and doesn't have a lot time out then actually you can suddenly be down yeah yeah so because yeah this is also a good point so heavy locks they have this ability like this property to be acquired release happens only in the very end and what you say also good point they also

Starting point is 00:09:48 there is a lock manager this by the way i couldn't find definition of lock manager nowhere nowhere like it's like it's obvious right even in the source code it's not defined which is interesting so lock manager is responsible for managing locks heavyweight locks right and backends can form a queue of waiting for a lock acquisition. So if I'm waiting to acquire lock, some other backends can be waiting and they, they like ask where is the end of the line and go there. Right. So it's just natural.

Starting point is 00:10:24 Like in the order of first first like in natural order, right. So unlike lightweight locks, lightweight locks acquired and released very quickly. I think, uh, documentation source code, they mentioned it's like, dozens of operations. Unlike, there is underlying concept of spin locks, which like few operations only, like few instructions only. Lightweight locks are bigger,

Starting point is 00:10:52 but it's very fast as well. Like acquired and released, and they don't wait until the end of transactions because they work in lower level of obstruction. It's not closer to users, it's closer to resources like memory, right? So in their main purpose is to protect some physical resources like parts of the memory shared buffers and so on right okay yeah like like

Starting point is 00:11:19 these things and so they can be acquired and list quickly there are only two types exclusive lock and share lock unlike heavy weight logs heavy weight logs we have a list and interesting relationships between different ones right here it's only share and exclusive shared and exclusive shared logs don't conflict shared lightweight logs don't conflict but exclusive lock cannot be acquired while share lock is still running lasting right it should be real share lock should be released first and only you can acquire exclusive for because exclusive lock is needed to modify the resource right share lock is needed to protect for reading it's saying I'm

Starting point is 00:12:03 reading don't change it because I'm still reading and when I'm done you can modified right so so this is lightweight locks and that's why they are lightweight because they are much much shorter living right so this is these are main differences between that what else maybe types of lightweight locks or forgive the word types but you know what what should people be aware of because i've only really come across one type because that's the type that seems to cause the problems but what should people be aware of at least yeah so

Starting point is 00:12:43 yeah types modes I struggle I'm mixing these terms and it's easy hard to distinguish between them so if we talk about types exclusive and shared we just covered if we talk about different

Starting point is 00:12:59 like kinds or yeah let's say weight events we observe in PGs that activity. Puggestad activity is the main community of statistics system view. Everyone should think, like, learn about it. Right. It's super important because it shows what's currently happening in database.

Starting point is 00:13:21 And there are two columns called weight event type and weight event. Also, by the way, slightly confusing because the word type is there and so on. I would prefer, like, it would be good to name that thing, like classes maybe or category I don't know because type war is so overused or like overloaded right anyway weight event type can be I think there are less than 10 class 10 types and two of them which are most interesting today is a lock meaning heavy weight lock and lW lock and weight event type lw you log, you can check documentation. There are many, many, many, many dozens of weight events for LW log, meaning that we have a lot of kinds of LW lock. These kinds, like, again,

Starting point is 00:14:15 types are only like exclusive and shared, but these kinds, they, it's classification with respect to the resource we are locking. For example, log manager itself, although the main purpose of lock manager is to handle, to manage heavyweight locks. When it does it, it does it using a piece of memory, shared memory. Special piece of shared memory called, well, it's called like main log table. Right. It's a big piece of memory which is segmented partition to 16 partitions starting postgis, I think, 9.2. or 8.2, 8.2, actually.

Starting point is 00:15:01 It was very long. NAM partition, NAM lock partitions, 16. And when a new information about heavy log, heavy weight lock is needed to be written there, the partition of this main log table where it needs to be written,

Starting point is 00:15:20 it needs to be locked by lightweight lock, right? To ensure, nobody else is writing to it. Yeah. So log manager can have up to 16 lightweight locks, which are seen as LW lock lock manager. 16 because we have 16 partitions of this main log table in memory. And how it works, based on, for example, for relation level locks, based on relation name,

Starting point is 00:15:50 there is a hash function which understands which partition to use, determines which. which partition to use, right? So the same table or index will always go to the same partition of all of those 16 partitions, right? So this was a long time ago. So back in the day, I'm guessing this was one thing and there was probably too much content before 8.2, yeah. Yeah, okay, that's what you're talking about.

Starting point is 00:16:18 Great. Yeah, and this, just to, like, this is a confusion because there was another 16. Yes, that's what I think. It's a different constant, which changed. That behavior changed in postgues 18. This behavior hasn't changed. Yeah, fast path changed. This hasn't.

Starting point is 00:16:37 This hasn't. This still is 16 partitions. And if you have a lot of heavy lock acquisition attempts for the same relation, a lot I mean like thousands or maybe dozens of thousands per second, a lot, really a lot. Then exclusive lightweight. locks on the same partition will be competing and while while you are waiting like while we try to establish heavy weight lock to some index or table but that partition is already logged by exclusive lightweight logs from different back-and's attempt to to write

Starting point is 00:17:18 heavy-weight lock information about about it we need to wait a little bit and this little weight will be seen as weight event type lw lock and weight event lock manager right is it like clear because we are yeah we have weird combination of heavyweight locks and lightweight locks in the same topic here right especially because lock the one the lightweight lock manager is actually looking at heavyweight locks is that's yeah yeah confusing part for sure but i was just looking up in the docks in the table of all of the they've called them types weight events of type lw lock and you're right it's such a long list yeah i think there might be 50 or more you will see checkpoint auto vacuum there but isn't it great that we don't observe them like

Starting point is 00:18:15 it's quite well optimized right like for example yeah yeah you'll find a lot of stuff i see many of them I do observe, not just a lock manager. We saw many of them in production and yeah. Usually the rule is if you see lock weight event type, it means you need to go and think how to redesign your application. Because classic example is, for example, we are doing some billing system and we have a single account which needs to be updated for each transaction people do.

Starting point is 00:18:50 I mean, financial transaction. Yeah. And this is a classic. example when you shoot yourself into the foot because updating the same row will will be like hotspot and you will see a lot of you see heavy weight lock contention because many many backends try to update the same row right yeah so or you mentioned that a good example if you do ddl without lock time out and retries you also can have a chain of waiting backends which just wait until your ddl finishes but it itself is waiting for some other

Starting point is 00:19:31 select and people i see examples in blog posts people like i see examples people try to explain this problem but many of them involve updates deletes no just select ddl and many other selects you don't need even to update anything or insert or delete that's it just selects and ddl and and And we have, and we see a log weight event in PG-Sat activity, it's bad. But we, but when we talk about, yeah, and for log, you saw like it's relation, object, page, also page interesting, Tappel, virtual KSI, it's interesting. Advisory locks is kind of different thing. But for LW log, we have a lot, and among them there is log manager.

Starting point is 00:20:22 I like the approach, which I think RDS started, maybe not RDS, but they use it a lot, and I also started using it. We usually take weight event type and weight event to columns from PGS activity and write them with colon in between. So it becomes LW log, colon, log manager. Just in, you know, like in texts where we discuss problems and do RCA or something. Yeah, it's just convenient. like a naming convention and i wanted to highlight that this problem which related to both heavyweight locks and lightweight locks in the name of it we have the word lock twice lw lock manager first time it's about lightweight lock but in lock manager it's about heavy

Starting point is 00:21:09 weight lock yeah that's why the lock like is encountered twice two times what else like we have other we observed other types of lightweight locks problems for example yeah well i've spotted one in the list subtrans sly u yeah this this is my favorite one although yeah i must admit uh i haven't touched this topic for a few years yeah i touched it heavily in 2021 when git lab had the problem yeah and studied it and yeah and since then slr u it's simply used at small caches postgues have multiple of them i think in Since Postgres 12 or 13, we have PG-start SLRU system view. You can see like counters of work with those SLUs.

Starting point is 00:21:59 But also SLRU mechanism got some handles, I mean, settings, GUC, GUC, Gux, right? Yeah, you can change them and increase it and, yeah, to postpone this performance cliff. So I haven't seen them often since then. Like there are customers who usually read and account. We help easily, like just try to get rid of sub-transactions, although, like, I still think by default you should avoid sub-transactions, but in some cases I already see they can be used in a safe way. You know, you need to just understand the limits and then you can use them. For example, again, DTL, sometimes complex changes of schema. You don't want to lose part of schema.

Starting point is 00:22:48 And this approach with attempts to acquire a lock, how can you do it inside transaction? You need sub-transaction, right? Yeah. Because if it fails, you don't want to lose everything. You want to lose only the last step, right? And this is exactly where I think it's worth thinking about to use some transaction, but you need to understand, like, details. For example, you don't want to have long transaction.

Starting point is 00:23:18 running on the primary in parallel, to other table, by the way, not to any table. And replicas which receive a lot of like transactions per second because they might be down because of the use of sub transaction and you can see subtrans SLRU because SLRU is overflowing. And again, when it's overflown, the lightweight locks acquisition, like, we see contention and we see it as in PGSAT activity and weight event analysis as LW log subtrans SLRU, right? There are other SLRUs, right, mentioned like Netify SLRU. I'm pretty sure multi-exact upset, multi-exact member SLRU.

Starting point is 00:24:04 Speaking of them, we had an episode about the case from metronome, right? Yeah, true. Yeah, it was a great blog post. Like this is a great example how company can. share with community what happened and others benefit since then we had another client new client we had which came to us with very same problem related to maltaxact member wow member exhaustion yeah exactly wow so it's also observed like a lightweight lock multi-exact blah blah there are there's a bunch of multi-exact lightweight locks you can see in the table another one is very very very

Starting point is 00:24:48 popular one is LW log buffer mapping. So usually it's called buffer thrashing, right? When we like the buffer pool is not big enough and a lot of eviction happens and new pages are loaded all the time. And we see when, of course, when it's happening to protect memory, Postgis needs to use exclusive lightweight lock when writing happening, shared lightweight lock. reading happening and this is exactly when we can see some back-ends are a little bit waiting for other back-ends right and interesting and this is how it's it's seen so so like if somebody's limited by their the amount of shared memory or like shared yeah yeah solution is simple we need to increase the buffer pool we need to fight blow because this is what like increases

Starting point is 00:25:42 this problem we need to get rid of unused indexes and red other things yeah because they also can contribute to it right unused ones of course like I was just thinking they'd be evicted and not better when you change something with insert or non-hot update all indexes yeah to be loaded to be changed yeah yeah yeah and they contribute to this spam yeah coming coming to the buffer pool and also to wall but it's a different story so we that's I think people underestimate how bad bloat is. I feel it like we have a new wave of companies coming for consulting to us, which I call

Starting point is 00:26:30 AI companies. They probably are quite old companies, but they heavily transition to AI. And they have increasing data volumes, increasing workloads, and they underestimate the problem with bloat and unused indexes and index right amplification all this spam coming to memory and wall and it means backups replication it's like this thing is like multi-sided yeah like cascades doesn't it yeah yeah it's it doesn't also it's not performance cliff which you like suddenly see and oh we have a problem it's slowly slowly slowly like growing or like you sink into it slowly like like sand or like a swamp yeah yeah and then you need to increase instance size or

Starting point is 00:27:21 think about sharding and so on by the way i like sharding a lot but i think in many cases it's just hiding the problem it's just distributing the problem and you just you pay to not to solve problems i i for business it's sometimes uh valid approach right you just you don't have time to solve this but we also can just implement automated procedures to reduce the amount of trash you have, right? Yeah, I hear about it all of the time as well, even at smaller companies that just upgrade their instance, especially at the lower sizes.

Starting point is 00:27:58 They just don't want to throw engineering time at it because of much. I recently started asking directly on the very first call when we have consulting, like, guys, do you care about blood and index health? And I usually hear no. And then it's our job to explain why, right? So it's not, yeah, bloat is not the problem. Bloat is like, well, it might be the root cause, but it's not the problem users see.

Starting point is 00:28:26 They don't see, they're not. Well, on the new stuff is also they don't see. Yeah, yeah. They created some indexes and forgot about it, right? Or redundant indexes. Same problem. I mean, similar in this case, but then boom, like buffer,

Starting point is 00:28:40 mapping, LW log matter mapping. Why? We don't have enough memory. Yeah. Also, this might be part of it. I think bloat used to be worse. Like, I remember in the before a few optimizations. 13, 14. Yeah. Before then, we, you could come across, especially indexes that were like 99.9%.

Starting point is 00:29:02 You know, in extreme cases, you could come across very, very, very, very bloated indexes. It's just much less likely now. So I think whilst. It's less likely, but not much. Yeah, well. Okay. Well, only deduplication doesn't solve the problem when, like, Postgis B3 doesn't have merge. But it does have bottom up deliche.

Starting point is 00:29:24 It does like a lot less split in. Well, if you have in the middle of B3 half empty page, this space won't be used if you write. For example, it's increment. timestamp or time stamp right you're writing to the end always in the middle you nobody will write so you have this blow to which won't be eliminated true until like yeah anyway let's get back to the topic until index well i was going to say until like logical replication upgrade or something like that well yeah yeah yeah anyway when you rebuild index anyway right yeah by the way let me

Starting point is 00:30:06 advertise something we have open source component which we just recently developed it's it's now entering better stage and it aims to like to automatically rebuild indexes on any platform just reach out to me in any way I will share details because we are looking for more cases before we like move on and and make it more publicly available you know I don't advertise it because I want to understand use cases people have in terms of anyway like If you want to rebuild indexes in automated fashion, we have open source company for you. Fresh one, very interesting, not only for B3, but for any index, almost any. I think Brin is not supported.

Starting point is 00:30:50 All others are supported. I've never seen a bloated brin index. Have you? Yeah, good question. It can degrade a lot if you modify. We've got another good episode on that, actually. I mean, Max Multi, I think is quite good. On every sentence we say, we had an episode already, right?

Starting point is 00:31:08 yeah anyway yeah we're looking for like early adopters for this small tool cool i'll put it in the show notes yeah yeah which is open source yeah okay all right is another one we see often oh really yeah well again like when somebody hasn't dropped unused and redundant indexes and they write a lot of oh i forgot to mention of course you can write you can find queries which to this buffer thrashing, right, and maybe get rid of them or make it less, causing less smaller storms, right? So you can optimize queries sometimes and avoid that LW log buffer mapping contention. Yeah. So about wall rights, same thing. Like if you have a lot of indexes which contribute to wall rights, or you have like very frequent checkpoints and you have a random

Starting point is 00:32:08 access pattern of rights when you write to the same page often but between those rights to the same page you have checkpoint it means this page will go as a full page right to wall multiple times over and over right you have a lot of wall and this can can be also an issue right you so and if discs are are maybe disk i.o is saturated as well yeah these things yeah or iops yeah there are there are several things there like there is i you all right i think and l w lock uh will write i don't remember from from top of my head but there are interesting nuances there i probably should cover it some someday in in postgismarathon because sometimes you are waiting on discs but sometimes you are waiting on uh locking internal

Starting point is 00:33:00 structures for lightweight lotion right so so if wall buffers is uh like there is a quite small amount of wall buffers in the memory. So if it's already fully written, it needs to go to disk. Probably you are waiting on disk. But if you are writing to, yeah. So it should be checked in detail when we have it. There are several weight events there. Also interesting thing which pops up recently is sync rep, LW log sync rep. Syncronous replication. When the primary, cannot continue because it weighs confirmation from replicas, synchronous replicas, right? So, okay, and you've seen that causing a lot, like, you're seeing a lot of those. Not a lot, but this started happening more and more if you use synchronous replication, quorum commit. Yeah, I don't, I come across a lot of people that think they're going. going to use synchronous replication, but then end up don't.

Starting point is 00:34:09 Do you see it quite commonly used? Let's say so. Big old clusters are on async. All new clusters should be on synchronous replication, although there is a bunch of issues with it. And there was a great talk a few months ago presented by Alexander Kukushkin about misunderstanding of synchronous application. And various anomalies you can experience in current implementation because it's actually not synchronous replication.

Starting point is 00:34:47 This is the thing. Like because when commit happens, main thing, when commit happens on the primary, it actually happens. It already happened. Commit happened. But we just are locked by the way on heavy lock, right? Heavyweight lock on our transaction is locked. And we are waiting for one of replicas to confirm. Or this is actually a lightweight lock sync replica.

Starting point is 00:35:09 Yeah, this is it. Yeah, this is it. This is it. Okay. And we are waiting, and when a replica confirms, this light white lock released. This is a special, like, case when we need to wait for, like, for something outside, which will help us unlock. Yeah.

Starting point is 00:35:30 So I have watched that talk, and I remember a really good slide in it with, like, all of the hops, like, all of, a really good diagram of what actually happens. So, yeah, I'll include a link to that. Yeah, and this, how to troubleshoot LWOXNCREP is, it's like, it's not fully understood. There are interesting new cases which are not covered in articles and talks. I think more materials are coming. I know about some.

Starting point is 00:35:58 Okay. Yeah. Can I, can you share them with me? Well, it's not, it hasn't happened. Oh, okay, cool. Check out. upcoming PGConfew. Great.

Starting point is 00:36:11 So, yeah. How would you feel about calling the episode there and actually then talking about specifically the lock manager issues in a different episode? Sounds good. Good idea.

Starting point is 00:36:25 Because there are interesting answers inside, yeah. Great. Let's call it a day for today. I think we covered 1.5% because it's huge the list is huge some of them I haven't seen ever but I think that's useful I think it's useful to kind of

Starting point is 00:36:44 for people to get a grasp on like which ones are they most like to hear what main thing always I mention when we talk about a light wet log and actually weight event analysis RDS documentation has great list of knowledge and how to style

Starting point is 00:37:00 troubleshooting documents for many weight events including many lightweight locks. Not all of them, only subset. But it's a great documentation. I hope it will be improved over time, extended, right? Yeah. Yeah, it's good.

Starting point is 00:37:16 I know a lot of effort was invested to building by many people. I recently reread the blog post by Jeremy Schneider, how it was done during a couple of years. So it was a huge effort. That's why it's so good. It's obvious. short documents but so much wisdom inside yeah done over a long period of time but also by very good people like people that

Starting point is 00:37:43 really know new stuff so yeah so basically do this with this like list of mitigation action items but behind each step many RCA's right yeah cases case studies it's so much time paid to just write one line what to do or what to check or how to change how to improve that's a great example of documentation yeah so

Starting point is 00:38:14 that's it well thanks so much Nikolai and look forward to talking again seeing bye bye

Postgres FM - LWLocks

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.