Postgres FM - HOT updates

Episode Date: November 18, 2022

Here are links to a few things we mentioned: Heap-only tuple updates (docs)README file (source code)Heap Only Tuple chapter (Internals of PostgreSQL by Hironobu SUZUKI) How partial, coverin...g, and multicolumn indexes may slow down UPDATEs (blog post by Nikolay) Why Uber switched from Postgres to MySQL (blog post)  Fighting write amplification with HOT updates (Adyen blog post) HOT Updates vs Bottom-Up Index Deletion (video by Lukas Fittl from pganalyze) Reduced Bloat with Bottom-Up Deletion (blog post by Hamid Akthar from Percona) What is fillfactor and how does it affect performance? (blog post by Kaarel Moppel from Cybertec)------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov / @michristofides / @PostgresFM, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I'm Michael, founder of PgMusted, and this is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, what are we talking about today? Hot updates. Hot updates or hot updates? Let's discuss this. Yeah. Heap only tuple tuple. We're at that point again. Heap only tuple updates. So let's start from short story. First time I was in the US, it was 2008. I came for the conference in Maryland. And I was in New York. First time I entered to Starbucks and they asked me what I want. I say a coffee like cappuccino or something. They asked me, it should be hot or cold. And I say hot. And they give me of course cold, because they hear, you know, american pronunciation is hot not hot so hot and if you say hot they can hear cold actually because wow it's all oh so okay hot updates or hot updates who serves cold cappuccinos well yeah it's you should live here for a while to understand
Starting point is 00:01:02 yeah i wish i lived in a climate where that was a reasonable thing to do but yeah so actually this is a listener question and topic suggestion so thank you to andy for this one the question was effectively when should i think about tuning fill factor what kind of for tables yes yes for tables i guess but he actually didn't specify but yeah because he asked about hot updates i'm assuming tables and what kind of workloads would it be worth customizing for? So yeah, it's really good question. And to answer it, we kind of really have to dive into what heap-only tuple updates are, how they help, the advantages, that kind of thing. Right. And this is one of the questions which are originally closer to
Starting point is 00:01:45 infrastructure level, like DBA, DBRE. But I see it clearly that developers should learn it as well and understand it better. Because this is one of administrative topics which are very beneficial to learn for developers, for people who write SQL, not just maintain Postgres notes. Because depending on which indexes you define and how you write your queries, you can benefit from HUD updates or you can avoid it, right? So, of course, I wish we had all updates HUD, but it's not possible, of course. And, of course, fill factor by default is 100 for tables we should probably should start from basics right what what fill factor is and how updates are
Starting point is 00:02:35 happening according to postgres mvcc model yeah let's start with how updates happen without this optimization so before version 8.3, before your 2008 story, actually, it was around about that same time that this got added, all updates. 8.3, you say, right? Yes. The same version which got XML data type and functions, I remember.
Starting point is 00:02:57 Oh, wow. Of course, your one. So yes, so same version. And that means there were times before this optimization. And of course, we have regular updates or cold updates. I've seen them called at least once. So by default, Postgres, due to MVCC and I guess other design decisions, will create a new tuple for each. Like if you do an update on a row, you'll get a new tuple and you'll have an old one that gets marked eventually as dead and cleaned up once no transactions can see it anymore. And cleaned by vacuum, by vacuum. It's cleaned later asynchronously, so independently by our
Starting point is 00:03:38 activities, right? Yeah, exactly. So the main optimization here, and there are some caveats, is that we can, if there's enough space on the page. Which is eight kibibytes by default, and usually it's all. Yeah, total size of the page is eight kibibytes, and total amount left free is generally zero. So Postgres will try and pack these pages as much as it can by default that's that's the fill factor we were talking about briefly so by default we have 100 or 100 as the default fill factor so it won't leave space in them yeah exactly for tables yeah because for indexes it's 90 okay yeah or b3 yeah yeah exactly so this optimization with some caveats will try and update a row on the same page. So the new tuple ends up on the same page as the old one and tidies it up without needing a vacuum, is my understanding.
Starting point is 00:04:36 So the main caveat is there needs to be enough space on the page. And the other one is that it needs to be updating columns that are not indexed in any way. Right. And here is very important. So, right. Two conditions should be met. You're absolutely right. Enough space in page.
Starting point is 00:04:53 And by the way, we are skipping discussing toast here. If we insert value which exceeds page, it's shrinked and toast tables are used and shrink to like two kibibytes, also compressed. But we skip it here. But so, okay, of course, the space should be there inside our page. A second reason, second condition says, you said indexed columns, but indexed should be defined explicitly. What does it mean, indexed should be defined but like explicitly what what what does it mean indexed of course if you create index on some column this means you this column is indexed okay but what if you use a column column inside where clause meaning you create a partial index. Yeah, partial or expression indexes count too, I believe. Right, so if a column is not used in any indexes
Starting point is 00:05:50 inside the main body of definition, or including covering indexes, but it's used only inside where clause, it's also counted as being used, and if you update such column, you cannot have hat updates, unfortunately. And the very popular example here is having a column called for example modified ad yeah and you just modify time and that's it and then you say i want to index only part of my table where
Starting point is 00:06:19 modification happened only recently and that's it you you put like create some index blah blah blah where modified ad is in this this year only for example and that's it if you try to modify modified ad column updated you don't have hot updates never so like it's not possible or i mean you you any update will modify this column because modify that usually updated by some trigger or by application itself. But any update is supposed to insert new value inside this column. Meaning you will always be touching a column which is participating in some index definition. So you will never have hot updates. I have an article about this because So about this, because I've hit it in
Starting point is 00:07:05 production, I thought I'm optimizing things, but I lost hot updates completely. And it was, it was not good in terms of performance. Actually, question why, like, okay, these two conditions are met, but why do we want it? Yeah, that's a really good point. What's the benefit? So my understanding is that without this optimization, update-heavy workloads become very difficult on Postgres. So we get what's famously called write amplification. The Uber blog post is like the most extreme, I think, example I've seen of this in the wild.
Starting point is 00:07:40 But if we have a... Actually, there's a really good Adgen blog post on this more recently as well but they're the way they explained i thought was really neat which is if we have a single row that has let's say 10 indexed columns doing a single update to that row requires us to not just write to a new page in the heap on in the table but also to update 10 index pages too so for one update we have 11 updates even if that update doesn't update an index column so even if those are effectively unnecessary updates whereas with this optimization if we are updating a single column that's not involved in any of those 10 indexes we can can get away with a single page, right?
Starting point is 00:08:25 Instead of 11. And that's a huge benefit, obviously. Right. So yeah, all Postgres indexes are direct, unlike in some other database systems where sometimes there is discussion, should we have indirect indexes as well? These discussions pop up periodically
Starting point is 00:08:43 and probably like we should. For example, some design could be only primary keys, indexes which support primary keys are direct, meaning that they point to heap and other indexes could point to primary keys. And in this case, we would not need to update all of them. But currently, in Post postgres all indexes point to heap they say this page and some offset and if an index doesn't have visibility information this is the key also to understand it so visibility means which is this tuple dead or still alive for our transaction and it means that index should you need to consult with heap to understand it and and this means that you need to consult with HIP to understand it. And this means that if some tuple is dead,
Starting point is 00:09:32 also we forgot to say in VCC, in Postgres, basics of it. Any update means some tuple. You actually described it. A tuple is deleted, a new tuple is created. A tuple is a raw version physical raw version and since index doesn't know which is live it needs to have both old one and new one because old one probably still needed for some transactions and so right if we have 10 indexes we need to create to to change all of them and later auto vacuum or regular manual vacuum will need to clean
Starting point is 00:10:07 up all of them all that entries there and also page split can happen like very like affecting performance even more if some index and b3 index already don't have space in some page although the fill factor for the indexes and between indexes is 90 so some space should be there but if not a new page needs to be created split happens like it's it's penalty for performance while we're on the advantages or the benefits of this there's a few others that i know we've covered in other episodes briefly but as well as not needing to update the indexes as well as less bloat in the indexes as that as a result there's also potential for less bloat in the table long term right there's less
Starting point is 00:10:51 right ahead log generation normally i believe it would normally be faster the correlated data stays correlated so when we were talking about brin indexes for example so there's a bunch of other oh yeah also i think reduced vacuum overhead because some of this is being tidied up right we let's don't touch additional in-page fucking which happens on the fly here but like it's too much details usually db is usually more let's let's focus on performance for developers so right hot updates usually are much much better meaning that you your actual update is much lighter so if you check explain and analyze buffers you will see that your update modified much fewer buffers right make made them dirty or written and and this is great. And of course, here in this topic, we should not use only micro analysis.
Starting point is 00:11:45 We need to check many of update cases. It makes sense to generate some benchmark and take your data, simulate your workload and see how many updates are had. It's easy. You just need to inspect pgstat user tables. It has both number of tuples updated and number of tuples updated in hot manner. And updated includes hot, of course. So it's like you cannot summarize them.
Starting point is 00:12:19 One includes the other. And the ratio is easy to get. And the closer to 100 you are, the better performance will be. Because again, buffers rule works here as well, right? The fewer buffers will be touched because fewer indexes need to be adjusted. And that's a really good point. It's possible that you're already, even without doing any tuning, you might already be achieving close to 100 hot updates so there might be right it might there might be no need to change your settings equally in that
Starting point is 00:12:51 situation you need to be very aware of that because if you if you do anything to prevent hot updates like you mentioned briefly earlier if you add an index yeah yeah yeah it was a ruby application very interesting like very good startup acquired by a large company later in san francisco and i was advising them and i said oh here we we can reduce our index size and it's scope just making it partial let's do it we will put this column to where clouds index is much smaller faster great by the way also worth pointing that if we have a partial index if its scope in terms of how many rows is out of our change so i mean change is happening outside this index doesn't need to be touched of course so yes if one of our 10 indexes is saying like it's indexed only this
Starting point is 00:13:41 part of table but we change another part of table course, this index doesn't need to be changed. But in my case, I speeded up my query, but then we deployed it. And then I suddenly see that hot updates. I was checking pgstat user tables, gathering statistics, showing that we're good, we're good, we have so many hot updates. And then suddenly, after release, it dropped to zero, basically. And I thought, what's happening? And then I realized, so I explained this case in my
Starting point is 00:14:13 article. Actually, this is a quite bright case which says that our changes should be tested at various angles, not only for our workload, but questions should be will other parts of our workload be affected and I didn't see so far comprehensive testing like the testing whole workload every time it's quite quite big and expensive task but I like still there are some
Starting point is 00:14:40 ideas and some in some cases we achieved I mean I've built some systems which are close or it's easier, like you would press a button and have this kind of testing. But it's another topic, of course. So in terms of what developers can do, I guess being aware of what their current state is, is a really good starting point. So checking that view for highly updated tables and just getting a sense of what's the current proportion of updates that are hot is a great starting point. If it's zero, you might have room for a lot of improvement. If it's 100, you might need to be very careful. But anything in between, there might be room, I guess, for a bit of optimization. Right, right.
Starting point is 00:15:20 Having good monitoring, which observes all tables and PGStat user tables and observes it, it helps. But also, of course, having a regular update, it doesn't hurt originally. It's okay to have it, right? But if you expect that this table, write performance matters a lot. And also, data volumes are huge and data grows. So vacuum, pressure on vacuum will be high. A lot of work to delete these dead entries both in heap and all indexes, right? Of course, it makes sense to try to achieve a higher ratio for hot updates. Hot updates. Yeah. Yeah, I guess the bigger your workload, the more this percentage improvement
Starting point is 00:16:07 will help you. Question to you, actually, as an expert in explaining plans and so on. Do you think having auto-explain running with buffers options enabled for slow queries can help to identify issues? For example, we have update which was slow. We see it in auto-explain log and we see a lot of buffers involved. We know that we update just one row, but why so many buffers? Meaning that maybe many indexes were involved.
Starting point is 00:16:37 Of course, maybe this update includes select subquery. So that's why we try to find which row we want to update. This is another reason for it. But this would mean that buffers are in red. I mean, hit and red, right? But if we see a lot of buffers diverted and written, it also might be related to updating hit bits, a different story.
Starting point is 00:17:02 But do you think it's a sign? We see a big number, so probably we should try to understand what's happening with hot update ratio. Yeah, exactly. I think that's the kind of thing where you can spot that there is an issue, but you can't necessarily tell what the issue is. And this would be a very big sign of it. Like even in select queries, we now try and point out when we think there's a disproportionate number of blocks even shared blocks hit or read versus the amount of data being returned because that's normally a sign of bloat but it can be a sign of other issues as well
Starting point is 00:17:36 so it's it's one of those ones where a high number of blocks might mean you've got a low a hot update proportion but it could be a sign of other things. So yeah, definitely. Yeah, definitely feels like the kind of thing where you could point out there's a problem and suggest some things to look into. All right. All right. So checking PgStart user tables. This is number one thing here. Okay, good.
Starting point is 00:18:03 And if we, for example, achieve high number, we should keep watching, right? That's it. Yeah. Monitoring again, isn't it? In terms of what people can expect in terms of speed ups or in terms of reduction in some of these numbers, I think the IGN blog post I read was the best
Starting point is 00:18:15 real case I saw. They had a serious amount of right-ahead log generation, but they reduced it by about 20% or so. That feels like an interesting number to me. So I don't think people should expect 10x improvements in some of these things. This is a very good point.
Starting point is 00:18:34 So it's not only about timing of your update, but also you generate a lot of wall data and means that you put additional pressure to your backup system and also to replication system. So more bytes should be transferred to standby nodes, both physical and logical. So it's like a lot of work. And having Postgres 13 or later helps here because in PGSA statements and in explain analyze, you will see wall metrics, how many bytes and how many full page inserts happened and so on this is great great point also actually you mentioned this earlier having postgres 13 or 14 or maybe 15 helps as well because b3 will be more compact right in this case how
Starting point is 00:19:21 how is it related to hot updates we We have, when updates became not HUD, we have a smaller penalty in terms of maintaining index state, right? There's a good video by Lucas Vittel on this that came out around about that time. Ah, it's about, let's delete from, it was originally Percona blog post, I guess. Oh, okay, great. Right, let's delete from the end or something like this, right?
Starting point is 00:19:47 It was something about, I think the video had been titled Bottom-Up Index DA, exactly. Right, because if you start from the end, we will have empty pages or something. I may be wrong, let's not guess. Let's just provide some link and read about it later again. Right, yeah, actually, you mentioned, Lucas, and read about it later again. Right. Yeah, actually, you mentioned Lucas Fittel YouTube channel,
Starting point is 00:20:11 Paginalize YouTube channel. I think it's underestimated because I would warn our audience because it's too deep. It's very deep. Yeah, you should go there if you're already not a new user of Postgres. But once you already know basics, it's a super cool channel because it's very deep, it's very interesting. I learn a lot every time I open it. So it's good.
Starting point is 00:20:34 Yeah, and short videos as well, like five minutes. Right. Reverting us to the very beginning, I should point to the fact that if you want to learn about hot updates, previously I always said you cannot find it in documentation because documentation didn't have anything about it. Now it has since Postgres 15 or 14. 15, right? But it has only small page.
Starting point is 00:20:58 It will not be enough for you. I mean, even if you're just starting, if you want to understand how updates are working, I encourage everyone to open source code again and find README related to HUD updates. Much more interesting information there. But of course, maybe too deep as well. Yeah, very in-depth, I was going to say. But it's quite beginner-friendly.
Starting point is 00:21:20 I looked at it earlier today, and there's a glossary even of kind of definitions yeah it's great so yes i'll include links to both of those there's also one final kind of place that people could look into if they want if you want uh diagrams or you're quite a visual learner there's a good part of the guide by we've mentioned it before by hironobu suzuki internals exactly the internals book it has the online version i've i can link to with diagrams of how this works yeah and igor rogov's book also has this topic covered also internals worth mentioning well okay good and finally like related to hot updates i have a feeling that a lot of things can be improved here, definitely. For example, there was a discussion of the idea of partial HUD updates. So in the future we will see some improvements in this area. I have a strong feeling that it can be much better in the future.
Starting point is 00:22:32 But in general, definitely, if you are thinking about this area, try to reach at least 13 or maybe 14 in terms of Postgres major version. You will feel much better. Because in terms of visibility, in terms of how B3 version, you will feel much better. Because in terms of visibility, in terms of how B3 indexes behave and so on. Okay, anything else related to hot updates? Yeah, I think it's quite important to recap. So the two big things needed to achieve a hot update
Starting point is 00:22:58 are one, enough space on the page, and two, no index columns being updated. And that kind of leads developers to two things you can do about this one is make sure there's more space available on each page so that's where fill factor tuning can be looked into and then second one is make sure no so we've covered the kind of make sure your indexes you don't have indexes on columns you want to update if you don't need them. So any if you've got any unused indexes, it's a real benefit for cleaning those up. Or if there's a trade off there to consider, if you've got a very heavy workload, maybe you're OK with some reads being a bit slower as a result.
Starting point is 00:23:37 So like there's that that natural trade off there. But the one we haven't talked about much is that fill factor tuning. And it's something I didn't know until today when I was reading up on it is that if you reduce the fill factor so if you have a partition table it's quite nice you can experiment on fill factor of new partitions not in production maybe but the other point that I learned today though was that you can change the fill factor on an existing table without rewriting at all it doesn't change your past data but you can benefit going forwards at least, which is something that's quite interesting as something you might actually be able to do.
Starting point is 00:24:10 You don't have to do a full table rewrite or similar to start getting the benefits going forwards. Right. So again, fill factor is the threshold when defining when new page should be allocated and used, when new tuples are coming during inserts or updates, right? And for table, it's 100 by default. For indexes, it's 90 by default. Meaning for indexes, we are better prepared for new pages use.
Starting point is 00:24:40 I mean, to avoid them actually, right? To make them less likely to happen but the question is should we decrease fill factor for tables from 100 to 90 or 80 70 60 50 maybe i saw 50 some people use 50 oh wow yeah answer is test always right there is no magic answer and it will depend on the data and especially on workload and also on your activity of auto vacuum if you have not tuned auto vacuum you will not benefit a lot so auto vacuum will need to clean up very quickly that's such a great point and things that prevent cleanup are also problematic here so it for example if you have long-running transactions that prevent things being marked as dead so there is there's a whole
Starting point is 00:25:31 interplay in terms of system level things to consider um so you're testing and testing i think on something as realistic as possible is quite important here so your pages need to be filled with data that's roughly realistic in order to determine whether there'll be space available. Or if you have a long transaction on a physical standby with hot standby feedback on, or if you have a logical standby, which is lagging a lot, your auto vacuum will not do your proper work. And this is an issue. And in this case, fill factor won't save you actually if you factor even make will make things worse because more pages fill factor if you reduce it you will need more space to store the same data right if you've said initially yes
Starting point is 00:26:20 to twice more well not well yeah yeah initially. Right, right, right. So experiments, like I saw a blog post from CyberTech, which for some synthetic benchmark workload showed that if you decrease fill factor from 100 to 90, roughly you have improved performance improvement for updates also like roughly 10%, roughly. But further, it's not so. Next 10% don't give you 10% of performance improvement,
Starting point is 00:26:53 only like a couple of percent. So maybe like the conclusion is maybe you should have 90 for tables as well, right? Oh, you mean by default? Well, yeah. By default in your application, but we cannot say it's the rule for everyone. It should be tested. If you have an append-only workload
Starting point is 00:27:14 or if you have append-only tables, there's no point reducing it as far as I can tell. So we're only worried about update-heavy tables. If append-only means no inserts no no updates no deletes then yes yeah exactly yes like log like table exactly which definitely should be partitioned and only one or two indexes maximum should be used on it okay because inserts also modify indexes there are no hot hot inserts unfortunately, right? Insert always modify all indexes, unfortunately. So if you want very good performance for insert,
Starting point is 00:27:50 you need to have fewer indexes, as few as possible. Interesting, yeah. Delete is the same. Delete also modify all indexes. There is no hot delete. I'm just raising silly questions. Do we have hot delete? Answer is no, right i'm just like raising silly questions do we have hard delete answer is no right so delete will modify both inserts and deletes deletes they will write to wall a lot because they need to implicitly they will modify all indexes you have on table so append only case
Starting point is 00:28:20 usually like i see people prefer not having indexes at all but maybe it's not a good idea as well because like who will take care of duplicates for example yeah maybe it's not it doesn't matter a lot but still sometimes it feels important anyway it's i think we've covered that quite well i think in turn i'm looking back over the question we've covered fill factor hot updates kind of workloads performance when when it's worth customizing and the age-old advice to test control the tables yeah exactly how to monitor how to look don't forget about buffers right yeah for a second youtube viewers should see buffers buffers buffers yeah nicolai has a t-shirt on for anybody listening with explain, analyze, and then buffers in bold, which is awesome. Well, thanks, everybody.
Starting point is 00:29:09 Keep the suggestions coming. I don't think we'd have done this topic so early if we hadn't had it requested. So that's great. Thank you, Nikolai. Definitely, I would be happy to hear about some real-life stories, production stories related to fill factor and HUD updates. It's always interesting because we have some materials, but most of them are talking about some synthetic like experimental environment, not about real production. And the production stories are always interesting to hear because they tend to be reproduced in other places.
Starting point is 00:29:41 In fact, Adyen shared that they changed their field factor to 85%, which is super interesting. Again, it'd be great to hear what numbers other people have settled on based on their own testing. Yep. Good. Wonderful. Well, thanks again, everybody, and see you next week. It was episode number 20, right?
Starting point is 00:30:00 Yeah. Yeah. Almost half a year without interruptions, without breaks. Good. Thank you, half a year without interruptions, without breaks. Good. Thank you, everyone, for being with us. Absolutely. Till next week.
Starting point is 00:30:11 Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.