Postgres FM - Read-only considerations

Episode Date: April 14, 2023

Nikolay and Michael discuss a listener request — special considerations for databases that are used in a read-only mode all day, and get an update at night with additional data.Here are lin...ks to a few things we mentioned: Index-only scansVacuumUK Covid-19 dashboardpg_repackPartitioningOur episode on BRIN indexesAlways load sorted data (blog post by Haki Benita)GIN indexes: the good and the bad (blog post by Lukas Fittl)Our episode on materialised viewspg_buffercacheTowards Millions TPS (blog post by Alexander Korotkov)Postgres WASM (by Snaplet and Supabase)YugabyteAWS Aurora Continuous Archiving and Point-in-Time Recovery (docs)Our episode on checkpoint tuningOur episode on partitioningPgQNeon branchingDatabase Lab EngineCluster~~~What did you like or not like? What should we discuss next time? Let us know on social media, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of PgMustard. This is my co-host Nikolai, founder of Postgres AI. Hey Nikolai, what are we talking about? Hi Michael, your choice, I cannot even name it, so please help me. Yeah, let me read out the request. So this one's based on a listener suggestion, so thank you so much for that one they were asking about special considerations for databases that are used in a read-only mode all day and then get an update at night with additional data so they are asking us to focus on well so they suggested the focus therefore should be on
Starting point is 00:00:38 performance during the read-only times so yeah i think this is fascinating i've only seen it a couple of times myself. Coronavirus data, right? In Great Britain, you mentioned this example. So we like reload data nightly, for example, and then people just use the frozen state of data. I remember reading about that one. I'm not actually sure that strictly counts as one of these cases. I'm not sure, for example, if they take any downtime or any time between updates. I'm not even sure if this person is assuming we have downtime or no. In the very beginning, let's agree, we don't consider any downtime as reasonable for us.
Starting point is 00:01:17 Interesting. I like it. I just quit if someone suggests downtime. I just quit. I don't work with them anymore at all. It makes it more interesting. It makes it more interesting. It makes this more interesting too. Right. Let's consider we don't allow downtime, but we consider all data changes happen during specific time period of day or week. Yep.
Starting point is 00:01:36 Periodically, and then we have frozen state of data and want to want our read-only queries to be as fast as possible, like the fastest possible. This is our goal. Sounds perfect. So yeah, kind of read-only. Like that's how I'm thinking about the, I know it's not read-only, but that's how I'm thinking about it.
Starting point is 00:01:53 I think we can split to two parts of this problem. First is how to make read-only queries as fast as possible. And second, how to refresh or update data or change data within a specified period of time. So then we have again, the fastest possible read-only queries, right? Perfect. I have my notes organized exactly the same way. Okay, good, good. So which would you like to start with? I'm thinking the reads. First one.
Starting point is 00:02:21 Yeah, perfect. I don't know about about you but I thought about this from like a perspective of why do writes and updates and insert what it inserts updates and deletes why do they cause problems and therefore what problems don't we have and therefore what can we do differently so do you want to kind of go trying to think of what the highest impact things are first or just start listing things that we can do. Let's do it in a random order and then try to summarize and find the most impactful ideas. Sounds perfect. So some of mine, being performance focused myself, naturally the first thing I went to was thinking about indexes. So we have in one of the biggest downsides
Starting point is 00:03:06 of creating lots of indexes in general is right overhead. So one of the things I was thinking was maybe the dial shifts a little bit towards we can afford a few more indexes than we would in a heavily updated system. So that felt like maybe a slight shift when people were thinking like, what are my natural instincts? And maybe you need to go against those a little bit or change the balance
Starting point is 00:03:32 there a little bit, but not just. So you basically propose to index all the columns. Well, it crossed my mind that there's a few things like deep in the weeds of indexes that might also be interesting. So firstly, maybe more indexes make sense. Secondly, maybe more multi-column index makes sense. So like index only scans, we don't end up getting, I think you mentioned this previously, but the heap fetches number, right?
Starting point is 00:04:02 Right. But these are two different things. First, let me comment on this here. So first of all, in my practice, I rarely see, I see it, but not frequently at all, that people try to minimize the number of indexes. Most often, I just come and see a lot of unused, redundant indexes. We had an episode about it.
Starting point is 00:04:23 This is what people usually do. Like they do more indexes than needed already but i agree with you definitely we could afford more indexes if we know we are not going to run updates rights during daytime but i don't think in practice it will prevent someone from creating another index in different situations. Sometimes we have a point-only table and people try to avoid indexes at all or just have a single index. It happens. But in many cases, people just create a lot of indexes anyway. So this I agree, but it's hard to measure. And I think the second item you mentioned, it's probably the most interesting one. To reach index-only scan run faster due to heap fetches zero. If you see the plan, heap fetches zero means that you're good.
Starting point is 00:05:12 But before explaining this, probably we should remind those who are not Postgres experts how MVCC works in a few words, right? So every update, every delete just marks your tuple as dead, but then some transactions probably might still need it. And then Autovacuum cleans those tuples and marks also index entries as dead as well and also cleans them. And if we have a lot of deletes and updates, not inserts, deletes and updates, we probably start to have bloat issues in both table and indexes, right? And before even considering index-only scans, probably we should think about more fundamental problem like bloat and data pools, right?
Starting point is 00:06:00 So what do you think, like, if we say before our read-only period starts, we would like to vacuum everything and probably get rid of blood, so repack everything. If we have a lot of time, this would be ideal. We get rid of any blood, and we know that all that apples already deleted. For auto-vacuum, there is no work in terms of regular vacuuming right yes i think that's a really big one actually so because we have to keep it online i think repacking makes sense pre-vacuum right there's no point doing a vacuum and then repacking everything or is it let's
Starting point is 00:06:37 distinguish repacking indexes and repacking tables indexes it's like reindex, and the state will be fresh. Repacking tables, it's like vacuum-full recreation, almost. So it's, again, basically, repacking means you just perform something like dump-restore or vacuum-full whole database if you repacked everything. And it's good. We have fresh state. We don't have dead tuples, and nothing to do for auto-vacuum or vacuum. Except just one thing, the state of usability maps. Well, will they change during... I was thinking of suggesting potentially I've missed something, but I could imagine this being one of the few exceptions to the rule of not disabling auto vacuum i wondered if running our vacuum at the end of our just before the read period starts to build visibility
Starting point is 00:07:33 maps exactly to update yes exactly as a as a final step i was actually thinking vacuum analyzed because i've we might have added a lot of data as well yeah exactly. So do all of our, in fact, we've jumped to the second topic of, I guess, the bulk loading, but it's important for that, for the index-only scan optimization. So the final step I was thinking would be a manual, also not manual, but like automated
Starting point is 00:07:57 vacuum analyze, not via auto-vacuum. Explicit vacuum, right? Explicit, that's a better word for it, yeah. Which is also good in newer postgres versions because it can process indexes in parallel unlike auto vacuum auto vacuum still cannot do it it's always processing table and all its indexes using just one worker if if you use explicit vacuum we can do it slightly faster yeah and the thing that ties this back to the query performance is that that final vacuum marks pages visible. All visible. Exactly. Updates the visibility map so that
Starting point is 00:08:32 when Postgres is planning queries and when it's doing index only scans, because we're not getting any new data, because we're not getting any updates, any deletes during that period, we can't have any of those marked updated right so the first thing i would like to have is zero bloat almost zero bloat maybe and zero dead tuples there is also a thing like we can do it actually if we afford downtime we could do it vacuum full but we agreed that we exclude this option so we should run pgrepack and for indexes it's well all situations are different right so we might see that we don't have enough time to do everything right in this case probably i would just take care of indexes first of all accept some bloat and dead tuples
Starting point is 00:09:17 well dead tuples maybe not but bloat maybe yes and also if we have partitioning we could just play with partitions or like some rotation or something, insert new data, new partition, and just get rid of old partition, just drop it and that's it. Well, it depends. But in general, I would like to see zero bloat, zero data tuples
Starting point is 00:09:35 and visibility maps updated. And also in some cases, I don't know the particular situation, maybe some writes are still happening, and Autovacuum does many things. It has multiple goals. Get rid of that tuples, it's one goal. To keep visibility map updated, it's another goal.
Starting point is 00:09:56 And visibility map has two bits for each page, not only all visible, but also all frozen. And who knows, maybe some writes will trigger to vacuum to be running in transaction ID wraparound. So it would be probably good to do vacuum analysis freeze as well to mark all frozen bits for each page in visibility maps. And finally, recollect stats, but it's analyzed part for our explicit vacuum. So we have a fresh statistics in pgStatistic. So all these things are good.
Starting point is 00:10:26 Maybe one more like as dessert. Let's add one more thing on top of it, like cherry on our cake. pgRepack supports, we can reorder the heap, the table, define specific physical order of tuples. We can choose a specific order, but just one, of course, just one. Some queries will be super fast because they will deal with fewer buffers, because they will be getting data in proper order. If your buffers touched, they would be super fast.
Starting point is 00:10:57 And we can afford it if we repack. Yeah, absolutely. If you're familiar with other databases, this is what people often call index organized tables. And it makes a lot of sense. There's a chance, like, for example, the examples I've seen, these are often quite time series heavy use cases. There's a chance your data will already be ordered. to insert it in order, then inserting your new data overnight or whenever you do it in an ordered fashion is also a way of potentially getting that clustering without having to do a very heavy, long process. So yeah, inserting it, making sure it ends up in order is the, is,
Starting point is 00:11:38 feels like a good one. Right, and since we don't have deletes, updates, or inserts, we know that it won't change during the day. So the state is frozen. It's a perfect situation. Yeah. This reminds us of bring indexes. We also had an episode, right? Okay.
Starting point is 00:11:58 Well, I wanted to bring up bring indexes because I think this might be another benefit that we have here. I think it might be that we... So I know we've talked about bring indexes and some of the benefits that have come in. Was it Postgres 15 or 14? 14, maybe. Yeah. But one of the downsides of BRIN indexes is that they degrade as data gets updated. In our case, they don't, but also they are slower than B3.
Starting point is 00:12:18 So probably we don't need them because we don't want to improve update performance. So I would still choose B3 again well it depends right like if they also are much much much much smaller so they don't spam our caches right exactly if we have a huge data set and ram ram's expensive right there might be some yeah several trade-offs here hard to say right but i all i meant is that in general day-to-day thinking maybe you don't think of brin indexes that much but if you're in this case maybe you can think of it a little bit more as something that's an option to you the other index type that i wanted to bring up along those lines is another one that's expensive to up well so it's one that's expensive to update which is gin indexes so because we don't have these updates i didn't i wasn't considering that it would have to be online the whole time so eventually we are going to have to update it or
Starting point is 00:13:14 drop and recreate it pending pending list and so on like we definitely can tune our gin indexes to benefit from a redundant situation it's a good point yeah so you make a good point that we could tune well the overhead would be overnight right it would be while we're doing our updates not during the day so read performance is still good right so yeah definitely we can benefit from redundant state if i remember so there's pending lists and i quickly checked right now fast update option. The thing is that we want everything to be already in place. So we don't want to have additional lookups internally for selects. So we can say the speed of selects is most important for us.
Starting point is 00:13:59 Updates can be slow. It's fine. So we make proper choices when we create a GenIndex or rebuild it, and that's it. Yeah. So fast update probably off. Yeah. My understanding is that that won't make all updates slow, but it will make some updates extremely slow.
Starting point is 00:14:17 Right, right. When we reach, pending list is reached by default. If I remember, it's four megabytes. It's not good to remember. I had, it's four megabytes. It's not good. I can't remember. I had issues with it in some cases. It's not fun. We have some blog posts. In fact, actually, there's one I was going to mention earlier by Haki Benita that talks about loading data in a sorted fashion.
Starting point is 00:14:42 And there's a good one by, I think, Lucas Vittel talking about the JIN update. So I'll link up both of those in the show notes. It's too fine tuning to me. I still like this, what we discussed, order of tuples and fully vacuum state, up-to-date visibility maps, zero risks that auto-vacuum start working and sometimes we don't want it to be working during our daytime at all.
Starting point is 00:15:00 It's perfect. We discussed like visibility maps, all pages are marked all visible. And it means that our index-only scans will be as fast as possible because they will have heap fetches zero in plans. When we look at plans, we see heap fetches non-zero. It means that index lookup was already not index-only. Heap was inspected to ensure about visibility.
Starting point is 00:15:24 But if our visibility map is up-to to date, no writes happen to table, it means that heap fetches is zero. It means the index only scans will be super fast. And in this case, I would check my queries and rewrite them to use index only scans everywhere where I can. Sometimes you need to use index-only scan quickly and then you already fetch separately, using
Starting point is 00:15:51 width as materialized, so materialized CTE with plan fencing. Or somehow else, you just want to find rows using index-only scans and then you can already read all columns you need from it and so on. So I would check all the plans and ensure
Starting point is 00:16:08 that I use index only scans as much as possible and heap fetches are indeed zero. This is super important and spoiler I will consider this as probably number one thing I would like to have in overall our discussion. Heap fetches
Starting point is 00:16:24 zero and index-only scans. Yeah, awesome. There's a couple of other things in terms of queries that you can do to favor index-only scans. Just checking that you do need all the columns that are being requested. Yeah. If you, like, there's something, especially if you're using an ORM or something, that can really lead to all columns being requested when actually only a few are needed for the application. This is what I was meaning. Or if you do need these columns, okay, include them to have covering index
Starting point is 00:16:54 and to have it in index. Yeah, or even just a straight-up multi-column index. So I would fight hard to have single index-only scan in most critical queries. In this case, I know this is the best performance Postgres can give me. So here's a related topic, and I've got two more for you. This one is materialized views or pre-aggregation in general. Ah, yeah. Okay. Well, they don't have to be materialized views or pre-aggregation in general. Ah, yeah. Okay. Well, they don't have to be materialized views, right?
Starting point is 00:17:28 It could just be that we load the data in this, like we aggregate before even loading. Which has to be in a broader meaning. Yeah, exactly. The point is it's not going to change. Exactly. So it helps with index-only scans, right? Because if you have data in two tables and you get a materialized view,
Starting point is 00:17:47 like we talked about this before, right? You can now index on columns across multiple tables. Yeah, I know you wanted to go to like DBA low-level stuff, like backups and so on. Let's do it slightly later. This is a great topic. Like, okay, we can, as one of my old customers told me some day, I love materialized use, but they feel like a huge hammer with jewelry. So sometimes it doesn't work too fine. So imagine if you created a lot of materialized use, prepared them,
Starting point is 00:18:22 created indexes on them, have index on the scans. But then my question is, what's the state of your buffer pool and the file cache? Is it good? Because you started to keep the same data multiple, multiple times, like many times, the same records, basically. You have denormalized the situation. And probably you need more memory to keep
Starting point is 00:18:47 because you have more pages and this kind of blowed as well because like you could do much better if you avoid materialized use for example maybe it depends i don't know the there is no single answer here i think it depends a little on, on the number of similar queries that are hitting you. Like if you have like a dashboard that is having like the same things being asked of it over and over again, that's one thing. If you allow people to set all different filters and they could be slicing the data in any way, they like completely custom, maybe there's like a different trade off.
Starting point is 00:19:26 Oh yeah. way they like completely custom maybe there's like a different trade-off indexes oh yeah but what i'm what i mean is if you don't allow that much customization maybe you materialize everything and i don't and only load that data so in some cases probably i would choose materialized views but i would think twice maybe even more like three four times do i need them because what i would do i would try to understand what's my working set during day yeah how much bytes of memory gigabytes terabytes i don't know how much memory i really need and probably i would try to use in reality apg buffer cache extension to inspect the current state of our shared buffers and understand what indexes are currently loaded, how much of them
Starting point is 00:20:10 and same for tables and from there also checking buffer pool efficiency from pgstart database and monitoring everyone should have it I would ensure that efficiency is more than 99% so at least two nines we should have there.
Starting point is 00:20:26 And in this case, it's good. Like, okay, we have space. We have buffer pool big enough. We have room for materialized queue, right? Well, yeah. And the main reason I wanted to bring it up was not that I thought it was necessarily a great idea. It's more that we don't have one of the big downsides
Starting point is 00:20:42 in this use case. We don't have the fact that the data goes stale. So it just opens up that possibility that they might be a better idea than they would be generally. But this is super important to understand the content of the buffer pool and probably the page cache and understand how many pages are going to be evicted because they don't have enough space in the buffer pool.
Starting point is 00:21:04 Because when it needs to be evicted, we have contention issues. For example, in Postgres 95, 96, it was a lot of work done specifically for select-only workload, read-only workload. And I remember an excellent small post from Alexander Korotkov when he was working with, by the way, MySQL people from Percon and so on, like towards 1 million TPS on one machine. It was long ago. We've shared it before.
Starting point is 00:21:31 I'll link it up again. It's great. And because it shows exactly how we can also run PgBench easily with select only prepared transactions and so on, prepared statements. This shows that contention possible not only when we change data. It's possible in read-only state as well. And I would try to minimize it and to keep our working set as stable as possible
Starting point is 00:21:54 in our buffer pool. Ideal situation, we have a lot of memory. Our database is very small. It's enough. Well, there will be a different question about cost optimizations from our non-technical people, probably financial people and so on. But in this case, we are good. We can afford a couple of more materialized views and so on, right? But in reality, we
Starting point is 00:22:15 usually don't have enough memory. Database is quite big and we try to optimize it. In this case, I would try to keep in the buffer pool as much as possible of our working set. And materialized use are our enemies in this case, right? They increase demand for memory. Yeah. I had one more topic that I wanted to run past you on the query performance side. And that was, again, thinking about some of the downsides of lots of like a higher churn of our data. And that's replication can we afford
Starting point is 00:22:46 more replicas or like more geo-distributed replicas to get data closer to end users in this workload where we don't have as many like we're during the day there's just we don't have to worry about lag at all you know what i like about sharding? It's because if we have shards and each shard has one or two, better two standby nodes, it's so much better compared to a situation when you have 10 standby nodes. And we spend not only disk space, but also a lot of RAM to store the same data many, many times, just to scale reads. It's so inefficient when you need many, many standby nodes. It's like you spend a lot
Starting point is 00:23:26 of money just to store. It's again like too redundant storage in terms of not storage, like I mean temporary storage of memory. So I don't like to have a lot of standby nodes, but sometimes we do need it. And again, in the context of materialized use, since we're supposed to use physical replication, we will need to store them on all nodes and occupy them to pay for memory many times. You see, I don't like materialized use actually. I've sensed a pattern. But what about the idea of reducing latency globally? Like some of these use cases were in the same country, like the UK COVID dashboard,
Starting point is 00:24:05 probably most of their traffic was coming from the UK. I understand what probably you're trying to say. If we have read-only state during daytime, our replication is small, like, and we can bring servers closer to user, read-only state, perfect situation. We, even if latency, if latency distance is big, we don't list it perfect situation we even if latency if latency distance is big we don't care because it's already data is already there that's a good point like edge computing almost right yeah well exactly i know we're again paying for ram in multiple places cpu in multiple places but it feels like maybe that's a trade-off you'd be more willing to take because you don't have that what the normal issues there yeah so we should use this remember this project which runs postgres
Starting point is 00:24:50 right in browser all right yeah using web assembly and virtual machine inside it and so like it's or even something like yugabyte right yeah so it's it's valid point and definitely worth considering bringing a regional case. And for example, I remember AWS Aurora with its global database idea, like read-only clusters and so on in this context, and it makes more sense. But what did you want to tell me about backups? Well, do we need them? We can have a daily one, for example, as part of that nightly. Do we need anything more than that so i would keep them but there is a setting i don't remember from top of
Starting point is 00:25:31 my head but this setting is responsible for it's like archive timeout maybe so the setting says even if wall 16 max by default even if it's not filled yet on this timeout archive it in this case of course if you know there are no many writes and we don't need a huge recovery time probably we should make it less frequent one hour for example right so so we we archive walls less frequently but worth remembering some selects can lead to writes and they can write to some wall and so on. But if we vacuumed everything, and we propagated everything to replicas, while log hints, if our hint bits on standby nodes also have everything from the primary,
Starting point is 00:26:19 in this case, we should be good. And indeed, we can archive less frequently. But I think it's a small question. It's just a matter of space occupied in backups. I don't think we will notice the overhead, performance overhead happening from archive command working. It's not like checkpoint or autovacuum. Well, should we move on to what about checkpoints? Then what would you do with those? Well, checkpoints, I think if writes are not happening at all we don't care we can keep default settings every five minutes or i don't care so if everything like if all buffers are clean no dirty buffers at all almost no like it's it won't
Starting point is 00:26:59 be noticeable it becomes noticeable only when rights are happening. And if you remember checkpoint tuning, we discussed a lot how to tune for heavy writes. I was thinking we could therefore make it a lot less frequent, but it doesn't matter. It doesn't matter, exactly. If we checkpoint with very few dirty buffers, it will be very fast. And that's it. So checkpoint is... And auto vacuum we already discussed. If we do vacuuming, even if we do freezing ourselves, in this case, vacuuming won't trigger. That's it. Right. What about the overnight process? Should we get to that?
Starting point is 00:27:35 Yeah, this is where things start to become less good looking, because we probably don't have a lot of time to vacuum, freeze, analyze all our database. We're both probably include to see the details about the process. In this case, we need to make decisions, having trade-offs and weighing both or maybe sometimes multiple choices and choosing the best one. For example, we might say, okay, we don't have time to rebuild and to repack our table using some clustering, or we do it infrequently, or probably we even don't touch some indexes. It depends. So it depends on how strict, how narrow our window to apply changes is but i would definitely consider partitioning here because in this case we have more control and data locality and so on and more control but also in terms of how much work will be needed for a vacuum because if you have huge table you just you know like okay we update
Starting point is 00:28:36 10 of our huge table which is 10 terabytes every night for example okay you will need okay 10% means one terabyte probably it's too much for a few hours but anyway we don't know in to which pages our updates will go right it's distributed who knows how and in this case we don't have control but if it's a partition table we discussed it very recently we insert new data to new partition or deal with fresh partitions we have some very already like in frozen state old partitions we touch them very rarely in this case vacuum works much faster rebuilding we don't need to rebuild everything we need to rebuild particular indexes and tables also partitioning is our friend here unlike materialized use i'm joking unlike materialized use. I'm joking. Maybe materialized use are not that bad, but I just they are outdated,
Starting point is 00:29:28 the approach Postgres currently has. We live in a time when we need much more powerful materialized use already, and there are projects that exist trying to solve it, but I wish we already had it everywhere inside
Starting point is 00:29:44 Postgres. In terms of that, you mentioned having a deadline or having a certain compressed period of time where we needed to do... Maybe we don't have that now. Like a couple of hours, for example, our window. So it's an interesting point because some of the things I was thinking,
Starting point is 00:29:59 I'm not sure they apply anymore because we've said we need to think about zero downtime. Except could we do blue-green deployments so we could have one cluster live and then switch to... Maybe branches. Yeah, or whatever you want to call it, where you flip
Starting point is 00:30:18 the connection so that we could have been doing all this data in a blocking manner. Well, partitioning can provide this already. For some cases partition Well, partitioning can provide this already. For some cases, partitioning, you just create new partition, you can switch to reading from it when you already filled it. Why not?
Starting point is 00:30:33 No, like it's almost blue-green or something. Yeah, interesting. I hadn't thought of it like that. It's like in PGQ, three partitions. One is being used right now. Another, we are working on it. And third one, we use it recently another we are working on it and third one we use it recently and we will be processing it so so the rotation of partitions this is like for from
Starting point is 00:30:52 queuing in postgres approach used by skype in pgq and we could do similar thing here and it's similar to like blue green or branching console but if you have branching, for example, neon, you can use them if you install database lab engine on production. If you're okay with using ZFS there, or you can implement it with hardware as well. In this case, you can have branches, but it's like we can just have some rotation here, right? Why not? But what I wanted to say also, here I would tune checkpoints.
Starting point is 00:31:27 So one of the trade-offs we want to make here, we probably want to generate less wallpages. So having less frequent checkpoints would be good. Distance increases, fewer full-page writes, inserts are happening. But the price we pay here is longer restore time after crash right so i would increase max wall size a lot checkpoint time out a lot like 30 minutes max wall size like 100 gigabytes checking 3d space definitely maybe even more and we know if we crash we will be in recovery like 10, 15 minutes. It's okay. Well, and it doesn't matter, right?
Starting point is 00:32:09 Because we're already… It depends on the case. Yeah, true. I don't know. Maybe it matters. If people continue using it and we are down for 15 minutes, maybe it's not acceptable. But if we can afford this risk, if we say, okay, our restoration time, 50 minutes is fine. In this case, we say like checkpoint timeout, 30 minutes, max wall size, 100 gigs.
Starting point is 00:32:30 Again, I provide some arbitrary numbers. They should be tested for a particular situation. But then we produce fewer wall records. Our intensive writes performance is better. Then other choices. Maybe I should drop indexes before when i do yeah yeah so this is what i was thinking about the blue green like we can't put we probably can't drop indexes before we insert if we're keeping you partition why not we can create them after we inserted yeah that. That's the blue-green idea. Doesn't the partitioning have the same issue for you as the materialized views, though,
Starting point is 00:33:11 like in terms of buffer cache? This is new data, right? Sorry, so yes. So you're saying put just the new data into the partition. Materialized view, it's not an original source of data. It relies on tables and just copies, transforming somehow, and copies the same data. It's derivative, right? I completely misunderstood how you meant you were using partitioning.
Starting point is 00:33:33 You meant, like, basically new partition per day. Makes sense. For example, yeah. Yeah. Or we can have, like, seven partitions, and when Monday starts, we reuse old Monday's partition, for example, and have rotation. Depends. I don't know the particular case, and we can design something or just use timescale, and that's it. But the idea is that partitioning is super beneficial because we, again,
Starting point is 00:33:57 like vacuum, we have data locality. Data is already stored in a more compact way, less sparsely. Yeah. So probably we even don't need to apply clustering. It's called cluster, but it's reordering. And pgRepack has this option to cluster data. So maybe we even don't need it because we know all fresh data is one partition already, a daily partition.
Starting point is 00:34:23 Vacuum is faster. Index is under control. We don't need to rebuild whole index. We build just index on our partition. All others already are good. Visibility maps are good. Index behavior is good for all partitions. If tomorrow we'll be using yesterday partition, we still keep it, but it's
Starting point is 00:34:40 kind of frozen state. So it's quite good. We definitely have quite a lot of stuff to use for optimized case here. So with Maxwell size and checkpoint timeout can be changed without restart.
Starting point is 00:34:55 When we prepare for bulk updates, do bulk updates without indexes or with as few indexes as possible and recreate them afterwards adjust to a checkpoint timeout at max wall size to move faster to produce less wall and to put less pressure on backup and replication systems and on disk of course what else maybe drop constraints beforehand before doing the inserts maybe but depends again Like if we do see or triggers, if we do see some overhead from
Starting point is 00:35:27 them, yeah, worth considering, but creation them also takes time, right? Yep. It's worth testing both, right? Like test without and test with. If it's new partition, if it's like blue-green approach, if we create new partition, nobody is yet reading from it. We can block it and create indexes without concurrently. True. I like that. Constraints as well. We know it's already big.
Starting point is 00:35:52 Like, okay, we have like 10 gigabytes partition. But we know nobody is using it yet. So we can move faster and regular create indexes like roughly two times faster than create index concurrently. And same with partitions. We don't need not valid and then validate two-phase creation. We can just create it, blocking this partition, not taking into account others at all before we open the gates to it.
Starting point is 00:36:17 One last idea from my side. If we can't afford to do all maintenance every night, we could stagger it, right? We don't have to do all maintenance every night we could stagger it right like we don't have to do the same every night we could do like a section of them on mondays a section of them on tuesdays a section on wednesdays like we can fit however you want or also maybe some actions can be like we can avoid duplicated actions and do it less often as well. So instead of changing something every day, we change it once per week.
Starting point is 00:36:47 Although everything else is changing. We have daily partitions, blah, blah, blah. But something is changed only once per week just to avoid. I don't know. It's like some fantasy. I guess it depends on the use case, right? But all I meant is in terms of maintenance, if you get rid of bloat once per week, it doesn't mean you have to do a lot of work once per week.
Starting point is 00:37:08 It could be you do a little bit of work. Yeah. For example, recreate some table, repack it only like once per week. And then we know we accumulated some changes and like some bloat and so on. It's like an order changed and we process it in a heavier way just once per week. It's also some optimization considering trade-off we have.
Starting point is 00:37:31 But in ideal world, we have enough time to insert our data and to say, vacuum all, repack. Repack everything, all tables, all indexes with clustering and then run vacuum without fool, without fool.
Starting point is 00:37:44 Vacuum analyze verbose freeze on whole database. Nice. And in the end we have quite ideal state and that's it. I would return max wall size just in case not to keep it very big.
Starting point is 00:38:00 But maybe dynamic play with max wall size and checkpoint timeout is not's not that needed maybe we can keep it quite large for all the time brilliant did you have anything else i don't think so i think it's it's enough we already like we designed some brainstorm some project here i guess yeah i hope it was helpful whoever requested this. And it was definitely interesting for us in terms of thinking it through. Maybe last thought. All I explained comes from my experience, but I never saw such case when I can afford dealing with like whole database like that. Usually we have only part of database behaving described in this pattern but like so additional question would be if only some of tables behave
Starting point is 00:38:49 like this what to do but it's there things become much more complex because trade-offs becomes harder and so on i'd say that becomes much more normal though i think this one's interesting yeah this one's interesting and i'd be really interested to hear from anybody that does maintain maintain a system like this. And if there's anything we've forgotten or anything we've missed that you do that is good in this situation, it'd be great to hear. Coronavirus database in Great Britain, for example. Yeah. I think a lot of government data in many countries, it's quite static, and it's been refreshed. So it should be popular. Some huge data sets get done like this, like geographic ones, spatial ones. There's all sorts that I've seen that only get a refresh once per day.
Starting point is 00:39:34 Yeah. Good. Okay. Thank you for the idea. Yeah, absolutely. Thanks to our listener who gave us this idea. Yeah. And thank you, Nikolai. Thanks everyone for listening. See you next week. Thank you. Bye-bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.