Postgres FM - Index-Only Scans

Starting point is 00:00:00 Hello, hello, this is PostgresFM. As usual, I don't remember episode number. Actually, I don't care. I know 100 something. And my name is Nikolai, founder of PostgresAI. And as usual, my co-host is Michael, Pidgey Mustard. Hi, Michael. Hello, Nikolai. How are you doing? I'm doing great. How are you? Yeah, good as well. I actually feel very good after therapy session we had last week. I'm glad to hear it. Yeah. So I know it's your choice.

Starting point is 00:00:33 I'm glad you chose very practical topic, technical, practical. So yeah, what is it? Yeah, I chose index-only scans. I was aiming for technical and practical, but I also wanted something that's still beginner-friendly. Like I think there's enough depth we can go into with index-only scans. I'm hoping there's something in here for everybody. But I definitely see some beginners not fully understand index-only scans, how to get them, or even knowing that they exist,

Starting point is 00:01:03 really truly understanding their benefits and also the downsides when they go wrong or when they aren't so efficient. So yeah, lots to cover, I think. But we can start nice and gently. Yeah, and since this topic is related to query optimization, it's your field, I would say, because of PgMaster. So I will let you to be in driver's seat.

Starting point is 00:01:32 Yeah, well, I feel like you are as much, probably more of an expert in this field than me still. But yes, it's definitely something I've had to look into a lot for the product I work on. And we've got a couple of tips around this within the product. But I actually wanted to start like super basic, like index only scans a type of scan. I think it's worth covering that we've got other scan types

Starting point is 00:01:54 and Postgres gets to choose between them in certain situations. We can cover like when it gets the choice of doing an index only scan, but I wanted to start with even the absolute basics. If for example, we don't have any indexes, the only scan option we have for a table is sequential scan. Yep.

Starting point is 00:02:11 Going through the pages in order, one at a time. Sequential or parallel sequential, if we distinguish them. True. You consider that a different type? Yeah, interesting. Well, technically it's different. The planner has to make a choice. True, true.

Starting point is 00:02:33 Oh, actually, this came up. Maybe this is too off topic, but do you think parallelism should be on by default, like with Postgres? Like max-parallel workers per gather? I think JIT should be off by default this is when it came up yeah i got beaten very badly again and it was in my like in my home i had bad bad bad bad bad situation i don't understand why jit is on i completely don't understand like it every it just workloads, and that's it.

Starting point is 00:03:10 So, yeah, I think random page cost should be closer to secbench. We will discuss this, right? Yeah. Random page cost adjustment, and we discussed it a few times. And as for parallelism, I saw Jeremy Schneider's opinion that maybe in some cases it's better to switch it off, to have more predictable behavior of query processing. Because if you, well, there are many things like that. If you have parallelism turned on, which is on by default, two additional workers, usually,

Starting point is 00:03:53 but there are several such things. And definitely it's our topic today. index only scans is unpredictable thing. And heap only tuple updates is unpredictable thing. And I saw quite recently I saw talking recently, I saw talking with customers, I saw opinion, which is probably reasonable opinion from from back end engineers, or like, oh, this is something quite unpredictable. We have many, many, many installations, various situations, different policies, infrastructures. So if we think we cannot, if it was only one cluster, we should think, oh, this is like our pet, we will take care of behavior closely, watch it closely. But if you have 1000s of setups, you cannot rely on such things. And maybe you will prefer, like, let's like raise this question.

Starting point is 00:04:41 And in the end, maybe we'll try to answer ourselves. So to have predictable behavior, maybe you should switch off parallelism, stop relying on heap-only tuple updates, hot updates, and also stop relying on index-only scans because they can degrade to regular index scans, right? Or even slightly worse, I think. So yeah, let's go back to basics. Basics.

Starting point is 00:05:10 Also, question. Do you feel the name is kind of off? If we had name choices made right now, maybe index only scans would be named as index scans, and index scans would be named, I don't know, like index heap scans or something, right? Because it's hard to, like, you need to explain it, and if people don't work with Postgres Planner every day,

Starting point is 00:05:38 they forget, and they expect index scan is what deals with only indexes, but then index-only scan, what is it? Yeah, that's a really good transition into what these are. Yeah, so index scans came first. And I think that's probably why we're stuck with that name. And in a lot of other database management systems, not including Postgres, you have index-organized tables. But in Postgres, we don't have that.

Starting point is 00:06:08 We have heap, and index is what is often called secondary. So to look something up via an index, we'll check first in the index, and then that will point us to the right place in the heap to find that row. Probably right.

Starting point is 00:06:23 Yeah, yes. At least page is right, but offset heap to find that row so you're right yeah yes at least pages right offset might be wrong right yeah well back to hot again let's keep it simple to start with yeah for simple yes for index points but it doesn't know if this tuple is is alive or dead this is yes this is why we need to check heap. Or even, I guess not even like alive or dead, but visible. Like it might be alive to new transactions. Right, visible to current transaction. Visible, exactly. Cool. So that becomes important in a moment when, in fact, you know when index index only scans were added to postgres i found

Starting point is 00:07:06 out it was it was not very long 9.2 yeah so yeah not that long like in the in the history i remember living without them them very long of course and this mantra like indexes don't store visibility information i first heard 20 years ago and it was a rule like we index scan must consult heap to understand visibility yes so and that and when looking back at in fact the wiki includes the postgres wiki includes really good entry on index only scans and explains that the majority of the work to add index only scans was work on the visibility map to make it crash safe so that i found that fascinating looking into it for this episode i didn't know that until yesterday so yeah super fascinating so the index only scan probably should actually explain what

Starting point is 00:07:58 it is and does if the data that we need for the query is already in the index, so very typical for a B-tree index, not for all index types, but in a B-tree index, we have the data that we're indexing in the index. If that's all we need for the query we're running... If we don't use select star, select asterisk, if we only select, for example, columns which are in the index, this is a condition, right? Yes, exactly.

Starting point is 00:08:27 So if the columns we're selecting are in the index, or, for example, if we're doing an aggregation like count star, we're not really selecting a column, but we... So yeah, there's a couple of cases. Yes, exactly. And if the index type supports it, like B-trees do, then we can get an index-only scan. So yeah, really cool feature.

Starting point is 00:08:54 This is like, okay, we deal only with values which are present in index keys. Index keys maybe can be called. Maybe not. Maybe it's a bad idea to call that. But yeah, the planner might choose, may choose or may not choose index-only scan. It still may think index scan is better because the second component,

Starting point is 00:09:21 like I agree with everything you said so far, but then you say we don't consult heap well in in general case index only scan consults heap because uh we don't have visibility information still right but uh it in some cases it skips it because of do you count the visibility map as the heap? Yes. No, no, no, no, no. I mean, we check visibility map and see what beat it has in terms of all visa bit for the for the page and heap we need. If it is if it's if that page is marked as all visible, we like this is a win but it's not guaranteed it may be not marked

Starting point is 00:10:08 as all visible in this case it's kind of degradation to behavior of index scan for this particular value of in reference to tuple right and we need to go to to heap and and see first see if this tuple is visible to our session and second probably we will jump inside page to proper position because of heap only tuple chain right but yeah that's it so i mean you everything is right but uh it's not guaranteed that we won't consult heap yeah good yes exactly good point um so heap this is you'll see this in explain plans as heap fetches if you're running explain analyze even if buffers is not used in explain as buffers yes even if you skip buffers this kind of buffers you still get. Yeah. In fact, imagine if you didn't have heap fetches and you only had buffers,

Starting point is 00:11:14 you'd have to be running it twice to spot these, like once. Anyway, so I really like that they call this out explicitly. It's really helpful. And as you mentioned, it's really helpful and as you mentioned like let's say we're we're scanning many many rows for our index only scan so we're doing like a range scan across maybe returning 100 rows or something and we only have to do one or two heap fetches that can still be a huge win it's only if the proportion grows to like a decent. So, yeah, two big benefits. Let's go back to like why even have, why even bother with all that work to make the visibility map crash safe?

Starting point is 00:11:53 Why even go to the trouble of doing this? It's not only the benefit of not having to do those reads on the heap, but also I think data locality. If we're doing a range scan, the hundreds values on the index are likely in a much smaller range of index pages than they would be heap pages. I mean, if they're all inserted in order

Starting point is 00:12:19 and there have been very few updates, maybe they are, of course, only a relatively small number of pages but they could be anywhere it could be really random reads so i think historically especially if you consider when this was added doing those random reads on the heap could be uh quite a lot more expensive than doing the well reads right and uh if uh exactly if there is no data locality each tuple sits in its own page, basically buffer and we have a lot of buffer operations additionally, to just to check

Starting point is 00:12:52 visibility, basically, in this case, if we if we have all values already in index, this a lot of buffer operations just to check visibility, it feels super inefficient and index only scans is just an optimization of index scans. But they'd be super good if visibility, like if you have, for example, if you just loaded your data into a table, run a vacuum once on the table, and don't do any writes anymore. This is perfect situation, heap fetches will be zero, the best case for index-only scans. But in the worst situation,

Starting point is 00:13:34 you will have the maximum number of heap fetches, and this should be maybe the same number of buffer operations as index scans. Plus, you've had to check the visibility map, which admittedly is not many reads. Like a visibility map is tiny, but you don't have to check that if you're doing an index scan. Oh, so you think index-only scan in this extreme case

Starting point is 00:13:58 is worse than index scan? Tiny bit. I know Planner might decide. I think based on the state of visibility map, in some cases it might decide to choose index scan. Even the main condition is met. I don't know that it does. In theory, that would make sense.

Starting point is 00:14:21 I don't know for sure that it does. It's worth checking. It's a simple experiment. Worth checking and looking at buffers. For simplicity, number one thing to memorize is index-only scan. If you have AutoVac untuned, index-only scan is the best what you can get from Postgres. For example, for aggregate, select count, things like that. If you get index-only scan, even if it's like a lot, a lot of rows,

Starting point is 00:14:53 this is the best. Well, I completely agree. Index-only scan and heap fetch is zero. This is an ideal situation. And one more condition. Rows removed by filter zero as well. So I see sometimes people get excited because they see an index scanned or an index-only scan, and then they stop because they think, I've got an index-only scan, but it's returning 100 rows for every... Just one row, right?

Starting point is 00:15:22 Yeah, exactly. So there is this one other gotcha that sometimes catches beginners out you would never be caught out by that but some people see index only scan they think great i'm done and actually it's still hideously inefficient so yeah that they're the they're the two things we look out for okay so yes uh rephrasing this, you really need those rows that index only scan brought to you, right? Yes. You really need them. Because if you see 99% was filtered out, it means those rows were not needed in the first place,

Starting point is 00:15:59 and you just need to find better index, for example. Exactly. Also, let's touch this thing. You created index, you run vacuum, no writes yet. You run select count star from this table or select column, indexed column, like ID, for example. Yeah. Which technically is the same because it cannot be null.

Starting point is 00:16:21 We know count won't count rows, which is null. And star cannot be null. We know count won't count rows which is null. And star cannot be null even if or like it's complicated. Maybe we should save it for another episode about counts, right? But you run it, but you still see sequential scan in the plan. This is happening all the time, right? That's a good point. Yeah, good point. Or parallel sequential scan. Or parallel sequential scan. Why? Index is better, right?

Starting point is 00:16:50 Because index-only scan is better. I know it. It's much fewer buffer hits or reads or both. Right? What to do? What to do? You know the answer, right? Yeah. So before we jump straight to it you can use

Starting point is 00:17:09 if you want to ask this kind of question yourself there's like some parameters that we can set locally to help us diagnose these things so we could use enable sex scan and set that to false briefly and then hopefully that will encourage an index-only scan. We can compare the costs, and once you compare the costs, it should become clear. How to compare the costs? Disable scan and look? Run, explain, analyze, select counts, or maybe with buffers.

Starting point is 00:17:40 But you don't need it. In fact, you don't even need analyze. Yeah, yeah. Just because we want to compare planning cost. And if sequential scan was chosen, it means the cost was lower than for index only scan. Yes, exactly. And then I'm guessing where you're going with this is probably random page cost. Right. This happens if you have default random page cost. And it doesn't make sense if you have fast disks. Random access is close to sequential access on SSD and VME, right? In memory, it's also so.

Starting point is 00:18:14 You might have very slow magnetic disks, but database is well cached, and you rely on it somehow, right? It also can happen still. In this case, random page cost is 4 compared to sequential page cost is 1. And to walk in B3, it's random access compared to page-by-page reading of heap sequentially. So index only scan, the planner looks at random page cost. If it's four times more expensive, it might prefer sequential cost.

Starting point is 00:18:50 Especially if it can do it in parallel. Yes, yes. Well, index scan and index-only scan also can be parallel. Oh, true. Good point. Kind of like we have duplicated. Yeah, actually, maybe that doesn't matter. Yeah, good point. Bitmap scan, index scan, index only scan, sec scan, and then parallelized versions

Starting point is 00:19:09 of them, right? Yeah, I skipped bitmap scan to keep things simpler. But maybe we should do a whole episode on that because it is interesting enough. Yeah. And for index scan, parallelization cannot be, as I remember, it cannot be applied to navigating inside B3, inside the tree, going from root to leaf. But it can be applied to fetching leaf pages in tree and also accessing heap and also traversing leaf nodes so it has bidirectional list for all the leaf nodes allowing not going up and down all the time like not starting from root for each entry right we just need the first entry and then i think it can be paralyzed dealing with leaf nodes. This is what I remember.

Starting point is 00:20:07 I might be slightly wrong about internals here. But again, index-only scan is the best if you need a lot of rows. For select count star, you might still see sequential scan. It's a good sign you have not tuned random page cost, which we recommend to everyone to do sooner because later it will be more feel it will feel more risky if you have yeah yeah yeah scarier because you have a heavy workload and changing this might flip your plans in some unpredictable manner. So, yeah, it's a big task. But if you do it early, live with it,

Starting point is 00:20:53 it feels more reasonable than keeping defaults. Defaults, yeah. And did you have cases when you see this, okay, cost for sequential scan is lower, that's why I chose it. But when you say enable sex count to off, giving some huge penalty to sequential operations in Planner's mind, then you see timing and buffer

Starting point is 00:21:12 operations so much better for index on this scan. Like you think oh wow, Planner is so wrong here. You had it. I have it all the time. Yeah. Right. And this is a good sign we need to change random page cost by default in all databases, basically.

Starting point is 00:21:28 Yeah. I think we've discussed this in previous episodes, but in the past I've seen, like, conservatively people go down to two, but more aggressively… I mean, ah, not two. I was thinking, why default is still four? You know, like, why is it still 4, the default? Yeah, I don't know. I've seen people mention that they don't want,

Starting point is 00:21:53 like people have old systems and they don't want to change those. Old systems already exist. I agree with you, but that is the argument i've heard defaults are applied to new systems new systems i understand are on ssd mostly but that's what people say and i actually think also even if they didn't like also you know it's not it's not about old systems versus new systems it's about ssds versus spinning disc like it's it's about this type mostly most people aren't setting up new systems on spinning like most people setting up yeah so in that case why are we forcing the majority to have to change their default instead of like default should be for the majority

Starting point is 00:22:39 yeah it's second place after jit or maybe default shared buffers third place third place goes to random page cost let's have maybe some chart of like defaults we don't like we would like to be changed by the way things are changing slowly yesterday I learned that pgdump is not a backup tool anymore it's off topic but it's very similar to yeah so many years of discussing and discussing it's like it's not a backup you're like swimming against current all the time like you

Starting point is 00:23:16 say something but you open documentation and it says this and you know every experienced guy says this but still so defaults are similar. So Postgres until version 18, in documentation stated, pgdump is a backup tool. Like first sentence about pgdump.

Starting point is 00:23:38 And finally, like five days ago, Peter Eisenkraut committed the change. pgdump is a tool to export PostgreSQL databases. Yay! Not backup. But this will change only in a year, in PostgreSQL 18. But it's already a win. So good. I hope defaults also...

Starting point is 00:23:57 They are slowly changing. Log checkpoint was on. Workmem. Hashmem multiplier has been... Workmem has been increasing very slowly over. Yeah. Hashmem multiplier has been, like Workmem's been increasing very slowly over the years. Hashmem multiplier doubled in like not that long ago. Like there was. AutoVacuum default throttling 10 times,

Starting point is 00:24:17 but it was many years already ago. Yeah, slowly it's changing. It's good. So random patch cost, most people say decrease it. 1.1 maybe. You said some guys decide to go with 2. Well, I can

Starting point is 00:24:34 say some guys decide to put 1 there, but put 4 to sec patch cost. What? Yes. Okay, so I've seen sensible people that have done testing, I've seen said anywhere between TechPatchCost. What? Yes. Okay. So I've seen sensible people that have done testing. I've seen set anywhere between 1.1 and 2. And I tended to go closer to 1.1 if the performance results.

Starting point is 00:24:56 Crunchy, which after we did a episode about something, they tested it and published some details. Maybe we'll find a touch. So their benchmarks showed that 1.1, in their case, I think it's AWS and GCP or something, right? In their case, they found out that 1.1 is slightly better than 1.0, according to some benchmarks. Not mine.

Starting point is 00:25:24 I don't know. Maybe. 1.0 according to some benchmarks not mine i don't know maybe so since then i also got kind of like put 1.1 before i i tried to put 1.0 1.0 so i don't know like 1.1 is good to start right enough maybe here let's go through a few like other recommendations okay i think this is like one recommendation of something that you can do to encourage index only scans or like recommendations that are around index only scans we did a whole episode for example on over indexing and i think that's relevant here before we go to over indexing one second it's super important to tune out to vacuum because otherwise hip fetches will be super high, and you need auto-vacuum. Auto-vacuum is not only about vacuuming

Starting point is 00:26:06 or collection of statistics or fighting with transaction and multi-exact ID wraparounds. It's also important because it maintains visibility maps. Yes, and it's the only thing that maintains visibility. It's the only thing that can set those bits. Directly, yes. I think this is important because any like let's say we've got a page in postgres that has 20 tuples on it like really common to

Starting point is 00:26:33 have like a couple of dozen of tuples on a page if any of those rows are changed the visibility of that like the bit gets out of the of the game yeah this page yeah it goes from one to zero right it goes it goes from all visible to not all visible or not guaranteed to be all visible and no no it's not it's not all visible that's it like we we don't know but it's already cannot be considered all visible that's it yeah technically these tuples might be visible to everyone auto vacuum just didn't mark it didn't get there yeah yeah so what welcome needs needs to do this as well that's why frequent visits of tables are so important not only because of deleting that tuples yes now is i guess it is possible that you will never benefit from index-only scans if you have maybe a small table that the rows are constantly being updated.

Starting point is 00:27:30 But most of us have tables where, or at least the cases we're thinking about here, you have some data that's pretty old that doesn't change often that we do still want to read and like in those cases as long as you can yeah tuning auto vacuum super valuable for maintaining the performance of index only scans yeah right what um is there anything else around that like i actually shared just recently in my newsletter an old post by thomas vondra i thought i revisited it he's updated it for the latest versions of Postgres. I'll share it again because I think it's so good on YouTube. Oh, yeah. It's also so. That's great. I actually expected this because after they acquired Second Quandrant,

Starting point is 00:28:12 they broke that blog and some links didn't work. I used archive.org to fetch some old pages. That's great. This post needed to be updated. I also think they should be part of documentation. Nice. Well, high praise. Yeah. So, but documentation tends to avoid good how-tos and just like kind of so far, like only basics. And yeah, there are how-to notes and documentation, but they are scattered among general documentation reference style and so on. Yeah, so I wish documentation grew like huge how-to section,

Starting point is 00:28:53 but it's a different story. But let's not lose this important point in this context. So autovacuum maintains visibility map, which has this all visible. It also has all frozen, but it's a different story. All visible, two bits for each page. And if we have a lot of writes, like many writes are happening, many pages are having zero for all visible bit. And what's important here to understand that partitioning is so good yeah right like if

Starting point is 00:29:28 you just consider unpartitioned one terabyte table and partitioned one where old partitions like most of writes go to the latest or few latest partitions and all partitions are like they can be they can receive rights but it happens rarely in vacuum aggressively fairly frequently visit such pages so okay some right happened in all partition but I immediately mark this page all visible and it's only one page in this partition that's why like archive archive data locality and archive kind partitions, old ones and only new or the latest receives all inserts, some updates they usually for fresh data. It means this like all partitions most of the time will be all visible for all their pages. This is so good if massive reads are needed for all data. Without partitioning, any page can suddenly receive a write, even if all rows on this page are five years old.

Starting point is 00:30:36 And then suddenly, oh, there is some place here, let's insert new tuple. And it's out of, so everything is shuffled, right? New data and old data, they can live in the same page. And for visibility map, it's kind of a lot of work for autovacuum and visibility map is most of the time is kind of like in bad shape. So what I'm trying to say,

Starting point is 00:31:03 partitioning is needed, not only because indexes become smaller this this reason is like over years i realized this one of the smallest reasons for me because b3 height growth very slow like i mean you know right but uh this particular thing this may be number one reason since recently I realized it. Wow, I think so. For partitioning, we can say, okay, this partition is barely touched in terms of writes. We just read it because people want to see archive data.

Starting point is 00:31:37 But reading is well organized most of the time. If we need counts or something it's index index only scans with heap fetches close very close to zero right without partitioning impossible right i mean and unpredictable less predictable also yeah i mean in the case you're talking about where it's appended mostly and it's to the right of the... I've definitely seen cases where you don't touch old data anyway, whether it's partitioned or not. But I definitely see this as yet another benefit of partitioning. I'm surprised you said you see it as the biggest, but that's really cool. Well, actually, I think if we, for example,

Starting point is 00:32:27 decide to create additional indexes to support index-only scans, and we will talk about methods right now, right? And index-right amplification, I know we plan it. But if you, for example, decide... Usually, we keep the same sets of indexes on all partitions, right? But technically, you can create an index for

Starting point is 00:32:43 a particular partition. And if, for example, you created it for archive-style partitions to support index-only scan, to prioritize them for reads, and you maybe don't care about write amplification because, again, these partitions barely receive writes, and hot updates also we don't care we care about we so we might decide to create more indexes for all partitions uh to and benefit from yeah i've seen that but it's a cool idea it's just like i don't know maybe it's a crazy idea maybe we'll have some walls hit uh if we implement it but seems reasonable and for fresh i never did it like just because for fresh partition you avoid creating these indexes because you want hot updates hot updates occurring so right yeah let's let's maybe dive into details a little bit i yeah well or we could

Starting point is 00:33:41 point people towards the we did a whole episode onexing, and we did another one on hot updates. So let's point people at those, because I think there's a couple more. Definitely want to get your thoughts on the difference between multi-column index and include, like when you use each. When it was released, again, like 11, 12, 10, 10, 11, I don't remember. many some many years already

Starting point is 00:34:08 many years ago and i was thinking is it only for unique indexes honestly i remember i had anastasia albenicova presented on postgres tv details some details and yeah or maybe it was interview because she participated in this include keyword covering indexes right so my my understanding we only need like we it might be size difference also right because if if we want to put in a new column to have multi-column index, one more column we just included, it will participate in structure, right? Yes. But if we say include, it's just additionally stored in leaf nodes, right?

Starting point is 00:34:54 Leaf page, exactly. Yeah. And it means that kind of for, to achieve index-only scans, both cases work, but first case can be also used since it's in structure, it can be used to verify uniqueness

Starting point is 00:35:10 or maybe also constrain exclusion. Exclusion constraints, right? Maybe for range types. It's a different story, but definitely for uniqueness. And I now realized actually we cannot put columns of some data type to multi-column index, but we can put them to include. This is also a benefit of include.

Starting point is 00:35:34 So unique constraint, we don't want to put index because it will break the logic of uniqueness. Right? But we can put additional column to include and have index only scan, not changing logic of unique constraint. I see what you mean. But also to include, we can put data types which are not B3 friendly, which are not supported by B3. Like what?

Starting point is 00:36:02 Array, for example, right? Yeah, yeah, yeah. Polygons or something like points. Although you do need to be careful with size, right? Of course, yes, of course, size matters. But imagine we have like, point two numbers, right? So we can put it to include and the index only scans will happen. I've got one more thing that I wanted to make sure we mentioned, which was something practical and very beginner-friendly. In fact, probably the main thing that I see stopping index-only scans

Starting point is 00:36:33 in the real world, which is people selecting more data than they need. And I know it's kind of obvious, but it's really common in RRMs to have select star be the default, which is almost always going to be giving you more columns than you actually need for that specific query. But I see people doing it like people deleting features, for example, like changing how a page is structured in an application. And they no longer need all the data from the the queries returning but don't change the query like there's there's so many ways it can creep into people's code that they no longer need all the data they're returning so being like careful on that front and also just

Starting point is 00:37:16 remembering that there are these optimizations if you can be even if the application does currently use all of the columns do we need to like what value is that adding at the moment right can you make things much faster by not including that information at that point right yes and finally maybe let's mention also hot updates i will already like kind of mentioned that indirectly but directly like you have, if you need to put extra one more column to index, either to, doesn't matter if it's to extend this multi-column list or to include, to have coverage for your selects,

Starting point is 00:37:59 you can lose hotness of some updates, right? Because even if it's include, still, if this column is updated frequently, these updates cannot be hot. They will be much slower. They will be regular updates. So it's better sometimes to avoid this. So this trade-off here, you won't select only scans, but you lose hot updates. What's either selects scans, but you lose hot updates. What's either?

Starting point is 00:38:28 Selects or writes? I mean, updates. Yeah. I know this is very off-topic, but do you know heap-only tuple updates are no longer heap-only in some cases? Like in Brint, there's some changes to Brin indexes. Have you seen this? No, I haven't seen it.

Starting point is 00:38:43 And now it updates the Brin index. I thought that was funny. Yeah, yeah. Okay, so I think we covered enough, right? Yeah, 100%. For iOS. When I hear iOS, I think index-only scans, right? That's so funny.

Starting point is 00:39:01 Are you on Android or iOS? I'm on iOS. My phone, I mean. Index only scans. Cool, cool. Yeah. Yeah. Okay.

Starting point is 00:39:10 Good. See you next week. Bye. See you next week. Bye.

Your Ad Here

Postgres FM - Index-Only Scans

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.