Postgres FM - Index-Only Scans
Episode Date: August 9, 2024Nikolay and Michael discuss Index-Only Scans in Postgres — what they are, how they help, some things to look out for, and some advice. Here are some links to things they mentioned:Index-On...ly Scans and Covering Indexes (docs) https://www.postgresql.org/docs/current/indexes-index-only-scans.htmlDiscussion on Twitter about JIT and Parallel Query defaults https://x.com/jer_s/status/1819749688184373742Postgres Wiki on Index-Only Scans https://wiki.postgresql.org/wiki/Index-only_scansHeap Fetches https://www.pgmustard.com/docs/explain/heap-fetchesRows Removed By Filter https://www.pgmustard.com/docs/explain/rows-removed-by-filterrandom_page_cost https://postgresqlco.nf/doc/en/param/random_page_cost/pg_dump docs change https://x.com/samokhvalov/status/1820539826363588755Crunchy Bridge changed random_page_cost to 1.1 for new servers https://docs.crunchybridge.com/changelog#postgres_random_page_cost_1_1Autovacuum Tuning Basics (updated blog post by Tomas Vondra) https://www.enterprisedb.com/blog/autovacuum-tuning-basicsOur episode on over-indexing https://postgres.fm/episodes/over-indexingOur episode on HOT updates https://postgres.fm/episodes/hot-updatesOur episode on partitioning https://postgres.fm/episodes/partitioning~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artworkÂ
Transcript
Discussion (0)
Hello, hello, this is PostgresFM. As usual, I don't remember episode number. Actually, I don't care. I know 100 something. And my name is Nikolai, founder of PostgresAI. And as usual, my co-host is Michael, Pidgey Mustard. Hi, Michael.
Hello, Nikolai. How are you doing?
I'm doing great. How are you?
Yeah, good as well.
I actually feel very good after therapy session we had last week.
I'm glad to hear it.
Yeah.
So I know it's your choice.
I'm glad you chose very practical topic, technical, practical.
So yeah, what is it?
Yeah, I chose index-only scans.
I was aiming for technical and practical, but I also wanted something that's still beginner-friendly.
Like I think there's enough depth we can go into with index-only scans.
I'm hoping there's something in here for everybody.
But I definitely see some beginners not fully understand index-only scans,
how to get them, or even knowing that they exist,
really truly understanding their benefits
and also the downsides when they go wrong
or when they aren't so efficient.
So yeah, lots to cover, I think.
But we can start nice and gently.
Yeah, and since this topic is related to query optimization,
it's your field, I would say, because of PgMaster.
So I will let you to be in driver's seat.
Yeah, well, I feel like you are as much,
probably more of an expert in this field than me still.
But yes, it's definitely something I've had to look into a lot
for the product I work on.
And we've got a couple of tips around this within the product.
But I actually wanted to start like super basic,
like index only scans a type of scan.
I think it's worth covering that we've got other scan types
and Postgres gets to choose between them
in certain situations.
We can cover like when it gets the choice
of doing an index only scan,
but I wanted to start with even the absolute basics.
If for example, we don't have any indexes,
the only scan option we have for a table is sequential scan.
Yep.
Going through the pages in order, one at a time.
Sequential or parallel sequential, if we distinguish them.
True.
You consider that a different type?
Yeah, interesting.
Well, technically it's different.
The planner has to make a choice.
True, true.
Oh, actually, this came up.
Maybe this is too off topic,
but do you think parallelism should be on by default,
like with Postgres?
Like max-parallel workers per gather?
I think JIT should be off by default this is when it came up yeah i got beaten very badly again and it was in my like in my home
i had bad bad bad bad bad situation i don't understand why jit is on i completely don't
understand like it every it just workloads, and that's it.
So, yeah, I think random page cost should be closer to secbench.
We will discuss this, right?
Yeah.
Random page cost adjustment, and we discussed it a few times.
And as for parallelism, I saw Jeremy Schneider's opinion that maybe in some cases it's better to switch it off,
to have more predictable behavior of query processing.
Because if you, well, there are many things like that.
If you have parallelism turned on, which is on by default, two additional workers, usually,
but there are several such things. And definitely it's our topic today. index only scans is unpredictable thing. And heap
only tuple updates is unpredictable thing. And I saw
quite recently I saw talking recently, I saw talking with customers, I saw opinion,
which is probably reasonable opinion from from back end engineers, or like, oh, this is something
quite unpredictable. We have many, many, many installations, various situations, different
policies, infrastructures. So if we think we cannot, if it was only one cluster, we should think, oh, this is like our
pet, we will take care of behavior closely, watch it closely. But if you have 1000s of setups,
you cannot rely on such things. And maybe you will prefer, like, let's like raise this question.
And in the end, maybe we'll try to answer ourselves.
So to have predictable behavior, maybe you should switch off parallelism,
stop relying on heap-only tuple updates, hot updates,
and also stop relying on index-only scans
because they can degrade to regular index scans, right?
Or even slightly worse, I think.
So yeah, let's go back to basics.
Basics.
Also, question.
Do you feel the name is kind of off?
If we had name choices made right now,
maybe index only scans would be named as index scans,
and index scans would be named, I don't know,
like index heap scans or something, right?
Because it's hard to, like, you need to explain it,
and if people don't work with Postgres Planner every day,
they forget, and they expect index scan is what deals with only indexes,
but then index-only scan, what is it?
Yeah, that's a really good transition into what these are.
Yeah, so index scans came first.
And I think that's probably why we're stuck with that name.
And in a lot of other database management systems,
not including Postgres, you have index-organized tables.
But in Postgres, we don't have that.
We have heap, and index is
what is often called secondary.
So to look something up
via an index, we'll check first in the index,
and then that will
point us to the right place in the heap
to find that row.
Probably right.
Yeah, yes. At least page is right, but offset heap to find that row so you're right yeah yes at least pages right offset might be wrong right
yeah well back to hot again let's keep it simple to start with yeah for simple yes for index points
but it doesn't know if this tuple is is alive or dead this is yes this is why we need to check heap.
Or even, I guess not even like alive or dead, but visible.
Like it might be alive to new transactions.
Right, visible to current transaction.
Visible, exactly. Cool.
So that becomes important in a moment when, in fact, you know when index index only scans were added to postgres i found
out it was it was not very long 9.2 yeah so yeah not that long like in the in the history i remember
living without them them very long of course and this mantra like indexes don't store visibility
information i first heard 20 years ago and it was a rule like we index scan
must consult heap to understand visibility yes so and that and when looking back at in fact the
wiki includes the postgres wiki includes really good entry on index only scans and explains that
the majority of the work to add index only scans was work on the visibility map to make it
crash safe so that i found that fascinating looking into it for this episode i didn't know that until
yesterday so yeah super fascinating so the index only scan probably should actually explain what
it is and does if the data that we need for the query is already in the index,
so very typical for a B-tree index, not for all index types,
but in a B-tree index, we have the data that we're indexing in the index.
If that's all we need for the query we're running...
If we don't use select star, select asterisk,
if we only select, for example, columns which are in the index,
this is a condition, right?
Yes, exactly.
So if the columns we're selecting are in the index,
or, for example, if we're doing an aggregation like count star,
we're not really selecting a column, but we...
So yeah, there's a couple of cases.
Yes, exactly.
And if the index type supports it, like B-trees do,
then we can get an index-only scan.
So yeah, really cool feature.
This is like, okay, we deal only with values which are present in index keys.
Index keys maybe can be called.
Maybe not.
Maybe it's a bad idea to call that.
But yeah, the planner might choose,
may choose or may not choose index-only scan.
It still may think index scan is better
because the second component,
like I agree with everything you said so far,
but then you say
we don't consult heap well in in general case index only scan consults heap because uh
we don't have visibility information still right but uh it in some cases it skips it
because of do you count the visibility map as the heap? Yes. No, no, no, no, no. I mean, we check visibility map and see
what beat it has in terms of all visa bit for the for the page
and heap we need. If it is if it's if that page is marked as
all visible, we like this is a win but it's not guaranteed it may be not marked
as all visible in this case it's kind of degradation to behavior of index scan for this particular
value of in reference to tuple right and we need to go to to heap and and see first see if this tuple is visible to our session and second probably we
will jump inside page to proper position because of heap only tuple chain right but yeah that's it
so i mean you everything is right but uh it's not guaranteed that we won't consult heap
yeah good yes exactly good point um so heap this is you'll see this in explain plans as heap
fetches if you're running explain analyze even if buffers is not used in explain as buffers yes
even if you skip buffers this kind of buffers you still get.
Yeah. In fact, imagine if you didn't have heap fetches and you only had buffers,
you'd have to be running it twice to spot these, like once. Anyway, so I really like that they call
this out explicitly. It's really helpful. And as you mentioned, it's really helpful and as you mentioned like
let's say we're we're scanning many many rows for our index only scan so we're doing like a range
scan across maybe returning 100 rows or something and we only have to do one or two heap fetches
that can still be a huge win it's only if the proportion grows to like a decent.
So, yeah, two big benefits.
Let's go back to like why even have,
why even bother with all that work to make the visibility map crash safe?
Why even go to the trouble of doing this?
It's not only the benefit of not having to do those reads on the heap,
but also I think data locality.
If we're doing a range scan,
the hundreds values on the index
are likely in a much smaller range of index pages
than they would be heap pages.
I mean, if they're all inserted in order
and there have been very few updates,
maybe they are, of course,
only a relatively small number of pages
but they could be anywhere it could be really random reads so i think historically especially
if you consider when this was added doing those random reads on the heap could be uh quite a lot
more expensive than doing the well reads right and uh if uh exactly if there is no data locality
each tuple sits in its own page,
basically buffer and we have a lot of buffer operations additionally, to just to check
visibility, basically, in this case, if we if we have all values already in index,
this a lot of buffer operations just to check visibility, it feels super inefficient and index only scans is
just an optimization of index scans. But they'd be super good if visibility, like if you have,
for example, if you just loaded your data into a table, run a vacuum once on the table,
and don't do any writes anymore.
This is perfect situation, heap fetches will be zero,
the best case for index-only scans.
But in the worst situation,
you will have the maximum number of heap fetches,
and this should be maybe the same number
of buffer operations as index scans.
Plus, you've had to check the visibility map,
which admittedly is not many reads.
Like a visibility map is tiny,
but you don't have to check that if you're doing an index scan.
Oh, so you think index-only scan in this extreme case
is worse than index scan?
Tiny bit.
I know Planner might decide.
I think based on the state of visibility map,
in some cases it might decide to choose index scan.
Even the main condition is met.
I don't know that it does.
In theory, that would make sense.
I don't know for sure that it does.
It's worth checking. It's a simple experiment.
Worth checking and looking at buffers.
For simplicity, number one thing to memorize
is index-only scan. If you have AutoVac untuned, index-only
scan is the best what you can get from Postgres.
For example, for aggregate, select count,
things like that. If you get index-only scan, even if it's like a lot, a lot of rows,
this is the best. Well, I completely agree. Index-only scan and heap fetch is zero. This is
an ideal situation. And one more condition. Rows removed by filter zero as well.
So I see sometimes people get excited
because they see an index scanned or an index-only scan,
and then they stop because they think,
I've got an index-only scan,
but it's returning 100 rows for every...
Just one row, right?
Yeah, exactly.
So there is this one other gotcha
that sometimes catches beginners out you would never be caught out by that but some people see
index only scan they think great i'm done and actually it's still hideously inefficient so
yeah that they're the they're the two things we look out for okay so yes uh rephrasing this, you really need those rows that index only scan brought to you, right?
Yes.
You really need them.
Because if you see 99% was filtered out, it means those rows were not needed in the first place,
and you just need to find better index, for example.
Exactly.
Also, let's touch this thing.
You created index, you run vacuum, no writes yet.
You run select count star from this table
or select column, indexed column, like ID, for example.
Yeah.
Which technically is the same because it cannot be null.
We know count won't count rows, which is null. And star cannot be null. We know count won't count rows which is null. And
star cannot be null even if or like it's complicated. Maybe we should save it for another episode
about counts, right? But you run it, but you still see sequential scan in the plan. This
is happening all the time, right?
That's a good point. Yeah, good point. Or parallel sequential scan.
Or parallel sequential scan.
Why?
Index is better, right?
Because index-only scan is better.
I know it.
It's much fewer buffer hits or reads or both.
Right?
What to do?
What to do?
You know the answer, right?
Yeah. So before we jump straight to it you can use
if you want to ask this kind of question yourself there's like some parameters that we can set
locally to help us diagnose these things so we could use enable sex scan and set that to false
briefly and then hopefully that will encourage an index-only scan.
We can compare the costs, and once you compare the costs,
it should become clear.
How to compare the costs?
Disable scan and look?
Run, explain, analyze, select counts, or maybe with buffers.
But you don't need it.
In fact, you don't even need analyze.
Yeah, yeah.
Just because we want to compare planning cost. And if sequential scan was chosen, it means the cost was lower than for index only
scan. Yes, exactly. And then I'm guessing where you're going with this is probably random page
cost. Right. This happens if you have default random page cost. And it doesn't make sense if you have fast disks.
Random access is close to sequential access on SSD and VME, right?
In memory, it's also so.
You might have very slow magnetic disks,
but database is well cached,
and you rely on it somehow, right?
It also can happen still.
In this case, random page cost is 4 compared to sequential page cost is 1.
And to walk in B3, it's random access compared to page-by-page reading of heap sequentially.
So index only scan, the planner looks at random page cost. If it's four times more expensive,
it might prefer sequential cost.
Especially if it can do it in parallel.
Yes, yes.
Well, index scan and index-only scan also can be parallel.
Oh, true. Good point.
Kind of like we have duplicated.
Yeah, actually, maybe that doesn't matter.
Yeah, good point.
Bitmap scan, index scan, index only scan, sec scan, and then parallelized versions
of them, right?
Yeah, I skipped bitmap scan to keep things simpler.
But maybe we should do a whole episode on that because it is interesting enough.
Yeah.
And for index scan, parallelization cannot be, as I remember, it cannot be applied to navigating inside B3, inside the tree, going from root to leaf.
But it can be applied to fetching leaf pages in tree and also accessing heap and also traversing leaf nodes so it has bidirectional list for all the leaf nodes
allowing not going up and down all the time like not starting from root for each entry right
we just need the first entry and then i think it can be paralyzed dealing with leaf nodes. This is what I remember.
I might be slightly wrong about internals here.
But again, index-only scan is the best if you need a lot of rows.
For select count star, you might still see sequential scan.
It's a good sign you have not tuned random page cost,
which we recommend to everyone to do sooner because later it will be more
feel it will feel more risky if you have yeah yeah yeah scarier because you have a heavy workload and
changing this might flip your plans in some unpredictable manner. So, yeah, it's a big task.
But if you do it early, live with it,
it feels more reasonable than keeping defaults.
Defaults, yeah.
And did you have cases when you see this,
okay, cost for sequential scan is lower,
that's why I chose it. But when you say enable sex count to off,
giving some huge penalty to sequential operations
in Planner's mind, then
you see timing and buffer
operations so much better
for index on this scan. Like you think
oh wow, Planner is so wrong here.
You had it. I have it all the time.
Yeah. Right. And this is a good sign
we need to change random page cost
by default in all databases,
basically.
Yeah.
I think we've discussed this in previous episodes, but in the past I've seen, like, conservatively
people go down to two, but more aggressively…
I mean, ah, not two.
I was thinking, why default is still four?
You know, like, why is it still 4, the default?
Yeah, I don't know.
I've seen people mention that they don't want,
like people have old systems and they don't want to change those.
Old systems already exist.
I agree with you, but that is the argument i've heard defaults are applied to new
systems new systems i understand are on ssd mostly but that's what people say and i actually think
also even if they didn't like also you know it's not it's not about old systems versus new systems
it's about ssds versus spinning disc like it's it's about this type mostly most people aren't setting up
new systems on spinning like most people setting up yeah so in that case why are we forcing the
majority to have to change their default instead of like default should be for the majority
yeah it's second place after jit or maybe default shared buffers third place third place goes to random
page cost let's have maybe some chart of like defaults we don't like we would like to be changed
by the way things are changing slowly yesterday I learned that pgdump is not a backup tool anymore it's off topic
but it's very similar to yeah so many
years of discussing and discussing
it's like it's not a backup
you're like swimming
against current all the time like you
say something but you open documentation
and it says this and you know
every experienced guy says
this but still so
defaults are similar.
So Postgres until version 18, in documentation stated,
pgdump is a backup tool.
Like first sentence about pgdump.
And finally, like five days ago, Peter Eisenkraut committed the change.
pgdump is a tool to export PostgreSQL databases.
Yay! Not backup.
But this will
change only in a year,
in PostgreSQL 18. But it's already
a win. So good.
I hope defaults also...
They are slowly changing.
Log checkpoint was on.
Workmem.
Hashmem multiplier has been... Workmem has been increasing very slowly over. Yeah. Hashmem multiplier has been,
like Workmem's been increasing very slowly over the years.
Hashmem multiplier doubled in like not that long ago.
Like there was.
AutoVacuum default throttling 10 times,
but it was many years already ago.
Yeah, slowly it's changing.
It's good.
So random patch cost, most people say
decrease it.
1.1 maybe. You said some
guys decide to go with 2.
Well, I can
say some guys decide to
put 1 there, but
put 4 to
sec patch cost.
What? Yes.
Okay, so I've seen sensible people that have done testing, I've seen said anywhere between TechPatchCost. What? Yes. Okay.
So I've seen sensible people that have done testing.
I've seen set anywhere between 1.1 and 2. And I tended to go closer to 1.1 if the performance results.
Crunchy, which after we did a episode about something,
they tested it and published some details.
Maybe we'll find a touch.
So their benchmarks showed that 1.1, in their case,
I think it's AWS and GCP or something, right?
In their case, they found out that 1.1 is slightly better than 1.0,
according to some benchmarks.
Not mine.
I don't know.
Maybe. 1.0 according to some benchmarks not mine i don't know maybe so since then i also got kind of like
put 1.1 before i i tried to put 1.0 1.0 so i don't know like 1.1 is good to start right
enough maybe here let's go through a few like other recommendations okay i think this is like
one recommendation of something that you can do to encourage index only scans or like recommendations
that are around index only scans we did a whole episode for example on over indexing and i think
that's relevant here before we go to over indexing one second it's super important to tune out to
vacuum because otherwise hip fetches will be super high, and you need auto-vacuum. Auto-vacuum is not only about vacuuming
or collection of statistics
or fighting with transaction and multi-exact ID wraparounds.
It's also important because it maintains visibility maps.
Yes, and it's the only thing that maintains visibility.
It's the only thing that can set those bits.
Directly, yes.
I think this is important because
any like let's say we've got a page in postgres that has 20 tuples on it like really common to
have like a couple of dozen of tuples on a page if any of those rows are changed the visibility
of that like the bit gets out of the of the game yeah this page yeah it goes
from one to zero right it goes it goes from all visible to not all visible or not guaranteed to
be all visible and no no it's not it's not all visible that's it like we we don't know but it's
already cannot be considered all visible that's it yeah technically these tuples might be visible to everyone auto vacuum just didn't mark it
didn't get there yeah yeah so what welcome needs needs to do this as well that's why
frequent visits of tables are so important not only because of deleting that tuples
yes now is i guess it is possible that you will never benefit from index-only scans if you have maybe a small table that the rows are constantly being updated.
But most of us have tables where, or at least the cases we're thinking about here, you have some data that's pretty old that doesn't change often that we do still want to read and like in those cases as long as you can yeah tuning auto vacuum super valuable for
maintaining the performance of index only scans yeah right what um is there anything else around
that like i actually shared just recently in my newsletter an old post by thomas vondra i thought
i revisited it he's updated it for the latest versions of Postgres. I'll share it again because I think it's so good on YouTube.
Oh, yeah.
It's also so.
That's great.
I actually expected this because after they acquired Second Quandrant,
they broke that blog and some links didn't work.
I used archive.org to fetch some old pages.
That's great.
This post needed to be updated.
I also think they should be part of documentation. Nice. Well, high praise.
Yeah. So, but documentation tends to avoid good how-tos and just like kind of
so far, like only basics. And yeah, there are how-to notes and documentation, but they are scattered among general documentation reference style and so on.
Yeah, so I wish documentation grew like huge how-to section,
but it's a different story.
But let's not lose this important point in this context.
So autovacuum maintains visibility map, which has this all visible.
It also has all frozen, but it's a different story.
All visible, two bits for each page.
And if we have a lot of writes, like many writes are happening,
many pages are having zero for all visible bit.
And what's important here to understand that partitioning is so good yeah right like if
you just consider unpartitioned one terabyte table and partitioned one where old partitions
like most of writes go to the latest or few latest partitions and all partitions are like they can be they can receive rights but it happens rarely
in vacuum aggressively fairly frequently visit such pages so okay some right happened in all
partition but I immediately mark this page all visible and it's only one page in this partition
that's why like archive archive data locality and archive kind partitions, old ones and only new or the latest
receives all inserts, some updates they usually for fresh data. It means this like all partitions
most of the time will be all visible for all their pages. This is so good if massive reads are needed for all data. Without partitioning,
any page can suddenly receive a write, even if all rows on this page are five years old.
And then suddenly, oh, there is some place here, let's insert new tuple. And it's out of, so everything is shuffled, right?
New data and old data,
they can live in the same page.
And for visibility map,
it's kind of a lot of work for autovacuum
and visibility map is most of the time
is kind of like in bad shape.
So what I'm trying to say,
partitioning is needed,
not only because indexes
become smaller this this reason is like over years i realized this one of the smallest reasons for me
because b3 height growth very slow like i mean you know right but uh this particular thing
this may be number one reason since recently I realized it.
Wow, I think so.
For partitioning, we can say, okay, this partition is barely touched in terms of writes.
We just read it because people want to see archive data.
But reading is well organized most of the time.
If we need counts or something it's index index only
scans with heap fetches close very close to zero right without partitioning impossible
right i mean and unpredictable less predictable also yeah i mean in the case you're talking about
where it's appended mostly and it's to the right of the...
I've definitely seen cases where you don't touch old data anyway, whether it's partitioned or not.
But I definitely see this as yet another benefit of partitioning.
I'm surprised you said you see it as the biggest, but that's really cool. Well, actually, I think if we, for example,
decide to create additional indexes to support index-only
scans, and we will talk about methods
right now, right? And index-right
amplification, I know we plan it.
But if you, for example, decide...
Usually, we keep the same sets of
indexes on all partitions, right? But technically,
you can create an index for
a particular partition. And if, for example, you created it for archive-style partitions to support index-only scan,
to prioritize them for reads, and you maybe don't care about write amplification because, again,
these partitions barely receive writes, and hot updates also we don't care we care about we so we might
decide to create more indexes for all partitions uh to and benefit from yeah i've seen that but
it's a cool idea it's just like i don't know maybe it's a crazy idea maybe we'll have some
walls hit uh if we implement it but seems reasonable and for fresh i never did it like just because
for fresh partition you avoid creating these indexes because you want hot updates hot updates
occurring so right yeah let's let's maybe dive into details a little bit i yeah well or we could
point people towards the we did a whole episode onexing, and we did another one on hot updates.
So let's point people at those,
because I think there's a couple more.
Definitely want to get your thoughts
on the difference between multi-column index and include,
like when you use each.
When it was released, again, like 11, 12, 10, 10, 11,
I don't remember. many some many years already
many years ago and i was thinking is it only for unique indexes honestly i remember i had anastasia
albenicova presented on postgres tv details some details and yeah or maybe it was interview because she participated in this include keyword
covering indexes right so my my understanding we only need like we it might be size difference
also right because if if we want to put in a new column to have multi-column index, one more column we just included,
it will participate in structure, right?
Yes.
But if we say include,
it's just additionally stored in leaf nodes, right?
Leaf page, exactly.
Yeah.
And it means that kind of for,
to achieve index-only scans,
both cases work,
but first case can be also used
since it's in structure,
it can be used to verify uniqueness
or maybe also constrain exclusion.
Exclusion constraints, right?
Maybe for range types.
It's a different story,
but definitely for uniqueness.
And I now realized actually
we cannot put columns of some
data type to multi-column index, but we can put them to include. This is also a benefit of include.
So unique constraint, we don't want to put index because it will break the logic of uniqueness. Right? But we can put additional column to include
and have index only scan,
not changing logic of unique constraint.
I see what you mean.
But also to include, we can put data types
which are not B3 friendly,
which are not supported by B3.
Like what?
Array, for example, right?
Yeah, yeah, yeah.
Polygons or something like points.
Although you do need to be careful with size, right?
Of course, yes, of course, size matters. But imagine we have like, point two numbers, right?
So we can put it to include and the index only scans will happen.
I've got one more thing that I wanted to make sure we mentioned, which was something practical and very beginner-friendly.
In fact, probably the main thing that I see stopping index-only scans
in the real world, which is people selecting more data than they need.
And I know it's kind of obvious, but it's really common in RRMs
to have select star be the default,
which is almost always going to be giving you more columns than you actually need for that specific query.
But I see people doing it like people deleting features, for example, like changing how a page is structured in an application.
And they no longer need all the data from the the queries returning but don't
change the query like there's there's so many ways it can creep into people's code that they
no longer need all the data they're returning so being like careful on that front and also just
remembering that there are these optimizations if you can be even if the application does currently
use all of the columns do we need to like what value is that
adding at the moment right can you make things much faster by not including that information at
that point right yes and finally maybe let's mention also hot updates i will already like
kind of mentioned that indirectly but directly like you have, if you need to put extra one more column to index,
either to, doesn't matter if it's to extend
this multi-column list or to include,
to have coverage for your selects,
you can lose hotness of some updates, right?
Because even if it's
include, still, if this column is updated frequently, these
updates cannot be hot. They will be much slower. They will be
regular updates. So it's better sometimes to avoid this. So this
trade-off here, you won't select only scans, but you lose hot
updates. What's either selects scans, but you lose hot updates.
What's either?
Selects or writes?
I mean, updates.
Yeah.
I know this is very off-topic,
but do you know heap-only tuple updates are no longer heap-only in some cases?
Like in Brint, there's some changes to Brin indexes.
Have you seen this?
No, I haven't seen it.
And now it updates the Brin index.
I thought that was funny.
Yeah, yeah.
Okay, so I think we covered enough, right?
Yeah, 100%.
For iOS.
When I hear iOS, I think index-only scans, right?
That's so funny.
Are you on Android or iOS?
I'm on iOS.
My phone, I mean.
Index only scans.
Cool, cool.
Yeah.
Yeah.
Okay.
Good.
See you next week.
Bye.
See you next week.
Bye.