Postgres FM - Multi-column indexes
Episode Date: August 22, 2025Nik and Michael discuss multi-column indexes in Postgres — what they are, how to think about them, and some guidance around using them effectively. Here are some links to things they menti...oned:Multicolumn Indexes (docs) https://www.postgresql.org/docs/current/indexes-multicolumn.htmlOur episode on Index-only scans https://postgres.fm/episodes/index-only-scansCombining Multiple Indexes (docs) https://www.postgresql.org/docs/current/indexes-bitmap-scans.htmlEnable BUFFERS with EXPLAIN ANALYZE by default https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=c2a4078ebad71999dd451ae7d4358be3c9290b07“PostgreSQL includes an implementation of the standard btree […] The only limitation is that an index entry cannot exceed approximately one-third of a page” https://www.postgresql.org/docs/current/btree.htmlOur episode on HOT updates https://postgres.fm/episodes/hot-updatesOur episode on LIMIT vs Performance https://postgres.fm/episodes/limit-vs-performance~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
Transcript
Discussion (0)
Hello, hello, this is PostGIS FM. My name is Nick, PostGIS AI, and as usual, my co-host is Michael, Pigeamastered. Hi, Michael.
Hello, Nick. How are you doing? I'm doing great. How are you?
I'm good, thank you. I tried to say, I thought, but actually, this is you who thought it's not enough episodes starting with multi-word.
Yeah. So what is the topic today? Tell us.
Yeah, this is, I chose multi-column indexes. And I know we've tried.
talked about them a bunch. We've done episodes on things like index only scans, for example,
but it felt to me looking back at those that we didn't really get into some of the
details around my multi-column indexes that I find particularly interesting. I think also
this is still the area. I see people, especially fairly experienced full-stack engineers,
back-end engineers struggling with in terms of performance or not realizing how big a difference
they can make or how helpful they can be in certain, especially a lot of the workloads I see
are web application, just like a lot of reads, like maybe 80, 90% reads, and a lot of trying
to optimize performance of those reads, especially with the odd kind of aggregate type query.
And index only scans help a bunch, but only once you understand multi-column indexes.
And I think there's some details in there that I'd like to discuss with you.
So, yeah, thought it was about time we actually had one dedicated to this.
Yeah, great.
I guess we will be going slightly beyond just multi-column indexes, right?
Well, possibly, yeah, what do you mean?
Well, maybe the question is, like, how we can compare them,
for example, having just indexes on single column, but many of them.
And sometimes I have a case where people intentionally understanding consequences
chose to cover every column with index,
a single column, I guess.
And, yeah, we usually say it's anti-pattern, right?
Well, I mean, yeah, I guess it depends.
I've not seen a workload, like,
with a set of access patterns where that would make the most sense.
Personally, I tend to see tables that have some columns
that get queried a lot more than others
and the groups of columns that get queried very often together,
lots of equality searches amongst, for example,
account ID, user ID,
but there may be a range search in terms of date.
And that's the kind of query that ends up being
so much more efficient with a multi-column index
than with a series of single-column indexes.
So, yeah, the kind of access patterns I see,
I can't imagine living without them.
But I'm sure that, well, I'm looking forward to hearing
about the case where it made most sense to...
Well, it was multi-tenant,
application where the freedom of queries is unpredictable so every tenant can choose different types
of filtering and ordering and we don't know in advance and they are not engineers those tenants
right so we need to give them some support at least somehow but index right amplification
I guess is a killer
in this use case if
data size grows, right?
Yeah,
and given it's a multi-tenant application,
are we talking
single database per?
Separate tables.
That makes more sense.
Yes.
So that starts to make some sense, then at least you're not...
Yeah, there are smaller tables, of course.
And this is isolation, complete isolation.
So all blocks,
and all tuples in these blocks
buffers or pages
how you name it
they are belonging only to
specific tenants or all problems
only like they are
localized
yeah anyway
if we think columns
A and B and we
have two indexes on both columns
when does it make to consider
a two column index instead
this is a key question I guess
yeah yeah i like i like that a lot and i guess should we probably should go one step back and say
probably we're going to be talking about btree indexes most of it for most of this
about b3 jin indexes yeah well there's a few the docs mention and it's quite funny the word
only is still in there i think from the initial commit so it says currently only the btree
gist gin and brin index types support multi-key column indexes and that um
the key is important word there
but basically
that means
key columns not include columns
so there is a subtle difference there
or gin index does not be multi-colum
so gin can
brin can gist can and
beetree can so when it says
only I think the only
inbuilt index
type that it doesn't
that doesn't support
multi-term indexes
hash yeah right
So I think only is a bit of a dubious word there.
Is that it? Is that it? That's it.
I can't think of a sixth one.
Okay.
Or maybe SPGIST, but like, I think that's a subcategory of gist.
Anyway, basically, I think we're probably going to be talking about a bee tree.
Like, the vast majority of index is a B tree and the vast majority of multiple column index I see a B tree.
Have you got any different examples other than, well, yeah, maybe maybe B tree.
b3 gen
yeah there's a problem with
gin I always like to talk about
that if
you have a scalar
column column of scalar values
like timestamp or
integer
big and you need to order by it
and limit
like 125
then it can be inefficient
because gin does know
like planar needs to choose
to use only
B3 or only gene
and this is where B3
gene can help to combine them
into multi-column index, right?
So we can talk about
hybrid indexes when one
column, one layer
is gene and another is B3
for example. Yeah, well
and I think that kind of starts
to get to why we need
or want multi-colum indexes in the
first place because I think
a large part of it
is ordering or
clustering. Like
you want, B. Tree is a
simple example. Beautifully, it has a single
order through the entire structure,
right?
Single demand. Yes, exactly.
So the order is really useful for things like
enforcing unique constraints. Like things
are placed in an order and you can see
is there one of this already in place?
And multi-column
just gives you kind of order
by the first of these columns first, and then for any of the equal order by the second column
next. And I don't think the docs make this super clear. I think they give a good example,
but that could be clearer, I think, that it's single order and order by A, then order by B,
and then C, and then D, and however many you list. And that's if they're key columns. Includes
is a bit different. That doesn't affect the ordering, and that has some benefits.
what we can discuss whether it does I guess in a bit
that's one of the main things I wanted to discuss with you actually
I think that's a particularly interesting part
but that ordering thing is really really important
in terms of then choosing which order to index things in in the first place
so I've never I've never come up with or come across
a guideline that I'm totally happy with
in terms of well it's straightforward
We discussed with 20 years ago in some circles.
So, by the way, not only ordering, but also filtering.
If your filters involve range comparison or like between operator, right?
Or in case of gist, it can be something else.
So anyway, so it can be not only strict ordering, but also just, is it inside the range or no?
Is it more or less with operators?
But as for the rule, let me this time also step one step back and say super roughly at high level forgetting about parallel operations and so on.
We have three types of access to data.
Sequential scan on one side for lots of data, it's the slowest index and index only scan.
Let's combine them in this case.
another side the fastest single index scan
especially single index only scan is the fastest
if you have a lot of data
and between them there is a bitmap scan
which is a combination of a bitmap index scan
bitmap hip scan and if you have
like this is like
proposal to have
explanation why we need a multi-column index
in case of two columns and filtering
or ordering on them.
If you have two separate single-column
indexes, you likely
will end up using either
one of them or maybe
both of them, but
in the form of
Bidmap scan. Yes.
Which is not the most efficient.
And with
multi-colum index can bring you
the most efficient
either index scan or even
index-only scan. Right?
Yeah.
And I think there are exceptions or times where they're pretty much equal in efficiency.
Of course.
But it's very, very easy to demonstrate with non-overlapping, like, data sets.
Like, you quite often use the social media thing, right?
If we had an index on user ID and a separate index on created out or, you know, some kind of timestamp,
looking at all of the user's posts in a certain time range
even if we can do the bitmap index scan on users
just to find all of that user's posts
and then a bitmap index scan on all of the posts in that time range
it's very easy to see how we're having to scan a ton more data in that
and then even though we're only looking at
they're doing like the bitmap index scan
only having to look at the index entries
and the bitmap itself will be relatively small
if that user doesn't have any posts
and then the eventual heap scan will be relatively small
and relatively efficient.
The actual initial scans to get the right data
is huge compared to having a multi-column index
with all of the user's posts next to each other
with their created at timestamps.
So it's, yeah, in terms of, we often talk about buffer reads, right?
We could create statistics, right?
And so the planner would know that this user was recently active,
so we need to follow the index on created ad.
And another is not recently active,
so it's better to fetch all entries for this user
and then order by in memory
and have top end sorting memory.
But both, with any sufficiently active system,
both would lose massively in terms of buffer reads
still to the index only scan
or even an index scan on a single multi-column index.
Yeah, yeah.
Anyway, also what matters,
random page cost matters here, right?
Because planar might, in reality,
index scan might still be better or vice versa,
but planner might think it can choose
bitmap scan in favor of index scan
because random page cost is for, for example, default.
Anyway, let's return.
While we're on the topic of outdated things,
I actually think there's a relevant part of the docs that I have an issue with that I wanted your opinion on.
It's in this area.
It says multi-column indexes should be used sparingly.
In most situations, an index on a single column is sufficient and save space and time.
Indexes with more than three columns are unlikely to be helpful unless the usage of the table is extremely stylized.
I don't remember these words from the docks.
Interesting.
They're still there in the latest versions.
and to be fair to them, they've existed
since pretty much since 7.2.
I understand.
In reality, it depends on application.
In some application, there are absolutely critical queries
which must have index scan
and sometimes even index only scan
with aggressively tuned to vacuum
and visibility maps maintain very well,
so index only scan doesn't degrade.
So I think this is like,
I can see why this was written,
but I cannot agree with it in all cases
because definitely we have three column indexes
sometimes because we choose to have the most efficient path
specifically to support specific queries
which are most important for us.
They receive a lot of traffic,
so we cannot allow degradation here.
Yeah.
What about four or five columns?
I guess it's rarer.
I don't see many, but I've seen a few.
well this let's let's let's a little bit postpone this question because I know you wanted also to touch covering indexes right because maybe this is where we will discuss this but let me return to your previous question yeah how to choose the order and I remember the rule originally what was if it doesn't matter so there are two rules and you need to spend time understanding them I I'm pretty sure users I mean
listeners who are most experienced
they know this very well
but I know a lot of
inexperienced listeners
our podcast receives
so two rules, one rule
and this is maybe
most popular in terms of
mentioning
the column on
the index on columns A-B
makes a separate
index on columns A-B
makes an index on column A
a single column index
like not needed
you can drop it
right because these two column index can support
queries that need only
index on column A
right and this means that if you
have queries
dealing with both A and
B in terms of ordering and
filtering and also
you have queries that deal with only
A you should put A
on the first place. This is
rule number one. And there are nuances
of course. And second
rule is
if you cannot choose, if you have
patterns that kind of
imagine you have only one query
which involve both columns.
How to choose? The order.
And the rule says, choose
that column which will give you the highest
selectivity and put it to the first place.
So the scope of
your scope narrows down
faster for
yeah I like it and I think
for single quick and I think there's a
another one I've heard is
your equality conditions
in the leftmost
and then if you have a
range condition or an inequality
condition like a greater than or less than
that needs to come last so you
all of your quality conditions first
then a range condition like created
that's with previous one
imagine but it's really important
for let's say you have
a query that has like where A equals one, B equals two, C equals three, and then D is between 17 and 1,
it's way more efficient to have the A, B, again, depending a little bit on selectivity, I'm sure you can come up of
examples where it isn't true, but having A, B and C in some order and then D last is really important
to making sure that it's as like low buffers as possible in terms of that index scan.
But imagine we have a query
forgetting about order buy
and limit. We have a query
which selects
with two filters. One is
some bullion
column and another is
time range.
So what should go first?
Well,
so think about it though because
like let's say a 50-50.
It's the simplest case, right? We've got 50%
true, 50%
false. Probably if we have a
different thing, it's trickier.
It should go, it should be
partial index, actually, but yeah.
Maybe, depending if we need to
like query the opposite one, sometimes.
Maybe it should be,
maybe it should be two partial indexes.
Or I don't like,
I don't think so. I don't
think two partial indexes are more efficient than.
But do you agree that
filter on date, usually, if we, for example,
we know that we usually select
narrow, narrow ranges,
day or maximum month
and we have a lot of data for many years
do we agree that the
filter on
or filter on
timestamp gives you better selectivity
in general case
I think it has to
right? Yeah. What do you mean?
Well of course if
Boolean column
has some like
it has two values and most
of columns have one value
but you're selecting another one
it's a different story
like
where is deleted
and we know we have most
rows not soft deleted
right or something like this
in this case
the Boolean filter
can give you better
selectivity but it's less
but I think a scan
let's think about like the buffer
reads if you
if it's true
and between these dates
or between these timestamps
our index scan
can be perfectly efficient if we are indexed by Boolean first, then time stamp,
because we only have to look at the true values between a certain date range.
If we mix that, if we do date range, then Boolean values, we discard half the values we read.
We read all of the date range and have to discard half the Boolean values.
Why should it all the date, all, I don't, I don't get it.
Why?
We should, we should, I reckon there will be,
double the buffer reads.
I forgot to mention my main rule,
always doubt and test in reality.
Based on my understanding
that you'll get double the buffer reads
for putting the date first
and the billion second.
So this is what I meant.
All of the rules of thumb are good,
but they're not perfect
and actually they're slightly contradictory
and I would say
even the most selective first.
Think about the shared
multi-tenancy thing
I prefer account ID
before user ID even though user ID is
definitely more selective
like partly because we're never
really well in most cases
data locality I mean right
data locality is one thing
yeah data locality is one thing but also
we're very rarely serving a single
query with a single index normally we're
serving more than one in that query
like type
so but then like then the
rules start to like not like they start to become a little bit more of an art form and you're
trying to think what will they be in future what's most likely what what might we want to
query together like it's quite likely we want to query users in the same account together at some
point maybe we want to look at all posts by this account or this team and having that data
closer together in the index has benefits than if we do it by user ID and it's let's say it's or
no matter how you're doing user ID, whether it's, maybe they'll be clustered, like, teams.
Like, maybe if you've got like a big in, maybe they'll be clustered close together,
but they won't all be.
You might add a team member many years after setting up your account, that kind of thing.
So I prefer some breaking of those rules.
There's one more rule, which is foundational.
Data Locality, by the way, I would like to think about it,
and also partitioning
if we have it
of course
we don't want to
scan all partitions
right
and actually
partitioning serves
as type of
kind of an index
kind of leading
like partition keys
kind of the leading
column or leading columns
I remember
I was trying to create
an index
on partitioning key
in each partition
and it was so
strange
because this partition
partition had only one value
and I was trying to
include this
as additional column
and somebody said
we have partitioning, why do you do this?
And I realized, oh, partitioning eliminates the need an additional step in indexes or some
indexes.
So what I was trying to say, additional rule is to remember the order of execution of select
clause.
Oh, not select clause, the whole query.
Yeah, yeah.
So first, like, we should think, for example, simple, let's make it, keep it simple.
Order by goes last.
right so if we have order by
created at desk limit 25
it means created ad should be the last step
in the multi-column index
right
yeah I never thought about that
but yeah that fits with the range thing
I was talking about
yeah because order of execution
if you put created ad on the first place
it won't be helpful
it's a simple rule
like it's intuition should tell you this
but maybe no.
And yeah.
But I agree with you
that rules can be confusing a little bit
and it's better to test and learn
and sometimes doubt and test again.
And testing, we should pay attention to buffers.
Yeah.
This world, and this is going to get better in Postgres 18, right?
People are going to get it by default one.
Yeah, me neither.
We already have customers migrating to 70.
So next year
I expect buffers will become
default and I hope you will see
a growing number
of data
with buffers, I mean plans
with buffers involved
because it's difficult. Yeah.
We definitely will.
Cool. What about
covering indexes?
Well, yeah, when I saw
the commit for adding
include to
create index. So create index
added this option to
include certain columns in the leaf pages, so in the tuples that are stored in the leaf pages,
so you can return more data.
Like if a query, for example, once your order by, like maybe wants to filter by user ID
equals this and name equals this, order by this date, the name could be a payload.
It doesn't have to be in the index ordering, but it could be there.
in the index leaf pages.
To achieve index only scans.
To achieve index only scans,
yeah, very, very important.
That's, I think, the only use case for this.
But it's a very important one,
important enough to get added.
To avoid heap fetches.
But that also works if under heavy load,
it works only if you tuned to vacuum properly
and avoid blockers of X-Win Horizon.
Yes, yes.
Otherwise, your index-only scan,
will end up just as regular slowly degrade to index scans basically yeah so yeah but covering index
like the word covering for me I think is slightly loaded because I think a an index for me is
covering for a query if you can serve it with an index only scan and in my opinion we had
covering indexes before this before it includes because if you had a multi-column index or
all of the columns that you
need. That was
a covering index for that query.
My favorite topic
criticizing the choice
of terminology
important. I agree with you.
But I'm only
reiterating this because I think it's important
for people to understand.
Like, I think it could be confusing
if you hear that the covering indexes
are only for the word
when you use includes
because it's confusing.
So, yeah, it got added in Postgres 11, 2018.
And I'm interested in kind of like the benefits over, like, why not just add it as another key?
Like, what are the downsides to doing that?
I spent some time thinking about it.
And I came to conclusion, when they were added in 2018, I spent some time.
And yeah, and I came to conclusion that the key difference.
is the semantics of
uniqueness
uniqueness,
unique,
uniqueness,
right?
So if you want,
if you have strict,
like,
if you add
additional column
to your multiple column
indexes,
you change uniqueness semantics,
right?
If you want,
that's a great point,
yeah.
If you want,
index on this can work well,
but you cannot,
but this is unique index,
this is where you should have
including,
but it's quite rare case,
actually.
So let's say we want,
an index on email, we kept back to the last week's episode or whenever it was,
we want it to be unique on email, but we want to include the name as a payload,
but we want to let people have the same name as each other, even if they...
Yeah.
But wait, because it's unique on email already, wouldn't it always be unique on email and name?
Oh no, because then we could, if they had a different name, they could use the same email.
Yeah, that's a problem.
So we don't want to have uniqueness on payers of failures.
If you want uniqueness only on email and,
name or something like it's just you said payload this is good point of view on it you're just
putting there as a passenger yeah that makes perfect sense yeah i also think do you remember
is there any size difference or well i would yeah i was going to bring up the exact same thing
i wondered if it also got done for that reason a little bit because if we've got the structure
of a bee tree you've got the once it gets sufficiently large you've got levels of kind of
They're kind of like an org tree, don't they?
Like layers of management almost.
And the, if you have a multi-column index with keys,
so where they're part of the ordering,
you need to preserve that information at the different levels,
like for ordering purposes.
But because includes columns aren't needed for ordering,
they're not relevant,
they only live in the leaf pages at the bottom level of the like B-tree.
so size-wise there should be a difference
but and I think this has changed recently
de-duplication I think changes this
not for unique indexes
but in cases where you can have duplicates
which is quite a lot of indexes
I actually think the de-duplication will normally beat
the benefits of not having to store it at multiple levels
obviously depending a little bit on how big this data is
that you're probably not putting huge paid loads in this.
But yeah, I think there was a size benefit.
I think it's less now that we've got the B2 de-duplication work.
Yeah, obviously in the columns you include into including part of Cate Index,
you cannot use them in filtering and ordering.
They are like passengers.
They're only beneficial for index-only scans.
But since they are sitting like basically in leaves,
do you remember this, there is a problem with B3 on 10.
there is a limitation due to key size, right?
So you cannot, like, 2,000-something bytes or something like this.
I guess in this case, for columns used in including, there is no such limitation, right?
I'm not having to...
There must be a limit, right, because it's still going in normal data pages.
I'm not sure.
I'm not sure.
Let's not jump to...
Wait, yeah, but I read this in the documentation, and I haven't made note of it,
but it is in the docks, I'll find it.
But I must admit, this is a rare beast to meet in the wild, you know.
I don't see including the word often.
I see, but not often at all.
Oh, yeah, I was going to ask.
So generally you see people with multi-column indexes, but they're all keys.
Is that fair?
Yeah, yeah.
Well, this is what minds tell you.
including is still kind of exotic
it's good to have
it but not so obvious
when exactly you should use it
well we discussed it but still in normal practice
you say I will just add it
if it's not unique index why I should
think about it
if there are no limitations
in the key to be met
if it's just some timestamp or
number
it's easier to add it and that's it
so I wanted to touch one more topic here
yeah when we should avoid completely putting another column to multi-column index or this
type of indexes we just discussed which are called a covering but you don't like it
when should we avoid adding such columns another column there for example we see oh there is obvious
benefit so when you say benefit like benefit to a specific set of read queries we could be
faster. Imagine we have some simple case, like the table, some filter we further by,
and it's already working well, but oh, we see like there is also column status. It has several
values, and we see most of the time we select 70% of everything, like status equals active
or something. But we think, actually we could shave off. We see some rows are filtered out
dynamically. This is what PG Master
always suggests immediately.
Oh, there is like, index
works inefficiently because
later some roles you exclude
right. Rolls filtered or something.
Like, how is it called?
So, well, yeah, there's a couple
of ways that it can be
either filter or an index
recheck, some types of index
and sometimes like, for example, with
BitNaps, I was looking at this today actually
because post-Cras 18 changes this a little bit.
But exact heat blocks
and lossy heat locks
so that can end up
with an index recheck.
But I'm talking about rows removed
because we have an additional part
of the work workload.
So a filter.
Yeah, filter.
And they are filtered out dynamically.
And we think, oh, it's not efficient.
Let's go fully efficient.
And we start adding, for example,
status to as additional column
in our multi-colum index
or it was a single column,
but it becomes two column, right?
Or we decided,
to use it as a condition and make index partial so we say where status equals active because
we know most of time it's active or something like this doesn't matter so index starts using
this call somehow right any of three ways another column including or where keywords
I know what you're going to talk about
Are we talking
Are we going into hot update territory
Exactly exactly
So this is what happened to me
I was like enthusiastic
optimizing some project
It was
I remember first time
It was very long ago
It was dock sand
Later it was acquired by
Dropbox
It was in San Francisco
And I was like
Oh we optimized additionally
Like shaved off
20 30% of buffer
hits and reads and it's great
like it's small optimization but it's helpful
and we can pay this price okay we will
we need to maintain
this index anyway it's slightly
bigger now but it's still
worth it and then later
I see degradation of updates
and you know why
right because we had hot updates
because basically every
most of updates they updated status
yeah
yeah and now it's
participating in index
So hot updates, it makes hot updates not possible at all. And hot updates, just a reminder.
We know there is new audience usually. So heap only tipple updates. This is update which
changes only table pages, pages of table data and doesn't touch indexes. Because normally
regular update, it needs to touch every single index, which produces more dirty pages,
checkpointer or
BJ writer or even
back end, they need
to flash it to disk
first of it doesn't matter
and then it generates
more wall
and it slows down updates
like regular updates
are much slower
than hot updates
so just putting another
column to your
multi-colum index
might affect updates
this is worth remembering
if it wasn't indexed already
like if it was indexed already
in some other way
Fair of like.
Yeah, yeah. It's a really good point.
And I think those index efficiency tips are the ones.
Like, I think people are most surprised by, like, people think, they look through their query.
They see, I'm talking about like experience engineers, but not experienced postgres folk.
Look through a query plan.
See, index scan, index scan, index scan, maybe there's a lot of loops.
Maybe there is a rose removed by filter, but maybe it only says one.
But it's like 20,000 loops.
and they don't realize that that's the most inefficient part of their query.
Like, just, anyway, so I love those, but normally when it's like index efficiency is,
we talk about a percentage like 0.1%, like that or less than 0.1%, great.
If it's 70%, we score it much less well.
It's a combination of factors, there's choice.
But one particular case, which is very common, and when I see it, I immediately recognize the problem.
if I see select blah blah blah where some filters and then order by created ad desk limit
100 definitely there are multiple indexes which include updated ad and there is a trigger which
or just some ruby or jango doesn't matter application which updates updated at always so this table
doesn't see hot updates 100%. And yeah and how to break that issue not.
losing the
mechanics of updated ad
timestamp.
I don't know.
I try to sometimes
to avoid indexing this.
If you present to your users
values,
really like you need
order by updated ad desk,
limit something.
You need it, right?
I try to say, okay,
updated ad, it's very close to
created ad usually.
Let's use created ad
and then like somehow
like it does work well.
It doesn't.
I would have thought in most cases it would actually, but yeah.
Depends, of course.
It depends.
Sometimes it's good.
Like you basically already have almost everything and on the fly.
Yeah, sometimes there is correlations, a very good correlation.
But sometimes if it's on partition table, updates happening all the time and it's a mess.
And updates might happen a month later, for example.
this is like
screws this
I will
for newer listeners
I'll link up
our episodes
on hot updates
and we did
one on limit
that's really good
for this
Yeah yeah
I just wanted to say
this is a warning
Yeah
yeah
Multicolum
Miner says
we
Everyone needs them
I can't
but we should be
careful
if we need
fast updates
and
fewer wall bites
put to wall
you know
Yeah
because wall
wall bites
can be an issue
and they
if they put pressure on both backups and replication systems.
Right.
Yes.
And this is the old read versus right thing, isn't it?
And also, like, if you've got a write heavy application
or if a lot of you, if you're not doing many reads at all,
you might not want many indexes at all.
Like, it is a trade-off in general and not just multicolum, right?
Sometimes, you know, sometimes here it makes sense to split
and have one-on-one relationship to tables
and in one you update a lot
and you give up
but you have only like
three or two columns and that's it
right and another one has everything else
and search and so on
but in this case you will doubt
with the problem that we cannot create
an index on two tables
we lose for some queries
we will lose a single index scan
we will need to join
back to denormalization topic again
yeah yeah I had one more
on include I wondered
if it might end up with less bloat in some workloads.
So when we're updating that column that's been included as a payload, for example,
the tuple doesn't need to move in the index.
So in the case where we've got them both as key columns,
we're having to move the tuple to a different page most likely.
If it's like ordered by email, then name, for example,
if their name changes, probably their entry or move page
and you could end up with a split.
So I was just thinking in terms of updates,
I wonder if we'd end up with less index bloat
if we used include instead of multiple keys.
It seems like a minor advantage,
but I wondered if that could help as well.
Yeah, it's an interesting point.
I should think about it.
Yeah, yeah.
But yeah, it feels to me like a include feels like,
a scalpel, like a very specific tool that is help.
Including, it's, yeah, just, it's, the word is including.
Is you sure?
No?
I'm not sure.
Include?
Yeah, it's includes.
I know this was, it was a bug in my mind.
Sorry.
Yeah, include.
It's not even includes ploy.
It's include singular.
Yes.
Yeah.
Yeah.
But that shows how common, like, shows how commonly you see it.
Yeah.
Super rare.
Including would probably make more sense, but it's just include, yeah.
Yeah.
So, okay.
Good.
Factor naming.
I think we're done?
Yeah, it was a bag or something, some pieces of advice.
And I think we'll come back to some of the other index types at some point.
By the way, I found it just as like an interesting tip bit.
Jin and Bryn, because of their structures, doesn't actually matter about the ordering
of the columns, which I found fascinating.
Said again?
GIN indexes and BRIN indexes,
the order of the columns.
How's matter?
Doesn't matter, yeah.
Interesting.
Okay.
I mean, mostly we're worried about Beatry,
so it's not too important.
Yeah, 90 plus percent.
Yeah.
All right.
Nice on Nicolai.
Thank you so much.
Catch you next week.
Bye, bye.
Bye.