Postgres FM - Row estimates
Episode Date: December 23, 2022Here are links to a few things we mentioned: ANALYZE (docs)Autovacuum config (docs)Statistics used by the planner (docs) CREATE STATISTICS (docs) Row count estimates (pgMustard blog post)�...�pg_hint_plan Optimizer methodology (talk by Robert Haas) Tomáš Vondra on statistics and hints (an excellent interview we forgot to mention, sorry!) ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov / @michristofides / @PostgresFM, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I'm Michael, founder of PgMaster. This is my co-host Nikolai, founder of PostgresAI.
Hey Nikolai, what are we talking about today?
Hi Michael, it's your turn. Tell us please.
Oh, flip it on me. Yes, I picked this one. I think we're going to call it Row Estimates.
But this is, yeah, based on, so this is about the planner,
query planner, and this is again based on a listener
suggestion or request they didn't ask about this specifically but they did ask about nested loops
causing performance issues and looking at any one particular performance issue would be quite a
dull podcast i think maybe a better video but it really a lot of these especially when it comes
to people thinking that the nested loop is the problem it's normally due to a bad row estimate
at some or an inaccurate row estimate at some point and a bad planner choice based on that
row estimate so I wanted to dive into how Postgres is making those decisions and also some things that we can do as users
to help it get better statistics
and to help guide it in the right direction
or not things that we maybe don't have at our disposal
at the moment that other databases do.
So that's the kind of thing I was hoping to talk about.
Sounds good, but what's wrong with nested loops?
They're good.
Yeah, and this is the reason I didn't want to call this nested loops causing issues.
It's mostly...
So yeah, what's wrong with them is a good point.
I think it's about trade-offs, right?
So the reason we have multiple...
A lot of the time, the planner has multiple choices for the join algorithm it chooses,
and even the scans that it does.
So it's not even
just about joins it has a lot of choices and different options and because SQL is declarative
it can make its mind up about the the fastest or most efficient way to to do that nested loops have
some real advantages they've they're very quick to get started. They're very flexible. They can be used in a lot of different cases.
But when the two relations that are being joined get large,
there can be more efficient or quicker ways of joining those.
In Postgres, we have hash joins and we have merge joins.
I think that's it for the join algorithms.
Yeah, for the algorithms. Yeah.
So nested loops are good when the relation on one side is
smaller, quite small, right? Low number of rows on one side.
Yeah, exactly. Both sides, one side. And if if even if one
side is large, if it's index nested loop can still be really
good, right?
And if they are both quite large,
nested loop probably it's not a good idea to have it.
Yeah, exactly.
So we'll probably be better off with a hash join
or in some cases, if they're both sorted, a merge join, I guess.
But sometimes we don't have those options, right?
Sometimes the only
if we if we're not doing a quality operation i think there so it's but equally when people are
looking into this if there's a good plan and a bad plan if that's often when people think it's
the nested loop that's the problem it there was another option right there was a better plan
a better path that could have been chosen but wasn't because it was being scored higher cost, the costs were estimated to be higher.
A big input for those costs is how many, I'm going to say rows, I guess, this might be another
argument for tuples are being returned at each or tuples. Sorry, back to that old one. Yeah, your version is tuples. I proposed tuples, but
you switched. Okay, so if you see an asset loop and we think
it should be different, we suspect some issues
with statistics, right? And first thing to check is
to try to analyze the table to recollect statistics or no.
What do you think? this is i think where
we get into it's probably worth us explaining up top analyze i guess what you're talking about
there is the process postgres has for gathering statistics so you can analyze a relation i learned
today actually looking into it you can analyze just a single column. And there's a lot of
flexibility there.
Or expression.
Nice. Okay, cool. And also, it happens automatically as part of auto vacuum, or auto analyze, I
guess is the part of that.
Well, yeah, auto vacuum has three tasks. So vacuuming, preventing, preventing
transaction idea up routes or freezing. And also, third option
is auto analyzing, like recollecting statistics. And
yes, this is cool. Yeah, maybe. But watch what was fourth, it
does the visibility map count as separate? Well, yeah, but it can be considered as part of vacuuming.
Okay, cool.
Maintaining free space map and visibility map.
But when analyzing, analyzing may happen together with auto vacuuming,
but it may happen also like auto analyzing separately.
So everything happens inside Postgres, right?
Many options.
But, right, so not every time AutoVacuum runs
to process some table, it recollects statistics.
It's controlled by several settings,
threshold, scale factor, related to analyze part.
Yeah, and I think it's fairly recent.
I didn't actually check, but one of the newer versions lets us set an analyze scale factor even for insert-only workloads.
Right, there was a problem for append-only use cases when you just add a lot of data and that's it.
The vacuum didn't process it.
Well, vacuum probably is not needed because you don't produce delete tuples
if you just insert, right?
But analyzing recollecting statistics
is definitely what I would like to have.
And actually many examples,
educational examples and so on,
they show like, okay, let's load something to table.
And before you must you must run analyze
yourself to have proper plans. But now you can just wait a
little bit right. And it will do a job and then you can play with
it. But of course, if you want to control probably you just
need to say analyze table name and that's it.
Yeah, so actually, the reason I wanted to
focus 11 or 12 when it was
like a couple couple of versions ago maybe three versions ago wow that's getting quite a long time
ago i remember when that was new even me i remember darafay i i cannot pronounce the last name sorry
darafay participated in this discussion and and he he was active in the Russian-speaking community as well.
So I remember this case and it was quite like so obvious improvement.
It was like one of those changes when you have feeling how come it didn't happen before, right?
Yep.
But that's the nice thing.
It does keep getting better.
And the reason I wanted to be specific about the analyze you were talking about there is that there's a few places people might see that word.
And there's analyze itself as a keyword that gathers statistics. You might see vacuum analyze as like a kind of parameter to vacuum. That is the same thing. So that's doing both vacuum and analyze. But you might also see explain analyze,
which I guess we're going to talk about in a moment,
which is not related at all, just totally different thing.
So naming things is difficult.
And sadly, that's just one we need to,
like, once you know what it does,
you realize it's unrelated,
but it can be confusing for beginners.
Right. Terms, words are overloaded everywhere.
Statistics term is also overloaded. We can talk about statistics, double statistics and
user statistics, this statistics, which collects, which keeps the column and expression stats. And
this is what is recalculated when you run analyze and also in in explain plan we also see some
statistics right we can also call it some so like the the terms are overloaded indeed there are too
few words in language unfortunately in any language but okay so back to our topic if we see
in nested loop and we say we we would like to have something else like hash join probably we just
need to check it with analyze like what will you do like just run analyze and double check the plan
or what yeah so there's a there's a so i think goes back a step how are you even spotting that
problem i would suggest that you you should be running explain analyze and trying to
work. So that's think analyze buffers. Yeah, ideally, maybe even verbose format JSON, maybe
on Twitter, there was a new saying like in buffers with trust.
I like it. Right. I think I saw you had a t shirt like that as well. So that's fun. Yep. So, so, so when explaining those buffers to check it before you run analyze to
recollect statistics.
Yeah.
But specifically explain alone won't help us here because explain alone only
gives us the estimated number of rows being returned at each stage of the
execution plan,
but with explain analyze buffers,
but explain analyze is the
important part for this topic because that will give us not only the estimated number of rows
but also the actual number of rows being returned and it's mismatches there that can really help us
diagnose where this problem's arising where the bad estimates are so yeah that would be let me let me disagree with
you here at least at least partially you're talking about like straightforward examples
when we can run exponentialized buffers so we see all details both for planning and execution
and we understand something is wrong we don't want to have a led loop here. But what if our execution takes 100 years?
Yeah, of course.
So if you say, like, just explain is not helpful, it's still helpful.
It will show a nested loop.
It will give us understanding how a planner thinks about our data.
And we know our data, right?
We know how many rows there and there.
So we say, like, no, I would – if I were you know our data, right? We know how many rows there and there. So we say like, No, I would, I would, if I if I
were you plot the planner, right, I would do it differently.
So just explain in some cases, and it's also kind of helpful,
right? In some extreme cases, when execution is long.
David Wright, Ph.D.: Very, very good point. Yep, completely
agree. But then I agree with you about the second port of call. It's often if you can see the relations involved and you can run and you think maybe analyze hasn't run recently on those and you can just it is it's not a locking up or it has even with very low risk.
And that might be enough to solve your problem.
It might be that the statistics are out of date.
Yeah, we can check.
We can check when analyze ran.
In logs, if auto-locum login is enabled,
and in recent versions, the threshold is 10 minutes,
it should be zero always zero zero like all everything all rows all and log entries related to auto vacuum should be there
in my opinion but we also have pgstat all tables pgstat user tables and we can check it has four
columns two pairs one pair for vacuuming and one pair for time stamps all all pairs are time stamps i'm not
talking about count count is quite silly metric right okay five now five well zero means something
but five ten when right so two pairs of time stamps one pair for vacuum one One is for auto-vacuum, one is for manual vacuum, and two timestamps for analyze. One for auto-analyze
means like analyze job of auto-vacuum. And second is for
manual analyze. And we can see when it happened, right? It
might still be happening. So also worth checking pg start
activity. And you will see that there auto vacuum colon
analyze or vacuum analyze so you can yeah good point it could be have some
right now in just activity the query column it has it it has details for auto
vacuum jobs as well so it's it's seen as regular session. What's the, like, on huge databases, how long do you tend to see how long it's taking?
Hours.
Well, it depends on how detailed you want your statistics to be.
By default, we have default statistics target 100, meaning 100 buckets.
But many people increase it, for example, to 1000,
sometimes more.
And the bigger default statistics target is, the longer analyze will be.
And it will produce noticeable disk IO, of course, reading data if it's not cached in
the buffer pool or patch cache.
And also individual columns can be tuned so default
statistics target it's like cluster-wide setting but some columns can be adjusted additionally
with alter table i don't remember like statistics something like that you can say i want a thousand
buckets here but 100 buckets there and only 10 there something like this it's like fine tuning
usually i recommend like avoid it unless you're 100 percent do no in my in my experience i saw
cases when people tuned it but they didn't we're not very clear in their intentions you know it
was like kind of experimenting and then it was abandoned
and then you just have some leftover of past experiments that's it so systematic approach
should be like you know what you're doing and here we definitely want more detailed statistics
so analyze can it depends on on table size and on number of buckets you want to collect for each expression or column.
Yeah.
So that's, in fact, going on to the statistics target, I think that's a useful next.
So if analyze doesn't fix your problem, there's something interesting.
If you're aware now where the problem is, then if you know the distribution of that data, if you've got a very skewed distribution,
that's where I see extending,
also increasing that target. Well, I don't know if I have the same opinion as you.
I personally don't think it's sensible
to change the global statistics target
from increasing it from 100 to 1,000.
But I've seen some people have success
with doing it with you know, with specific
columns that they know, distributed a certain way. Before I started to work with thin cloning,
I worked on the holistic approach to this problem. So for example, you say, Okay, I have my database,
I have many clones of it, and I have, I have my workload. I can reproduce it reliably, like definitely can do it, at least part of it, right?
And then we were able to run, for example, 10 VMs in parallel.
We can do it sequentially, of course, but if we can do it in parallel, why not?
By the way, there will be a a question do we have the same hardware
everywhere so so we need we also added some micro benchmarks to compare like baseline cpu same disk
same and so on but the parallelization is of benchmarks is quite interesting topic itself
so and then you just compare you take 10 values for your knob, in this case, default statistics target,
and then you compare many, many things. For example, starting from regular latencies and
throughput, you can also use PGSat statements to analyze particular query groups and see deviations.
And it's very, this a very good holistic approach.
By the way, it's also possible to do it using thin clones, but you need to focus, of course,
on buffers, not on timing, because timing will be quite volatile metric, but buffers
will be reliable.
So it's possible to do something there and see how the amount of work in terms of IOs or buffers will change depending on default statistics target.
And in this approach, the goal was, okay, we want to improve queries.
And if we change global value, which queries will be improved and which queries maybe will have some penalty, right?
Of course, you need also to compare and analyze in this experiment. And this is holistic
approach quite interesting, actually, if I like this
approach, it's like, I consider such approaches as enterprise
ready, because you answer all questions, cover all topics
here, and then you can make decision with a lot of data. But
requires efforts, of course.
Yeah, right.
And I haven't seen as many cases
where other queries are slowed down in these cases,
definitely in other performance cases.
But I hadn't considered a really good point you made
around the speed of analyze being very dependent on this and and then
well that's that's crucial for things like major version upgrades because if we when we do a major
version plans changes plans changes well we we don't have statistics do we we don't get the
statistics automatically so we have to run analyze before we can like as part of the downtime but
there's no i don't see any way
around that so if we've now increased our statistics target to a point where it's going to
take half an hour to run analyze across our entire database then that's another half an hour of
downtime every time we do a major version upgrade yeah that's that's not good no exactly so it's
it's considering these other i hadn't thought so much about that
but that makes me think even more that the kind of going after each only increasing on a kind of
case by case basis might be more sensible right and and analyzing stay in stages i saw cases when
it didn't help so like to sit in some intermediate state, it was not good at all. So conclusion was let's do the last step, like
maximum statistics and just wait more. And that's it. It would be
great if postgres upgrade, pg upgrade would just export and
import pg statistic, understanding changes if they
are happen between versions. But by way, I remember I wanted to interrupt you in one more place, but I didn't.
So you mentioned that if we have issues, probably we need to change this.
We started to discuss this default statistics target knob.
But maybe there is another mitigation action.
For example, we can tune auto-vacuum
to recollect statistics more often.
Yep.
So I would start from there, maybe.
Not from change of number of buckets.
Yep.
Well, and it depends a little bit on the problem, right?
It depends if it's a distribution issue or if it's a data being stale issue, or if it's a correlation issue.
So there's another...
Before we go there, before we go...
Oh, go on.
Right, so it's a question, what is it?
And the answer to the question, we need to experiment.
That's why I started, will you do analyze manual analyze to check it right okay we see that
vacuum did it yesterday and we inserted a lot for example so obviously we should run manual
analyze right and and double check the plan compare before and after right I mean I feel
like I want I want to say yes but equally i feel like it's a trap
you feel it you because you know me already right it is it is trap it is trap well 99.9
of dbs will do it and i personally will do it as well but i don't like it because if i later like
we we want to understand what's happening right And this is one-way ticket always.
You can't undo it, you mean.
Exactly.
But we do then know what the problem was.
At least then we know the problem was in the state.
But how can you be 100% sure that you won't have any questions to previous version of statistics?
Maybe you would like to check other plans as well, right?
You run a list and that's it.
Door's closed.
Bye-bye.
It's not good, right?
That's why, guess what?
Thing closed.
Well, just to question that slightly, if we have a point-in-time recovery, for example,
and if we want to, let's's say restore to a previous time before we
analyze can we get the old statistics back exactly yes the only problem with this approach is just
like terabyte per hour yeah yeah okay cool that's it so if you have 10 terabytes wait 10 hours for
to to try again and then you run analyzing that's why I think clones and database like engine
so I think my answer to your question
is still yes if it's like a
if this is like a production issue
we don't like
we need to heal us up
yeah exactly
I would but equally
really good point and worth thinking
worth taking a split second to think
about the consequences before
hitting execute
oh by the way is analyze
transactional
I don't know in this case
you can detach one of clones
right make it
analyze rollback
well yes exactly
I think it is because it's just a pg statistics
table it should be I don't pretty statistics table. It should be.
I don't remember 100%, but it should be transactional.
So you just begin, analyze, check your plan, roll back.
Why not?
It should be so.
Cool.
Very well.
Maybe I'm terribly wrong, but.
It's a great idea.
Someone should check it.
Well, one of us will check it.
In this case, you can iterate and return basically reset statistics once again and check using regular traditional postgres not thin clones, which I like so much.
So okay, this is the cover bit, like how we can play with many different query plans and see what's happening if we...
Now let's talk about correlation you wanted to talk about, right?
Well, yeah, I guess we've only covered a couple of the different ways
that this could be a problem.
And another famous one, I think we have discussed it on here before,
is let's say we're looking at two columns that are highly correlated and by
default postgres is going to assume i'll go on each time i try to invent some example i'm becoming
either racist or sexist or something i cannot i've got a good one for you car car makes car
makes and models for example if we say where where car make is toyota and car model is prius those are going
to be extremely highly correlated i'm not aware of any priuses that aren't toyotas but by default
postgres is going to assume that those are independent variables and is going to look at
the proportion of rows that are toyotas if they're in the most
common values and proportionate in the priuses and work out a much lower estimate of the total
than actually the case so that's probably the best well currently the best use of extended
statistics so we can now think thanks to people like Thomas von
dry. I think there's a few more people involved. As of a few versions ago, we can now specify
like we can educate about these. Yes,
there are three options. And in this case, I don't remember the words, but like something
about distinct values, most common values. And the third is about this, like one, like depends on
another, right? So functionally dependent or so. In this case,
I would choose that third option, because obviously,
Prius means Toyota. So probably, and if we don't do that,
Postgres just multiplies, okay, we have like 20% of Toyotas. Okay, we have 1% of Prius,
like multiplication, a very low number. This is a bad idea, obviously. And also, if our
query includes, for example, I don't know, like Nissan and Prius, like, obviously, we
should have zero, right? But Postgres will tell multiplication will give some non-zero number also wrong.
In this case, definitely.
But the question is how to, like, how to, okay, I usually try to find some holistic approach, right?
How to build some framework.
You're right, by the way, that obviously, we just click statistics or we increase number of buckets, none of queries should slow down.
It should only help, right?
I think so, but I'm not 100% sure.
Right.
But the framework, why I talked about it, the framework also asks question, are there any parts of our workload which struggle from our change?
This is the idea.
We should check everything and ensure that we improved something, but we have everything else at least the same.
Not improved, but at least the same.
And here I also ask a holistic question.
How to understand that we need it?
We need to guess that this column depends
on that or we can oh well based on the query plan we can often see that that's where a misestimate
is happening right like that's that i mean i i'm definitely somebody who's guilty of having a hammer
and then everything looks like a nail so that's my like that's typically the way i've seen it but be easier to to spot so yeah that that would be my point but where where is that
the place you would look as well i haven't just the question no answers okay um because this is
a common issue in query plans and it's it's not always clear from the query plan where the issue is we
we have a tip in pg mustard for particularly bad row estimates especially when well only when
there's it they're part of a slow part of the query plan a slow subtree and we link out to a
blog post that we've written going through kind of like helping people try this then try that or
you know this going through basically the things we've discussed here, so I will link it up. But I think I
need to update the blog post based on a few things we've mentioned here. It's a few years
old.
Yeah, so I like that you also started to advertise your product because we invest a lot of efforts
to try to help people and to improve things, tooling and so on. I think this statistics
is very, very underrated, because you need to make a lot of efforts to get into there. So if,
for example, explainer lies would suggest or like if Pidgin master suggests more explicitly what to
do, more people will would start using it, right? So we need some helpers that would do it.
Actually, we don't have helpers for indexes.
So it's maybe, and it's still like,
there are some works in progress trying to improve and do it.
But very interesting questions.
Yeah.
Another, well, another thing that,
the last thing I wanted to make sure
we covered as part of this,
it feels odd not to,
is that sometimes people get to the end of their tether with this.
And, you know, maybe create statistics doesn't yet support their use case.
Like we've talked before about two correlated columns, but from different tables.
I think you mentioned quite a good idea using materialized views for that.
But as a bridge to have an index on columns from different tables.
Yes. Yes.
Yeah.
It's like quite also silly thought, but it's what we have.
Can we do create statistics on materialized views?
I didn't even think about that.
Great.
So that makes sense then.
And even on foreign data wrapper tables, so foreign tables.
Yeah.
Makes sense.
Well, the thing i wanted to mention though
is that sometimes people as a last port of call or because they're used to using them in previous
databases also would like hints for this so i know i know people often talk about index don't mix with
hints hints you mean to to to command the the executor what to do the planner what to not
necessarily to command them or he says it's time to do that, to create it.
No, I meant...
Regular notifications.
Okay, like normal.
Yeah, I did mean regular ones because I think there's PG Hint Plan
that I think one of the things they let you do is give an idea
of the number of rows you're expecting from certain things.
So I think there's an interesting area there.
And I did see a talk by Robert Haas
that when somebody asked about hints,
he seemed very open to them
specifically for row estimates,
not for other things.
So that was super interesting to me.
Yeah, and the common approach to hints
from the core hackers is we don't want them.
Yeah.
But still people need it sometimes.
So I don't have 100% strong opinion here.
But let me tell you some small story.
We added hints to Database Lab.
So Database Lab is non-production and we the idea was to allow people to experiment
more and see what would like what if approach what if plan would be different and so but
sometime later sometime past people started to use it this comment okay people started to ask
okay it's good i found good better plan optimized now why we don't have it on
production right yeah and i realized that you if you add something to lab you also need to think
if like some there will be some people that will want the same on production so and they're already
like okay should we add it to production? Maybe no.
Maybe yes.
There are different opinions here.
Maybe we should do a whole episode on that, actually.
That feels like a good one.
Wonderful.
Was there anything else you wanted to make sure we covered?
No.
Analyze more often.
Yeah, absolutely.
Well, thanks, everybody, for listening.
Thank you, Nikolai.
And see you next week. Or have a good Christmas for everybody who's celebrating.
Thank you. Likewise.