Postgres FM - Advanced EXPLAIN
Episode Date: October 11, 2024Nikolay and Michael discuss some more advanced topics around EXPLAIN, including some tips for complex query plans, some recent improvements, and an idea or two that might make it even better.... Here are some links to things they mentioned:Michael’s solo episode on EXPLAIN basics https://postgres.fm/episodes/explainOur episode on auto_explain https://postgres.fm/episodes/auto_explainWhich cloud providers support auto_explain? (Blog post by Michael) https://www.pgmustard.com/blog/which-cloud-providers-support-auto-explainpsychopg https://www.psycopg.orgOur first episode on BUFFERS https://postgres.fm/episodes/buffers-by-defaultOur second episode on BUFFERS https://postgres.fm/episodes/buffers-ii-the-sequeltrack_io_timing https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-TRACK-IO-TIMINGcompute_query_id https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-COMPUTE-QUERY-ID EXPLAIN: beyond the basics (talk by Michael) https://www.youtube.com/watch?v=sGkSOvuaPs4~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artworkÂ
Transcript
Discussion (0)
Hello, hello, this is Postgres FM. My name is Nikolai, Postgres AI, and with me as usual, my co-host Michael Pijamastat. Hi, Michael, how are you doing?
Hello, Nikolai, I'm good. How are you?
Very good, very good. So it was my turn to choose the topic. And I chose the topic, naturally, I thought we didn't discuss explain, never ever we discussed explain, right? We didn't do it.
But then I googled and found out that you discussed it without me.
So I thought, it doesn't count.
Let's discuss once again, but maybe going deeper in some places.
Yeah, I covered the basics.
And I think it's a good suggestion to cover more advanced topic. I think explain is one of those topics that you can just keep getting more advanced.
And there's definitely a level at which I'll get way out of my depth.
But yeah, there's definitely room for a second episode where we go into more advanced topics,
for sure.
Right.
And just to those who don't know, you spend most of your professional time dealing with plans
because PGMaster is helping people analyze and approve queries,
particular queries, looking at explain plans, right?
Yeah, exactly.
It's a visualization tool for explain
and then also
tries to provide performance advice based on that so we get normally people's more complex query
plans so i i guess this is my this is an area where i've got more practical experience than
most obviously when everyone knows when you're building something, people that build healthcare software aren't always intimately familiar with everything about healthcare. There's certain things you get really familiar with, but you also end up having to deal with things that are not to do with that at all. probably most variations of how performance problems can be shown via explain plans i have
some blind spots for sure but yeah it's been a big thing for me but also for you right like this is
a big part of your work's been performance consulting and yeah well it's hard to do that
without explain for me explain is the not number one tool usually because except cases when a
customer asks us to to help with specific query which happens as well but usually we deal with
analysis like top-down analysis of whole workload and we start with pg start statements and weight
event analysis performance insights or something
and then we
go down
and when we
identify
first
aggregated
query
or normalized
query
in official
pgstat statements
terminology
and only then
we find
examples of
bad behaving
query
then we
discuss
like how
to
how to
improve
where to we need to a playground we a proper playground, and maybe we will want to discuss how to make it.
So, explain plans, match what we see in production.
And only then we go and check the actual plan and try to improve it and so on.
Sometimes we have all the auto-explain, right?
Auto-explain is a great tool.
Yeah, we did an episode on that too.
Yeah, yeah, I remember.
And yeah, if you have it, it's great.
But it's not always so.
It's not enabled by default.
It requires some effort to be enabled.
Although it's present, it's available everywhere.
So I wish I saw it more often.
Did you see I did a blog post on this this year
on which cloud providers
provide access to Xplain
and which parameters?
I don't remember, honestly.
I need to check it.
I'll share it.
It's not the most interesting read,
but it's like a good reference.
And I find it useful
for if somebody mentions a cloud
provider to me like one of our customers so everyone has it right yes pretty much not every
not quite everyone but like pretty much everyone and most of them provide access to most parameters
to explain and also explain have a bunch of parameters and it was i found it hard to check
which ones offered which parameters and i've got good news as parameters and it was i found it hard to check which ones offered
which parameters and i've got good news as well actually i just i got a nice direct message from
somebody who works at aws and the one the parameters they didn't offer there's like two
or three that they didn't offer they've submitted a pull request for so i'm looking forward to the
next release um of the rds, which I think will include those.
That's good.
That's good.
So, yeah, of course, I will also deal with explain, but it's just one of things and we probably don't dive too deep as you do.
So where do you want to start this discussion?
It was your suggestion.
What do you want to start?
But you're an expert, right?
Okay, all right.
Maybe.
I'm not sure.
I still feel really uncomfortable with the word expert.
I'm feeling very uncomfortable with the word analyze.
It should be execute, not analyze, right?
In the explain comment.
I mean, naming.
I've been thinking about naming for a while with this because part of our tool, we give people something to copy and paste so they can easily copy it and put it in front of their query if they're using a like P SQL or an editor or something. and longer the more parameters that get added to explain the more like and and the more information
the better for when you're using a tool if you don't have to worry about it getting the output
getting long and your your visualization tool should be pointing people at the right place
anyway the amount of data is like the more the better really the more information you get the
more likely it is you can point out the problem however that uh it's become like explain and then in princess analyze buffers
verbose settings uh format json and there's more there's like wow there's now there's serialize and
memory but all most of these are off by default so you have to specify all of them to get all of
the information so analyze is one of them but the reason i was
thinking about naming was i wondered if we could i was wondering about suggesting something like
explain all but the problem with that is there's so many like pieces of advice out there in the
wild about when you run explain analyze be careful because your query will be executed you know if
that if your query is data modifying it will modify the data if your query's data modifying, it will modify the data.
If your query's long,
it will take that amount of time to run.
And all of that advice would become,
not obsolete,
but if we change the name,
all of that advice becomes difficult.
Like, you have to go back and change all of the advice,
or you have to add,
if you want to explain, analyze,
or explain, or...
For five years,
the old word could be supported,
but it's so confusing to explain people
that analyze is actually executing the query,
and it's not the same analyze as analyze table name,
which also affects probably plan,
but you need to understand this.
If you run analyze table name,
probably after that you will see different plan in
explain and explain analyze.
But it's not because these two words are
the same words. They are very different
words.
Remember we discussed
Postgres 17. It was confusion
resolution.
I wish this confusion one day
would be
resolved as well.
I agree in principle.
I just think in practice it's going to be painful.
Why painful?
The word execute is not used yet.
First we introduce synonym,
and then in five years we ditch all the words.
It's used in prepared statements, right?
Like prepare, execute.
Ah, yeah.
But, well, execute means execute.
And analyze and explain also means execute.
Okay.
True.
By the way, let me confess about something related to prepared statements.
A few weeks ago, we had an episode about planning time, right?
Which also related to our
today's topic because
planning time we check using
explain mostly.
Of course we can check it in
PGSTAR statements, but
again, not everyone
enabled track planning in PGSTAR statements.
So usually we just check
in auto-explain or regular, manually.
In ad-hoc way, we check using explain.
Not auto-explain, sadly.
It cannot show planning time?
Wow.
No.
Okay.
Good to know.
But in PitchDestinedStatements, if track planning, yeah.
Yeah, if track planning.
So if we, for example, deal with partitioning,
with high number of partitions, planning time increases, as track planning. So if we, for example, deal with partitioning, with high number of partitions,
planning time increases, as we discussed.
But I just blog posted about it,
and right after that, we had a discussion on Twitter,
and it turned out that my experiment was wrong,
and my ideas, not not fully wrong but not purely
accurate because planning time
is affected only in the very first
time
when query is executed in a session
if you execute it once again
we already have
relation metadata cache
rel cache and
overhead becomes almost zero
at least for simple for simple queries yeah simple
queries that that um can prune down to a single partition or like a low number of partitions and
it was david david roly yeah yeah yeah we point yes of course if there's no pruning then it's a
different story of course uh with with a number of partition growth,
more work needs to be done at planning time.
But yeah, I'm talking about partition pruning happening at planning time.
The idea is only the very first execution suffers.
It's so, I mean, I'm really impressed David noticed that so quickly.
Yeah.
I mean, he knows his stuff around this topic.
I'm pretty sure he was involved in the fixes.
I think it was in Postgres 12 to improve that exact metric.
So, yeah, kudos to him for pointing that out so politely as well.
I think he said it was misleading, not wrong.
And also, really good reminder.
Do you remember when we had melanie on to talk
about benchmarking how difficult benchmarking was to make sure you're measuring the thing you think
you're measuring yeah it's just a good reminder like it's these things are so difficult but by
publishing them by publishing your findings and by the postgres community being so open to be like
to correcting us when we're wrong, we've all learned something.
Exactly. It was very good.
And I cannot agree it's wrong because there was no mistake there.
There was a problem that the experiment didn't do next step,
which of course I should do.
So actually misleading is a good term here.
So I extended blog post like right away.
And it turned out if like just, if you use just explain, no execution, no analyze word.
If you just use explain, you see you add partitions.
We checked from zero, 100, 200 and up to 1000.
And you see using explain, explain select primary key lookup
and partition key
basically lookup and you see
first of all you see index scan only one
partition so partition pruning is working
at planning time but
planning time
grows linearly
with number of partitions and
number of
planning buffers also grows linearly
for this kind of query, and it was range partitioning.
We didn't check others.
But once you run X-Pen once again in the same session,
only three buffer hits always, regardless of number of partitions.
It's also good relief, right?
So you think, oh, it's great.
I'm going just to relax and I can have 10,000 partitions
for simple queries like this.
Of course, complex queries will be different,
I'm quite sure.
But if partition pruning works at planning time,
I'm happy, right?
It just tells me I need to
take care of
proper
connection
pullers.
So I need to
have them
always.
And interesting
that the
problem with
our discussion
and my ideas
a few weeks
ago when I
said like,
oh, if you
have overhead
of planning
time, which
obviously it's
not, it's only for first execution planning.
In this case, just use prepare statements.
But this is definitely maybe not only misleading,
but also wrong because, as I actually learned in detail,
if you use prepare statements,
you see that partition pruning shifts to execution.
So the generic plan includes all partitions, right?
Because we cannot cache the plan which is dealing only with single partition in this case,
but what if we need to execute the prepared statements with different parameters, right?
We need different partition probably. So
the cached plan includes
all partitions and you run
explain for prepared statements for specific
parameter and you see that
all that execution
time, you see subplans
removed, right? This word
in explain. This is interesting, which
means that
overhead shifted to execution.
It's not good, right?
But again, overhead, which overhead?
If we have real cash here, all good, right?
So it means that for high number of partitions, not prepared statements matter.
They can help, but they don't help fully because of this shift of partition pruning.
But what actually helps a lot are pullers.
We have pullers in all heavily loaded projects.
And if you are preparing to have huge tables, like we always say, if you exceed 100 gigabytes, you need partitioning but it also means you do need proper
proper connection pooling and avoid the initial overhead of of planning basically with high number
of partitions planning will be not good right yeah so back to explain So, back to explain. Yeah, question back to explain, kind of linking them.
When you did the second explain, extracting the buffers and things,
did you use format JSON for that and parse the output?
Or what did you do?
It was in, yeah, it was text, regular.
Yeah, just parsed it.
Settings, yeah.
Yeah, and it was done by our our AI
so it
it doesn't use
JSON at this point
just
we
we don't want
very bloated
format and
yeah
and actually
yeah we needed to fix
how it works
because right now
it works through
through
CyclePG
so
Python
and
yeah
it supported
so if you run a
couple of
explains in one
shot, it
received only one
result.
We fixed that.
So now it sees
both results.
And second
result is always,
in our case,
three buffer
hits.
So planning
time remains
way below
one millisecond,
like I think
maybe 10
microseconds or
so.
So super tiny, almost zero.
And
the bottom line
here is that, you know,
usually we say, if you run explain,
analyze,
buffers, of course.
We had a couple of episodes about it, right?
If you run it, always run it
a couple of times, maybe a few times
because of caches and if you run it, always run it a couple of times, maybe a few times because of caches.
If you want to check timing, if you're too obsessed by timing,
not my case, I'm obsessed by buffers, not timing.
But you need to run it multiple times to check the second execution
because first time probably everything is not cached,
data is not in the buffer pool and not in page cache so
run it twice that's a good piece of advice right but what we just discussed means also that
even for planning run it twice in the same session right because maybe you observe some
a lot of work if partition if partitioning is involved, especially a lot of work related to lack of real cache.
I've seen this.
The other weird thing I've seen, I don't know if you know the answer to, is you can do a simple ID lookup, like primary key lookup, you will get planning time but no planning buffers
reported even if you include the buffers thing
Yes
What's that?
It's a bug I think
Yeah, inside my team
last week since we worked on this
blog post, a couple of other
many people
actually were involved
but this particular problem was very annoying
very annoying so like why buffers are not reported in planning okay now they are reported and we
cannot understand on what it depends on on what it depends right so once we understand we're
probably it will be a bug report already, right? Because something wrong is there.
Because we requested.
It's weird, but I mean, the text format's hurting you there
because by default in the text format,
if things are zero, they don't get reported.
It's like roughly a general rule.
You think so?
Whereas in JSON format, you'd still get buffers reported,
but they would have
zeros it's not much better in our case it should be reported because it always was something like
some number of hits but oh yeah so that's a different bug right like all i meant is if
planning buffers are zero but i don't understand how planning buffers can be zero if it's like if
it's actually doing work if it's actually you know
picking which
index to scan
so I found
that
in the case of
prepared
cached plan
if plan is
cached
there is still
some planning
time but
I suspect
this planning
time occurs
during execution
because of
these
supplants removed
and partition
pruning happening
I'm not sure
by the way
this is some
nuance. I think maybe like parsing might count as planning or like there's, I think there's other
steps. Well, I'm looking at it right now. I'm looking at the plan, which checks execution of
with force generic plan, plan cache mode set to forced generic plan and we check execution for
prepared statement,
it's cached,
the plan cached,
and we see
no buffer
planning related
buffers reported
at all,
like,
it's missing.
But planning time
is like
38 microseconds.
And partition
pruning happened
at execution time.
So I wonder
what are those 38 microseconds.
If plan is cached, maybe just additional checks.
I don't know.
It's interesting, right?
You know there's multiple stages.
I can't remember all of them, but one of them, for example, is parsing.
One of them is rewriting, so you might rewrite the query.
I think some
of those get bucketed at the moment as so if probably you're right there are no buffer hits
involved at all and that's why it's not reported at all but i've seen this i've seen zero reported
when it's like not a not a plan cache like if it's a primary lookup that you've run well yeah
so where there
is some work
for sure
right
yeah right
or like in my
head it would
make sense
that there's
work
anyway it's
an interesting
quirk
and even if
it's not a
bug the
fact that you
raise this
topic and we
discussed this
topic inside my
team it's
it's already
something right
which is not clear.
And yeah, if someone who watches us
who is more experienced than us,
please tell us.
This is a super interesting piece of information, right?
Yeah.
So what I wanted to say,
you need to check the execution of explain for planning
for the second time inside the same session just to exclude the fact that you observed overhead from lack of real cash.
This is lesson I learned last week.
And this is correction to ideas I expressed a few weeks ago in the planning time episode.
And there's some good practices like that help avoid mitigate
this kind of issue anyway like normally people when they're benchmarking or or testing things
will run something five times and maybe take the best of the five or you know running things
multiple times i don't know if you've ever seen some sports do like uh like diving for example
they give people scores out of 10
and then they cross out the top scoring judge and the bottom scoring judge yeah and yeah actually
we did do that we did it but we like due to some reasons it was not like we we plan to improve this
and this kind of analysis should be done in exactly that way. You run
multiple times, you check which data points are way off and exclude them and then you take average.
That makes sense totally. But in this case, just how we did it, it was a single session experiment
and we ran all explains. So bottom point, if you run explain five times and every time it's a new
connection right it's still the problem persists yeah you need to run it multiple times inside the
same connection and then and then i would not like i would not agree with excluding the first one
due to lack of real cash because it's also an interesting fact that with growing
number of partitions the very first
execution like it can reach
dozens of milliseconds
and even hundred milliseconds it's a lot
so it means
connection poolers are needed and they
need to be configured properly
if you have thousands of partitions
otherwise
it will be very often
new connections established,
could execute the first time,
huge number of partitions,
and you have penalty of dozens of milliseconds
for heavily loaded projects.
It's a lot, maybe 100 milliseconds.
It's like huge overhead.
And this fact also interesting, right?
So we learned two facts.
Overhead is huge, linearly growing
for this
particular simple, very trivial case. And there is basically an overhead for subsequent
queries. Wow. This is like a sequence of surprises for me in this work. I like it, actually.
So yeah, let's maybe switch to another topic. Maybe you have some ideas for what to discuss next.
Yeah, well, when I think of advanced explain,
I'm thinking more like advanced query optimization.
Maybe you're familiar and got good read and explain plans
when it's a relatively simple query,
maybe some normal scan types, normal join algorithms you're somewhat
familiar with it you can work your way around that but i've spent several years looking at this and i
still from time to time see operation types that i've never seen before and and things get complex
like in just in terms of how many operations there are or
things that complicate things like ctes and init plans and kind of things happening which all the
things happen in uh things happening at the same time i think uh we we had haki benito on relatively
recently talking about how if you have multiple c, those happen at the same time and independently of each other.
So that's confusing.
And how those appear in Explain is interesting.
But because it's shown as a tree,
they have to kind of work out different display mechanisms
for showing that that's not necessarily misleading
because it's hard to say how you should visualize that.
But that's what I'm thinking in terms of advanced topics around explain.
How do you interpret those?
And also, what can you ignore?
If you want to do this, normally your aim is, can I speed this up?
And as things get complex, you often want to simplify.
What can I do to discard a bunch
of this information like which bits of this are already fast that i can ignore how do i maybe i
can run a simpler query that's only the part of the the plan that's problematic like that's what
i'm thinking in terms of more complex topics right i agree Reading complex plans. This is it. Yeah, I I read, quickly scan buffers, and they are cumulative,
like they are accumulated to the root, right?
And the root includes everything.
So you can see and go from root to leaves, basically.
It's like upside down tree, right?
So we top-down analysis again we we okay we understand
number of buffers understand how many megabytes or even gigabytes sometimes it is if you multiply by
the block size which is most most cases eight kibibytes right and then we go down and see where exactly this huge volume came from
this is my default type of
work with plans
including execution of course
because if you don't have execution you only have cost
and buffers are reported only
for planning as we
thoroughly just discussed
so
in this case usually it helps
in most cases it helps and uh i i wish i
also was able to see uh log related information right and explain plan yeah right like weights
yeah it's difficult to see how they would do that but not not necessarily ways what do you mean by ways what do you mean by locks if uh we yeah yeah actually
maybe you're right yeah what what's happening under the hood slightly but it's already super
complicated output right so yeah and and it's like there's macro versus micro issues here if you're
on a test environment where you're the only person there,
it's much less likely that you're going to be
incorrect.
Originally, when I said locks,
I was meaning
heavy locks.
So, you know,
at planning time,
all indexes are locked.
And all
tables and all relations are locked.
And that's a lot of overhead.
I wish I saw it.
It's interesting information.
Accessory lock for all of them.
And if it's update,
okay, what kind of lock I have here.
Oh, interesting.
Yeah, that's not what I was thinking at all.
Yeah, you thought about lightweight locks.
Well, no, I was... Or weight events. No, I was thinking, at all yeah you you thought about lightweight locks yeah well no i was oh even a different
or wait events no i was thinking like if you easiest experiment to show this is
start one transaction and maybe do some ddl or like yeah drop a column or something yeah
anything that's gonna add a huge um like a really heavy lock
on that table then when then open a second transaction just do a simple select query on
that table go back to the first transaction and commit go back to the second one and see that
your your query was just waiting that entire time if you ran that query with explain in that second
if you're committed it's not waiting anymore well but the execution time of that explain will be oh yeah
yeah yeah yeah yeah yeah yeah you're right and this execution time is it can be explained if we
saw like basically a pie chart or something or like distribution um or like There will be no node.
There will be no operation in that explain plan
that shows that it was waiting on an exclusive lock.
It will just show that one of the operations was extremely slow.
Simple wait event analysis for this particular session would help.
Yes.
And explain technically could show it.
It doesn't come up that often to me.
Maybe it's like one of the blind spots
because I don't see those,
but I don't see plans
that have these really unexpected bottlenecks.
I would like to be able to,
not by default,
but I would like to be able to see these
wait event analysis and have locks as well
because I want to understand this query,
which locks it is going to acquire when I go to production, right?
When execution finished, what logs were acquired? Right? Yeah, I think interesting extensions,
but it's already so huge. Yeah.
And as I mentioned, right, we're getting more and more, I think this is part of the reason
people are adding more parameters to explain.
So write-ahead logging information.
It should not be by default. It's too much.
Well, I actually have a proposal here.
I haven't suggested it on any of the mailing lists yet,
but I would like your opinion.
I thought about adding them by default
when you include the parameter verbose.
So if you do explain analyze verbose
give me everything like you just don't want to update your your website yes yeah i don't want
to update that snippet for sure but also i want people to only have to remember to do that and
they'll get whatever postgres can give them. Verbose means like really long.
It means everything.
It kind of means like I'm okay with loads of information.
Give me it all.
So it's already kind of in the meaning.
So I don't, anyway, I haven't proposed it yet,
but that would also come with buffers.
So I thought you might be happy.
And also if we talk about what's missing
and what you cannot get from explain, I also like
at some point, maybe some extension of this could be
imagine if we see
operations at disk level
as well. And we understand
not only hits and reads
for the buffer pool, but also
hits and reads for page cache, for example.
And maybe CPU analysis as well
for this particular execution.
If it is explain analyze LICE, right?
That's a good point.
It's not exactly what you're talking about,
but loads of people don't realize
because it's not an explained parameter
that you can, with track IO timing on,
you can start to get some insights
into the disk level performance.
And that's improving in each.
Like there's recent, even in 17, we got some improvements there.
So like buffers are split into shared, local, and temp.
IO timing wasn't initially and is now split into all three.
Yeah, that's great.
But I'm talking about like basically PG start K cache, but for explain.
I understand, but the timings are a good start, right?
I agree, yeah. And actually, I think itings are a good start, right? I agree, yeah.
And actually,
I think it's possible
to have all of this right now already,
but you need to connect many pieces.
And for example,
it could be a tool
which runs explain,
but also looks at
wait event analysis
for this PID
in Pagesat activity
and in logs as well.
And what else?
Like collects everything and big report for a particular query.
And also physical.
Physical, if we have physical access, we can do it from proc.
Knowing PID, I think we had prototype for our old bot.
So it's like you just counters from proc.
Because if you've got the query identifier,
which is the normalized one that you can match to like PGStats statements,
I'm imagining you could start to like tie all of these things together.
Yeah.
A lot of stuff is possible.
Who is doing this, by the way?
Maybe someone is doing this.
I don't know the monitoring tools well enough
to know who's doing some of this micro level.
But if they are, let us know.
If you're using one or if you're building one that does,
that would be cool to hear.
To be honest, though,
this goes back to some of the conversations we've had
about the macro to micro zoom.
Once you've done your macro analysis
and you've identified a few problematic query like a the main consumer if you've got a few bottlenecks
explains a really good tool for doing that micro like once you once you've got some a sensible set
of parameters that you know will be slow it's not that hard like it's not that hard
to reproduce normally the difficult thing is then going through a 200 line explain plan and you
asked what can help i think we've put a lot of effort into like helping with that stage and a
lot of the other tool providers have as well one thing we do that i think it's really helpful is
if a sub plan is fast like as a proportion of your query
we we collapse it by default we hide like we don't hide it we just say there's like
there's 20 nodes here everything is fine there probably you don't want to look here which means
that like a 300 line plan can go down to like 20 that's that's cool but that like those kinds of
things i think can really help initially.
It's not good for a hacker.
It's not good for a Postgres hacker.
It's like trying to work out the cost model for a new feature.
But for somebody just trying to speed up a single query,
you don't need to look at the 200 partitions that were never executed
or that took zero milliseconds.
Yeah, and you mentioned query ID.
And for example, I think it was last week
we were working with one of the customers.
We saw that AutoExplain was used and it's great.
And there was a task to analyze a lot.
And we still plan it.
And I'm very thankful to you
that we have integration with AI right now,
and your system helps our system.
And I plan to perform some massive analysis at larger scale
to be able to do it.
But the interesting thing is that if you want to analyze a lot of queries
from AutoExplain, you do need some...
You basically need to compute...
It's called compute query ID parameter, right?
You need to turn it on because by default it's auto,
and for auto-explain, it will not work.
It does if you have verbose on.
Okay, in auto-explain.
Yeah.
Okay, in our case, we didn't have verbose on, I think, right.
But only as of 16 i think
ah it's better to turn it on actually maybe yeah yeah yeah question of overhead and let's by the
way talk about overhead in a second but if you imagine you have like a lot of query cases various cases, various queries from the log.
Presence of query ID first gives you understanding how many different normalized or aggregated
queries you are dealing with, right? And second,
you can then join this
information with Pitches of Statements and go to macro
level back and understand, okay, this query is not very
influential because it's a tiny part of,
or maybe vice versa, it's influential,
although we see only a few occurrences
in our slow log because we have threshold, right,
by time, auto-explain mean duration, right,
query mean duration.
We usually set it to one second, for example, right?
And if we have only a few cases,
but we see in workload,
oh, this is actually 50% of all total time, total exact time.
Oh, we do need to think about it properly,
and influence of this optimization will be high.
So let's do it.
This helps understand when you have a lot of queries to deal with,
where to start.
You want to start with most influential steps.
So query ID is a good thing.
And I'm glad we have it
everywhere in explain,
in the logs,
in PGS statements,
you can connect all the dots.
And PGS activity as well, right?
So wait event analysis
is possible just joining
by query ID as well.
If you run many, many times, right?
So, yeah, great. What about
overhead? I think it's also an advanced topic.
I know you blog posted about it
and Ongris did.
You talked about auto-explain, right?
But explain itself also
has overhead.
Well, explain
can have both,
can under-report and can over-report the timing.
Dying ago, I'm not a fan of dying ago.
Well, but bear with me.
I think people focus a lot, there's different things.
The Ungress thing was about the queries being slower.
You actually pause some overhead by observing.
And that's true when you're running explain analyze with time because timing is on with analyze by default but it's also true
if you're running auto explain like there's also some overhead to measuring and to logging like if
you log a lot of plans that's some overhead as well so there's there's that there's those overheads
but there's also the the possibility that it under reports which i think throws them
so if you so this is one of the reasons they added the parameter serialize if you are doing
a lot of work after the query is executed but in order to send the data to the client.
So I, like,
nobody really cares about the server-side time.
Like, that's not the big, like,
the server-side time is important,
but it's what we get from explain and analyze.
But what they normally care about is the user experience,
like the full client-to-server-to-client time.
I cannot agree here, but okay, go on okay well what what i mean is if you if you if you're getting a report for my dashboard is slow there's no point
only speeding up the server side time like if you're sending yeah we need to look at amounts
of data yes um and part of that work is so serialize is part of it it's getting the data ready to
transmit and then there's the network overhead as well like i've actually trans of actually
sending all that data so yeah i guess i guess it's not it's not network cost it's not like
transmission cost it's uh cost of not cost cost maybe not a good word that's overloaded here right uh it's it's time
right or cost it's or it's cost it's cost serialized serialized is reported as time and
amount of data what what cost okay i actually don't think it has costs relate like i think
it's one of those things that gets reported in the summary section.
So just-in-time compilation
and planning.
It's about civilization, not transmitting.
Yes, it's a step before
getting the data ready to transmit.
Still everything on the server.
Yeah, for example,
uncompressing
or decompressing
toast data.
And it's Postgres 17 feature, right?
Yes.
So it just goes deeper to understand what's happening on server side, like execution and
what part of it is associated with serialization, attributed to serialization right well and not just that before if you ran
explain analyze select some data that had been toasted from this table you would get a really
fast execution plan because it wouldn't do the serialization like it would say this executes
really quickly so it wasn't just that you couldn't see the the time it was like a query
that might take 100 milliseconds when you run it from psql if you put explain analyze in front of
it it could run in one millisecond or 10 milliseconds much much much faster so because
it didn't have to do that expensive step i see yeah yeah so this is one of those cases that used
to be and not just used to be like by default this will still be this is one of those cases that used to be.
And not just used to be, like by default, this is still the behavior in version 17.
If you run explain, analyze, some basic...
So like star from blah, blah, blah.
Yes.
Before 17, we didn't see this part at all.
Even in 17, you don't see it unless you also explicitly ask, explain, analyze, That's bad.
I like defaults.
That's just what I've been saying.
Yeah.
Yeah.
And what about verbose?
Verbose includes it, right?
No, it doesn't.
No?
This is my, but this is my proposal that verbose should include all of these additional parameters.
It should, of course, I agree.
Yeah.
Well, okay.
It is as it is right now
so and it's i have to actually say explain is awesome explain in postgres in particular is
the best i've seen of any database that i've worked with um other day like my sequel for
example is catching up like they are they got explain analyze in version eight but the d the amount
of data we can already get is way like it's beyond the other databases i've seen so it is good it's
just always can be better right yeah yeah yeah if you if somebody is watching us thinking oh
a couple of haters right haters of it's just a lot of stuff yeah yeah but a lot of stuff is helping all the time
it's true do you want to discuss overhead from like buffers option i couldn't when i i did some
really basic one really basic experiment where i tried to where i did very very small queries but
i couldn't get any additional overhead i couldn couldn't, there wasn't like a measurable
extra overhead of requesting buffers.
I think you said you did an experiment once
that did show it, but I don't remember.
On timing, it cannot, like buffers are like,
they are basically invariant.
Of course, they can be hits or reads
or they're like moving back and forth.
But if everything is cached, it's very stable.
Like you're reading the same data, same query, just select.
Multiple times you will see the same number of buffers all the time.
That's one of the reasons I like them.
But if you include buffers, I did see in the past, it was some time ago,
that timing is increasing.
Because a lot of buffers need to be reported
and counters inside need to be presented.
If it's a complex query, in many places,
this overhead adds up and the timing without buffers option
is very different compared to with buffers.
I've seen, yeah, I've seen single queries where
that's true and I've seen, I think
Lucas Fittel from PG Analyze has done some
examples of this. I think
I've only seen it in queries
that have nested loops with lots of
loops. That's the
example everybody turns
to. For example,
join with nest loop.
But the reason I don Yeah, but the reason I think it's slightly overhyped
is when you're optimizing, you don't always care.
If the explain analyze is four seconds,
or maybe let's pick 400 milliseconds,
and your real execution time is 300 milliseconds,
if you can speed the 400 milliseconds up to 50 milliseconds,
the 300 might go down to 40,
and directionally, you're still looking for the same things.
And by asking for buffers information,
you're getting more information,
you're more likely to be able to optimize your query.
So I don't care as much as most that it's slightly different.
I only care that including buffers doesn't change buffers.
That's it.
That's it.
So for me, when I start working with buffers,
I'm less concerned about timing.
Focus on buffers, optimize, optimize, optimize,
and then go back to result and see,
oh, timing has improved drastically
because we reduced number of buffer hits and reads.
Right, so that's it. This is very, very
common path for me.
Just reduce because you
like sequential scan, lots of
buffer hits and reads.
We are at an index, okay.
Couple of, oh, three, four, five
buffer hits, that's it.
Time to look at timing. Oh, of course,
expected
a thousand times faster right
i like i like you know i like buffers a lot but for me they'll always be secondary because
just because if you look at timings first and and last that's the main aim like the main aim
is to get things faster and yes like, in maybe 99% of cases,
it's going to be I.O. that's the limiting factor.
But in some queries, there'll be a sort in memory.
There'll be something that doesn't report extra buffers in explain.
There'll be just in time compilation.
Or there'll be some limit, like time compilation or there'll be some limit like
maybe trigger timing just time compilation is off but all i'm saying or planning time like plan
actually planning time new you now do get buffers but uh trigger times just in time compilation
there's like a few other examples sorts in memory hashes in memory like if some of these other
like right this bottlenecks don't report buffers i
would just why not like look at timings and then buff like i i also i almost always see
query plans with both these days so it's that's great yeah yeah that's great that buffers are
present if i'm not at some point i started saying if there are no buffers, I'm not dealing with this query.
This is what I say to customers.
Go and give me buffers because we are missing super important.
Okay, if you put it to second place, this place should be always present.
I'm sure we've talked about this before.
I think that's probably enough.
I've got a talk that I gave a few years ago called Explain Beyond the Basics, which I would have thought we had covered some of the bits of, but actually I took a completely different path.
I went down some of the less common types of issue that you can spot with explain, which we haven't really talked about. So I can include that as like a, if anybody wants to know a bit more of the specifics it's a bit nicer like that
format is good because you can actually see examples of explain plans so it's probably it's
probably a bit better video wise and yet the the reason i see buffers more often now is everybody
suggests adding at you but also some of the tools some of the other tools as well like depeche added
it to the default thing that he suggests people get. So there's loads of people.
That's news to me.
I remember it was missing and it was a big disadvantage.
He added it.
It's great.
That's good.
That's good.
Yeah.
Anything else you wanted to add?
I think it's enough.
Me too.
Nice one.
Thank you.
See you later.
Bye.
Thanks so much, Nikolai.
Take care.
Bye.