Postgres FM - auto_explain
Episode Date: April 28, 2023Nikolay and Michael discuss auto_explain — what it is, how it can help, and how to check it's overhead. Here are links to a few things we mentioned: auto_explain (docs)ScaleGrid guide to ...auto_explain Can auto_explain, with timing, have low overhead? (Blog post by Michael)pgBadger pg_stat_monitorEXPLAIN ANALYZE may be lying to you (blog post by Álvaro from Ongres)pg_test_timingOur episode on benchmarkingDatabase Lab Engine~~~What did you like or not like? What should we discuss next time? Let us know on YouTube, on social media, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PG Mustard. This is my co-host Nikolai, founder of Postgres AI.
Hey Nikolai, what are we talking about today?
Hi Michael, the topic is quite obvious for me.
Over the last week or two, it popped up several times in various places.
Let's talk about AutoExplain.
Yeah, I think this might be our first purely extension-focused
episode, which is cool. And I know
I count it as an extension. Do you?
In terms of it being like a contrib module?
It's
extension, right? But
what else?
It's a little extension.
There are two kinds of extensions.
Main extensions, which are
contrib modules, so which are contrib modules,
so-called contrib modules, old name, right?
And they come with Postgres.
And you can install it from official package,
Postgres SQL contrib.
And there are all other extensions.
But probably we should talk about extensions separately.
It's a good episode idea, maybe, right? But this auto-explainer, it's a contrib module.
It's very, very like super official
extension it probably has more potential risks in terms of bugs compared to postgres itself but
much much much less compared to other extensions because it's maintained by the same team as
postgres yeah that's a really good point it comes out on the same schedule it's shipped with
postgres so if you install community postgres you get it um off by default but it makes it that bit easier
for cloud providers to offer it they don't all offer it but it does make it easier who doesn't
offer it who doesn't offer i was looking at heroku's docs i'm pretty sure they still don't
offer it there's a chance that the docs are out of date i don't even think some of the newer ones so like super base i'm not sure they offer it based on their docs maybe that's
out of date and even there are some new ones that don't offer extensions at all like but rds gcp
cloud sql and azure they all offered right yeah exactly quite a few a few of the other popular ones do. Right. Maybe we should cover what auto-explain is.
Automatic explain?
Yeah, so it logs them, right? So it logs execution plans like you would get if you ran explain.
Explain and analyze.
Yep, so you've got access to all of the parameters. There are some minor differences,
but yeah, you've got access to all the parameters you do when you run
explain and a couple of additional ones,
right? So you can set a threshold
so that it will only log
plans above a certain
duration or execution time.
It's competitor to
log mean duration statement, which
logs queries which exceeded
log mean duration
statement logs queries after successful execution
right not in the beginning as would log statement do but when it's successful and if duration exceeds
log min duration statement value for example one second right or half a second i remember by the
way one of posgis advantages like 15 years ago was that compared to
MySQL is that Postgres allowed fraction of second for this parameter, while MySQL was like
one second was minimal value you can use. So AutoExplain is competitor here, right? Because
sometimes people do it. You have both log min duration statement set to one
second or 500 milliseconds, and auto-explain set to the same value. But does it make sense at all
to use this approach? Because auto-explain output will include query as well.
I thought about this quite a lot in the past. And I think there are some
differences that might make it worthwhile,
but you are duplicating effort for sure.
One example is another bonus parameter we get with AutoExplain,
which is the sample rate.
So because there is a small amount of overhead... Which is not a bonus anymore because log-mean duration statement
also has this capability since Postgres 14, if I'm not mistaken.
We can also sample.
So what I meant was that if we're sampling, we might want to sample one, but not the other,
for example. So we might want to get all... Ah, that's a good point.
Yeah. So you understood. So I feel like we've actually covered two of the big points. One is
that we've got this logmin duration for auto-explain, and we can set that in milliseconds so it's off by
default so it's my it's negative one minus one by default which means don't log anything so you have
to set it to something i would heavily heavily suggest you do not set it to zero which will log
every every statement in many cases it will just kill you if you do it. Yeah, the logging overhead is severe, not just in terms of performance, but also in terms of size, sheer volume of logs.
Yeah, kill, I mean, the logging subsystem will probably quickly become bottleneck and no queries will be possible to execute for Postgres at all.
And we can discuss the details here. It's an interesting topic separately.
Yes, so a typical value for that
on a transactional system
might be a few hundred milliseconds.
So it's in milliseconds by default.
You can set it to 100
and that would be 100 milliseconds
or 1,000, that would be a second.
Yeah, and remember,
we discussed what slow query is.
And here it's very connected.
So why 200 milliseconds? Because this is exactly 200 milliseconds. This is exactly when perception of human mind
starts working. So if something exceeds 200 milliseconds, human notices this latency.
So we don't want queries to exceed it in LTP systems. And that's why 200 milliseconds. But
unfortunately, in some
systems, it's quite low value and volumes you will be seeing in logs and observer effect from logging
will already start hitting you. So sometimes people choose 500 or even second. It depends.
This is something where if you're worried about the overhead, or you feel like you've got a lot
of slow queries at the moment
and you just want to work your way down,
you could start with a higher value
and reduce it over time as you have fewer of those.
Right. This can be changed without restart.
What you cannot change without restart
is switching to Login Collector,
which I would highly recommend to do.
But switching to Login login collector requires restart.
Yeah, actually, that's a really good point.
I have seen guides that suggest installing or loading AutoExplain
in shared preload libraries,
but I think the general advice is to do it in session preload library
because then you don't have to restart to get it.
Well, it depends, of course.
But usually when you need it, if you didn't think in advance, you don't
have it, right?
And it's hard to quickly, even in session, it's hard to implement it.
So I would still recommend to put it to shared preload libraries and probably even maybe
turned off or with very high threshold in terms of duration and just be prepared to use it
for some incidents yeah that's an interesting point a high duration would mean you don't log
too much at least but you would still be paying the overhead of running explain all the time so
maybe also a low sample rate if you there are multiple overheads here indeed. So let's finish discussing why it's useful,
and then we'll talk in details about various overhead things.
So why it's useful?
It's useful because Postgres doesn't have...
We have PGSTAR statements,
but they lock only query bodies in a normalized way.
So the first question we usually have
when we try to optimize
some particularly bad query group
or normalized query,
we need examples.
Examples can be taken from various places,
but log mid-direction statement,
slow examples from logs,
it's quite good idea.
But then we think,
okay, we experiment in some environment,
for example, it's an exact copy or it's like, for example,
database lab engine.
We have a thin clone branch and we check plan there.
But then we think, maybe in reality, a few hours ago,
when the incident happened in production,
maybe this query had a different plan.
And we don't know, right?
Maybe, maybe.
And we can't reproduce it, right?
If you run explain and analyze.
You can reproduce it.
With Database Lab Engine,
you can do point-in-time recovery
to go to a particular point in time
exactly in the middle of an incident.
And then you can use the exact same
planner settings and workmem
and then Postgres should be had similarly.
Like 99% it will be so.
But sometimes you don't have database lab engine.
Sometimes you still have doubts.
In this case, auto-explain is a good way.
It's a very reliable way to tell that this is what we had in terms of plan.
Because we have it locked, right?
Maybe the most reliable way to understand
what plan Postgres used when executing this query.
Yeah, exactly.
And the metrics involved, right?
And the buffer numbers, the timings.
So it's the only way.
You can choose both, none of them,
or just timing, or just buffers.
It's possible to tune it and choose.
And you have very good articles saying that sometimes timing overhead is not that big.
But let's finish pros of it.
So to me, AutoExplan feels this approach, working with actual plans, feels like Generation 1. Same as, remember, we didn't have PG-statements 15 years
ago. And we all used PG4im first and then PG Badger and normalized queries processing logs.
Same approach as PG-statements, but only for slower queries, those which exceeded threshold.
And the tool normalizes them, and we have reports, nice reports.
We understand that probably we don't see full picture.
Okay, maybe sometimes we allow ourselves to turn on full query logging for some short period.
This was considered as a valid approach, understanding risks overhead and so on.
We turn
it on for some minutes just to collect whole iceberg, right? And then we have a query analysis,
but without plans, right? We never saw plans there. But then PGC statements replaced this
approach almost because like, okay, logs are good, but it's a type of iceberg. BGCAT statements is a very good way to control almost all.
It has like 5,000 limit in terms of normalized queries.
Sometimes it's not enough.
But still, like you see almost whole iceberg.
But without examples and without plans.
So still logs are good for examples.
I would choose BGCAT activity.
We've increased track size.
I don't remember. It's 1,024 by default. Usually what you want is to put 10,000 there or so and see real examples to
sampling from PgSetActivity. But where to get plans? This is the big question because if you
want to gather samples of plans or maybe even more normalized plants and understand what's happening, AutoExplan is the way to go still.
I think in the future, we will have some Generation 2 tools, maybe based on the BPF, by the way, maybe some extensions.
But right now, this is the best way to collect actual plans
auto explain that's why it's so important and many people want it that's why yes and i think
it's the only way of running your queries getting the output of your queries and also logging the
plan if you run explain analyze manually, the queries run,
but you don't get back the results.
So it's an interesting best of both worlds there.
I don't get it.
To explain, don't log results.
It logs query badges and plan.
It doesn't log them, but whichever client requested that.
So let's say you're running queries in client A over here.
It's getting results back from those queries
and separately we're logging the results.
Whereas if we start up a session as client two
and run explain analyze on one of those queries,
we get the plan back, but we don't get the query.
I see.
Yeah, so if you request explain analyze,
you say I want plan but not results.
But sometimes you want both.
Yeah, it makes sense.
Makes sense.
So yeah, so that's…
And you can play with parameters of auto-explain in your session
and say, okay, now in this session,
I want all queries to be logged by Autox plan
with plans, with timing, with buffers, let's go, like tracing all plans.
And then you work normally obtaining results.
It can be some application you use, right?
Or just human who is working with database using SQL.
And for this session, we log all plans because
we want to understand what's happening, right? Yeah. Let's say, for example, you have something
that's intermittently slow and you can't work out why it's intermittently slow. You run explain
analyze 10 times and it's not slow in any of those. But then when you run it by your application,
one in a hundred is slow. So how do you know when to run explain that
like how do you reproduce that or to explain a really good way in terms of the v2 of pgstat
statements that's a really interesting topic and not one that i prepared for but there is that
extension from picona that's that they're trying to replace and i think that does i would consider it as v 2.5 not v3 but v 2.5 nice yeah because it
has more more things than pg monitor right pg monitor pg start monitor we'll get the correct
name and i'll put it in the replacement for pg start statements very promising interesting but
i think uh like to become uh popular you we need to check its overhead.
It's very interesting.
I would like to learn more about it.
And also, probably, it should be in contrib models.
Well, I suspect that's their goal long term.
Yeah, I talked to them when they started.
I talked to Peter, actually, and asked him about this particular decision
instead of improving PGC statements to work separately.
And one of the reasons
was obviously independent
release cycle. Because if you
want to improve
something from contrib modules,
you understand that release cycle is one year.
And you need to convince a lot of people and
experimental things and so on.
It's hard to get into it.
It will take years to build something very breakthrough because it's hard to get into it right so it will take years to build
something very breakthrough because it's it's hard yeah but separate you can move faster and
do many things i think it's a smart strategy so you've mentioned overhead a couple of times and
i think this is a really interesting area not just for to explain but for a lot of these like
a lot of things we've mentioned i have looked in the past and tried to find some even basic benchmarks of the overhead of things like
pg stat statements or to explain uh pg stat monitor you know anything like this i'm just
interested i understand that but the flip side of that is that in quite a few places people mention horror stories or
there were really severe warnings in the docs saying for some of them not all of them that the
overheads really that can be really massive so be careful but no one ever kind of says how much it
is and then if you go digging in the mailing list every now and again you see either like some people
mention a few percent here or
there some people one person mentioned like two or three x overhead so there are some really scary
stories out there so yeah that's the time it can be killing overhead if you if you use uh if you
don't use login collector if you use syslog with journal d well maybe it's improved but i benchmarked it like many years ago and then you
you say threshold is zero meaning that meaning that you log everything i think you don't even
need to track timing and buffers it will be enough to kill your system like just like that
so speaking of benchmarks let's be a little bit in context this week yesterday actually
autotune published an article saying
the worst thing in postgres is mvcc like they hate it most well i understand but i dig into
underlying research and there is a paper like scientific paper comparing mvcc in various
systems and it doesn't provide any discussion any any numbers like how exactly they tuned autovacuum, for example, and so on.
So details are very important here.
And we all know
it should be written on the front page of
postgreSQL.org that defaults
are very old. Don't
use defaults.
You do need to tune it, unfortunately.
This is Postgres. You do need to tune it.
You need to understand some things
and you need probably help do need to tune it you need to understand some things and you need
or probably help from from expert and tune it right so auto explain is great thing but of course
if you enable like saying auto explain is bad it's the same as saying log min duration statement is
bad they both are bad and but they both are good let's talk about overhead. There are maybe three places where I see overhead.
First, we already touched actually. First is logging and disk I.O. overhead. So if you log
too much, since you log to disk, it's not good. For example, if you logged into some,
by the way, I did it in some team PFS, like into memory, small disk, but very, very quick rotation, understanding
risks that it can be saturated in terms of space. But if you rotate it very quickly and
send details to some offloaded to somewhere, you don't touch disk at all, you can lock
everything in this case, it's possible to use it. Here, I would blame Postgres for not having some easy ways not to log to disk.
I would rather, for example, say it over UDP to some different place, and that's it.
Just send logs.
And in this case, we would log much more, and just network would be our bottleneck, right?
But if you log to disk, as we do in most cases with PostgreSQL, this
is number one thing to consider. And sometimes people, they store logs on separate disk.
And I saw several times that this disk is magnetic. So yeah, so this is easy. Like you
can you can shoot your foot feet very quickly here. So this is number one.
Number two, CPU, right?
Maybe you can tell more here.
So there have been good blog posts in the past,
like one from the Ongress team.
Alvaro.
Yeah, yes.
So he said, explain, analyze.
He's lying. Yeah, yeah exactly it's a very good
title but the main point he has is that if the if the track timing calls to measure timings if
you're running explain analyze with timings which is the default for explain analyze then even the
fact of measure so the fact of measuring them so there's an observer effect classic observer
observer effect yeah Classic observer effect.
Yeah.
Once you measure it, it's changed.
You measure something else already.
Yeah, exactly.
So the timing's reported,
and so the query will be longer because you're measuring it.
Now, this is true if…
But it depends on CPU, right?
On Xeon, for example, modern Xeon, like Cascade Lake, for example.
Well, relatively modern.
It's not that bad if you check it.
No, there is a tool within Postgres.
I'll link it up in the notes, but I think it's called pgtest timing or pgtrack timing, something like that.
In the directory.
Yeah.
And it's very quick and easy to check if yours is like relatively fast the only place in recent time that i've seen a very
like a not well performing one is when somebody was running postgres in docker and most production
systems i've seen a run on sufficiently decent hardware that it hasn't been a big issue but i do
trust that there are some people so very well respected people who do recommend turning timing
off when you use auto explainexplain for this reason.
And I think it's mentioned in the docs still as well.
So I suspect there are cases where people are still running production systems.
Or there were workloads where, I think Alvaro mentioned nested loops having a particularly high overhead, for example.
So if you have a workload that has lots of loops that do things in the tens of thousands hundreds of thousands
so for some probably not good crew plans but it's possible that certain workloads possibly yeah
i wonder what's overhead on arm like graviton 2 and so on like i have you seen anything about it
i think i've published the only blog post on this that's
ever been published like i would love to see more but on this but i haven't seen any it's a good
it's a good good area to research definitely interesting i think it's interesting but based
on the reaction i got which is like it's just a comment from you and nothing else i think there's
limited interest in this for some reason. And I've seen quite a
few production systems with auto-explain on sample rate one, I mean, duration, 100 milliseconds,
and timing on. So not just timing, right? Buffers as well. So I think this is like there are,
you mentioned, it's not just, I think you saw in the past that buffers might have a small overhead as well.
Buffers definitely have overhead.
If you say, explain, analyze timing of buffers off,
you see... I mean, manually, right?
And you can do the same with auto-explain.
You get rid of timing and buffers overhead.
It's also already useful, right? Because you see structure of the plan.
And you have total timing because it's it's already calculated it's like similar to log min duration
statement log entries so it's already useful to see structure and just total timing good but then
buffers and timing i would choose buffers. They have some overhead. It's also interesting to understand for your system
and for your queries what it is.
But buffer numbers in this podcast are honored.
We have timing already, right?
For overall timing.
We don't have to choose either or though, right?
Like we can get both.
Well, we can get both, right?
But let's agree on this.
Even if someone performs very good benchmark of overhead of auto-explain,
I would consider it as valid only for a particular system,
particular workload.
That's it.
In each particular case, if we are serious guys
and we want our system to be reliable, performant,
and scalable, handle handle growth and so on
we need our specific benchmark for our workload which is very interesting topic and hard to
we probably also need to talk about one time and for our hardware for our postgres version and so
on and so on and in this case we should measure two overheads from buffers and from timing.
And consider four points, like without anything, with both, with only timing, with only buffers.
Here I would exclude with timing only.
Because buffers, in my opinion, in terms of internal metrics inside the query, buffers are more important than timing numbers what do you think
uh more important i i think it's like an argument for the sake of an argument i don't i don't see
it as an either or like if timing overhead is not worth it for your system if it's too much
don't track it and buffers are useful but would i like if i had the choice of just buffers or buffers and
timing i would choose buffers and timing i'm never in a situation where it's a choice one or the other
i've not seen like if they had similar overhead and we could if we could afford one but not the
other sure but i've never seen that and i can't imagine it's true because for example timings tracked per loop right but buffers are reported not per loop they're reported
per um per node so i wouldn't be surprised if it doesn't have the same looping overheads for
example um and there were no warnings about it in the docs so it's i've anyway yes sure in principle
i think you're probably right but i just don just haven't seen that be a practical question ever.
Interesting.
Okay, well, okay.
Maybe you should enable both and have sampling.
Maybe.
But if you do check this on your system, I'd love to hear,
even if you can't share details,
I'd love to hear what order of magnitude were the overheads.
Yeah, I have it enabled in a couple of places actually
quite loaded once recently but we didn't do somehow i missed this we didn't have it doesn't
follow my principles my principle is always have some benchmarks and make changes based on numbers
and so on but since it was enabled with some quite low sample rate, I mean, like below 10%,
for example, it was considered as low risk and threshold and sample rate. So like,
let's have something, right? Already have something already useful. We didn't discuss it,
but one of the use cases, probably maybe very important important one, is when we enable it and if we have big teams, we use this to support their work on optimization, like continuous optimization.
We say, okay, these are bad slow queries, and we have examples with plants.
Look at them. And in this case, if you enable it above this, like 500 milliseconds threshold,
and also sampling 10%, enable both, right? But it's very bad if we do it blindly, we
need to run benchmarks and understand what's the price here. How would we do it in some
system? For example, we have some system, OLTP, and we want to measure it. How?
We did a whole episode on this, right? Like benchmarking.
Well, yeah, but this is specific benchmarking. First thing I would do here is to ensure that
we don't have overhead from observer effects from logging system, right?
So, to test our timing.
Yeah, we should separate these things. And benchmarks should not bump into this ceiling.
We don't want to saturate our disk IO and then make conclusions.
Like, oh, you know how to explain this to the next.
This is how you can do it very easily.
So, you need to make sure that you have room in terms of disk IO.
And then you can start thinking about overhead.
Well, probably I would run two systems exactly the same.
One with auto-explain, one without.
And I would probably make sure I don't saturate anything at all.
It means that probably I would do it in a single session.
I don't need multipleate anything at all. It means that probably I would do it in a single session. I don't need multiple sessions here at all,
which means that it's possible also to run it
on some non-production environments
like Database Lab Engine, for example.
You can just compare and see infrequent queries,
like a single session, run this query, run that query,
and compare details.
Probably latency, mostly.
Yeah, latency,
but do it many times with the same query.
Yeah.
Or a workload
that you can replay on both, right?
Right. So you don't want to
reach the edge in terms
of CPU and so on.
It's still a performance experiment it's
performance research and you repeat the same queries many times to have reliable data and
then you just compare without it and with it what's the difference and yeah latency probably
what else we have we can have pg start statements enabled and check not only latency but also i don't know like
also uh buffer numbers well buffer numbers should not change right because we shouldn't they
shouldn't right what else probably a weight event analysis would be applied here as well
maybe like i'm trying to invent something on the fly, but definitely latency is the main point here. Yeah.
The main danger in this benchmark is to bump into some different limit, like disk IO or anything else, and measure it instead of real overhead from AutoExplain.
Yeah.
Even I fell into that trap in my blog post of the, I was probably doing far too high workload for the machine I was running it on.
And I also tested the logging
overhead and on my system it was there was about 25 overhead just from logging everything and then
i turned that off and it got it down to like it was less than a couple of percent by the way eppf
analysis would be great here and like yeah you you should see that overhead comes from auto-explain. This would be like the best if you,
okay, like you measure for this type of query, we have like 5% overhead because we know like
this query executes with nested loops, like for different parameters. By the way, different
parameters in PGSA statements, we have one query, but with different parameters, we have different plans, right?
So we can have same query, but depending on parameters, overhead may be different because
plans are different.
So parameters also define the plan.
And of course, we need to take care of planner settings and workmem and so on.
I mean, this experiment can be done on different hardware.
I mean, CPU should be the same,
but I don't care about disks unless they are saturated.
If we don't saturate disk I.O., disks can be the same,
file system can be different.
Different file system,
that's why ZFS and DatabaseLab can work.
But cores, CPU cores should be the same.
Postgres version should be the same.
Planner settings should be the same. And then version should be the same. Planner settings should be the same.
And then we play with queries in single session and compare.
Interesting experiment, by the way.
Yeah.
And if we have ability to check with a BPF where, like, to perform in CPU analysis and see that indeed, like, 5% overhead for this particular query with particular plans,
5% overhead comes from auto-explain?
This is success.
So you mentioned three types of overhead.
I think you said disk IO for logging, for example,
CPU for timing.
Was there a third?
CPU for buffers.
CPU for buffers, yeah.
Makes sense.
Yeah.
Well, what else?
What about additional memory?
Probably no, right?
What about network?
Probably no.
So I don't see any.
Maybe if there are some other types, I just don't see them.
No.
There are a couple of limitations I wanted to make sure we covered that I've seen catch people out in the past.
One is that it doesn't track planning time.
So these timings are all about execution times and there's no reporting of planning time.
So if, for example, your client is reporting that a query is taking 1.1 seconds and it's only a couple of hundred milliseconds in auto-explain
in the logs, that the difference
I mean, you can verify this by running explain
analyze, chances are it's a very high
planning time overhead. And that's, I guess
that's another argument for...
We have it, since Postgres 13,
we have it in
planning time.
And of course,
it would be interesting to dive into... AutoExplan power is
the tree, execution tree it shows, right? It would be interesting to dive into planning time for
particular nodes. It's not possible. Planning time is a whole. And as a whole, we have it in
PgStats statements. If we divide it by calls, we have average planning time, and so on. Actually, we have it there.
Like min, max, it's there.
It's tracked there.
I think this might be another argument for the log min,
what's the other setting, the logging one we were talking
about having both on.
I think that one does include planning time.
That's interesting.
So any difference there?
Anyway, so that's kind of any difference there anyway so that's a that's
kind of a bit of a gotcha um the another small difference is that and i think this is coming
in postgres 16 i saw a commit for it is it doesn't currently log the query identifier which can be
really useful even if you yeah query id it auto explain currently doesn't log it even if you, yeah, query ID, AutoExplain currently doesn't log it, even if you use verbose, which is the parameter that adds the query ID in explain.
So that's another like minor difference between them.
And log min duration statement does it, right?
Yes. Good point. Another reason.
Since some, or yes, I remember in some version.
14, I think.
14, yeah.
We have, since that version, we have query ID matching
pgstat statements query ID in both
logs, slow log.
We call it slow log, but it's not separate,
unfortunately. And in pgstat activity.
So we can basically
load it to some tables and then
join them and have quite good analysis.
And some people do this.
Right, yeah, that makes
sense, Interesting.
And one other thing I forgot to mention that is different between them is that in AutoExplain, we have an extra parameter, which is log nested statements.
And that's another use case for this.
We don't have access to that when we want to explain, for example, doing function.
Like, I know a couple of people that use a lot of... You go already too far.
So I saw discussions on Twitter
the last couple of weeks,
and there are different approaches.
Some people say, no, no, no,
not in production, not in production.
But some people say, yes, yes.
And I can say yes, just carefully.
I like it. Match it.
But details, it's slowly to...
Oh, by the way, sampling is very interesting.
I didn't realize it until
maybe a couple of years ago when
you wrote this article
that sampling exists for
many versions already.
Unlike query logging
for which sampling appeared in
2013 or 2014, only a couple of years ago,
in AutoExplan it exists for many years and many Postgres versions.
So we can benefit from, for example, in Postgres 12 or 11.
All supported versions of Postgres for sure.
So it's great. It's great.
I like sampling in heavily loaded systems. Even if we concluded that we have overhead like we enable it, and nobody is using this information.
But if we do need it, it's okay to pay some price in some cases and have this very important information to optimize our system.
Completely agree.
It will pay off very quickly.
All monitoring of any description has some overhead, right?
We can't monitor at all, and most people think monitoring is a good idea.
So then the question becomes
how much overhead is worth it for the benefit?
Yeah, observers and observer effects are everywhere.
Yeah.
Wonderful.
Should we call it a day?
Yeah, I think it's like a summary
is that measure it on your system.
Be quite optimistic, I think.
Like fears are discussed around sometimes go from not understanding where it can hit you, right?
And if you understand where it can hit you, you'll be fine.
Like this, like we discussed it.
And sampling.
So you can be good and even if overhead for your system is quite significant with sampling and high threshold you can still
find it very useful because plans are a very good thing yeah and don't set log min duration to zero
that's my final parting advice right right and don't send queries as is to ChartGPT.
And plans also.
Don't send plans as is.
Nice.
All right.
Well, thank you, Nikolai.
Thanks, everybody.
And see you next week.
See you.
Bye.