Postgres FM - pg_stat_statements
Episode Date: May 5, 2023Nikolay and Michael discuss pg_stat_statements — why everyone should use it, but also some downsides! Here are links to a few things we mentioned: pg_stat_statements (docs)pg_stat_stateme...nts (PgPedia)PGSQL PhridayObserver effect in pg_stat_statements and pg_stat_kcache (Postgres Hacking session on Postgres TV) track_io_timing (docs)Overhead comment (by Tom Kate, via Jeremy Schneider) pg_stat_monitorPGConOur episode on query analysisMarginalia ~~~What did you like or not like? What should we discuss next time? Let us know on YouTube, on social media, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PGMustard. This is my co-host Nikolai, founder of PostgresAI.
Hey Nikolai, I picked the topic this week and we're going to be talking about PGStats statements.
Let's do it. Good choice.
You find this one interesting, finally. The reason for this is I'm also hosting this week
a blogging event that was started by i believe ryan booze
probably amongst others called pg sequel friday so each month somebody hosts this somebody different
and they pick the topic and the topic i've chosen for this week was pg sat statement so i'm hoping
to give our entry as an audio one i'll write a blog as well, but it'd be nice to have an audio entry from us as the podcast.
Okay.
We count on you.
Yeah.
But, uh, it's a good choice.
This is extension.
It's hard to find any Postgres user who doesn't use it.
Yeah.
It's a really popular, I mean, we call it an extension, but it's it when, um, I mean,
we've both done the, there was a, there's a really good series of interviews by Andreas or Adz Scherbaum.
I'm not sure if I'm pronouncing that correctly, but this interview series that he does, he asks people, what's your favorite extension?
And it's not everybody saying this, super useful but also as it's another one
of these contrib modules because it comes with postgres most people have access to it if they
want to use it yeah well pgc assessments is a major observability part of postgres ecosystem
in addition to logs single log file actually right autoained we discussed last year i think pgsa is much much
more popular by the way let's briefly mention that our previous episode about auto-explained
was released on thursday and it was you claimed that heroku doesn't have auto-explained right
but we recorded it on wednesday and on the very same day, Hiroko released it.
It took like 10 plus years, right?
Because among all those years,
they could choose any day,
but they choose the same day you claim that doesn't exist.
It's funny.
Yeah, exactly.
And to let listeners in on kind of what happens behind the scenes,
we normally record on a Tuesday,
but this week we couldn't get a time
we could both do until the Thursday.
So we really were recording last minute on the Thursday
for a release on the Friday.
So I couldn't believe the timings.
You sent me some history from there.
I think it's from their GitHub
showing that they were working on it a week before.
So I could have found out if I'd looked deep enough.
The proof to decrease our ego and say that it was not a reaction.
It was like an independent event, but it's a very interesting coincidence.
So Heroku has auto-explained.
And of course, we have PgStar statements.
Yes.
And actually, a little bit of trivia to link the two.
I didn't realize that they were both added in the same release of Postgres.
8.4.
Interesting.
2009, I think.
Yes, 2008 was 8.3 when XML data type and functions went to the core.
I remember it by my heart.
So, but the next year was 8.4.
So super old as well.
It keeps getting better.
Lots of additions to PG stat statements, even in recent years.
So it's an old extension, but one that people keep improving.
Right, with a lot of disadvantages.
Go on.
Well, we should discuss why it's needed and so on.
If you want to start from negative parts, we can start from them as well.
I don't care.
I'm intrigued.
For the start, it doesn't register failing queries. So some failing
queries can take a lot of time and resources,
consume a lot of resources, but you
don't see them if you
analyze only PgStars statements.
And this is a big issue
because probably it's like a half of your
traffic which is failing queries, right?
If you're in a disaster.
In extreme cases, right.
But producer's statements tell you it's okay.
Yeah, let's start with those then.
So any other big limitations or downsides?
Yeah, of course.
For example, if our listeners check out the latest Postgres Hacking recording on Postgres TV. By the way, Postgres TV reached
2000 subscribers, not a huge number, but it's growing pretty well. Thank you. And also our
podcast published there, I think, contributed a lot. And so we had a session with Kirk and
Andrei, as usual, and we wanted to improve performance overhead of pgstat kcache, it's very related to performance
overhead of pgstat statements because basically it's the same code. So pgstat statements has
special list of queries in normalized form, so-called normalized form. In my materials,
I quite often call them query groups to highlight that inside those queries, normalized queries, we can have
many different queries with different plans, right? So I call them a grouped query group.
So when a new entry needs to be inserted, it needs an exclusive log on shared memory area for
PGSTAT statements or PGSTAT KCache. And in this case, if some selects are happening concurrently in parallel, those
selects hold shared log and your insert needs to wait. So for example, imagine we have like 5000
that's default records in pgstat statements, and somebody is slowly reading them during,
for example, 100 milliseconds. It's's quite slow and while they are reading it
one session they hold shared log and any new query which is not yet known to pgstats.atmos
trying to be inserted is blocked and we have like a spike up to 100 milliseconds in latency for that query.
It can be any query which is trying to insert.
Sometimes it's some set command.
We can discuss it also.
Also disadvantage.
It's called track utility commands.
Set application name, set something like that.
It's any set.
Set local.
Off by default, right?
On by default, I think, no?
Okay.
We should check.
I usually see it from my perspective in client setups.
You know, we have many layers of defaults,
including cloud defaults from RDS or so.
I usually see it being on,
and it's an issue exactly because of what I explained here.
So if you have cases when new entry needs to be inserted quite often to
PG-star statements, for example, you call PG-star statements reset every minute, for example,
extreme case. By the way, it's on by default. Yeah. So if you reset it, it's empty, and a lot of new entries need to be written,
and all of them need to log whole structure, whole PGSR statements content quickly,
like for a very short time, but exclusive log is needed.
And if some monitoring is reading all the content often, for example, every 10 seconds, why not not you have a contention issue here at shared
buffers area previous assessments have their own shared buffers area not general shared buffers and
there are plans to improve this so in this case if like or for example if you have a lot of different groups, but PG-SAT statements keep only 5,000,
according to some algorithm.
I actually don't know details about this algorithm,
but some kind of least recently used, maybe.
I don't know.
It says it gets...
Are you talking about the ones,
like how does it deallocate
or how does things drop off?
It says the least used.
Yeah, sometimes some query is known,
but it's evicted because there is no space,
only 5,000 dot max,
pgSense statements dot max is 5,000 by default.
You can increase it, but not more than 10,000, I guess, right?
Which is quite almost the same, just 2x, right?
So if query is evicted and needs to be registered again,
the problem is that registration is a synchronous
process and when you call some query, pgSAS Atmos does know it and you need to insert it. It should
be inserted in background synchronously with the same backend. You have latency spike for your
query execution. In this case, this eviction and like back and forth it's also a problem and
this can be definitely the case when you for example have track utility in pgsa statements
on track utility means all set commands also considered as queries and the problem is that
all current postgres versions utility commands are not normalized and if if you have a Java app, I had it in my life,
a Java app which calls set application name,
to, and then session ID.
In each session, we do this.
Every time this session ID is quite unique,
it's a new session.
For several queries in the same session,
it will be the same,
but we need to set it for each session.
And it spams our PGSAS Advanced content. queries in the same session it will be the same but we need to set it for each session and it
spams our pg-status-advance content i saw like 50 percent of all normalized queries in pg-status-advance
which i call query groups it was set commands and in this case this amplifies the problem of
this contention and i observed it for some e-commerce during black friday it was interesting case i was wondering why we have latency spikes up to it was up to 500 milliseconds but every five
minutes but with shift 0207 minute 12 minute 17 and so on and then i recalled oh it was me who put
some pull like select from start start from PGC attachments remotely
to have some poor man storage
because monitoring was weak.
And I implemented this
and I shifted just because I know
like sometimes people have a lot of other Chrome jobs
starting at even time.
So I shifted it two minutes
and it helped me recognize
this is my own script is doing
this.
And then I reproduced this problem.
So a couple of weeks ago, we had this session on Thursday with Andrey and Kirk, and we implemented
like some proposal.
When new entry is inserted instead of synchronous unconditional insert, we try to insert it,
we make several attempts during like 10 microseconds.
And if it's failing, we give up
and just do it next time.
Why not?
And I propose like, I think it's a good idea.
We discussed it with PgStat KCache maintainer.
There are reasons to say that maybe it's not a good idea.
So there is ongoing discussions
in PgStat.k cache repository.
But again, since the code is very similar,
I agree with the point that it should be fixed in both extensions,
pgstat.k cache and stat statements, and maybe somehow differently.
But the problem is quite serious in heavily loaded setups.
So current advice is turn track utility off.
Makes sense.
And this problem goes down.
Not fully disappears, but goes down.
Pressure decreases.
But good news in Postgres 16,
utility commands are normalized by PGSAS statements.
So it's good.
So set application name to something
will become single entry in Pages of Statements.
Yeah, so there's a commit for this, right?
We should probably say that it's not guaranteed
that that will be in 16, but it's very, very likely.
Yes, yes, yes.
It's not huge.
It's not merge, right?
So I hope we'll make it to Postgres 16.
Fingers crossed.
So thank you to everyone involved in that.
You want more bad things? Yeah, let's do it. Why not? postgres 16 fingers crossed so thank you to everyone involved in that any
you want more
bad things
yeah let's do it
why not
okay
well
of course
when like
sometimes people
listening to me
say you are
postgres hater
you know
anybody that knows
you knows
that's not true
I had it
like new audience
well when they
joined they think
I talked so many bad things about postgres so I'm a hater well if I'm a new audience. When they joined, they think I talked so many bad things about Postgres,
so I'm a hater. Well, if I'm a hater, why am I using it 18 years everywhere?
Yeah, and it's not like you're locked in. But one thing I did want to check quickly,
you mentioned PGStats statements max having at most 10,000. I can't find documentation that
suggests differently,
but when I go to edit it on a cloud provider, for example,
it says the max value is much higher than that.
So I'm a bit confused.
I might be easily wrong here.
And we should check source code, of course.
Cloud providers can restrict additionally.
For example, I was checking Cloud SQL documentation yesterday
and learned that, for example,
they allow you to set share buffers not more than 60% of memory and effective cache size not more than 70% or 80%.
I don't remember, but sometimes I want more than 100% because I want to full-cause this to make Planner work as on different size machine.
So cloud providers can add additional limits, but here I would check source code and maybe
it's more than 10,000, right?
Cool.
Interesting question.
What's the overhead of having a bigger number here?
Yeah, I was wondering whether that would cause more contention or less, but actually…
Contention depends on workload.
Yeah, that makes more sense.
Of course, on this number as well,
but on workload, definitely.
And when PGSAT statements appeared,
as you mentioned, 14 years ago,
it was like a relief.
Because before that, we used only logs,
PGFUIN and then PGBadger,
understanding that it's a tip of iceberg because only above some log-measurement statement
like 1 second or 500 milliseconds
definitely tip of iceberg.
PG-SAT statements register everything
almost everything we discussed.
It's limited for
number of normalized queries
but for each query
which is already known to PG-SAT
statements to increment metrics
Like aggregated?
Right, in a reliable way,
cumulatively in a reliable way.
Yes, that's great.
Unlike if you use log-based query analysis,
you always have the problem
that you see only the tip of iceberg
in terms of duration.
You see only slow queries,
but fast queries can be an issue in many cases.
Yeah, exactly. That probably leads us nicely into the times that it's most useful. So like
one of the times I see it used most...
No, no, no, no, no. We talk about bad things. It was a small break. So bad thing
is that overhead is significant from PGuget's assessments. It's
noticeable. So have you seen any, like I've looked for benchmarks on this kind of thing.
I've seen numbers mentioned, but very, very little data behind it.
There is common consensus. It's kind of 7%, but it's so like dumped number out of nowhere. So it depends on many
things. It's hard to measure, it's worth measuring for
particular system, particular workload and with it without it.
But worth mentioning that if you create an extension, if you
actually read it to share libraries and create extensions
global for all logical databases inside this cluster, right? I guess it's some kind of several percent.
That's it. I've heard a lot of single-digit percent. I've heard probably more commonly,
I've heard 2% as a rough estimate, but I guess there's no point throwing out this. Well,
we didn't even talk numbers when we talked auto-explained, did we, really? But people
need to measure it themselves. But I wish there there was a bit more like a few more publicly available benchmarks on this
kind of thing also also track timing enabled versus not enabled right because previous
assessments also can have it or can skip it are we talking track io timing in this case
yeah track io timing sorry yeah yeah so well that's a recent addition, I think.
Was that Postgres 15, I think,
or only a recent,
maybe a couple more versions back than that.
But it's good that we can have that now
in PGStat statements,
but there is warning in the docs again
that that does cause additional overhead.
Right, so I think it makes sense
to measure overhead only for particular systems,
but in general, we can claim quite reliably that it should be below 50%. And I personally would
pay up to 10%. There was a good discussion about overhead from observability tools recently,
raised by Jeremy Schneider, mentioning Oracle experts, and Tom Cate, I guess, and so on,
saying that this overhead is negative. Because even if you see it as positive,
you optimized queries using this tool. So you're already on very good side, right? So
it's negative overhead. It's interesting points. Yeah, it's interesting points. So if you analyze
whole picture, having this tool is beneficial, and you can improve your performance using it, especially if you do it routinely. But speaking of bad sides of producer statements, again, lack of visibility of ongoing queries as well.
Yeah, things need to have completed.
Right. And also lack of plans and planning is off by default right like
we don't even get planning time even in the newer versions that do include planning time which is
not that many it's off by default so even the i use cloud sequel at the moment and they don't
even let you turn it on so it's quite difficult to get planning information. Yep. But I'm talking about this fact that a single normalized query can contain many cases.
Simple example when I usually use. Select star from some table where column equals one or equals
two. For example, if 99% of values is one
and only 1% is two,
in one case, you will get sequential scan
and in another case, you will get index scan.
It's two different cases,
very different in terms of performance.
And PG-STAT statements doesn't allow you
to distinguish them at all.
No, good point.
So yeah, no example plans.
Lack of weight event analysis because it could could be combined with PGSTAT statements.
For example, if you have a normalized query, you understand that.
If you could build some pie chart understanding when we spend time,
okay, we have total time, mean time, mean max time.
We have this standardization.
Some of the building blocks, yeah.
Like query identifier as well now.
Right.
In some cases, we lose time doing some actual work.
It can be CPU, it can be IO.
In some cases, we just wait, being blocked by other sessions.
True.
In this case, CPU is free, it can do something else, right?
But where is this info, right?
We don't have this info at all.
There are many other things and i know
percona try to address it with pgstat monitor it's quite good and worth looking but their overhead i
would definitely measure as we discussed recently i would definitely measure and understand before
going to production but it's an interesting alternative to pgstat statements if they give
you even more ability to tune queries,
maybe it's even more negative as an overhead.
Maybe, yeah.
It addresses many things like erroring queries and so on.
By the way, the biggest disadvantage, in my opinion,
is the lack of examples.
Until very fresh Postgres versions, it was very hard to find examples in logs and so on. Like we have some
obviously not good normalized query in pgSatAtoms. We see it's slow or it's too frequent. Okay.
What are our examples? Especially, okay, if it's frequent, probably we can find,
oh, but how to find the proper place in code? It's also like when I call disadvantage,
it doesn't mean something is bad.
Something is maybe not just developed yet.
There is opportunity here.
So for example, we found some query
which is not frequent,
but just slow in terms of average latency.
We need to troubleshoot it.
We need to run explain and analyze
to perform micro optimization,
single query optimization, right? And in this case, we need examples, because, as mentioned, parameters
define concrete query, like we need example to work with. And yeah, and here we can go
either to like we can go to logs, we can go to BGStatActivity, or we can use a modern approach with BPF or something like that.
Or like sometimes developers are familiar enough with their application to know what the kind of parameters that could be inserted would be. So like if it's a slow dashboard that's been loading in their SaaS application and they know that like an account ID is
likely to be of this format and then you've got you know probably a date
range or something you could like know that maybe the other things are not available.
Exactly. Well but it happens and that's great but in general case it's
very uncommon to hear it works for me well
yeah right because different parameters were checked and we like for 90 of cases for this
query we have good situation but for 10 we have very bad situation we need those examples
and good news is that in modern postgres versions 15 future future 16, maybe even 14, previous one, they had a connection
between observability parts.
PgSysAdvance has query ID for ages.
Always.
It had it always.
If you consider currently supported Postgres versions, all PgSysAdvance have query ID.
Some weird numbers, sometimes negative.
And logs also have the same number
in modern postgres and also pgst activity and pgst activity is my favorite choice because
logs are usually above only log min duration statement it can be also by the way auto
explain doesn't log query id right so i think again it's committed to 16 but not oh great great so we log slow log
sometimes it's easier way but in many cases we have a case but we don't have good examples because
none of them reached log min duration statement in this case we go to purchase activity and then
we bump into the problem our limit track activity size, I don't remember, I always like it's
1000 2024. And to change, we need to restart. And then serious setup. It's a big ask. Well,
it's you need to like this trade off because we start and yeah, so it's worth increasing this track size to 10 000 or something because this will allow you
to collect samples of queries easier and with query id you join this data with pgc statements
data and understand the whole picture is this according to pgc statements and here are the
examples even fast examples we will capture them just with some sampling it's easy right yeah so this is this is
also like kind of disadvantage it requires some efforts to benefit from epigenetic statements
because otherwise it's just a building block because you cannot use it directly
without any dance around it because you need examples you also need snapshots because it's
okay we have cumulative metrics they are growing growing growing but it doesn't remember when it
was reset by the way it was also added to 16 right is that time for pgc statements or no
I remember the patch but I don't know was it committed or no what do you mean specifically
if you reset the just statements and then you mean specifically? If you reset PGSR statements
and then you have beautiful numbers
which you know they can be analyzed.
You don't need two snapshots.
You can work with a single snapshot
if you know the reset time.
For example, you reset it one hour ago.
You have a snapshot.
You know these numbers are observation during one hour.
You have averages.
But if the reset time is not remembered, you know these numbers are observation during one hour. You have averages, right?
But reset time is not remembered.
Unlike regular statistics, PGStatReset,
you can check it in PGStat database when it was reset.
But for PGStat statements, you cannot check.
That's why I always reset. There's a view, isn't there?
Isn't it like PGStatStatements info or something?
Maybe in some new versions in current versions it's not possible to understand i always reset statistics if i do it
i do i try to do it less often according to because of the problems we discussed earlier
but if i do it i usually reset statistics both database statistics and PGCell statements,
and that's how I know the number.
Or deal with two snapshots.
Yeah, add it in Postgres 14.
You can now see...
Oh, I'm lagging.
Good.
So it's in 14 and 15 then.
Okay, good, good.
Yeah, it contains two things.
One is a timestamp in UTC of when the last reset was, and the other is about
reallocation.
So like how many, you can see how many queries have been kicked out or how many query groups
evicted from that.
Let's say you've got 5,000.
I missed that.
I don't have six.
I still don't have a good example.
So even 14, I work in production systems.
I observe myself. I work with older versions so that's how i missed it okay that's great news yeah thank you for correcting me
so anyway we need to deal with two snapshots and here we come like okay it's a building block and
if you want to work with it you can work directly it requires additional effort but it would be good if all
monitoring systems supported it quite well but they there is no like almost none of them supported
well for example consider calls and total time we need all aspects of all dimensions we need
to take total time and understand like every second how many seconds we spend to process queries
assuming that we know all of them of course 5000 by default not not all of them but if you
disabled set by the way set can also if you disable set you exclude it from consideration
and also not good right so i mean in terms of observer effect but if like seconds per seconds this is beautiful metric
if you forget about this contention in terms of transactions and blocking issues it would roughly
give you an understanding how many cores you need right to process this workload if it's like
two seconds per second okay two cores would be probably enough but we will probably have 100 it's very very rough
approach but it gives you understanding of your workload and the same for with buffers block
numbers shared block heat red temporary blocks and so on local blocks we need to divide them we need to deal with two actually three maybe derivatives here first is divide
all metrics by calls this give you average for individual query total time by calls will be
every time you have it anyway and all blocks by call will be on average each call each query
takes this number of blocks hit this number of blocks read, and so on.
I.O. timing as well.
So we can divide by calls.
And also we can divide it by duration
between distance, time distance
between two snapshots.
Monitoring system should do it.
And this also super needed.
And for example, Datadog provides some of them,
but not all of them.
And in some cases I had like,
why I don't have it here.
And I'm speaking not about dashboards,
final dashboards and Datadog,
but in the metrics they collect.
They don't collect this
and you cannot calculate it yourself.
So there is lack of info.
They don't collect everything.
This is the problem.
And if you check out
all other monitoring systems,
this problem also the same.
And once you already,
like if you analyze yesterday, for example,
if you don't have this data, you cannot understand what's happening.
And sometimes.
The reason this is a problem is because if we got direct access to PGStats statements,
but we only have it since the last reset.
We only have the current snapshots information.
Of course, we need to see spikes when an incident started,
incident ended.
And inside it,
I would like to have all metrics PDSR statements have
divided by calls
and divided by time diff.
And I mentioned third one,
it's percentage.
So it's interesting to understand.
For analysis,
like top-down analysis,
you want to understand which query is very data-intensive in terms of hits and reads in shared buffers.
I would combine them, divide and see.
Maybe hits and reads separately are also good.
Because I had incidents when reads was not a problem, hits was a problem,
and it was not collected by monitoring, not presented.
But eventually we realized that if it was,
we would understand root cause analysis much better, much faster, because some query was dealing with data fully cached in the buffer pool,
but it was so intensive, not only because of frequency calls, but just because single query reading so many gigabytes from the buffer pool.
And it was not visible because nobody expected that blocks hit
should be present in monitoring as well.
It was a surprise.
And this is like, I would take everything, divide.
I know there's a price in terms of monitoring storage and so on,
but it's worth it because we already spent some percent of overhead
in Postgres itself to collect this data.
Please deliver it to monitoring, right?
So currently I see monitoring, they fail to do this job pretty well.
And some highlight in PGCon,
I will have tutorial,
three hours tutorial.
We will talk about various aspects
of Postgres monitoring,
and this is a very big aspect of it,
how we work with PGC statements
and similar extensions
and how we present data,
which metrics we need
and which derived metrics we need as well.
Right.
Nice.
Am I allowed to talk about any positives yet?
Okay.
So let's talk about positives.
100% you should install it.
It's super useful, despite of all downsides.
I imagine there's an exception to this,
but for most applications I've seen,
they would benefit from having this information. A few times I've tried to help people and they didn't have it installed or
like the hosting provider didn't put it on by default and we were kind of hamstrung until they
turned it on for a while it's very difficult to start to get an idea of what's going on without
this kind of information without any logging so it helped even if you aren't doing anything with it currently in the future if you need help
from somebody having it on can can be really beneficial one thing that you mentioned dividing
by number of calls a lot and i think i see quite a lot of guides and that's the kind of example
query they give querying pgstat statements looking at a few things like total execution time and then
mean execution time using that calculation and a
few other things but then they order it by mean execution time and i understand looking at the
slowest queries by average time but more often i see people that want to look at queries by total
load so they're looking to reduce total load on the database more than the average, you know, the worst performing in terms of length of query.
So I prefer to start looking at that ordered by total time.
And that goes to what you were saying about looking at even fast, fast in inverted commas queries that are executed a huge number of times.
Well, yeah, you know, I have my approach here.
I think we need to order by
everything almost i like we need to have different tables and if we for example if our monitoring
also limits the entries not like five five thousand it cannot handle all of them and it
needs to limit them we probably need to have a few lists. For example, if we limit 100 or 500,
we need to have several ordering approaches.
And of course, total time versus mean time is the most popular,
but also there are IO metrics,
order by hits plus reads,
or order by hits or order by reads, two lists.
But speaking of total time versus mean time in order by, I have simple approach here.
If you care more about humans, you should order by mean time.
If you care more about machines, computers, you need to order by total time.
Because optimization goal, if you target to reduce resource utilization, you order by total time.
Because total time,
it's like final metric.
Of course, there are, again,
there are some lock issues,
blocking issues,
which can increase total time.
And unfortunately,
we cannot distinguish if we had a wait event
in four in PGSR statements
combined easily, right?
We could say, okay,
we order by total time,
but we exclude a lock.
We don't want to,
because lock,
it's like some application logic problem it's not a resource utilization but a rough approach is like order by total time and
optimize all queries which exceed 20 percent of that calculate percentage if it's if some query
group normalized query exceeds 20 good candidate for optimization if you care about machines. Probably users won't notice.
Cost, yes. Well, yes, if you reduce utilization, it sometimes also converts to dollar.
And users care if a system is overloaded, like that's affecting everybody on the system, for example. On edge case, if it's saturated.
But if it's 20% CPU and you order by total time, you reduce it,
and maybe the people won't notice because maybe it's query which executes
below one millisecond on average, but it's so frequent, right?
It's super fast query, already well optimized,
but it's called thousand times per second. I had many cases, like for example, select one. And you see 30% of total time is
this select one. What? Like, why do we need it? We pay for it, right? But users don't see it
because it's just one millisecond or even below one millisecond. To select one, it should be like
some dozens of microseconds probably. But if you care about humans
or you have not LTP
but some analytical system
and again users, that means, right?
We need to order by mean time,
average time and find users suffer
due to those queries.
And if we optimize them,
probably our computers won't notice
because for example,
it's accessible for less than 1% of total time, but it's still, in terms of mean time, it's very bad.
Like, above one second, users need to wait.
Or analytical system, like, one minute.
Like, it's very low resource usage, but very bad user experience.
So, resource usage is total time.
UX is user experience is mean time usage is total time UX is your
experience is meantime, this is my rule. It's quite easy to
explain and people usually understand. I like it. But let's
not forget about buffers, right? Our motto is buffers first,
sometimes not maybe here if like but if if you think okay, total
time is big, or meantime is big, what's happening here?
We should take shared block hit and read and then divide either by or like we can take it as a whole between two snapshots.
And let's say if we care about total time or we need to divide by calls if we care about mean time.
And we understand, okay, mean time is huge.
But because on average,
single query deals with gigabytes of data. And that's the problem. Probably it's a select
count or select sum or something aggregates, right? Let's think about how to improve it.
Yeah. So you can look at this really good for splitting out all of the different buffer
numbers. So I think there's about 10 different types that you get through the same ones you see in in explain explain analyze buffers you get via pgstat statements
and it could be used like we could look at temp blocks for example we could look at which queries
most often or using the most temporary space like what and what do they use on average like that
kind of thing could be really handy the other thing that they've added that isn't buffers
related but i think i didn't realize they had and could be really handy. The other thing that they've added that isn't buffers related,
but I think I didn't realize they had
and could be really useful for certain, like one certain case.
Well, will they do add?
But I was going to say just in time compilation statistics,
I've seen quite a lot of people considering
upping the cost limits for that or turning it off completely.
But this would be a really good way of-
My advice is to turn it off completely for LTP.
That's it.
Yes, but it's quite cool
that you can look in Pageset statements
and see which queries are using it
and which ones by average.
So you could order by that
for that ad hoc analysis.
Data-driven decision
versus just listening to me.
Right, right, right.
Yeah, that makes sense.
Definitely.
But speaking of hits and reads and so on,
if you divide by calls, you have average. It's super important. But speaking of hits and reads and so on, if you divide
by calls, you have average, it's super important to have average for hits and reads. It makes
sense also for like temporary files, but temporary files, sometimes we just quickly realize we
need to increase work mem, that's it. But if you think about like, sometimes, yes, it
makes sense. And shared buffers is the most interesting and this is like okay our average is this and all gigabytes for single query not not good but if for for total time centric analysis for machines right
if we consider hits we again like we can divide by duration number of seconds between two snapshots
and we can imagine this flow okay this query group is responsible for like okay one gigabyte per second
or 100 gigabytes per second uh dealing with shared buffers and 100 megabytes per second of dealing
with underlying page cache because this is maybe not only disk it's page cache as well depending
on our on the size so we don't have visibility without pg start k cache to this and this is super good for
engineers to understand this these streams like okay this generates this io and this
is even without timing it's already very super useful number of buffers per second
especially in a world where you're seeing we're seeing you know providers that charge by that
you know if that's the metric that you're paying for, for example.
Aurora, for example.
Yeah.
Yeah, makes sense.
But wall numbers, it was added to Postgres 13 maybe, or 14?
13, I think.
Both to explain and analyze and...
13.
Both to explain and analyze and produce some statements,
and it's super good.
Just checked, yeah.
Sometimes you have terabytes of data generated of wall per day you enable compression to reduce it it's already
better but still a lot it's like it's increasing your backups are growing replication needs to
transfer all of these things probably it's because of index amplification probably because of
something else and pgcSourceAdmonds can show
you which queries are responsible, for example, 20% of all generation among all queries. Of
course, actually, createindex also will go there, right? It will be registered by pgSourceAdmonds,
createindex, or reindex concurrent.
If you have, is it the utility? Would it be under utility commands?
No, I think it's DDL.
It's not utility. Utility is set.
Yes, and auto vacuum won't show up here. Auto vacuum
also generates some wall.
Processing tables and indexes generate some
wall, but we won't see it in PGSR statements.
But normal SQL, including
DDL or index maintenance
like reindex concurrently
we will see them and we can again understand percentage for each query group for each
normalized query and make conclusions okay our index maintenance it's a lot of wall but we move
it to weekend so we are fine or these inserts together like whole group of inserts it's responsible responsible
for i don't know maybe not inserts updates updates for example responsible for 10 percent of all
all wall generation or 20 percent and we check we probably drop or adjust some index we make these
updates hot reduce index amplification problem or maybe drop some index, we make these updates hot, reduce index amplification problem,
or maybe drop some indexes,
sometimes unused, redundant. We had an episode
about index maintenance.
It's important. And in this case,
less wall will be generated,
and we will be better
in terms of replication, backups,
and so on. Less
pressure to those subsystems.
I feel like we've gone into quite a lot of detail
here. I'm glad we have, but I feel like we might be coming to the end.
Yeah, I'm just showing, like, I'm just illustrating why everyone should enable,
create extension PgStat statements in just one database. It's not needed to install it
everywhere. It's a single thing, actually. You can install it to Postgres database, for example, and work with superuser. And you will see everything from all
users, all databases, all queries globally. And there's a chance on some cloud providers,
it's on by default, so you might not have to, but worth checking.
Well, yeah, RDS superuser and so on. By the way, also interesting dimensions,
DB ID or database and user, like which user, if we have like segregation, if we split,
like divide and conquer, these parts of our application use this DB role, DB user. Humans,
they have their own database users. So we can perform analysis and we have segmentation here.
We can perform analysis and understand who is using what.
Who is generating, like, is responsible for total time
and all, IOMetrics and so on.
And also, a super interesting case,
like, okay, we found some query.
We talk about performance analysis
starting from database point,
but database is just the heart of any system right
we have also applications humans many things and sometimes we need to trace and understand which code is doing this and this is super hard topic because because for example if you did that data
doc it has application performance monitoring apm so if you analyze performance from application side, you can trace and say,
this is the code generating it. You can also have network latency involved in latency analyzed,
but it's pros and it's cons as well. It's beneficial because for application,
you have final latency, including network round around trip time but you cannot distinguish how much of those
milliseconds or microseconds is our network from postgres you don't see this like network latency
and you cannot trace to code unless you make some tricks for example sometimes people start
putting some comments for example for ruby there is a library called marginalia. It can
help you trace. It can be URL, for example, to put to comments. And PGStat statements will
ignore comments. It will use first comment, keep it, but all subsequent will be ignored in
normalization process. And this can be used. So you put some comment here and you can trace them. Okay, this query group is from that part of my application.
That URL or this module or this file.
You need to invent some way and it requires some efforts.
There is interesting effort from some Russian developers for pgstatk cache originally.
So the idea to tag.
So you have some dimensions for pgstat kcache originally so the idea to tag so you have some dimensions
for pgstat kcache entries you tag like for example this is application module equals this
and you start distinguishing different calls and metrics and you can have specialized segmentation
depending on your application logic i think it would be good to
consider similar improvement for pgsa statements as well because this tracing problem it's it's
sometimes we lose like database experts we lose for with our tooling we lose to application
tooling this apm stuff they say you know it's more convenient for us because we see which code
exactly generated it with your pg assessments, we cannot do it.
Okay, I'm talking about downsides again.
I think time to call it a day.
That was really helpful.
Thank you so much, Nikolai.
Thanks, everyone, for listening.
Thank you for this topic.
You see it resonates inside me a lot
because, yes, I spent many years working in this area.
I hope it was helpful.
Good luck with your blog. Thank you. Okay good one see you next week bye bye bye thank you bye