Postgres FM - Buffers II (the sequel)
Episode Date: April 26, 2024Nikolay and Michael return to the topic of using the buffers explain parameter — with a new analogy, some (conspiracy) theories of why it's still not on by default, and some related chat ab...out the serialize parameter coming in 17. Here are some links to things they mentioned:BUFFERS by default (episode 4) https://postgres.fm/episodes/buffers-by-default Lightning talk by Michael at pgDay Paris (5 mins) https://www.youtube.com/watch?v=WfY-mSpUzaQ&t=1470s Waiting for SERIALIZE https://www.depesz.com/2024/04/11/waiting-for-postgresql-17-invent-serialize-option-for-explain/ ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello, hello, this is PostgresFM, episode number 94.
And we are going to repeat episode number 4, but not really.
So today we're going to talk about buffers again.
So let's call it buffers 2.
But I forgot to say hi, Michael.
Hello, Nikolai.
I was thinking about buffers already too much.
This topic is super, super important, I think.
Yeah, so last time we called the episode
buffers by default, because you especially, and I'm a convert to this, think that buffers,
which is an explain parameter, should be on by default, at least with explain analyze.
And there've been a couple of patches over the years trying to put this in place or trying to
make steps towards this as a goal and we were in
support of that but we also wanted to explain why we thought people that are doing query optimization
work so developers should include the buffers parameter when they run explain analyze manually
or with auto explain so that they can more effectively do query tuning.
Is that a good summary?
Yes, that's a good summary.
Have you ever thought that in explain plans,
buffers are not turned on by default, although there is consensus, I think,
and the fact that it was not changed so far hasn't been changed.
It's only purely technical reasons.
But I'm sure since you checked it
probably you'll explain details here but don't you find it's interesting that explain plans by
default have buffers off but pgstr statements has it on right and it's just interesting why and
because if you want to start connected data and I think we should do it. We should look
at the whole picture, macro level, and we also should work with individual plans provided
by explain command, explain analyze buffers command. And if you forget buffers, just do
explain analyze, which many, many, many, many blog posts still suggest,
unfortunately. In this case, it's hard for you to connect the dots between the individual plans
and macro level. Yes, an interesting point about PGTAC statements. I reread the two threads,
the two most recent threads about turning buffers
on by default and the first one was started by vick fearing and it actually was he took a step
backwards and said i personally would like to turn buffers on by default but because i realized that
might not go it might not be what everybody wants and that might be a wider discussion here's a patch proposing that we allow multiple new parameters for for postgres each one was like
like it was allowing you to turn each explains parameter on or off by default so set set the
default yourself for your instance so you could set buffers on by default for you for your
own instance and other people could leave it off by default and that was widely thought of as um
people other people wanted buffers on by default but because he also included settings to enable
and analyze by default and enable every other single you know verbose right ahead logings settings i
actually can't remember which ones but basically all of the explain parameters that existed at the
time of the thread was started also came with one of these settings people fixated on the fact that
an administrator turning on the analyze parameter by default would lead to you running explain on its own and that executing the query
which was seen deemed to be a foot gun that would be too dangerous so generalization
did a trick bad trick here and honestly i think it's not like i'm not interested in the ability
to turn it for specific servers i I want real default for everyone, global
default, because I deal with many, many servers and I cannot tell them all. This is what we do
right now. We tell them all right now. Use buffers by default when you provide plans. It works only partially. It never works. Efficiency
here is not 100% coverage.
So if we can do
it by some additional settings, okay.
Maybe some cloud providers
would do it. I don't know.
But this partial solution I don't like
at all. I do think we're in a
new world with cloud providers. I think
now we have this, you know,
for example, PG stat statements
is off by default, but in the cloud, most cloud providers have it on by default. So a lot of users
Some of them even learned about template databases, template zero, template one, right?
And put it there. So you just knew the database created. This is how it's implemented, right?
So we're in this new world where
we don't necessarily have to have things on by default in postgres core for it to be on by
default for most users in the real world so there's a there's a slight weird subject here but but to
move on to the second that's still partial solution i want postgres 18 for example to have this by
default what what should we do but actually we
we jumped straight to the this uh question why it's still not default but we didn't explain
what it is right so maybe we should explain first because it's also important
why we care so much about this why why if if the if a plan doesn't have buffers data, so what?
Right?
Yeah, let's do a short version
because I think we did quite a good job of that
in the first episode we did on this,
which we can point people at episode four.
But do you want to give a short version?
Yeah, let's do a short version.
So I see this data.
As I mentioned, this data is present in PGSR statements always. So if PGSR statements is installed, you have buffers, shared blocks, read shared blocks hit. There are also local and temp pair, two additional pairs of numbers, but it's not like interesting here, we will talk only about, let's talk only about the buffer pool
numbers.
There are also two more columns, dirted and written.
So if some query changes data, it's dirted.
If it's written out of the buffer pool it's written. And the hit and read are interesting, of course,
because this can give us idea about cache efficiency
in terms of buffer pool cache efficiency
because there is also underlying page cache additionally,
always still, right?
And in the explain plan, if you run explain,
it's only the plan. If you run explain, it's only the plan.
If you run explain, analyze, it's the plan with execution.
But if you don't put explicitly, you don't put the buffers word.
And if you put, you need parentheses additionally.
So explain, analyze, buffers in parentheses.
In this case, you don't see this buffer data.
So you only see timing and you understand.
You cannot, for example, understand cache efficiency in your particular execution of this particular query.
Like how much was taken from the buffer pool, how much was taken outside, maybe from memory, maybe from disk.
If it's red, it's outside.
You have no idea.
Of course, you can enable timing, I.O. timing, and not regular timing,
which is always enabled in the explainer plans actual timing, right?
But if you enable I.O. timing, this can give you an idea about how much time was spent on I.O.
And as I understand, I.O. in that sense means I.O. outside of the buffer pool, right?
It's so tricky, by the way.
What is I.O.?
Does I.O. include communication to what?
To disk?
To memory?
Memory what?
Memory.
Page cache or buffer pool?
Both or just page cache?
In case of track I. track IO timing, the setting,
I guess everything
but communication
to the buffer pool
is counted.
Yes,
but I would count everything.
So it's tricky.
Anyway,
you can see timing again,
but cache efficiency
wouldn't be possible
to see.
If you're curious,
you do need buffers.
But in my opinion, cache efficiency is not the number one benefit
that buffers provide.
Number one benefit is the amount of work.
If we know timing to understand the real efficiency of our mechanism,
we need to understand how much work was done.
And if we see only number of rows returned,
it's of course kind of like,
this is final result, right?
Delivered.
But what's inside?
Inside we need to understand how much actions were needed.
And these actions in database world, it's IO.
Databases are all about, mostly IO, of course.
We like, we don't do a lot of calculations.
Usually we deal with file. The goal is to find the data we need as quick as possible
to efficiently change it, rewrite it, store it, and so on. So only the buffers can show
the actual amount of work that was performed during the like achieving final result. And this is again a hit,
read, dirtied, written, shared buffers. Of course, temporary files are also interesting if you see
buffers written and read. And this means that this query needed to deal with... Workman was not enough, we needed to write
to disk. It's also amount of work and this can explain some IO timing and overall latency
increased significantly. So the amount of work. Let's do some maybe new analogy. For example,
you need to replace cabinets on kitchen. I don't know. And we're not interested
in details. Like you need final result. And then you hire some contractor and pay this
contractor or like some team, right? Of contractors. You pay to them some money and they say, oh,
you know, this took one month, right? And you say, well, okay okay i see final result i understand one month i pay and this is
bye-bye right in the fair maybe it's too much right how we understand this too much there is
we can compare to different similar similar cases but it's better to dive in dive in to understand
what exactly what are you going to do let's check step by step. You will go by this thing,
you will put this, you will remove something.
You start understanding the real
amount of work to be done.
And then you think, really? You guys
spent one month for this kind of work?
This is strange. But
in some cases, they can really explain
oh, you know, you have a difficult
case here. You need to
change this and that and this really like a lot of work.
And then you start understanding, okay, in this case, probably one month is really reasonable, right?
Similar in databases.
Let us go back to databases.
If you see that to return some rows, database needed to perform sequential scan, it's reasonable to have bad latency,
very high value, right?
Because it need to scan very big table.
It's like a lot of work.
And then if it can be optimized,
of course, yes, we can create index, right?
Maybe we can buy tools for our contractors,
better tools than they use and so on.
I don't know, but diving inside this is the key
you understand the work to be done does make sense yeah it does i don't think the analogy is great
like i think but i can like for example there's there's one parallel that could make sense how
many people were working for the month like was it four people flat out for a month?
Parallelism is an interesting
analogy there.
But I think
you can see all the steps
that Explainer is doing. You can see
what operations it's doing even without
offers.
We can see sequential scan
or index scan.
When you see, for example, sequential scan, you see rows.
You see index scan, okay, you see rows filtered out.
But, for example, let's move slightly to deeper topics.
If database is bloated, so to read 1,000 rows, we needed to read 1,000,
for example, sequential scan.
1,000 rows, they can be packed
in a few pages.
Or they can be scattered,
sparsely distributed
to thousand pages, right?
Thousand blocks. And we can immediately see,
okay, to read these
thousand rows, we needed
to deal with thousand pages. Thousand
shared buffers hits, ideally,
not reads, right? In this case,
you think, okay, something is not good here. There's no data locality here. What can I do
about it? Maybe we deal with bloat or maybe we just need to think about, I don't know,
like cluster our data or something. Not very choice but i mean reorganize physically the order
of storage but this this is good i mean you can understand and without buffers you don't see it
right yeah well i mean i'm completely on board with this i was just trying to push back on the
analogy like we've discussed this before we we both agree it should be we'd love it on my default i
i make a tool where it really benefits from when people use buffers we we both agree it should be we'd love it on my default i i make a tool where it
really benefits from when people use buffers we we give a tip in exactly that scenario when
uh the we call it read efficiency when we say the read efficiency is not good
and it might be it might be bloat it might be data locality there might be a different reason
but yeah you can't spot read efficiency issues without buffers or other reads related
issues like read speed so if you've if you've got bad disks or you're you mentioned like being
throttled by a cloud provider or something like that you can't spot that you can you maybe you
can get a clue if you're really tuned into your data volumes and
the time being taken but if you've got the number of blocks or pages and you can convert that in
your head to you know his we're reading 100 megabytes and it's taking let's say 20 seconds
that's only doing five megabytes a second that That is terrible. You've got an idea that...
You touched several interesting points here.
First of all, sequential scan,
if there is a...
For example, you have sequential scan,
some filters and limit one.
We need just one row.
It can be very quick if this row was found quickly
or it can be very slow.
It takes a lot of time if it needs to
scan through a lot of irrelevant rows. And without buffers, we have no idea in the plan,
right? Because we don't see how many rows were filtered out.
Index will report how many rows, how many irrelevant rows.
You do see rows removed by filter and sequential scan as well
how come ah part five ah filter out yeah okay sorry yeah you're right you're right yeah okay
but yeah it's it's not it's only about like i understand yeah i agree with you it's only about
the cases when storage is not efficient for example example, bloat or dead apples, this won't be reported
and if we see buffers, we understand
okay, two buffers here
are present somehow, let's understand
why. And
second point I wanted to
extract from your speech
it's super important
not only to use
buffers but always convert them to bytes
you said megabytes
per second yeah buffers per second it's hard to understand usually it's just it's not it's also
not what disks disks don't tell us how many pages per second they read they exactly we cannot compare
to our disk characteristics we know know usually our random access, sequential
access, throughput numbers, maximum numbers, and for in VMI, for example, gigabyte to gigabytes
per second, then we understand, okay, we have powerful disks. And like 100 megabytes per
second is already very noticeable. But if, of course memory is much like several orders of magnitude much faster.
But if we say how many buffers per second, I don't know. So we need to always multiply
it by the size of block size, which is in absolutely majority of cases is eight kb.
Right. And I wanted to emphasize this because I think not everyone watched our episode number four about buffers.
So I wanted to emphasize how important it is.
If you deal with these numbers and you started using them in plans and explaining something,
you should understand that your colleagues might not understand you unless you convert it to bytes, kilobytes, megabytes, or kibibytes,
mibibytes, gibibytes.
So you start putting these numbers.
It's magic.
I saw it so many times.
It's magic.
People say, oh, we are reading one gigabyte here for 10 rows.
Okay, what's happening here?
When we saw like, boring buffer
numbers, they didn't have such reaction at all. Once they see gigabytes, wow, like, this
is not and mine. I had cases when instead of diving deep into the some specifics of
the plan, it was actually a mistake if you dive deep straight
to the details of the explain plan. The number one optimization tip is, okay, guys, how many rows do
you want like reasonably to return or touched, I mean changed. And let's see the whole work done by database here. This gives you a very good sense
what like how inefficient it is. And in many cases, it might be some forgotten limit or
something and you understand, oh, okay, this is not what we want, actually, at all.
And this is very high level optimization tip. I see you're skeptical about it, but it works very well.
I don't mind it.
I think I've described it.
I've heard this described by others and described it myself
as the best way of doing work faster is not doing the work at all.
So if you can avoid doing it entirely,
if you can do less work in the first place or no work,
if you don't even need to run the query, great, get rid of it.
But you're talking about if you do decide you do need to run the query, can you just
get, like, are you doing the smallest amount of work that's possible?
So what I think would be good, you know, there's a summary in the end of the explain plan.
Yeah.
And in this summary, we should have data in bytes.
It would be great to have it.
Like how much was read, hit?
You know, I've done this.
You know, it's actually...
We both did this with our tools.
But I think I did it in a controversial way.
I did it in a way...
I think this gets down to kind of the...
It makes it difficult to... The trade-off between being accurate and being helpful.
If you're already talking about splitting it into the four types, like, I guess there's actually eight, there's actually 10, you know, you've got shared hits, shared read, shared written, shared dirtied, local red, local hit.
So there's 10 statistics already if you want to be accurate about the overall,
even the overall reporting.
And what we ended up doing is it's horrible and it doesn't make any sense,
but it's nice from like a – well, we just sum them all together
and give an overall number. what if we would say shared block reads
shared block hits
talking about operations
not the amount of data
because we both know
for hits
for blocks hit
there might be multiple hits
of the same block
many many times
nested loop for example
very simple case
for reads
it's also possible
if
I don't know
it's possible I think, I don't know, it's possible,
I think. Let's not spend time here. When you summarize it, so instead of blocks hit, I would
say how many block hits happened and how many block reads happened. And then you summarize them
and say how many I.O. happened I would maybe not summarize them all together.
I would maybe still distinguish reading and writing types of direction of operation, you
know. I understand. But the reason I love this approach is...
I separately of O.
The reason I like this is most of the time we're looking at read queries.
That's just the general case.
Don't you have write-intensive workloads?
Yeah, of course there are.
But if you're looking at people trying to speed up queries,
the vast majority are read queries.
Or they take update and extract the search part of it and optimize it first as
select yeah it also happens i agree with you yeah but my my main my main point was even if it's a
right intensive query it's the before versus the after and it's funny that you used the word magic
a few minutes ago that your is the technique that's magic it kind of flips
the switch for people i think it's i use that word but in the opposite direction i think it
takes the magic out of query optimization i think sometimes junior developers or people beginning
think that adding an index is magic and it's gone it's you know we've seen the blog post it
this query is now 10 000 times faster's like, there's no magic here.
This is why.
Yeah, yeah.
So let's agree.
There's no contradiction here at all.
When I started Postgres AI,
my idea was that Postgres database administration
and optimization and all other areas,
it should not be black magic.
It should be white magic.
And black magic happens when people need to spend 10 years because of lack of transparency
of observability tools or regular tools and understanding good materials.
Now we have much better materials than 15 years ago, for example, a lot of good tools. And buffers is one of the cornerstones
of tools and approaches
which really clarifies everything.
And so magic, black magic disappears
and only white magic.
It's white magic when you say,
okay, I did this and now query is thousand times faster.
It's great, right?
But looking only at timing,
you cannot achieve it usually.
You do need to look at the work
to be done or is being done and this is buffers well and if you see on the one that's a thousand
times faster that instead of having to read a gigabyte or instead of the sum of buffers being
a gigabyte the sum of buffers is now two megabytes like some something in the order of a thousand
then it becomes obvious surprises anymore
yeah exactly and it doesn't seem magic to people anymore it's you start to realize how the index
is working and why it's therefore so much faster so that's the reason i love the simplicity of it
but i think it also like i want to get back to is this now a good time to go back to why
adding buffers by default didn't work the second time around?
Yeah, please. I'm all ears.
Well, you started the thread.
Yes, maybe I forgot. Lost tests, right? Tests.
Yeah, that was the main. So adding it by default, I think one of the problems was the patch that eventually the patches
the patch set that ended up getting proposed there were about four different patches that all did
slightly different things and i think that was part of the problem that it wasn't kept
as a simple change and actually the biggest part of the change proposed was around tests as you say
so if we turn buffers on by default the question the first question that you asked and
that the patch ended up doing was should it be on for explain on its own as well as explain analyze
and i think that was a mistake i think that over complicated matters um because also can have
buffers for plenty right yes but the reason i think that was a mistake was it changes, like there are far more explains in the test set in Postgres' regression tests than there are explain analyzes.
So firstly, it complicates that.
And secondly, timing is off by, or summary, which gives you the planning time, is off by default in explain.
So it's really weird to have buffers on by default
and not and not summary on by default so i i don't think it makes sense logically but also
it complicates the the patch set and the regression tests so i if i if i was redoing this i would have
a go without that you know just have it on for explain analyze not for explain
okay well that's a that's an opinion i don't actually know if it'd be any better without it
but this the second issue was then what do we do with explain explain analyze has timings right
but they have a regression test setting that that turns those off so that each time the regression test
suite runs the timing's fluctuating doesn't make the tests fail and there was an attempt to add a
similar thing for buffers and so to disable them for tests only instead of rewriting the tests i
think yeah and maybe it's better would be to rewrite the tests,
especially because
I forgot to mention
one of the very good things
I like about buffers
is that they don't lie.
They're always stable.
So you can use them in tests
and rely because,
well, of course...
They're not quite as stable as...
Hits and reads?
Not just that. Planning buffers as well.
Sometimes you get a lot of planning buffers, sometimes you don't.
It can depend on...
Okay, at least to some extent they are stable.
Even if the query, for example, is blocked by some different session
and our session is waiting for a log to to be acquired we can spend a lot of
time on this weight but buffers will be fine i mean it won't affect them and this is one of the
signs if you see buffers are low but latencies high maybe locking issues are involved here right
yeah um but but tests don't like they shouldn't block tests passing right like you can you can
handle this i would by the way i would bring my bad or maybe good analogy back and say if your
guys are wasting a lot of time but do a little work maybe there is some blocker right i mean
with kitchen kitchen i see some analogy here
I see
at least it's my
I like this approach when you dive deep into
details what's happening what people are
doing really doing
it can be applied to anything
maybe lawyers or accountants or something
if you hide
you usually pay more
if you don't look inside at the amount of real amount of work to be done you usually pay more I mean if you don't look inside
at the amount of
real amount of work to be done
you usually end up paying much more
and this is true with databases as well
if you don't use buffers you will
be having
inefficient plans and you will need to
provision bigger instance for example
and you will pay more
same. Maybe we found the real reason
that they won't turn it on by default
so
too many hackers are working at cloud
companies and they are not interested
in a conspiracy
to be clear
I don't believe that
before we finish because
we have not much time left
let's touch the very fresh topic
serialize.
Sure.
Postbiz 17 is going to have yet another option in explain,
and it's called serialize,
and everyone is excited about that, right?
As I said.
I have heard quite a lot of people quite excited about it.
It's good.
I think it is good.
We're getting the option to see another case where
your query can be slow,
but explain analyze at the moment won't
show that it's slow.
So yeah, for sure, good. But again, off
by default.
What's the rise is, first of all?
Just briefly, it's
amount of what?
Oh, come on.
We're back to buffers, but bytes
and timing.
Yeah, by default it shows
timing, bytes, buffers.
It shows everything, especially
if you have buffers. It will show buffers
as well. But what
is it about? It's about serialization.
So if you...
It can connect us to the last episode when we discussed select star.
And I said select star is actually not a bad thing
because I was comparing this in my head to the idea
to explicitly list all the columns.
And select star for me is a more convenient way.
But, of course, if you need only, for example, one column,
numeric column, for example, integer column,
but there is a JSON column next to it, which is toasted.
Of course, if you select it,
it's a big piece of inefficiency in this query.
If you don't need it, you don't use it.
But Postgres will need to deal with the toast table.
And serialize option can very well show this, right?
A lot of work
to do. Instead of selecting
just one number, we need
to do a lot.
Well, I think the issue is more that
the explain analyze at the moment
doesn't show the inefficiency.
So if you did select star
from that table,
you would get a really slow,
like you did it from your client,
you would get a really slow result
depending on how much data there was.
You'd see in the client that it was really slow,
not just because of the,
well, there's a couple of reasons.
One is because a lot of data
is being sent across the wire,
but also because of the serialization.
So there's two reasons.
But if you didn't explain and analyze, yes.
Actually, you know, now I understand that it's pity I didn't see the discussion beforehand,
because there is demand actually, and I saw it not once, when people want to see the plan
and actually result as well.
Yeah, it's a different conversation.
Different, yeah. I ought to explain the reason, of course, but... result as well yeah it's a different conversation and this is a explain auto-explain there is of
course but yeah this is only the serialization part and it's off by default again um so it's
it's more output i like it we can add it to the list of parameters we ask people to
use and you think it should be wrong by default i I mean, it's the same argument, isn't it?
Like, we want to see the actual work being done.
You know, this and that, and also, I mean, this off by default,
that buffers off by default,
and also analyzes super confusing work
because it's used in other areas of Postgres administration.
How about having new words
invented
that will show proper plans? Of course we can
do it in P-SQL for example with
set.
I did consider
I would be interested in your thoughts on this
and listeners thoughts on the idea
of explain all
or something that whatever
version of Postgres you're on, it will
do all of the
parameters. I think
including analyze, full is
loaded for sure
because of vacuum 4 I think
All what? All queries?
Yeah, okay, let's not pretend
we are hackers at all
No, but I think there is something to
what's the end goal here? Is explain going to
have 300 parameters?
How many are we going to keep adding?
Yeah.
Yes.
So buffers definitely need to civilize.
I need to test it myself for some time to conclude.
I would say I would have.
I have reservation.
I think it's helpful for showing people where the problem is.
But because it's still off by default,
I think a lot of times when it would have been helpful,
it won't be used, which is a shame.
Yeah, that's interesting.
Everything here serializes off by default.
I actually didn't give this thought before we recorded this podcast.
It's funny.
Yeah, it actually should be default on.
Of course, I understand like someone will say
it will produce observer effect,
but honestly, I don't care
because timing itself produces observer effect,
maybe number one player here.
And also this observer effect is maybe not observer effect,
but actual normal effect
because regular clients need this part, right?
Yeah, it's true. This is the opposite. Observer
effect, normally we're talking about
the additional timing added
by observing, whereas
this is a case where explain, analyze will
under-report the time taken
because of this. It's keeping some work
that is needed, actually.
Weird.
So I need to...
My fingers need to memorize.
By the way, I type explain analyze buffers super fast.
Now I need to as well serialize.
It's already too much.
I actually now use a text expander.
So every time I can type explain analyze
and it will change it to the full list.
But obviously you need to
work in lots of different terminals all over the place so yeah probably best to learn to type first
anyway okay so we like the thing but don't like its default off maybe we should actually that's
my opinion reasons yeah okay it probably makes the test difficult of course that's difficult of course but uh
yeah i i would uh emphasize once again in tests and regular tests application that's buffers is
is gold because yeah i understand that sometimes there may be slight differences because for
example data is packed slightly different physically, right?
Legout is slightly different.
But there is no order of magnitude difference.
With timing, you can have it.
Disks are slower and saturated.
Disk IOS is saturated.
Or locking issues, something like that.
And you have timing which you don't understand.
With buffers, it's a reliable number you can
use in testing.
Cool. Thanks so much,
Nikolai. Thank you.