Postgres FM - work_mem
Episode Date: December 31, 2023Happy holidays! Today's topic is work_mem — how it's used and how to tune it. Related post: https://x.com/samokhvalov/status/1740813478150189172?s=46&t=HSxrW86NZqzPQP9vXVJEWg. ...
Transcript
Discussion (0)
Hello, hello, this is PostgresFM episode number 78, live, because I'm alone today again.
Michael is on vacation, holidays, but I cannot allow us to miss any weeks, because we do
it already one year and a half.
Episode 78, so no weeks are missed, so this week we'll do it as well but it will be a
small episode first of all happy holidays everyone uh it's december 28 29 so so it's holiday season
and uh let's have some some small episodeMem. Somebody asked me to cover this topic.
And I actually wrote a how-to.
I just haven't published it yet.
I'm in my Postgres Marathon series where I publish how-tos every day.
Almost.
I'm lagging as well a little bit because of holiday seasons.
But I'm going to catch up.
So WorkMem. WorkMem is that everyone uses, right? because of holiday seasons, but I'm going to catch up. So workmem.
Workmem is that everyone uses, right?
We all run queries.
We need to use workmem.
But this is super tricky, super, how to say, basic setting
because everyone needs it.
And it's also super tricky because every statement can utilize
less than workmem if like this is limit but it defines upper limit right so we if if we need
less we use less it's not like allocated amount of memory like shared buffers in the size of the
buffer pool. But so we can use less, any query can use less. But the trickiest part is that we don't
know any statement can be used multiple times. Less may be up to work, ma workmem, but multiple times, because this is the amount of memory needed
for hashing, sorting operations.
So if we have, for example, multiple hashing operations inside of one query, for example,
multiple hash joins, we can use it multiple times. and this adds the kind of this unpredictable...
It's hard to define a good algorithm to tune it clearly,
because we don't know, right?
For example, we have some amount of memory
and we know our buffer pool size.
It's simple and it's another topic,
but you define it once and you cannot change it
without restart.
So we say like some rule, most people use it 25%.
Okay, we allocate the 25% for the buffer pool.
What's left we can use and also operational system
can use for operational system page cache.
And we should not forget also that there is also
maintenance workman which is more predictable it has some trickiness as well because of the vacuum
there is a vacuum workman which is by default minus one it means when maintenance workman
will be used and we have multiple workers for for auto vacuums. So if we set it, for example,
I see people say two gigabytes.
It's maybe not really.
It's quite a lot.
You need to ensure that, for example,
when you create an index,
two gigabytes is indeed helpful.
But we can have multiple auto vacuum workers and i usually advocate to raise
the vacuum workers a lot so we can have many of them say 10 if you have a lot of cpus or maybe 20
even it means if you set maintenance workman to two gigabytes. Auto vacuum workman is minus 1, which means it inherits from maintenance workman.
Auto vacuum alone can use up to 20 gigabytes.
It also depends because not for everything it will use it.
But anyway, we need to subtract these 20 gigabytes
from the remaining memory.
We also have some overhead for additional processes.
And then what's left, we can just say,
okay, we have max connection, say 200.
So we just divide the remaining memory by 200.
And this is roughly per each backend what we can use,
but we don't know how many times
backends will use workmap,
right?
So let's discuss the approach I'm like kind of developed just writing this how-to.
First, my favorite rule is this Pareto principle rule 2080.
So we take, for example, pgTuneune Leopard UA I forgot the name
but this is very simple
tuning
heuristic based tool
which is quite good
I mean it's good
good enough in many cases
you just use it
and it will give you
some value for workman
quite safe value
again the main thing is with workman quite safe value again
the main thing is with workman
to see how
your max connections because if you
use max connections you need to
understand how
like we don't
want to have out of memory
this is main thing to
to be
careful with
so okay it will give you some rough value the main thing to be careful with.
So, okay, it will give you some rough value
and I think let's go with this value, that's it.
Then we run it for some time in production
and the second important step
is to have very good monitoring.
Monitoring can provide you some very useful insights about
temporary file creation. When workmem is not enough, it doesn't mean Postgres cannot execute
query. Postgres will execute your query, but it will involve temporary file creation, meaning
that it will use disk to have more memory, right? And this is of course very slow. It will slow down code execution a lot,
but it will eventually finish
if unless a statement amount is reached, right?
So, okay, we applied this rough tuning.
We started monitoring workmem.
Oh, by the way, how to monitor these temporary files?
I see two sources of temporary file creation. First is like very high level is pgstart database.
For each database, you can see the number of temporary files already created and the size of
them, total size of them, columns, temp files, and temp bytes.
So if your monitoring is good,
or if you can extend it to have this,
you will see the rates of temporary file creation
and also size.
Size is also interesting.
We can talk about average size
or maybe maximum size for each file.
Well, we can probably play with this data more,
but it's only two numbers, right?
So number of files and number of bytes.
It's not a lot. We cannot have P95, for example, here, right?
So next, more detailed information is from Postgres logs.
If we adjust log temp files setting,
we can have details about every occurrence
of temporary file creation in the Postgres logs.
Of course, we need to be careful with observer effect
because if we set it, for example, to zero
and, for example, our workmem is very small
and a lot of queries need to create temporary files
not only temporary files will slow us down but also it will produce a lot of logging
observer effect can be bad here so probably we should be careful and not set it immediately to
zero but to some same value first and go down a little bit and see. But eventually, if we know that TAMP refiles
are not created often, we can go even to zero.
Again, you should be careful.
And finally, the third source of important monitoring data
here is PGSA statements.
It has a couple of columns, TAM BLKs read and temp blocks written.
So we can understand for each normalized query, I call it query group.
For each query group, we can see, again, like same as for database level,
we can see number of, oh no, not the same.
We don't have number of files here.
Instead, we have
block read read and written so written blocks is interesting here so but the good
good idea here is that we can identify the parts of our whole workload and understand which queries are most active in terms of temporary file creation,
that means they need more workmem, right?
They lack workmem.
So once we build our monitoring, or we already have it, maybe.
I'm not sure everyone has very good.
As usual, i'm very
skeptical in terms of current state of postgres monitoring in general but assuming we have this
covered in our monitoring tools and we have some details probably in logs next thing of course we
can identify parts of workload and we can think about optimization first.
Instead of raising workmem, we can have idea, let's try to reduce, let's be less hungry
for workmem, right?
Let's reduce the memory usage.
Sometimes it's quite straightforward, sometimes it's tricky.
Again, here I recommend using this Pareto principle
and not to spend too much effort on this optimization.
We just try, if it takes too much time, too much effort,
we just proceed to next step.
Next step is the raising work map.
From these, like monitoring could already can suggest us what is average temporary file size and what is
maximum temporary file size. And from that information, we can understand how much we need
to raise. Of course, instead of jumping, like, straight to this new value, it may be risky. Sometimes I see people do it.
I mean, we know our max connections value.
We know that each statement can consume multiple times
up to workman size because of operations, this approach.
Also since Postgres 13, there is a new setting,
which is, I always forget this name,
but there is a setting that tells you multiplier
for hash operations.
And as I remember, by default, it's two,
meaning that you have workmem,
but hash operations can use up to two workmems,
which adds complexity in the logic and tuning, right?
And again, it makes it even trickier to tune.
So, like on the safe side, if you want to be on the safe side,
you understand the available memory, you understand your max connections,
and you add some multiplier, like two, three, maybe four.
But usually, this will lead us to very low work.
So this is why this iterative approach and maybe raising understanding that
our workload won't change tomorrow suddenly like a hole.
Usually in our existing project, usually we understand,
okay, realistic consumption of memory is this, so we are fine.
We can start raising this workmem.
But like, and if you apply the formula,
you will see, oh, we have risks of out of memory,
but no, no, we, our workload is,
we know our workload, right?
Of course, if we release changes in application,
often workload can change as well, right?
So we should be careful with it.
Especially we should be careful raising max connections
after this tuning of workman
because this can lead us to higher out-of-memory risks.
So instead of raising globally,
I recommend trying to think about raising locally.
For example, you can say,
I want to raise for a specific session
because I know this is like heavy report.
It needs more memory.
I want to avoid temporary files.
I just set workmem to higher value in this session
and that's it.
Other sessions still use global setting of workmem.
We can set even local workmem in a transaction. So transaction finished,
workmem kind of reset in the same session. Or we can identify some parts of workload,
and this is good practice to split workload by users. And for example, we have a special user that runs heavier queries, like analytical-like queries, maybe.
And we know this user needs higher workmem, so we can alter users at workmem.
And this is also good practice to avoid global arrays.
But of course, this will make the logic complex.
We need to document it properly.
So if we have a bigger team
and we need to think other people will deal with it,
of course, this needs proper documentation.
Set doesn't have comment unlike database objects.
So maybe, by the way, I just realized
maybe it's a good idea to have some
commenting capabilities in Postgres for configuration settings, right? So anyway, as a final step,
of course, we consider raising it globally. And we do it all the time. I mean, we see
max connections quite high, and we raise workmem. So even if you multiply max connections
by workmem, you see that you already exceed the kind of available memory. But this is tricky. I
mean, this is risky, of course, but if we observe our workload very long time and we know we don't change everything drastically we change
only parts of workload sometimes it's okay but of course we understand there are risks of the risks
here right so raising workman is kind of risky and should be done with understanding details I just described. Okay, I think maybe that's it.
Oh, there is also, since Postgres 14,
there is a function pgGetBackendMemoryContexts.
It's very useful.
I mean, I don't use it myself yet
because it's quite new.
Postgres 14 is only a couple of years.
But, and there is the drawback of it.
It can be applied only to current session.
So this is only for like troubleshooting,
detailed troubleshooting.
If you deal with some queries,
you can see what's happening with memory
for a particular session.
I saw discussions to extend this function to be
able to use it for other backends for any session. And when I was preparing my how-to,
these days I use our new AI bot and of course it hallucinated thinking, oh, you can just pass
PID to it. No no it doesn't have any
parameters you cannot pass anything to it i would expect it so i would probably hallucinate as well
but the reality is it supports only current session that's it maybe in the future it was
it will be extended so that discussion didn't as i understand didn't lead to patches accepted yet. But anyway, this is additional extra.
I think what I just described is already quite practical.
Just remember that any session can use,
any query can use multiple workmaps,
but usually it's not so.
And so the approach based on temporary files
is the way to go these days, just monitor temporary files.
It's not like a big deal
if we have few of them happening sometimes,
especially for queries, analytical queries,
they anyway are slow probably.
And okay, temporary files,
we can check how much we can win if we raise workman but but
anyway like for walltp of course you want to avoid temporary file creation and by default I forgot
to mention workman is just four four megabytes it's quite low these days it's quite low. I see in practice for mobile web apps
on bigger servers with hundreds of gigabytes,
we usually raise it to 100 megabytes.
Having few hundred max connections and connection pooler,
we usually tend to have like 100 megabytes work now.
Maybe even more sometimes, again, depends.
I think that's it so
hello chat
I see several people joined thank you for joining
honestly I recorded
live just because this is more
convenient for me so this
is podcast anyway
this will be distributed as
usual
I want to again say thank you for being
a listener. Happy holidays. And I hope we will have very good topics in new year. And
I hope every Postgres production server is up and running with very good uptime
and with as few failovers as possible
and with as low temporary file numbers as possible as well.
So this is my wish for you in the new year
and thank you for listening and watching.
Bye-bye.