Postgres FM - work_mem

Starting point is 00:00:00 Hello, hello, this is PostgresFM episode number 78, live, because I'm alone today again. Michael is on vacation, holidays, but I cannot allow us to miss any weeks, because we do it already one year and a half. Episode 78, so no weeks are missed, so this week we'll do it as well but it will be a small episode first of all happy holidays everyone uh it's december 28 29 so so it's holiday season and uh let's have some some small episodeMem. Somebody asked me to cover this topic. And I actually wrote a how-to. I just haven't published it yet.

Starting point is 00:00:51 I'm in my Postgres Marathon series where I publish how-tos every day. Almost. I'm lagging as well a little bit because of holiday seasons. But I'm going to catch up. So WorkMem. WorkMem is that everyone uses, right? because of holiday seasons, but I'm going to catch up. So workmem. Workmem is that everyone uses, right? We all run queries. We need to use workmem.

Starting point is 00:01:14 But this is super tricky, super, how to say, basic setting because everyone needs it. And it's also super tricky because every statement can utilize less than workmem if like this is limit but it defines upper limit right so we if if we need less we use less it's not like allocated amount of memory like shared buffers in the size of the buffer pool. But so we can use less, any query can use less. But the trickiest part is that we don't know any statement can be used multiple times. Less may be up to work, ma workmem, but multiple times, because this is the amount of memory needed for hashing, sorting operations.

Starting point is 00:02:10 So if we have, for example, multiple hashing operations inside of one query, for example, multiple hash joins, we can use it multiple times. and this adds the kind of this unpredictable... It's hard to define a good algorithm to tune it clearly, because we don't know, right? For example, we have some amount of memory and we know our buffer pool size. It's simple and it's another topic, but you define it once and you cannot change it

Starting point is 00:02:47 without restart. So we say like some rule, most people use it 25%. Okay, we allocate the 25% for the buffer pool. What's left we can use and also operational system can use for operational system page cache. And we should not forget also that there is also maintenance workman which is more predictable it has some trickiness as well because of the vacuum there is a vacuum workman which is by default minus one it means when maintenance workman

Starting point is 00:03:17 will be used and we have multiple workers for for auto vacuums. So if we set it, for example, I see people say two gigabytes. It's maybe not really. It's quite a lot. You need to ensure that, for example, when you create an index, two gigabytes is indeed helpful. But we can have multiple auto vacuum workers and i usually advocate to raise

Starting point is 00:03:49 the vacuum workers a lot so we can have many of them say 10 if you have a lot of cpus or maybe 20 even it means if you set maintenance workman to two gigabytes. Auto vacuum workman is minus 1, which means it inherits from maintenance workman. Auto vacuum alone can use up to 20 gigabytes. It also depends because not for everything it will use it. But anyway, we need to subtract these 20 gigabytes from the remaining memory. We also have some overhead for additional processes. And then what's left, we can just say,

Starting point is 00:04:29 okay, we have max connection, say 200. So we just divide the remaining memory by 200. And this is roughly per each backend what we can use, but we don't know how many times backends will use workmap, right? So let's discuss the approach I'm like kind of developed just writing this how-to. First, my favorite rule is this Pareto principle rule 2080.

Starting point is 00:05:00 So we take, for example, pgTuneune Leopard UA I forgot the name but this is very simple tuning heuristic based tool which is quite good I mean it's good good enough in many cases you just use it

Starting point is 00:05:17 and it will give you some value for workman quite safe value again the main thing is with workman quite safe value again the main thing is with workman to see how your max connections because if you use max connections you need to

Starting point is 00:05:33 understand how like we don't want to have out of memory this is main thing to to be careful with so okay it will give you some rough value the main thing to be careful with. So, okay, it will give you some rough value

Starting point is 00:05:49 and I think let's go with this value, that's it. Then we run it for some time in production and the second important step is to have very good monitoring. Monitoring can provide you some very useful insights about temporary file creation. When workmem is not enough, it doesn't mean Postgres cannot execute query. Postgres will execute your query, but it will involve temporary file creation, meaning that it will use disk to have more memory, right? And this is of course very slow. It will slow down code execution a lot,

Starting point is 00:06:26 but it will eventually finish if unless a statement amount is reached, right? So, okay, we applied this rough tuning. We started monitoring workmem. Oh, by the way, how to monitor these temporary files? I see two sources of temporary file creation. First is like very high level is pgstart database. For each database, you can see the number of temporary files already created and the size of them, total size of them, columns, temp files, and temp bytes.

Starting point is 00:07:06 So if your monitoring is good, or if you can extend it to have this, you will see the rates of temporary file creation and also size. Size is also interesting. We can talk about average size or maybe maximum size for each file. Well, we can probably play with this data more,

Starting point is 00:07:31 but it's only two numbers, right? So number of files and number of bytes. It's not a lot. We cannot have P95, for example, here, right? So next, more detailed information is from Postgres logs. If we adjust log temp files setting, we can have details about every occurrence of temporary file creation in the Postgres logs. Of course, we need to be careful with observer effect

Starting point is 00:07:56 because if we set it, for example, to zero and, for example, our workmem is very small and a lot of queries need to create temporary files not only temporary files will slow us down but also it will produce a lot of logging observer effect can be bad here so probably we should be careful and not set it immediately to zero but to some same value first and go down a little bit and see. But eventually, if we know that TAMP refiles are not created often, we can go even to zero. Again, you should be careful.

Starting point is 00:08:33 And finally, the third source of important monitoring data here is PGSA statements. It has a couple of columns, TAM BLKs read and temp blocks written. So we can understand for each normalized query, I call it query group. For each query group, we can see, again, like same as for database level, we can see number of, oh no, not the same. We don't have number of files here. Instead, we have

Starting point is 00:09:06 block read read and written so written blocks is interesting here so but the good good idea here is that we can identify the parts of our whole workload and understand which queries are most active in terms of temporary file creation, that means they need more workmem, right? They lack workmem. So once we build our monitoring, or we already have it, maybe. I'm not sure everyone has very good. As usual, i'm very skeptical in terms of current state of postgres monitoring in general but assuming we have this

Starting point is 00:09:52 covered in our monitoring tools and we have some details probably in logs next thing of course we can identify parts of workload and we can think about optimization first. Instead of raising workmem, we can have idea, let's try to reduce, let's be less hungry for workmem, right? Let's reduce the memory usage. Sometimes it's quite straightforward, sometimes it's tricky. Again, here I recommend using this Pareto principle and not to spend too much effort on this optimization.

Starting point is 00:10:32 We just try, if it takes too much time, too much effort, we just proceed to next step. Next step is the raising work map. From these, like monitoring could already can suggest us what is average temporary file size and what is maximum temporary file size. And from that information, we can understand how much we need to raise. Of course, instead of jumping, like, straight to this new value, it may be risky. Sometimes I see people do it. I mean, we know our max connections value. We know that each statement can consume multiple times

Starting point is 00:11:12 up to workman size because of operations, this approach. Also since Postgres 13, there is a new setting, which is, I always forget this name, but there is a setting that tells you multiplier for hash operations. And as I remember, by default, it's two, meaning that you have workmem, but hash operations can use up to two workmems,

Starting point is 00:11:41 which adds complexity in the logic and tuning, right? And again, it makes it even trickier to tune. So, like on the safe side, if you want to be on the safe side, you understand the available memory, you understand your max connections, and you add some multiplier, like two, three, maybe four. But usually, this will lead us to very low work. So this is why this iterative approach and maybe raising understanding that our workload won't change tomorrow suddenly like a hole.

Starting point is 00:12:17 Usually in our existing project, usually we understand, okay, realistic consumption of memory is this, so we are fine. We can start raising this workmem. But like, and if you apply the formula, you will see, oh, we have risks of out of memory, but no, no, we, our workload is, we know our workload, right? Of course, if we release changes in application,

Starting point is 00:12:41 often workload can change as well, right? So we should be careful with it. Especially we should be careful raising max connections after this tuning of workman because this can lead us to higher out-of-memory risks. So instead of raising globally, I recommend trying to think about raising locally. For example, you can say,

Starting point is 00:13:04 I want to raise for a specific session because I know this is like heavy report. It needs more memory. I want to avoid temporary files. I just set workmem to higher value in this session and that's it. Other sessions still use global setting of workmem. We can set even local workmem in a transaction. So transaction finished,

Starting point is 00:13:28 workmem kind of reset in the same session. Or we can identify some parts of workload, and this is good practice to split workload by users. And for example, we have a special user that runs heavier queries, like analytical-like queries, maybe. And we know this user needs higher workmem, so we can alter users at workmem. And this is also good practice to avoid global arrays. But of course, this will make the logic complex. We need to document it properly. So if we have a bigger team and we need to think other people will deal with it,

Starting point is 00:14:13 of course, this needs proper documentation. Set doesn't have comment unlike database objects. So maybe, by the way, I just realized maybe it's a good idea to have some commenting capabilities in Postgres for configuration settings, right? So anyway, as a final step, of course, we consider raising it globally. And we do it all the time. I mean, we see max connections quite high, and we raise workmem. So even if you multiply max connections by workmem, you see that you already exceed the kind of available memory. But this is tricky. I

Starting point is 00:14:57 mean, this is risky, of course, but if we observe our workload very long time and we know we don't change everything drastically we change only parts of workload sometimes it's okay but of course we understand there are risks of the risks here right so raising workman is kind of risky and should be done with understanding details I just described. Okay, I think maybe that's it. Oh, there is also, since Postgres 14, there is a function pgGetBackendMemoryContexts. It's very useful. I mean, I don't use it myself yet because it's quite new.

Starting point is 00:15:40 Postgres 14 is only a couple of years. But, and there is the drawback of it. It can be applied only to current session. So this is only for like troubleshooting, detailed troubleshooting. If you deal with some queries, you can see what's happening with memory for a particular session.

Starting point is 00:16:03 I saw discussions to extend this function to be able to use it for other backends for any session. And when I was preparing my how-to, these days I use our new AI bot and of course it hallucinated thinking, oh, you can just pass PID to it. No no it doesn't have any parameters you cannot pass anything to it i would expect it so i would probably hallucinate as well but the reality is it supports only current session that's it maybe in the future it was it will be extended so that discussion didn't as i understand didn't lead to patches accepted yet. But anyway, this is additional extra. I think what I just described is already quite practical.

Starting point is 00:16:52 Just remember that any session can use, any query can use multiple workmaps, but usually it's not so. And so the approach based on temporary files is the way to go these days, just monitor temporary files. It's not like a big deal if we have few of them happening sometimes, especially for queries, analytical queries,

Starting point is 00:17:17 they anyway are slow probably. And okay, temporary files, we can check how much we can win if we raise workman but but anyway like for walltp of course you want to avoid temporary file creation and by default I forgot to mention workman is just four four megabytes it's quite low these days it's quite low. I see in practice for mobile web apps on bigger servers with hundreds of gigabytes, we usually raise it to 100 megabytes. Having few hundred max connections and connection pooler,

Starting point is 00:17:57 we usually tend to have like 100 megabytes work now. Maybe even more sometimes, again, depends. I think that's it so hello chat I see several people joined thank you for joining honestly I recorded live just because this is more convenient for me so this

Starting point is 00:18:19 is podcast anyway this will be distributed as usual I want to again say thank you for being a listener. Happy holidays. And I hope we will have very good topics in new year. And I hope every Postgres production server is up and running with very good uptime and with as few failovers as possible and with as low temporary file numbers as possible as well.

Starting point is 00:18:57 So this is my wish for you in the new year and thank you for listening and watching. Bye-bye.

Postgres FM - work_mem

Happy holidays! Today's topic is work_mem — how it's used and how to tune it. Related post: https://x.com/samokhvalov/status/1740813478150189172?s=46&t=HSxrW86NZqzPQP9vXVJEWg. ...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Postgres FM - work_mem

Happy holidays! Today's topic is work_mem — how it's used and how to tune it. Related post: https://x.com/samokhvalov/status/1740813478150189172?s=46&amp;t=HSxrW86NZqzPQP9vXVJEWg. ...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Happy holidays! Today's topic is work_mem — how it's used and how to tune it. Related post: https://x.com/samokhvalov/status/1740813478150189172?s=46&t=HSxrW86NZqzPQP9vXVJEWg. ...