Postgres FM - Parallelism
Episode Date: May 26, 2023Nikolay and Michael discuss parallelism — both parallel query execution and other parallel operations. Here are links to a few things we mentioned: Parallel query (docs)Parallelism in Pos...tgreSQL 11 (talk by Thomas Munro)Parallelism in PostgreSQL 15 (talk by Thomas Munro)Towards Millions TPS (blog post by Alexander Korotkov)Memory resource consumption (docs)Our episode about index maintenanceOur episode about partitioning Patch to make postgres_fdw parallel-safe (by Swarm64) PostgreSQL Parallelism Do’s and Don’ts (talk by Sebastian Dressler)Increasing max_parallel_workers_per_gather (blog post by Michael)~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!If you would like to share this episode, here's a good link (and thank you!)~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PgMustard. This is my co-host Nikolai, founder of PostgresAI.
Hey Nikolai, what are we going to talk about today?
About parallelism. Parallelism.
Are you trying to say it multiple times at the same time?
I did it sequentially.
Nice.
Do you know that parallelism works well with sequential operations?
This is where it shines like if
something needs to be done sequentially their parallelism can be applied yeah it makes sense
i guess it's like oxymoron right i see what you mean like a dichotomy like almost like it feels
like it shouldn't be the case but it is very much it's not my thought by the way it's from
thomas munro who contributed a lot to parallelism of
various operations.
This is one of those topics where we have had multiple requests, not in parallel, I guess.
But yeah, so people are keen to know how parallel query works specifically, some pros and cons.
I think talking about parallelism overall makes sense as well, like a bit of an overview. So I'm
glad you added those bits in. So yeah, looking forward to this one. Do you want to start with what it is or a bit of an overview. So I'm glad you added those bits in. So yeah, looking forward
to this one. Do you want to start with what it is or a bit of history? Yeah, let's start with
history. I must admit, I'm not a big expert on this topic. Of course, I know many things, but
I'm far from being an expert. And I see, I don't see bottom in some places, if you get what I mean,
like in some places i see i need
to dive deeper to see the bottom well i'm i think from previous conversations we've had that
sometimes it's a sign that you are an expert if you realize quite how much there is that you don't
know still but this is one of topics where i i feel my weakness so like disclaimer sure yeah and
i've only looked into this from a like i've obviously done a bit of reading for this episode,
but previously looking into just performance of queries in general and having to read around things,
having to write some things for it as well.
So, yeah, mine is only surface level two compared to people that have written this.
But I think a lot of people don't even necessarily realize that Postgres can run queries in parallel,
that it can do certain operations.
It's not just can, it does all the time.
And usually my experience is closer to OLTP.
And it makes sense because Postgres is the best OLTP system.
But some people, of course, do some heavy analytical kind of queries. And, of course, I remember how
Aprilism appeared in 1st and 9.6
2016.
Easy to remember, 6.6, right?
But you need to remember that
9.7 became 10, so
2017. But it was not enabled by default
in 2016 and 96.
Correct.
And I remember there were fears that in LTP context,
it's not a good idea to have it
because it's more for analytical queries.
Let's enable it dynamically for particular queries
only because of course it has some overhead.
Yep.
So for LTP, we want only single core operations.
But eventually, right now we live with it enabled by default
and we see benefits and so on.
Of course, sometimes application produces not optimal, suboptimal queries,
even sometimes having sequential scans or heavy index scans.
And in this case, parallelism works well well improving the speed and so on unlike for example
just in type jit which we usually recommend disable for ltp this we recommend usually enable
and it's interesting i think it's a similar conversation and the difference is that the
the the defaults are better for the parallelism settings i think the jit one
basically i don't think the cost estimations are quite right i think maybe we'd be able to leave
that one on more if it had a higher cost threshold or had some costing improvements i think parallelism
costings are a bit better and it tends to not cause as many. What about planning time? Planning time also affected, right?
Yeah. Good point. I'm not sure about the, I haven't seen many parallel queries that also
explode planning time. So I don't think it's as bad as things like number of joins and things. So
it's, yeah, I haven't seen it cause excessive planning time issues. I don't know if you have.
The reason I'm talking about this, because because we definitely now live in the world where
JIT probably should be disabled for LTP cases, but parallelism should be enabled, sometimes
tuned. And we see it works well. Every release we have improvements and it works well. And it
makes total sense to me. By the way, we needed to mention that we are not going to talk about parallel execution on multiple nodes.
Like, I don't know, like a green plumb site or all those systems.
We talk about only single node parallelism inside one machine.
Yeah, core PostgreSQL.
This is important to distinguish.
Right, and also to frame what we have, we have it only for read operations, plus a different level for some other operations like vacuum, only manual vacuum, explicit vacuum and PgRestore and so on. But for queries, it's only for reads. We don't have it for writes still. But like, for example, parallel copy would make total sense, but it's still under development.
Yeah, we do have it for similar things like create table and create index, create materialized view, refresh materialized view.
Oh, it's also writes.
Yeah, we can consider them write queries also, but DDL queries.
For DML queries, it's only for read-only, for reads.
Yeah.
So select with these queries.
And this is, like,
since 2016, first non-default,
next year it already became default
in version 10,
former 9.7.
Then it was developed better and better.
Every year we have a bunch of improvements.
Yeah, I think it's interesting, like,
a couple of things about those first few
versions. I think it was really interesting in 9.6 it was released it was very limited in scope but still
useful for people so i think it only supported sequential scans a couple of join operations
and it was as you say it was off by default and that was controlled by a a very important setting
i think if you're doing if we're going to be talking about this.
Max parallel workers per gather was set to zero.
So that meant it couldn't use any extra work.
Each query couldn't use any extra workers.
And that in version 10, a couple of big improvements.
One was that more scan types were added.
So index scans, index only scans as well for B-tree indexes
and bitmap heap scans as well. The
heap part of those could be parallelized. Huge, huge improvements. And that max parallel workers
per gather was bumped to two, which meant we could get three processes by default on a query
that supported parallelism. So one that suited it, we could get up to three cores working on our query at the same time.
Right. And let's maybe mention in this context.
So I remember in 2016-17, I remember we played with some selects trying to understand,
is it worth having it enabled by default, even in 9.6?
And I remember examples when we saw very clearly that overhead brings some penalty, so it's better not to enable it.
But then somehow I lost traction, and then we already live in a world where it's everywhere enabled, and we see it's beneficial.
And if we check examples, I think it's very doable to find examples where you disable parallelism for some query, and you see that it's better in terms of in terms of what by the way time and
buffers maybe right buffers well eventually time time is the most important for end user
user doesn't like well user might care about buffers but only those which produce final result
which goes to the user because if it's like a terabyte to download it's not good but buffers
here also important and overhead i think exists in terms of both so it's like a terabyte to download, it's not good. But buffers here are also important and
overhead, I think, exists in terms of both. So it's good to check both when you analyze
and make decisions based on that. But let's maybe mention why parallelism is important
at all, philosophically and theoretically and so on.
I think I know where you're going with
this but i have some other like yeah let's do the obvious ones first we have a lot of cpus
this is yeah and a growing number like in the the modern world it's quite common to have a lot of
cpus available it because people often scale up their database size. And especially in the cloud, that comes with lots of CPU.
The key problem is that CPU, in general, CPU development, processor development,
went into direction where one core cannot be improved as before.
Like we had this house called the law.
Moore's law, I think.
Right, right right right so obviously the makers of cpus intel
md they decided to bet on multiple core approach and we now have hundreds of cores easily in cloud
it's relatively cheap especially on some kind of spot instances you can play with. And this means that application needs to be adjusted
because you cannot say like, okay, now we need,
like we want to benefit from having multiple cores.
But unfortunately, application needs to be changed to benefit from it.
It cannot be done transparently in most cases.
Sometimes it can be done in some trivial cases
when work can be paralyzed somehow.
Well, again, it's a different topic,
but some algorithms can be paralyzed.
But I remember even some languages exist,
like extensions to C and so on.
It's from my university 20 years ago.
I already forgot everything.
But obviously, to make Postgres benefit from multiple cores, it needed to be heavily reworked.
By the way, I remember also in 9.6, a lot was done to reduce contention in shared buffers.
It's also important.
So a lot of areas need to be improved to benefit from multiple cores.
And of course, if you have a few users, but many more cores, you have only 10 sessions, but you have 96 cores. And of course, if you have a few users, but many more cores, you have only 10
sessions, but you have 96 cores. Okay. And three autowalking workers default. It sounds like most
of cores will be underutilized and it's not good. This is the key. The development of modern
hardware dictates how to build software.
Yeah, there's an opportunity there, right?
Like, especially for things that are very easy to parallelize,
by giving it access to three times the workers,
we're not going to get a three times improvement,
because as you said, there is some overhead,
but we could get close.
Like, it's not uncommon,
especially once you bump that number up to get close to the number of workers times the benefit.
So there's huge potential, not as much as in indexing.
You know, you see, whenever you see a blog post that says how we sped up our database 1,000x in one easy step, that's never going to be parallelism.
But so that's going to be parallelism. So that's going to be... Unless you switch from one CPU to a thousand CPUs
and scaling linearly, which is very...
Theoretically, it can be imagined already.
But an interesting example, if you dump PgBench,
and you have PgBench table, you do some benchmark,
you create that one terabyte database,
which obviously all one terabyte database, which obviously
all one terabyte will go to a PgBench accounts table.
Other tables initially are empty.
And then you say, okay, I'm going to test the testj option of PgDump because PgDump
allows you to benefit from multiple cores.
And you say, okay, I have 96 cores, so I expect my pgDump.
Okay, not 100x, but at least 20x, I will be already happy almost.
And you start dumping and see no benefits at all.
Why?
Because DashJ can work when you have many tables only.
If all your database is a single table and other tables are empty,
PgDump cannot parallelize it.
So is that...
I thought that was like a practical limitation
because the work hasn't been done, or is it
a theoretical limitation? Is there some reason
why it couldn't... It's just how parallelization of PgDump
is implemented. Sure, okay, great.
At database level.
It cannot parallelize a single table
unless it's partitioned.
Yeah, it makes sense.
But it's not really one table anymore.
But yeah, it's a nice link back to that.
And a couple of years ago,
a special option for pgbench was added.
So now you can create,
when you create pgbench database,
dash i, initialize it.
You can say,
I wanted to have partitioned pgbench accounts.
And then you can speed up your
dump. And this is an interesting example showing
that there are many
places where parallelization can be
tricky.
While we're talking about parallel query
side of it, I think it obviously is
worth giving a shout out to the people that
did this. It's a huge amount of work to
get it in at all and make
it safe, make it performant then
it did improve a lot in versions 10 11 and it kept improving there were kind of smaller improvements
and performance improvements in 13 14 even 15 recently so it has it's something that went in
and has continued to get better which is normally a sign that people are using it and want it to be
um wanting it to be better i'd say another
reason it got added this is like it's the same reason but a second order effect was that this
was around the time i think that people were starting to think about alternatives to oracle
and sql server so i think a lot of a lot of people migrating were probably struggling this is a theory
based on who was working on the on the
feature and like which consultancies they were working for but this isn't i don't think this
was a passion project for people who are excited about working on this yeah there's demand there's
real need exactly right and let me finish with reasons dictated by hardware it's not only about
cpu i see in many articles people mention cpu like we have many course let's do it. It's not only about CPU. I see in many articles, people mention CPU.
Like, we have many cores, let's do it.
But it's not on...
Ah, and of course, very important that Postgres is still process-based,
not thread-based.
It's also important, and this is how Postgres works.
So for parallelism, it needs to run multiple processes
and then find a way to communicate between them.
But I want to highlight, to emphasize a very important point.
It's not only about CPUs, of course about CPUs and multiple cores and so on,
but it's also about disks.
This is often overlooked when people explain why we need parallelism they forget mention that now we
are on ssd and to get maximum of it of ssd you need to do multiple threads i mean multiple
processes you cannot get full speed of ssd if you if you read in a single thread. You need like 16 threads. To saturate disk I.O.,
you need multiple.
For example, FIO,
very good tool to benchmark disks.
If you use single-threaded benchmark,
you will never reach saturation.
You need like 8, 16 threads.
And this means that to go full speed,
especially if you have heavy queries, you definitely need multiple threads or multiple processes and the communication between processes.
So disks, in my opinion, are very important.
But couldn't, so here's a question.
Couldn't that be an argument for having more connection, like allowing more users, for example, rather than like, it's not necessarily
a...
It's both.
Okay, yeah.
Of course, well, if you have a lot of small queries in parallel, of course, this is a
different area.
This is what I mentioned in 9.6, there was big improvement to reduce contention, how
we've worked with the buffer pool is organized. I remember a great work from in post,
blog post from Alexander Korotkov
towards 1 million TPS on single node.
It was interesting that it was done around the same time
when MySQL did it as well.
We've mentioned it a few times, yeah.
Right, it's interesting.
But it's not related to processing of a single query.
It's related to multi-user, multi-session processing.
But parallelism for single session is needed
when, again, you have many cores,
you have much fewer,
and this is actually normal,
much fewer active sessions right now.
If you have more active sessions than cores, it's not a good position already.
I mean, usually we have some room.
If we don't have this room, if we have 96 cores and more than 96 active sessions,
I suspect we're saturated already.
There's CPU, most likely CPU.
It's not a good position.
But usually we have some room and why
not to use some room and speed up one query and reduce contention as well. It will be executed
faster and we will have average active session number going slightly down. It's good.
Yeah, I was going to ask, is this because... So we've got good note in the docs.
I think actually the Postgres docs on parallelism are great.
They're in far more detail than I was expecting when I first looked it up.
I think they really do explain it well.
A note that they make is that this isn't just about slow queries.
This isn't just about querying lots of data.
It's querying lots of data and returning few of those rows.
It's not as likely to make
a big difference to your queries if you're having to return all the rows anyway because of the
overhead of of sending that data over the network is likely to dominate the the time but if we're
doing aggregations it depends well sure sure but the the big benefit here is in kind of aggregate
queries or things that are heavy queries that are returning some summary of that information
or only returning a small...
Which normally is the only justification for doing a heavy query, right?
If you're doing a heavy query and then not returning most of it,
it's bad, right?
That's why I think buffers first analysis of single query
matters here at all.
A lot you need to pay attention to buffers
and see what's inside, how many bytes,
how many pages were involved in query processing
and compare parallel, non-parallel, tuned with this in mind.
That's a good point.
You return just one row, like average, aggregate,
but inside it you needed to process a lot of buffers.
And this explains why it's so
heavy. So I was a bit surprised when you said, like, I get the impression you don't turn off
parallelism for some of your, like, let's say you've got a lot of RLTP. But would you, if you
had the choice, do you think? It depends. I would probably do some benchmarks. i would analyze the content of pg style statements create some like i call it
how i how do i call it simulated workload or something like i i already forgot how i call it
sorry word out of my mind so i would create some test set and just run multiple iterations
checking even probably just in single thread as usual like i do it
sometimes in shared environments where i don't care about time a lot i just see how much data
how much buffers were involved in processing without realism with realism it will give me
idea of course it's not directly converted to time but it will influence the most the resulting time. And here, of course, in parallelism, we can understand like we're reading a thousand buffers
using a single process versus three processes or four workers or more.
It's different, right?
So we need to understand like timing will be affected by parallelism.
But still, it's interesting to see overhead and so on.
So buffers analysis is important, right?
Yes, absolutely.
In fact, there's a few tips.
So when we're talking about this, we're probably talking about looking at explain plans.
So we get information about parallelism through explain plans.
Analyze buffers, right?
Yes, but initially just explain.
You can see whether the planner is going to use.
Exactly.
Not just is going to use it, but how many workers it will request as well.
So you can get some information about how much parallelism.
So it doesn't have to max out the option. it can choose anything between zero and your, well, initially max parallel workers per gather
for each gather or gather merge.
So yeah, there's that.
But then additionally,
I wanted to say that verbose
is quite important here as well.
If you want to see some of the per process statistics,
verbose is quite a useful thing.
Yeah, good point.
So let me finish my thought
probably yes sorry of course if you if we need for example to deal with 100 buffers for example
again currently if we forget about vacuum we should discuss vacuum and create index and
dump restores separately which have paralyzation and uncheck we mentioned last time let's let's
talk about them separately but for normal queriesML queries, it's only about reads.
So, for example, if we need to deal with 1,000 buffers originally fetched from heap using sequential scan or from index using index scan, index-only scan, or also bitmap scan, we understand that, okay, if we have three workers, will it be three times faster? And here it's interesting to see overhead and buffers option
will also provide us some additional insight about overhead
of parallel versus non-parallel and parallel with how many workers.
And when Planner chooses plan, it only takes two kinds of input
besides query itself.
First is statistics for all tables involved.
And second, planner settings and workmem.
Planner settings from PgSettings,
you have categories, special category
about planner settings.
There are some settings related to parallelism.
Yeah, actually workmems are really good.
It's an important point here.
Yeah, if you're going to change
your parallelism settings you also need to consider your work mem setting at the same time
like why am i saying that is because most people realize that work mem can be used multiple times
in a query if you have multiple hashes or multiple sort operations work mem can be used multiple
times that's why it's a scary parameter to set and why people often go relatively conservative, but they want to bump it up so that operations can happen
in memory. Once you introduce further parallelism, that multiplies. So you can have each parallel
operation that's using workmem can use it the number of times that there are workers.
So it multiplies quickly. So if you're going to bump that number up,
it's something to be aware of. And it might be an argument for reducing it sometimes.
Maybe you can get away with a higher WorkMem setting for your entire workload if you reduce it.
But like WorkMem, if you do have the odd, really big query that you need to run,
you can set some of these settings locally, right? Oh, like on a session level?
So workmem is a super interesting topic, and we need a couple of minutes to discuss it.
But before that, let me mention what I understand about which read queries can be parallelized
in Postgres.
First of all, if you want to benefit the most, you need the most recent version, because
each version had improvements.
Yes, makes sense. Like 10, 11, recent version because each version had improvements. Yes, makes sense.
Like 10, 11, 12, all of them had
improvements. Second,
the easiest is sequential scan
and index scan and bitmap scan. They all
can be parallel.
And it's quite obvious because
well, index is more complex, but
still can be parallelized. All good.
We can benefit a lot.
And if we expect only a few users using our 96 core
or maybe 244 AMD EPYC Milan CPU,
so many cores, of course,
and only if we have like this beast running Postgres 15
and only 10 users doing some analytical queries,
we definitely need to reconsider
and increase max workers per gather.
There is one set of settings,
max working processes, which requires restart.
All others don't require restart.
Max workers, max processes.
Max worker processes, yeah.
Right.
It's usually increased also for logical replication.
But in this case, if you expect heavy queries and only a few users using some beast machine, you need to increase it a lot. And this requires a restart. Everything else can be adjusted later and dynamically in session to experiment and so on. These access method nodes in plan, which are leaves in our explained plan,
they're relatively straightforward, but then interesting,
like aggregates, joins, right?
These operations are interesting.
And in 10, hash join was, or merge join was supported.
And later, a couple of years later, in 12, maybe, I don't remember exactly, hash join was supported merge join was supported like and later a couple of years later in 12 maybe i don't
remember exactly hash join was supported as well but they have some difficulties right and and also
i see in release notes of 16 which is uh it's already better right better was released last
week i guess full outer join also can be paralyzed and so on it's interesting and if you see like about joints it's also possible about
aggregates it's also possible but things become much more difficult there and now memory memory
even without parallelism is so tricky in postgres memory management what backend developer expects. I say, okay, I have 600 gigabytes.
25% of that
goes to shared buffers and also
200 gigabytes probably or maybe like
300 gigabytes I'm ready to
give to normal query
processing. Everything else will be
second underlying layer of
caching, which is page cache.
I want some setting. I want
to say, okay okay 200 gigabytes can be used by
postgres there is no such setting and it's so hard to understand how like if we set up new server
it's so hard to predict how much memory it will be using it's super well i think a lot of people
don't even realize what happens if it goes wrong. So we're talking out of memory.
Everything shuts down.
That's the failure case.
We can enable swapping, and some people advise to enable it
or enable a small swap for protection.
But in general, yes, it's not a good position.
Either the server goes down or you are becoming very very
slow like so latency spike you have latency spike it's not good you you don't you definitely don't
want out of memory killer to be involved or to be very slow so you don't want to be out of memory
but how to properly plan it memory planning planning is a super difficult topic. Maybe we
should allocate for it a special episode. But in a few words, you take workmem. Of course,
there is a maintenance workmem and autovacuum workmem, which by default is inherited by
maintenance workmem. So if you understand how many workers can build indexes, and this can be paralyzed as well, which is good.
Also, you know how many autovacuum workers.
You have max autovacuum workers, which is by default three,
not enough for modern hardware at all.
So you allocate some autovacuum max workers.
You increase it.
So you can plan this part relatively easy it's simple
but normal query processing dml processing it's super complicated so you take workmem
and then you should understand workmem is what not single session can consume but can consume
multiple times of workmem because it's how much memory is allowed
to be consumed by ordering and join operations, hashing operations, which becomes very complicated.
Okay, one query can have multiple joins and so it can consume multiple times of workmem.
For safety, you need to take workmem and multiply, for example, by two or three.
But in reality, it's never allocated as a whole. Usually, it's some part of it.
So you need to understand your workload on production to make proper decisions. And I see many servers are kind of overprovisioned because they know that queries, most of the queries are relatively light,
so they don't reach even one workmem for a single query.
So it's good to have max connections.
They allow max connections quite a lot.
There are many reasons to try to reduce max connections, especially for older Postgres.
But in general, you need to take max connections, multiply by workmem, and also
by two or three.
Because there is some uncertainty
how many megabytes can be used
by a single session. But of course,
worth mentioning, default
workmem, if I'm not mistaken, it's four
megabytes, maybe bytes.
It's for
teapot Postgres.
Yeah, it's too small.
So 100 megabytes is good. It's for teapot Postgres. Yeah, it's too small. Right.
So 100 makes this good.
Well, yeah, for big ones.
But I think if I was setting defaults, I might try and argue for a bump to 16 at least.
I feel like there's almost no, even for tiny instances.
Depends on your resources.
I know it depends.
I know, I know.
We should probably go back to parallelism.
Right.
So I wanted to draw this picture really quickly
and then we add parallelism here.
It means that our max connections,
which for example is 300 for our 96-bit server.
Okay.
We have work map 100 max
and for safety we multiply by two or three.
It gives 100 megs times roughly 100.
It's 10 gigs already, and max connections is 300, I said.
It's already too much, right?
Way too much.
So probably you should reduce workmem, but in reality, you see it's never achieved.
So real memory consumption is good, so you try to increase it to avoid temporary files.
But parallelism, what does it bring to this picture?
It brings more complexity because it says, okay, now each session,
each session out of our 300 or 500 max connections can have multiple parallel workers
and consume even more workmem, right?
And for safety, you need to multiply by expected number of parallel workers on average,
which you also hard to predict without observing in reality real production behavior.
So you have two multipliers now.
Yeah, and that was my point.
I think people are often kind of aware of the status quo but if
they change this setting they might not think to read to change the other one as well so it's yeah
so my main point is that if you i do some consulting my my team does some consulting
and we say okay if you want to be super safe you need to use these multipliers, but you will end up having a lot of temporary files,
unfortunately. Later, in iterations, you can adjust and try to get rid of temporary files,
going maybe to unsafe space, theoretically, but practically, you see that your workload
is good. I mean, you still have a lot of free memory, which is used by page cache. So also you don't need to
steal all gigabytes from page cache. So this is the approach. This is theoretical safe approach,
but it's very wasteful in terms of memory and leading to latency overhead because of temporary
files. So parallelism gives complexity to this picture
definitely oh there is also multiplier for hash okay it's yeah hash mem hash mem multiplier but
it's if you're thinking about work mem hopefully you're thinking about that as well because it's
in the docs in the same place and stuff so yeah that's two now by default yeah another tip like
before we move off so parallel query if you notice like if you have a query plan that's not parallel or something you think should be,
one thing, for example, to be aware of is that user-defined functions by default are not marked parallel safe.
So that catches a lot of people out.
So you need to explicitly mention that those are parallel safe.
And they should be parallel safe as well, right?
If they are parallel safe, yeah.
Also, you probably deal with very small table,
tiny table, tiny index.
There are two thresholds defining,
like below these two thresholds,
one is for table, one is for index.
Below them, it cannot be.
You can adjust them and go down if you think
it's worth but depends again yeah good point so you will never like if you don't change any settings
you'll never see a parallel scan on a tiny table oh by the way and you know what cpu is one reason
we discussed it cpu development dictates software development. SSD disks also.
But memory, we have some.
It's so affordable.
Hundreds of gigabytes now.
Well, it depends, of course. But if you're a startup which grows really quick,
you have some money, but you cannot be slow.
So you sometimes prefer just scaling vertically
and putting a whole database or a major part of it to memory.
It's possible, I mean.
In clouds, it's one of options which we have.
And it's quick.
Of course, it requires some additional budgets.
And also, this is very closely connected to partitioning. Because when I say put whole database to memory,
it might be not whole database, but working set.
And if you have tables partitioned,
you probably have some old partitions which are out of memory.
And you have fresh partitions which are in memory,
and you start benefiting from parallelism
because you don't touch disk and so on.
And work with memory, of course, can be well parallelized.
Unlike magnetic disks, right?
Is that a good transition onto things like parallelism for create index and vacuum?
Yeah, for DDL and other.
Yeah.
So for create index is definitely a good thing
because we like to recall this,
you call it time, drama with Postgres 14.
It means that when you build,
and again, it's related to partitioning.
Everything is related to partitioning.
Everyone should partition tables which exceed 100 gigs.
So if you build an index over a large table,
and as we discussed in our episode about index maintenance,
we definitely want to rebuild indexes from time to time
because their health tends to degrade.
It degrades slowly in modern Postgres, like 14, 15,
but things are much worse in older Postgres versions.
So while you rebuild index, XminHorizon has been held,
auto-vacuum cannot delete freshly the tuples,
and blood is accumulated eventually.
Attempt to improve it in Postgres 14 was reverted in 14.4.
So it means that you want your index build to go faster
because if you move faster, fewer write transactions, concurrent sessions,
produce dead tuples.
If you move faster, if you have room, of course,
if you have power in terms of CPUs and disk.
So you move faster, fewer dead tu was generated and less bloat is produced eventually
when auto vacuum cleans those dataples makes sense right so you want to go faster but of course you
don't want to saturate your iops limits for example or disk throughput and so on and of course, if you rebuild 10 indexes in parallel and they are also paralyzed, having
more workers and it's like dozens of backends are working on index maintenance, you are
going to probably also saturate your CPUs, which is also not good.
So this needs careful planning. Probably during weekend,
moving fast enough,
but not saturating anything.
Control of utilization of CPU and disk IO
here is the key.
But it's a great feature
because you can rebuild index faster,
especially if it's on large table, it's good.
Then what else?
Vacuum, unfortunately, vacuum is,
parallelization works only for indexes and only for explicit vacuum.
Auto vacuum cannot do it still.
And maybe I missed something, but it's only for your manually invoked vacuum runs.
But if a table has multiple indexes, it's quite often.
Indexes can be processed using additional workers which is really good thing.
I hope eventually vacuum will be parallelized even for a single table. But you know like in
general again partitioning is important because you can parallelize processing using just multiple
workers if you deal with multiple partitions. Even with for auto, in this case, this feature for explicit vacuum command,
which applies only to specific tables, indexes, you don't care.
We have many workers.
We have a lot of partitions.
We can process them using auto-vacuum, and that's it.
This is the way to parallelize in practice if you have a lot of cores and a lot of resources.
Yeah, and I think defaults in these areas are still quite low for most people.
Like in terms of like...
T-pod settings, T-pod users.
Cool.
What about analyze, actually?
I didn't look it up.
Again, like I think a single table,
analyze for single table, it's not...
I think, as I remember, it's not paralyzed.
It might be, but the key here is, again,
we usually rely either on auto-vacuum,
so we need more workers.
My recommendation
is to consider,
like,
you have a number of cores.
Take something
between 10%
up to maybe
30-40%
of them.
And this is
your auto-vacuum.
Because,
if you think,
any writes,
they just mark
X-max,
or like,
then we need to really process it.
And you need to allocate a significant amount of resources for that
to manage your writes, right?
So you definitely need to increase to vacuum max workers.
Three is not enough if you have dozens of cores or hundreds of cores.
So analyze either it's rely on your to vacuum max workers.
And again, partitioning is good because analysis of large tables,
especially if you globally increase
the default statistics target,
like some people do, it's not good.
But also there is an important case
when you want explicit analysis,
especially after PIDGE upgrade.
In this case, you run vacuum DB CLI tool,
for example, with dash J.
And you're going to have problem if you have huge tables
because dash J again applies at upper level.
It takes tables and run analyze on them.
So if you have partitioning,
you can say dash J after pgupgrade, for example,
we have maintenance window and we can use all cores we can saturate
resources because we we want to minimize duration of maintenance window and parallelization with
vacuum db-j you take number of cores go full speed and then you are limited again similar to
my example with pgdump you are limited with like largest size. That's why it's also important to partition
because in this case physical tables
will be much
smaller and parallelization will work much
better. So you need
to shrink those tables to
pieces. Nice.
And we can link up our episode on partitioning as well.
Right. It's very related to parallelization.
Yeah. On that
note, for further reading or if anybody wants to learn more about this, where would you point them? Oh, by the way, it's very related to parallelization yeah on that note for further reading if anybody
wants to learn more about this where would you point them oh by the way it's not only related
to parallelization but also related to dml to reads if you select from many partitions
you need constant exclusion you need to deal with as few partitions as possible and also
parallel workers might help you there as well
definitely to read from different partitions and then to aggregate result yeah i haven't studied
i haven't actually looked into that but it's still using the same like worker processes right like
it's still doing the same parallel scans what i'm trying to say is that if you need to deal with
many partitions in your query, multiple workers can be helpful.
Yeah, true.
It's a very related topic. Partitioning is very related to parallelism, obviously.
Yeah.
So vacuum we covered, create index covered, analyze covered, what else?
I think we should leave it there.
No, no, dump restore is important.
PgDump supports that J, but pgRestore also supports that J.
And there are improvements.
I don't remember.
If you check release notes, some fixes were done.
So in newer versions, it works better.
Also there is work with foreign data wrappers and so on.
Something can be paralyzed, but I don't remember details about that.
I think that the Postgres foreign data wrapper, for example, doesn't support it by default, but you can.
There is a fork of it by Swarm64, but I think that was experimental when they got sold. So I don't
know any more about that. On that note, there is a really good talk by somebody that used to work
at Swarm64 called Sebastian that I'll link up. It's Parallels and Do's and Don'ts. It's about
an hour, I think. Oh, great. I don'ts it's about an hour i think oh great i didn't i
don't remember it or worth mentioning also so i would like to have copy paralyzed sometimes it's
you need to deal with single large table and pg restore pre-stored realization applies only at
like higher level so between tables so if you two tables, you can probably move two times faster, roughly.
But if you have single table,
parallel copy would be interesting feature.
And you can cook it yourself,
dump restore, dealing with snapshots and so on,
knowing your IDs, you can split your table
and so on, it's possible.
But also there is create table S,
which is parallelized.
And also materialized views, creation with initialization of data.
And refreshing materialized views as well.
Maybe something else.
I'm sure something else.
Check list notes, a lot of things for that.
Absolutely.
Also, for anybody looking at wanting a recap of the basics,
I did also have a blog post for people that did want to increase max parallel works per gather and the basics around that.
I think we've covered it all here, but that's a shorter summary.
There are many nuances there, but yeah, you need to experiment.
Wonderful. Well, thank you, Nikolai. Thank you for the people that requested this. And I'll see you next week.
I have a strong feeling we didn't cover everything,
but I hope we did a good overview.
Yeah, absolutely.
And I expect many more features in the future
because it's obviously a huge topic
and a very important one for modern hardware.
Thank you so much.
Thank you, all listeners.
See you next time.
Bye.
See you.
Bye.