Postgres FM - 102 Query optimization
Episode Date: October 7, 2022Here are links to a few things we mentioned: pg_plan_advsr How partial, covering, and multicolumn indexes may slow down UPDATEs (blog post by Nikolay)Why Uber Switched from Postgres to MySQ...L (blog post)pganalyze index advisorNancy bot (project is not active)pgreplaypgreplay go  Real Application Testing on YugabyteDB with pgreplay (blog post by Franck Pachot) pg_query  Database Lab thin clones Migrating to Aurora: easy except the bill (blog post by Kimberley Nicholls) ------------------------What did you like or not like? What should we discuss next time? Let us know by tweeting us on @PostgresFM or by commenting on our topic ideas Google doc.If you would like to share this episode, here's a good link (and thank you!)Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQR.
I am Michael, founder of PGMustard. This is my co-host Nikolai, founder of PostgresAI.
Hey Nikolai, what are we talking about today?
Hello, let's talk about query optimization, like second step, third step, after you identified,
maybe third step, right? First step is you found some queries which behave not well.
Second step, you found some optimization idea.
And what's next?
Yeah, exactly.
How do we know whether it'd be safe to make change that we're thinking about,
whether it will make a difference, that kind of thing.
Yep.
And this is one of the topics our listeners requested.
Yeah, exactly.
I think they phrased it quite nicely.
It was like a 102, I think they said, like the American class in terms of university courses, right? Is that how it works? 101 is the intro and then 102 is the next step.
I'm not the first person who can explain this, but yeah, I saw it. 101 is everywhere, right?
Yeah.
So where to start? Let's first make a step back and think about how we verified our optimization idea. As usual, I have a strong opinion here
because I see a usual approach is either to rely on experience
or to test it right on production.
Because the problem is that non-production environments
have different data, different size of database,
everything is different, sometimes different hardware,
but actually hardware is less important.
Much less important.
We can tune our Postgres on a weak hardware
to behave in terms of the plan choices.
To behave is exactly like on production.
It's possible.
But the data, statistics,
and the planner settings, of course,
these things are the most important.
So the question is how you verify your ideas.
How do you do it?
And the answer, great.
So what do you think?
Well, I'm interested.
Yeah, well, so I think you've probably gone really advanced there
already talking about how hardware is less important.
I think the first lesson is that testing in a similar setup as possible
is really useful.
And maybe there's some exceptions to that,
but by similar, I think the first thing I always try and encourage people to do is make sure the
quantity of data is similar. So, and I guess we're testing two things, right? The first thing we want
to test is for the single query that I was trying to optimize or for the single problem I was trying
to solve, does this solve that problem? And then there's a secondary level,
does it introduce other problems or does it work well amongst the whole load of the system?
But right, so we want to, we want our non-production postgres behave ideally
exactly in the same manner as production would do it. Actually production can change. Yesterday
it chose one plan,
today there is no guarantee it chooses the same plan.
And this is quite well known in Postgres.
And unfortunately, Postgres doesn't provide a good way to solve this problem.
I mean, to freeze the plan choice
and avoid sudden plan flips for critical queries.
But it's another topic.
So this is another topic I mentioned just to give the feeling that the planner behavior is not stable.
So achieving the same planner behavior on non-production environment, it's quite a challenge.
But there are specific tricks that can help us.
For example, of course, ideally, we want the same data.
Because the same data will mean pg statistic content is
the same so we are fine there are tricks to export import pg statistic content even if you don't have
data there are some not very popular but interesting extensions i think they came from
japan i don't remember names but we will provide links it's possible to export and import statistic
i this approach have limitations it's similar to export and import statistics. This approach has limitations.
It's similar to hypothetical indexes, partitions.
Very interesting approach, and sometimes it's good.
But what if we want true behavior?
So executor really executes our plans,
and we can see the behavior of Postgres for our query.
In this case, we want the same data.
Of course, there are several levels
how we can be close to the same. We can just take it as is on physical level. This is the absolutely
good approach. We just grab the same PgData. If we have managed Postgres, we just clone it.
Most of managed services allow to clone. It takes time. it costs money, but it will give us the same data. And so
we can see behavior there. But sometimes we cannot copy on physical level. So we can only copy on
logical level. But as you said, the numbers of rows, for example, will be the same. So we're
good. A problem, not really in the topic of performance, but a problem I see quite a few
customers coming across with that is the kind of access to data.
So giving developers
access to production data
can be problematic
from a privacy point of view.
It's very problematic.
And in my opinion,
it's not about...
There are two problems
that access of arbitrary developer
and company to production
can bring us.
First problem is security.
And second problem, I think it's more important, even more important.
Security, we can ask them to sign something,
to do security trainings, a lot of things.
But second problem is more interesting.
Developers want to develop.
So they add a lot of instability.
If we allow developers to go to production,
it can introduce challenges in terms of stability because
we want to have an established process of development testing and deployment so we don't
want developers to go and check for example oh i have a great index idea i will create it right on
production primary node and i will check it's not, right. But security is also a concern. But even if you don't talk about security,
which is a very big concern,
if we need to use a logical copy of a production,
for example,
the question is,
will the plan be the same?
Row counts are the same, right?
Statistics should be the same.
And, for example,
we took the same,
even the same hardware,
and we put the same postgres settings to configuration
question is does it guarantee that the planner will behave in exactly the same manner or no
so the the place my head's going to is using analyze and it sampling even when you run analyze
you need to get samples of tables right so maybe that's not the only way it could be different, but that's one reason it could be.
I agree.
Yeah, some instability there.
But also if you have a lot of bloat, for example,
rel pages is different.
So on your new node, on your logical clone,
rel pages will be smaller.
And this directly affects the planner behavior.
So it can choose different plan.
So we can see different access nodes and plan.
Not access nodes, execution nodes and plan
when you compare production and this logical replica.
Unfortunately, and there's no,
I don't know solution to this problem,
but it's quite interesting problem.
And actually, not just bloat,
also, I guess, insert order, right?
Like correlation of data.
Physical distribution.
Yeah.
Well, you mean the buffer numbers will be different.
This I understand.
For example, we need to read 1,000 rows.
We had 1,000 of buffer reads and hits on our source,
but we have much fewer hits and reads on our replica because more compact storage,
for example, of these rows.
This is possible, right?
But the question is, will the structure of the plan be the same?
And unfortunately, we cannot guarantee.
Usually, it's very close, but unfortunately, real pages are different, and it directly
affects the planner behavior.
But we're talking about edge cases, right?
Not the majority of the time if we're talking about simple OLTP type queries.
Does it use the index?
Does it not?
The kind of things people are probably worried most about, they're likely to be easily replicated
once you have a similar sample of data.
Right, right.
I agree.
And sometimes, for example, we might have multiple production
environments, and they are slightly different. So things can be very interesting in various cases,
but at least we can be as close as possible. If we aim to do it, we say, okay, we have a billion
rows on the table, let's have a billion rows on the table in our production environment,
non production environments. And then we can see the plans. Now back to the topic. If we have a plan, if we have
a query and a plan, and now we want to ensure that it's a good idea and we need to approve it.
Of course, direct comparison before and after, it's already good, right? So we have one query
and we see that our change, for example,
we create an index or we do something else, we change query somehow, we directly see that
I.O. numbers improved. So conclusion is let's do it. But as you mentioned in the very beginning,
of course, there are some cases when our change can affect other queries. And the question is
how to find it holistically,
how to test it holistically. And this is unsolved problem. First of all, let's consider one case
when it can be possible. I had it and I wrote an article about it. We will provide it. So it's like
an interesting case when optimization of index led to all updates being worse in terms of execution. Why? Idea was simple.
For example, we have some select. It's quite slow. Maybe not slow, but we see that it uses some index
which we can reduce in size significantly adding a where clause. We add a where clause, everything
fine. This select has better performance because index size is smaller and so on.
Because now it's partial, smaller, right?
But then we deploy it and suddenly you see that latency of all updates on average reduced.
Why?
Because we lost hot updates.
Because adding some column to index definition, even it's in the where clause, leads to different behavior
of updates because HAT updates heap only tuple updates.
They are possible only if no indexes have column name in their definitions, that column
which we are changing.
Yes.
Right?
So if we change some column, usual case, for example, updated at timestamp,
we usually tend to change it when we do updates, right?
Because it's the semantics of this column.
But if we have an index on it,
no such updates can be hip-only-tuple.
Hip-only-tuple updates,
they are much faster than regular updates
because during these updates,
Postgres doesn't
touch all indexes. It doesn't need to change all indexes. But with regular updates, Postgres needs
to update each index. And this is called index amplification, index write amplification issue,
one of those which Uber highlighted in their article. So losing hot updates may be painful.
Especially on a heavily loaded system, for sure.
The place I am starting to see some progress on in terms of tooling is people even being aware that some of these things are a problem.
So PG Analyze added an index advisor in the last year or two.
And one of the things that it points out is an estimate of write performance overhead. It's PG Analyze added an index advisor in the last year or two.
And one of the things that it points out is an estimate of write performance overhead.
So not only can indexes be a problem in terms of preventing hot updates,
they also add write overhead for new rows, for example. Oh, yeah.
Any index, if you create one more index, definitely you add overhead.
Definitely. And we can actually estimate, definitely you add overhead. Definitely.
And we can actually estimate, we can measure it.
And this is interesting.
And this, definitely it's possible to improve tooling.
Definitely.
I don't see best tools here.
I made a couple of attempts to create tools.
One of the ideas was to create a benchmarking tool,
which will be used in any change, for any change related to database.
It was called NancyBot.
And the idea was we will replay workload before and after our change.
We will collect all metrics, for example, from PG star statements,
from logs, auto-explain, everything.
And then we will provide the direct comparison before and after.
And we see exactly which queries degraded
and which queries improved.
And this was a great idea.
We even had a good interface and so on.
But then I realized that practically it's not possible
to run full-fledged benchmarks
every time developers do any change.
Because it's just economically doesn't work.
It's too expensive.
You need a separate machine or a couple of machines, sometimes several in a batch. change because it's like it's just economically doesn't work it's too expensive you need separate
machine or a couple of machines sometimes several in a batch sometimes we ran several experiments
like what if we do this this this like we have 10 options and we executed them then questions
arise like we executed them in parallel not sequ sequentially, because sequentially is too slow. If each experiment takes 10 minutes, you want to execute it in parallel.
But then questions arise, okay, but what about, are all virtual machines the same?
Or there are some deviations.
For example, one of them got worse disk or CPU or something.
Or noisy neighbor.
Right, right, right. So we added a lot of logic,
like micro benchmarks
to ensure that some baseline is met
and so on and so on and so on.
And complexity grew,
but the main problem was
you just need to pay a lot
to do such experiments.
Some companies can afford it.
Also, one of the biggest problems
was how to replay workload.
There was PgReplay and pgReplay.go
I saw the other, maybe
yesterday, I saw the article from
Frank Pacheux
or YugaByte about
how to run pgReplay for
YugaByte or pgReplay.go actually
for YugaByte. But collecting
queries is also a challenge and I did
it several times. So the bottom
line, it's quite complex it's
possible to do it some companies actually implemented something but it's very hard to
create a universal tool here but still there is need here there is it we want to cover all changes
with some testing right but the good news here is that we don't need full-fledged clones.
We can do in a shared environment.
We can focus on IO metrics and run many experiments on just one machine.
This is what we do with Database Lab and Thing Clones and Database Branching.
And it's possible.
So we can have Thing Clones.
Of course, they have different timing because different file system, for example, ZFS or anything else. B3FS is also an option. And then you run multiple experiments right on one machine and you don't care about timing, you care about plan structures and IU numbers,
and you can verify many things. So next question is what to verify. Okay, we have our query we
want to optimize. We found some index idea. We verified
that query is improved. How to check other queries? Well, some new tooling needs to be built here,
right? And I think it consists of two areas. First area is a reaction. We need a reactive approach.
We want to grab the content of PG star statements periodically from production and have some regression testing.
We can reduce the scope
if we can parse queries,
already normalized queries,
using PG query library
from LucasFittal, right?
To identify which tables are involved.
And if we consider optimization
for a particular table,
we can find only those queries
from top 100 or top 500
by total time, by calls.
Somebody needs to decide what is the most important metric
and which queries are most critical for our application.
But we can find those queries which potentially might be affected.
Of course, we can also think about triggers and various dependencies, foreign keys and so on.
So this is tricky.
But anyway, we can find it automatically.
And using thin clone, using single connection,
sequentially we can test many queries.
But the problem will be how to find parameters,
which parameters to use for those queries.
And this is unsolved problem.
I already mentioned that this is a big problem.
Which parameters to test?
In reality, what do you see? In reality, what I see people doing, and I think it's probably a smaller scale to you,
is that they reason about the potential effects, discuss it amongst the team during the PR process,
during the discussions. Then they deploy to production and monitor.
Maybe they provision a little bit extra.
A purely reactive approach, right?
Exactly.
So we deploy and see.
Yeah, this is possible.
First of all, of course, code review helps.
And having a bunch of very experienced Postgres experts helps a lot, right?
They can say, oh, you know, this is a good idea, but what will happen?
What's overhead from it?
Let's think about it.
Of course, we can manually analyze and try to predict.
And if you have experience 10 plus years, it will work quite well.
Unless you're very tired, very busy, have days off and so on.
So I still think about a fully automatic approach.
But it's possible and we can check metrics and have some alerts.
For example, if updates are very important for us,
if the ratio of hot updates dropped after our release,
it can be somehow flagged, alerted, and so on.
This is a good approach, yes.
But it's purely reactive, right?
And actually, most projects, projects don't have even this.
So I agree, this could be a good step towards it.
But back to fully automated fashion, which queries to check?
We need to check queries from production for regression testing, and that's possible.
And actually, in a couple of cases, we already have something like that with our database lab engine.
Everything is done in CICD,
and the users see the difference
and potential degradation before we deploy it.
It's very good.
But additionally, we need to think about
absolutely new queries.
For example, if it's a new feature,
new feature branch,
and it introduces some new queries
or have a rewrite for existing queries, production doesn't have these queries. So second part of it,
it's even more challenging. We need to test absolutely new queries here. And I'm just
explaining our thoughts. We moved in this direction with Database Lab and I'm excited that in future
we will have fully automated
testing of all
things related to databases
even for those who are not experts
at all. And when experts
are involved, they have a lot of
artifacts, metrics to
analyze and understand much quicker
what to do, how to fix it.
So everyone can move faster and
with better quality and so on so right now without this tooling idea is right we need to have code
review and some every database schema change we need to test the change itself but also think
about how it will affect workload. Yeah.
And I guess first step,
some people don't even check about the locks involved, right?
That's like, I guess that maybe that's 101 rather than 102.
But making sure that it's not a blocking change.
I know this has got a lot better in more recent Postgres versions.
Right, but I even don't discuss it considering it's too trivial, but right.
So like create index should have concurrently
and updates should be in batches.
Yeah, and this is actually, we solved it with database lab.
We have a fully automated verification
that no exclusive logs last more than
specified number of seconds.
For example, one second.
This is possible to fully automate
for the change itself. But we spent some time to fully automate for the change itself
but we spent some time understanding how to test the change itself
and quite already solved it
we can put it to your CICD
but now the question is
I think it's a bigger question
for example some developer created something but forgot index
how to see that it will be bad if we deploy this? And how to see it
without involving very, very expensive and very experienced DBAs every time, right? So we should
involve them only if we have a hard question, not a simple question like forgotten limit or index
which degrades our other queries and so on yeah and i was you mentioned pg replay
go that's my the old the company i used to work at go cardless and i i think that's how they reason
right yeah so i think that's how i remember it being used most it was big changes it was risky
things it was things that people already knew were potentially problematic.
For example, big functionality, we deploy big functionality, or we had some
refactoring during several months, right? Large migration, that kind of thing.
Well, in my opinion, benchmarks with multiple sessions should be applied very rarely when
something big changes, either like big new release happens or migration to newer Postgres,
major Postgres version, or change of infrastructure,
change of operational system.
It happens a couple of times per quarter maximum usually, right?
Yeah, much less normally.
Much less normally.
It doesn't depend on the size of company actually,
because anyway, it's a very expensive change, usually.
And it's good that we need benchmarks, full-fledged benchmarks, not every day.
But application changes might happen every day, sometimes several times per day.
And they might change some small part of the database.
Good news, only small part of the database, right?
We don't perform huge refactoring every day. So if we change only one table, two tables, we should be able to
test it in more light manner. This is why I'm so big fan of thin cloning. And I'm so excited to see
others also think about database branching. Neon database, they say about it, like they say,
you can test a lot of things in CICD
it will be cheap.
Actually, Aurora has, as we discussed,
Aurora has thin clones, but it's not cheap.
You need to pay for each clone
for compute power separately.
I hope Neon
will be different. Running on
one machine and you pay for a single machine
but run many tests
at the same time on one machine.
Otherwise it doesn't scale because you sometimes need dozens of tests happening in parallel if you
have a big organization. It can be very costly. So you will turn it off and you will be without
testing returning to bad times again. But the advice right now is yes have database reviews try to automate some things learn your
pgstat user tables right because it shows you how many updates and hot updates you have
some tuple statistics it's very good to understand it i'm not i don't agree with a datadog approach
if you go to database dashboard in datadog they have a lot of improvements lately over the last year or two.
But if you check what they name, they present it as throughput, tuple statistics.
No, no.
Throughput is TPS, QPS.
This is throughput.
Our global throughput for our data is not.
But tuple statistics is important, but it should be in details.
But it's a long topic. Anyway, understanding when you change something for a table,
understanding couple statistics from PGStat user tables,
it's a good thing to have.
And you can see, oh, I lost hot updates.
Oops.
Yeah.
How would your advice change if we're talking about smaller teams?
If we're talking, let's say, like your six-person startup,
you've got a
decent amount of data but not necessarily you know not billions of rows and maybe the team is
currently prioritizing moving fast and if they have a few problems that they have to fix that's
okay is there any changes in this like because this feels like a very safe approach like the
kind of tooling we've discussed seems very very much optimized for trying to avoid a problem at almost any cost.
Well, yeah, good question.
First of all, I would anyway change mindset to use buffers instead of timing.
Anyway, even if you're small, because it will save you so much.
Of course, there is such thing as a premature optimization when you try to optimize too early, so much. Of course, there is such thing as premature optimization when you try to optimize
too early, too much, but
still, if you know that
some table, you will use it as
a queue pattern.
Insert, update, and
delete. Insert, update, delete. It will be bloated
a lot. You need some experiment
anyway. This experiment, by the way,
can still be a single-user, single-session
experiment in a shared environment.
Because you don't actually need to use PgBench and utilize all CPUs.
You don't need some kind of background workload unless this workload also deals with your table.
What you do need is to think about the future, like what will happen in one or two years with this table how big it will be and
run some benchmark with one session to fill it and then to run one more session to next you run
to update it heavily and so on and see what bloat will be and do you have good indexes here or no
like if you focus on buffers, you will quickly...
This mindset change is very important because you will start comparing.
Like in your tool, PgMaster,
you compare, for example, rows, buffers, buffer numbers.
To get one row, we deal with 1,000 buffers.
It doesn't feel good at all.
And if in one year it will be even worse, probably we
should prepare for better growth right here in this area. So we need to test it. We need to test
our queries and see what will happen if we if we have more data, for example.
Yeah, I think if the only other thing I would add is I think I see kind of a couple of extreme failure cases here.
I see the failure case where people have not really started adding many indexes at all.
They almost have no indexes.
And then they start to scale and they have every problem.
Or index every column, right?
Yeah, exactly.
Or the other extreme where they've added an index every time they've seen a problem.
Or, yeah, I've not seen an every column case.
I've heard about them.
But definitely, you know, every time they see a potential for a multi-column index they add one so a single table and even a
table of six columns could have 20 indexes on it different orders different so i've seen both of
those failure cases and i guess if you know you're in one of those then it kind of changes your
approach quite a lot if you're in, we already have too many indexes,
maybe you need to be a bit aware of that and be looking at reducing that.
If you know you're in the case of we don't have enough,
maybe you've got a little bit more leeway
to not have to worry about as much impact of adding extra.
You need to understand your workload to know what to test.
And like, this is true,
but the change of mindset to buffers
and just experiment a lot.
This is a good thing.
Like experiment, experiment, experiment.
And you will see and then you will probably start.
If everything is based on human decisions, like based on gut, right?
In this case, at some point you will start skipping some checks because you already know that it's okay.
But sometimes you will be wrong anyway.
This is a bad thing.
Yeah.
In fact, the buffers thing came up just the other day.
I saw a really interesting blog post from a company
that moved from RDS for Postgres to Aurora for Postgres.
Okay.
And they, in my opinion, must have not had enough indexes.
They must have been doing too much IO
because their costs went maybe tripled
or something according to the graphs and it was really interesting to see that and see that they
thought this simple migration would be straightforward but actually because they
hadn't optimized for buffers they hadn't reduced you know being quite efficient on the io front
that hit them with a big bill and some headaches. On Aurora optimization, it's not about performance,
it's about money as well, right?
And I, your centric optimization
is the best thing there
because if you focus on reduction of IO,
I don't remember it's about,
is it about rights or rights?
I remember something different was there.
This detail I already forgot.
But either rights or rights can,
if you have them a lot,
you need to pay for I.O. at Aurora.
So you definitely want to reduce,
and you want your buffer pool to have better efficiency ratio
in terms of hits versus reads.
And just you can check your queries and optimize them and so on.
And if, for example, indeed, if your index added more,
for example, it broke all your hot updates,
so hot updates ratio became zero.
This is what happened with me a couple of times.
In this case, you will start paying more.
So you optimize one query,
but not only you degrade it,
you have worse performance for other queries,
but you're also paying more.
It's not good. So my advice
is know your workload.
Learn it. From PGS
statements, for example, focus on
buffers, not timing.
When optimizing, timing is
our final goal for optimization,
but it should not be used
during the process of optimization.
It should be only checked
in the beginning and the end. But inside of the process of optimization, It should be only checked in the beginning and the end.
But inside of the process
of optimization, we should be focused on buffers.
And finally, experiment a lot.
And to experiment a lot,
you should just be ready
to clone and reset a lot,
and so on. That's why think cloning is good.
And as I've said, Aurora has think cloning,
so you can run additional
load and test it there and stop it.
I guess they have also like per minute or even per second billing
as AWS in general has.
EC2 nodes, they are per second, so it can be very efficient.
You will need to wait some time, like minutes, to provision a node.
But at least you can quickly stop it, right?
But pure thin clones,
which I named them local thin clones.
So we have one VM and many clones inside it.
So we pay only for one VM.
It's the best for experimenting.
So having such a platform for experimenting
is a good thing.
Anyway, general advice is experiment a lot.
Yeah.
Sounds great.
Anything else you wanted to add?
No, as usual, subscribe, like, share,
provide feedback.
Feedback is very important.
Share is also very important
in your social networks, working groups,
Slack, Discord, Telegram,
what you use, please share.
And bring us more ideas.
Actually, we have a line, we have a queue of ideas anyway, but we want more.
Yeah, exactly.
It's not just always a request comes in and we do the exact topic.
Sometimes it gives us an idea for a slightly different one,
or we put two together and that makes a good episode, that kind of thing.
So we'll see.
Exactly.
Thank you so much, everybody.
Take care.
Cheers, Nikolai.
Bye.