Postgres FM - Plan flips
Episode Date: March 6, 2026Nik and Michael discuss query plan flips in Postgres — what they are, some causes, mitigations, longer term solutions, and the recent outage at Clerk. Here are some links to things they me...ntioned: Recent postmortem from Clerk https://clerk.com/blog/2026-02-19-system-outage-postmortemThe real cost of random I/O (blog post by Tomas Vondra) https://vondra.me/posts/the-real-cost-of-random-ioautovacuum_analyze_scale_factor https://www.postgresql.org/docs/current/runtime-config-vacuum.html#GUC-AUTOVACUUM-ANALYZE-SCALE-FACTORdefault_statistics_target https://www.postgresql.org/docs/current/runtime-config-query.html#GUC-DEFAULT-STATISTICS-TARGETpg_hint_plan https://github.com/ossc-db/pg_hint_planAurora PostgreSQL query plan management https://docs.aws.amazon.comAmazonRDS/latest/AuroraUserGuide/AuroraPostgreSQL.Optimize.Start.htmlpg_stat_plans https://github.com/pganalyze/pg_stat_planspg_plan_alternatives https://jnidzwetzki.github.io/2026/03/04/pg-plan-alternatives.htmlWaiting for Postgres 19: Better Planner Hints with Path Generation Strategies https://pganalyze.com/blog/5mins-postgres-19-better-planner-hints~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
Transcript
Discussion (0)
Hello and welcome to Postgres FM.
We share about all things PostgresQL.
I am Michael, founder of PG Mustard,
and I'm joined as always by Nick,
founder of Postgres AI.
Hey Nick.
Hi, Michael.
How are you?
I am good.
How are you?
Very good.
Fantastic.
Oh, great.
What are we talking about this week?
Not about AI.
Yes, a few.
We have a detox.
Everyone is talking about AI,
so we are not going.
to talk about AI.
Great. So what are we talking about?
Let's talk about plan flips and it's quite complex topic because I feel
PostGus lacks a lot of stuff and I'm not sure what it needs to have in future to be
better. And there is a lot of criticism including recent one.
A recent wave of criticism against Posguss in this area, right?
After some incidents from one company, the company
name is clerk, right? Yeah. Clerk. How do you pronounce it? I say Clark. Okay, that's
interesting. So it's E but that might be. I don't know. Yeah. Okay. Yeah. And there was an outage or
series of outages. I don't know. Yeah. Quite a long one. Two weeks ago. And I'm as a usual,
kudos to all companies who share experience. This is great. They shared experience. They had experience. They had
a blog post with timeline and root cause analysis to some extent. And then I saw on Twitter,
I saw discussions, which looked like Posgis is not good enough. Let's choose better systems.
Because look what you might have with Posgis. Yeah, look what can happen.
Yeah, might happen. Exactly. Something quite innocuous. So it was it, they've said the plan flip was
the, maybe not the root cause, but the thing that took them down and due to insufficient
statistics.
But yeah, they're an infrastructure provider, right?
So they do authentication.
And if off is down, that means all of their customers are down.
Yeah, exactly.
So this was a huge issue for them and for all of their customers.
They are very wary of things that can change unexpectedly, understandably.
But equally, lots of services that are, that need.
extreme high availability, one on Postgres.
So how can they do it?
I think that would be a good discussion.
So in my opinion, their root cause analysis is halfway done.
And naming plan flip a root cause is not all right.
Why?
Why five whys?
You know this approach.
Yeah, I think Toyota.
Yeah, basic approach for root cause analysis is ask why until you cannot answer
at all. So they
analysis was saying like
the problem happened
because post-gress triggered
analyze, obviously
automatic analyzed by auto-vacom
process, which
caused plan flip. So
they did only two steps in this
5-Y approach. They said
okay, what happened? Why we
are down? Because statistics
changed. Why is this? Because plan flip.
Why? Because it was triggered
by Auto Vacuum, two or three steps.
But you can go deeper, right?
Yeah.
You can go deeper.
I think there was one more.
I think they did increase statistics target, for example,
to make less likely to happen again,
but there still are more steps for sure.
Yeah, the increase defaults statistic target?
No, maybe not default, but for the table, I think.
Ah, or for the column, I forget which.
Yeah, it's either column or global.
Anyway, that's interesting.
because in my opinion
there are different
plan flips,
different kinds of plan flips
and I'm still trying to figure out
for myself
but if you have plan flip
when suddenly
like it flipped after
recalculation of statistics
that's interesting
because that might happen
this is actually a difficult type
this is a hard type of plan flip
but question is why
it happened only now, not one hour ago, for example.
And what's to stop it happening again?
Yeah, yeah.
And this is interesting, right?
So I have some thoughts on this specific case, but I think it might actually be useful for us to go back to basics a little bit and define what do we mean by a plan flip and what the normal causes of them.
And I think this is a little bit of a special case that should be interesting and might be completely avoidable, I think.
But a lot of cases aren't avoidable.
And I think that's an interesting discussion.
But yeah, in terms of plan flips, I wanted to ask you, do you define them only when they're bad?
Because we have a cost-based optimizer, right?
The reason we can flip query plans is because the planner has choices, and it makes those choices based on estimated costs of doing different join algorithms or different scan types.
But flipping is part of its design, right?
the same query running on different data as the data changes perhaps should use a different
execution plan it would be faster to use a different execution plan so flipping is designed it's
deliberate but the idea is it flips to a better plan when the data suits that so sudden
plan change is normal yes yes but when people say plan flip their own it feels like they're
only ever talking about that yes exactly
Exactly.
Yeah, because this is when it changes to wrong state.
Because the planner sometimes have mistakes.
Yes, or misestimations, yeah.
For example, if you have random page cost four, which is default,
I saw many situations when we have two candidate plans
which have very similar cost.
under these circumstances. So for this data and what we have in PG statistic,
and for these settings, including random page cost 4, which is default, I saw the cases when
there are two plans. One is obviously wrong. For example, sequential scan for a huge table.
And another is obviously right, which is index scan. Like I'm remembering some simple case.
Like simple in terms of it's obvious to a human or even AI that this plan, for
this table shouldn't be chosen because it's sequential scan on the whole table.
But since, for example, PostGus thinks that random I.O. is so expensive,
suddenly choose a sequential scan plan instead of index.
And you don't notice until some point when these costs, they are very close to each other.
And then suddenly when a little bit more data comes to the table and analyze happens,
sequential bad one wins and that's it and this we usually solve by adjusting
random page cost telling the planner that please don't choose sequential plans i know we have
like thomas vondra tomash vondra yeah yeah has great proof that maybe random page cost four is
not bad maybe even it should be bigger but i have brain split here because
I see what, like, his reasoning makes sense theoretically, but what I saw in production
proves the opposite. I saw these plan flips happening when just like suddenly, and it was like
some ticking bomb. It was like, yeah, it was sitting there, and then boom, like suddenly the
planner chooses the sequential scan and we are down because it times out, right? Yeah, I have a theory
about this. I've been thinking a little bit about it. And I think,
It's like Thomas's work is very good in theory because he cares about the optimal time to flip.
He cares about how much data do we need to be reading before a sequential scan is faster.
But in practice, with large tables, the flipping point is not that important.
The main point is we almost never want to request that much data.
Often, especially in OLTP, we want very small selects.
we're only selecting one record or only a very few number of records.
So the flipping point, we want to put so far out of sight that even if our row estimate is really bad,
because often it's a combination of two things, right?
It's not just the cost of doing random access, it's also how many rows do we expect to be reading.
And if that's out by a lot and the cost is high, those multiply together and suddenly postcards is thinking,
you know what, I might as well scan the whole table here because I'm expecting to read so much.
But if we can, if we decrease the random page cost, we get away with the bad row estimate.
So I think you can fix either and get towards it.
But if you do both, then you massively decrease your chance of this kind of issue.
I hear you and this in my mind connects very well to the topic of OTP versus analytical versus hybrid.
because there are good advancements in the area of analytical queries processing for Posgars in various ways.
And there is also, I'm a strongly belief that there is a very simple case called pure LTP.
Pure LTP means that we need to serve users who are humans mostly, maybe.
And they have human perception.
Our very first episode was about these 200 milliseconds.
This is when they recognize something.
is already happening.
And if it's significantly more than 200 milliseconds, it feels slow.
Which means we need to have our queries below 20 milliseconds because one web page consists
of multiple queries usually roughly 10.
Say 10, some guess, right?
It means we need to be below 20 milliseconds.
In this case, by no means I want any sequential scan be applied.
Never.
Like I don't want it.
Like just forget about sequence.
I want to turn it off.
completely.
I would counter that sequential scans on tiny tables are fine, right?
If it's only got 100 rows in it,
but also an index scan on that table would be fine as well
because it's still only got 100 rows.
So the times where sequential scan is best
don't matter anymore because an index scan whilst being less efficient
or even a bitmap scan would be absolutely fine
even if it's twice as slow
or even five times as slow because it's such a tiny table.
But we also have different.
And other, there is also bitmap scan, index and hip scan.
So it also, it's like in the middle.
And how the planner thinks about the cost, it's interesting.
So I think how we tune some service for OLTP.
And this is obviously OTP.
People didn't want this service to.
Of course.
Yeah.
We definitely want very strict timeouts.
I think 15 milliseconds, it's old.
We should go down to 10 seconds.
10 seconds. So statement time out should be 10 seconds. Even transaction time out should be 10 seconds
for radical. Let's introduce concept of radical OTP. Yes, you have some workload that needs to be
analytical and some people can wait or some background jobs. Yes, create index taking a couple of hours.
It's also a different story. But if it's a pure OTP mobile app, which many people depend on,
put transaction time out and statement time out, very low value, be very strict.
and just cut those queries which start running longer.
In this case, if you, for example, have a bunch of them happening,
there's a chance that on powerful machine you will survive.
If you don't have it, it will be a snowball effect of many sessions being stucked
and running in sequential scan or something like a very not optimal,
and it will be consuming resources.
The other thing it might do is it might more quickly point you in the right direction
as to what's causing the issues.
Yeah, how to diagnose.
Let's maybe, I think I'm pushing us to some like problems and resolutions already,
but let's go back.
You raised a very good point about what plan flip is.
So in my head it's also like this is negative term.
Plan flip is a sudden plan change in the wrong direction.
direction when like basically black planner chooses a plan we don't want like sequential scan or maybe
bitmap scan which is timing out for example yeah I think that it might even be worse than that
I think it might even be like it's noticeably worse if it flips and it's one point one time slower
I think most people don't really care if it's but if it's double or triple or it's like it takes
you from being 90% CPU to 100% CPU if it crosses that threshold maybe again it's the
Thomas episode we did on performance cliffs if it's a
a cliff for you then.
In my opinion, performance cliff term is different and it's macro term.
While plant sleep is micro term, it's like it's just one session and we see suddenly
execution of this query became much worse.
It was 10 milliseconds, suddenly became like one minute, boom.
And obvious example is like we had an index scan, but suddenly the planner, and usually it happens
in complex queries, right?
So joins, sub queries, CTE, multi-stages, like it's something complex.
And then boom.
While performance clips for me, like it's macro level, like some lightweight lock, contention happens and didn't happen and then suddenly it happens.
There is no official terminology here, right?
So I'm just reading some blockposts and trying to adjust my own mind to them mostly, right?
So this is how I feel.
So plan flip is negative term, right?
So we don't want it to have.
happen. And we started with difficult example. This company obviously experienced this difficult
example. And my question to them is like what random page cost you had? Did you have statement
timeout, I don't transaction time out, maybe not related in this case, and transaction time
out. So some safeguards, safeguards which would protect from a huge wave of sessions which
experience this bad plan to be executed and also what were your auto-vacom settings
you think that's relevant before we go to default before before we go to an
increase number of buckets with statistics target these are macro settings we need to keep in
mind and this is what I had another yeah I had this is what thought yeah sure this for me felt like a
data skew issue and so they mentioned the column in question was 99.996% null values and this query was
looking for the not null values which of which there were still 17,000 so we're talking about
billions of records but only 17,000 were not null in cases like that like where you've got such a data
skew you could be looking at like for those columns you could be looking at increased statistics
or often you want a partial index on a column like that, right?
Because you don't care about indexing the billions of null values.
You only care about indexing the not null values.
And once you've got the perfect index on that,
I know it could be a more complex query where you want different columns,
but if you've got those columns in the index at where the value is not null,
then actually the planner's almost certainly going to choose that
because it's such the perfect index for that query.
And you don't even have to worry about missing.
estimations, it's never going to pick a sex scan in that case.
Yeah.
Yeah.
So we definitely lack information here.
Yeah.
We didn't see the plans.
We don't know of their settings.
We even don't know the version.
True.
Right.
So we can only guess.
And again, there are difficult situations with plan flips.
I saw quite difficult ones and where obviously like everything was normal.
But what I'm trying to say, you can get a plan flip because you didn't do good job.
configuring postgres well enough and putting these safeguards for all TEP workloads and this is what
we do number one with all the clients during consulting and which also the clients who install our
monitoring and checkup service and this is like number one thing we need to be we recognize we are
OLTP priority is to have the super fast queries and we need to prepare basics right
Unfortunately, they cannot protect us in 100% of cases.
But if you have your auto vacuum not tuned, for example, you have this relatively new setting.
I remember Darafi raised this topic.
We didn't have auto vacuum insert, auto vacuum, vacuum, insert scale factor.
And auto vacuum analysis are there too, right?
So which triggers on insert.
And by default, it's 20%.
And also we have old regular vacuum scale factor,
which triggers on updates and deletes, also 20%.
And for analysis, it's 10%, 0.10, right?
Which means 10% of your table needs to be changed.
If it's a huge table, it's a lot of data.
And again, you're like this data skew, it's a good point.
If you suddenly, you had one distribution,
but suddenly, like, all the new records, they are very different.
and you don't have properly tuned auto vacuum, you accumulate that data a lot.
And then suddenly after 10 or 20% of data changed or inserted, auto vacuum finally ran analyze.
Suddenly we have a flip.
If it happened earlier, maybe it also would be bad.
It's actually a good point.
Maybe still we need the statistics target.
But in my opinion, it's better to have more frequent process.
running on large tables. So statistics would represent actual data better, more often, right?
To allocate more resources, to have more workers. This is what we do all the time with larger
services. And unfortunately, we see, like I would say, two out of three clients are coming.
Most of our clients are startups, which like have, say, database exceeding 100 gigabytes,
up to 10 or 20 terabytes.
This is like a sweet spot for us
where they experienced some problems already
and two out of three roughly
have poorly tuned auto-wacking.
Only two out of three is quite surprising to me.
This is my rough.
I can check statistics.
You need to update your statistics.
Rihanna.
My internal, yeah, I need to ask AI
to maintain statistics.
Not sure about that.
Anyway, but I had a question, though.
Yeah.
Default statistics target is like 100,
and I quite often see people increasing it to a thousand when an issue like this happens,
or at least for like specific columns.
Other than increased analyze time,
which is like important, but not that important for,
and it does increase that much.
And also it's not that important for LTP systems, right?
Like it's a background thing.
It can be, like it's, like, it's,
I'm wondering.
I'm wondering.
During upgrades, though, a little bit of upgrades.
Only some types of upgrades, right?
Because if you've got such an important infrastructure,
OLTP system, you're probably not going to be doing PG upgrade.
Like, you're probably not going to be doing those types of upgrades that care of that analyze.
You're probably going to be doing logical failover.
It still needs to analyze.
You create this logical replica.
You need to analyze this replica.
Not while you're that.
You don't have to be down during that.
That could be done while the other system's serving.
queries. You are right. You are right.
And the statistics will have been getting
updated while it's being populated
because vacuum will be running
and analysed will be running.
So it's not in this.
So I was wondering
for really important RLTP systems,
why not just boost
the default statistics target everywhere
and just pay that extra cost?
Did we raise this question
with Tomash Wander or no?
I don't think so.
Maybe we raised it when he was on
PostGovsky YouTube channel.
I remember the opinion that
100 buckets, it's a lot.
This topic
connects to partitioning
for me. Because when I see
a table with 10 billion records
like
table like multi-terabyte in size
it's
this is like already
quite late. We should have it partitioned
in this case
we would have many tables
and at least
we would have localized problem.
Only for this partition, we have a problem.
And we would have, let's say it had 30 partitions,
we would have 30 times more data in the samples.
Different statistics, yeah.
Yeah.
Yeah.
So, I don't know.
I see people raise it.
I, like, okay, does it affect, analyze?
Yeah, it does.
How much we can afford it?
Okay, set it 2,000 globally.
Okay.
How to prove it, it helps or not?
We need holistic analysis.
plans. Yeah. So we need this is actually we tried to do many years ago before like current state
of art of like things with branching and so on. Maybe we should revisit this topic. I've just realized
I disagree. I don't think this is about on average improving things. I don't think this is a case of
on average I want a slightly better execution plan. I think this is a case of I never want to
completely flip to an awful plan. So it's about risk mitigating the outliers not about improving
I would actually accept slightly worse on average latencies.
I don't talk about average.
I don't talk about average.
I'm talking about complete analysis when you have, for example,
thousands of plants based on production.
You have a clone of production.
And using branching, you can create replicas and compare plans before and after the change.
This is what if framework, which we have with Diby.
And you can compare plants.
There's a question how to properly compare plans at the ground scale.
We talked about this.
before. Our approach is just compare structure and cost and the buffers and also timing. But timing
it's tricky because it's volatile and so on. If cost changes seriously, this is already a signal
to investigate what happened. If structure changes significantly also a signal. Structure, by the
way, maybe like you're not in the whole structure, probably just need to access methods. Like
you had three index scans, suddenly you have two index scans and one hip scan, sequential scan, right? So it's bad.
Right. So what happened? Anyway, this holistic approach is great. So you can have a huge amount of plans. And then you have ability to do the change and compare before, after.
Again, I haven't visited the topic of default statistics target, particularly in this context for quite a while. But it would be great to do it. And again, like I have a bunch of unfinished thoughts. This topic plan flips is unfinished in my head. Like I'm still like,
thinking what's better. So once again, you don't need average or something. It's a classic
optimization problem. You want to optimize something and you don't want everything else to
degrade significantly. You say, I'm okay if something degrades like maximum 10% on some metric.
There's a question of which metric to choose also. But I want this to be improved. This is one
thing. So we have framework. We don't have all the answers yet.
The second thing is how to understand we are on the edge of flip or cliff if you like doesn't
Yeah, how to understand.
I think I actually don't know if this is important as much as how do we stick with what we've got.
I want to predict.
Yeah, but I think it's really difficult to predict bad plan flips without also predicting good ones.
There will be so many false positives.
If you think about how you would do it.
I forgot to mention that in our case, we execute on real database, full size, full clone.
Using database branching, provisioned very quickly.
And we see not only planner, but executor inaction, like realistic testing, right?
Full-flage realistic testing.
We execute thousands, hundreds or thousands of queries samples properly collected.
So, like, it's not we have 100 samples and 90 of them is the same.
query ID. We take query ID, we collect multiple samples for each query ID, and then we execute before and after the change. We developed this framework specifically to predict plan flips for major upgrades.
Yeah, which is it's slightly different, right? I think. No, it's not different. I think it's different. Do you know, I think why it's different is often people talk about plan flips in the context of I didn't change anything and this thing
just happened.
Whereas with a major version upgrade, you know in advance you're doing it.
Major change.
And you can look at, yeah, you have this reference point if we change.
You have new engine, basically.
Yeah.
Yeah.
So I think psychologically, at least, it's different.
I understand that in practice, there's a lot of the same harness.
So that's why I'm saying there are easy plant flips and there are difficult
plans.
Easy, it's when you did some action.
You upgraded postgres.
You jumped between 13 to 18, five versions for that.
example. There are new nodes in plans, right? Never existed before. So definitely new settings,
new configuration parameter defaults. In my opinion, plan flip analysis should be a part of
major post-grews upgrade and we have it for our clients. Yeah, we have it for our clients.
It still requires some, like, it's semi-automated right now. I hope at some point we will
fully automated and it should be automated, in my opinion. And it shines when you have
database branching properly implemented. But,
There are difficult plan flips like.
So to explain how to properly categorize it,
let's a little bit stop and think about what planner,
the planner, what does it take into account when choosing plans?
It's only three things.
It's a version, which defines the version of the planner, basically,
which engine we are using.
and the second it's settings not all settings yeah my current vision is like the planar settings plus
work planar same plus workmaam why i'm saying plus workman because the planner settings there is a
category if you check pg settings one of the columns it will be category right and there is a category
called planar settings but work mem is beyond so plus workman and finally the number three is pg statistic content
And that's it.
It doesn't care about hardware or actual data.
Yeah, true.
It's not actually the data volume.
It's how much data it thinks it has, which is the statistics.
Yeah.
And if the statistics is lagging, it might live in different reality.
That's why I say I would prefer, it might still be wrong,
but I would prefer it be wrong sooner and be closer to reality, right?
Yeah, yeah.
So when we change engine, it's obvious.
We know when it happens.
And before that, we can perform this plan flip analysis holistically according to our methodology.
This is great.
We can predict this plan flip and prepare a resolution for it in advance.
This is what we do with clients.
Second, if we change settings, basically it's the same.
We also know when we do it.
For example, okay, we are going to raise default statistics target globally.
This will help to this plan.
Question, how will other plans behave?
We don't want them to degrade significantly, at least now.
Same approach can happen.
It can be applied.
The third thing is PG statistic content is changing.
Again, I would prefer to be changing more often, like continuously changing.
So more often, auto-analysis should happen more often to reflect the current reality.
For this, we need auto-vacuum tuning.
And then how to predict.
Also, you missed something which is we could gather more statistics.
Like it's a sample and we could increase.
Yeah, this is mitigation activity already.
I agree that moving from 100 to 1,000 sometimes might make sense.
Honestly, like, I don't have strong opinion here.
I have clients with 100, with 1,000.
In the past, I had clients who thought that going down to 10 is a good idea.
I don't have them anymore.
Somehow right now, all the thoughts are to the direction of raising this,
not to reducing this.
Yeah, but I remember we had such discussions.
Maybe we should raise it.
Counterintuitive ideas, sometimes we know.
We don't have it anymore.
Anyway, two first things we usually control.
We don't have self-driving post-gues yet, so we control it.
And we can perform plan flip analysis if we have such two.
Wait, there's one more thing.
It's not available everywhere, but another thing that does affect this is, for example,
if you're using hints with PG-Hint Plan.
Okay.
Okay, hints.
I think that's a big topic, though.
In production.
That's a great topic.
Let's consider the exclusion because it's not a major approach right now.
It's not, it's like very rare.
Is that true?
I hope so.
I think in other databases, this is actually often one of the main.
You just try to open another gate or something.
And I agree, let's open it.
But let's just finish with this second plan while I think about second plan.
Again, if you change major version, you control it, you can predict plan flip.
If you change settings, the same thing.
You control it, you can predict.
If changing PG statistics content due to data changes leads to some plan flip, this is hard to predict.
How to predict it?
When I think about it, I want to be able to check the second plan.
or maybe first five plans which were almost winning.
Because in this case, I can say, okay, I see the winning plan, current plan has cost, say, 500.
But also there is a second plan which has cost 490.
And it's quite close.
If it's close, say below 20%, if difference is below 20%, I will start thinking, might it flip soon.
because this is why I want second time.
I think this is incredibly tricky though.
Maybe it's simple, but for example, parameters matter a lot.
Like per query, different parameters.
If you get, for example, yeah, parameters matter.
Yeah, I agree.
Yeah, a huge amount.
If you look at the solutions at SQL server and Oracle,
bear with me because I think SQL server and Oracle a bit further ahead of postcards on this.
Probably other, maybe DB2 sometimes has some features in this kind of area.
but the solutions they've ended up implementing
are around forcing the existing
so plan pinning
Orrera has it.
POSBORRA has it.
Interesting.
So yeah, these are the solutions
that other databases have ended up implementing
rather than detection of,
rather than trying to predict a plan flip,
what they try and do is pin it or freeze it
and then maybe require approval when it does flip.
So instead of trying to predict it,
they prevent it and then have it
have it as like an opt-in thing.
Maybe this chat changed for the better,
but maybe you want
someone to approve it first.
So there are several things here.
You touch several things.
First, depending on parameters,
I agree with you.
But in a system, which is quite big,
we have usually,
like, Pareta principle is working, basically.
Like, a lot of workload is defined by
limited number of specific plans.
And this is,
this is how we like we have in our plan flip analysis we have methodology to identify not only
normalized queries from PG-star statements but also particular examples particular parameters so we would
reproduce exact queries with exact execution so plans which contribute to workload the most for
example contribute in terms of calls or in terms of total execution and plan time right so we want
to cover like 80% of workload in several directions and several metrics, right?
And this is the magic, like how we collect those samples and prepare our testing query set
for testing, so it would represent a majority of workload.
Once we found a way to do it, there is some tail.
Those queries are less important because if you just think about database time in terms of
execution and planning, if some query defines only minority of that time, even if plan
flips, it wouldn't affect the whole cluster so much, right?
Yeah.
Doesn't make sense, right?
It does.
And it actually reminds me of a different idea.
Maybe I'm wrong, actually.
Maybe I'm wrong.
You know what?
It might be like, okay, now it's only like 1% of a whole time.
And London due to plan flip, boom, it's already half of time.
It reminds me, though, of something that I think could help here.
It's not in Post-Crist.
Actually, maybe there's a couple of examples in Post-Crist's core at the moment,
but the idea that the plan is chosen and then is stuck to no matter what during execution,
it could be, I think Thomas actually brought this up at the end of our call on performance cliffs.
It could be adaptive at one time.
If it realizes it's doing, it's chosen a bad plan,
instead of sticking to it till the end.
We should invite Andrea Lippeuf, who I think, yeah, who, who, who,
can talk about this adaptive query optimization a lot.
I consider him an expert in this area.
Great.
But do you see what I mean?
Yeah.
It's able to adapt during execution.
Maybe we avoid the issue,
or at least it's way less bad when it happens.
Yeah.
And also, like, to finish my thought,
maybe I said maybe this approach is wrong,
maybe if some plan which contributed only like 1%
to whole database time now suddenly half.
Because of this, I think my God told me, we need multiple directions.
One of the directions, when we collect this working set, we also order by calls.
So it should be quite, so for a query, which had the plan flip to influence the whole cluster so much that it causes downtime,
it should be quite frequent enough, I think, right?
If it's super infrequent, it should be fine.
If it's not super frequent, it means that it won't occupy multiple backends in parallel.
Unless it gets really bad, yeah.
Unless you forgot to tune the statement timeout or transaction time.
Yeah, we have multiple vectors, and I think this is a good approach, good enough approach.
Maybe it can be improved.
So I'm very curious about this area, and I'm coming from practice.
Yeah, I do think there's still complex cases where you might have clusters of even the same query in PG-Stat statements could have
four different plans that are actually all good.
But let's say you're a multi-tenant system.
Maybe small customers look like, or tiny customers have this plan,
huge customers that are typical have this plan,
huge customers that have this weird distribution,
or maybe like healthcare customers look different for some other.
Yeah.
Yeah.
And in analysis, again, I said, like,
we collect multiple plans per query ID.
Why multiple because of this?
But so we like, say top five for each query ID,
we collect multiple of them.
But all of them are like we take from this top N by H metric.
So we try to capture the concept of how influential this query is.
And multiple plans might appear.
It would be great to have plan ID, right?
And shift aggregation logic there.
But we have query ideas, the main tool.
I think PG-analyzed Lucas, I think, did some work on a plan ID.
Yeah, yeah.
Yeah, there's a soil, and I commented on LinkedIn, and it's a great work.
Absolutely needed, I think.
Yeah, it's great work.
So then, like, if we have this, the question, two questions, like, first, like, how to understand that some plan candidate is approaching and suddenly can flip soon?
Because data, in large tables, data usually is not changing too much suddenly.
I agree with you.
It's like gradual.
Yeah, it's quite gradual.
So the planar behavior involving large tables, it's dangerous when we have two plans.
This is my hypothesis, right?
So again, for me, a lot of open questions still.
It might be like on the edge.
And I want to feel this on the edge for each query idea or like for each plan in advance,
slightly in advance to be able to predict.
I think someone recently had some work in this area to show
plan candidates. I might be mistaken, but I saw something somewhere.
Didn't dive deep. And the second thing is, in my opinion, hints and plant freeze are two
great features. I put them to enterprise basket. Like this is, for me, it's enterprise thing.
Because it's interesting. Do you need plans? Do we need plant free? For larger systems,
probably we should have both. But I also understand the reasoning.
that it makes a system complex and it requires a lot of management.
There is a price you pay.
If you put some, you start to be responsible for it.
You need to maintain it.
And if you throw some plan, the same thing.
You're responsible now, right?
Basically, do we have capacity to manage it?
Because maybe later it helps less and maybe then it's not helping but opposite.
That's the problem.
Absolutely.
But I do think it's interesting that these other databases
is given the choice between the two went in that direction.
I'm not saying it's a good idea,
but they've clearly thought about it,
and there's some merit to it,
and it's been working for many years,
of course with issues,
of course with unintended side effects,
but it is a solution, I think, worth considering.
So, yeah, this is good.
I thought it was actually really interesting.
A lot of open questions.
Of course, I wish,
I saw you send me this Postgrease 19 improvements
in terms of plan management,
and HIN plans mentioned that.
I wish Aurora team open-sourced this.
It's not Aurora specific thing.
It's called APG plan MGM extension.
APG plan management.
So it's a great thing,
and I wish it existed in the open-source ecosystem.
I think eventually it should exist in open-source ecosystem.
Because when I wish,
would use it. When suddenly we have plan flip, we know what plan we would prefer. And let's just
pin it right now before we apply permanent solution, just mitigate as fast as possible without any
like deployments. But let's say we're, let's say we're Clark and we've got, we know we've got
15 to 20 really important queries. They're on the hot path for every single auth request.
They're really frequent. They think the thousands of times a second. We could just
We could deliberately pin all of those or put hints into all of those to maintain.
So this is different.
So what I'm saying is temporarily pin before we apply permanent solution.
What you say, permanent pin, which means there might be some cost you will pay to maintain it.
Maybe, yes, but my point is at least we then can guarantee it won't go bad.
Like maybe it will gradually degrade and we can look at fixing.
If it's going to degrade, I think it's more likely to degrade gradually instead of this all of a sudden,
which I think we can then fix Monday to Friday 9 to 5 instead of when there's a huge database outage.
And again, just partition your tables.
I know it costs some effort, but if I are it easier.
And gather statistics on the parent partition.
Yeah, which is another topic.
It's fixed.
Lawrence Albuhr from Cybertech
fixed it
I think it will be in 19
Nice
So automatic gathering of statistics
Yeah so it's headache
During upgrades especially
I think it's solved now
I mean now
In the future
Yeah
Yeah
Great good
Nice one
Yeah enjoy it
Thank you
Likewise
Thanks so much
And see you
See soon
