Postgres FM - Planning time
Episode Date: September 20, 2024Nikolay and Michael discuss planning time in Postgres — what it is, how to spot issues, its relationship to things like partitioning, and some tips for avoiding issues. Here are some links... to things they mentioned:Query Planning (docs) https://www.postgresql.org/docs/current/runtime-config-query.htmlAre there limits to partition counts? (Blog post by depesz) https://www.depesz.com/2021/01/17/are-there-limits-to-partition-countsNikolays recent experiment https://postgres.ai/chats/01920004-a982-7896-b8cb-dfd2406359b0PgBouncer now supports prepared statements https://github.com/pgbouncer/pgbouncer/releases/tag/pgbouncer_1_21_0“The year of the lock manager’s revenge” (from blog post by Jeremy Schneider) https://ardentperf.com/2024/03/03/postgres-indexes-partitioning-and-lwlocklockmanager-scalabilitypg_stat_statements.track_planning https://www.postgresql.org/docs/current/pgstatstatements.html#id-1.11.7.42.9.2.4.1.3pg_hint_plan https://github.com/ossc-db/pg_hint_plan~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artwork
 Transcript
 Discussion  (0)
    
                                         Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
                                         
                                         I am Michael, founder of PgMastered, and this is my co-host Nikolai, founder of Postgres AI.
                                         
                                         Hey Nikolai, what are we going to be talking about today?
                                         
                                         Hi Michael, let's talk about planning time and what can increase it, potentially.
                                         
                                         Nice, and this was your suggestion. Why is it something you're thinking about at the moment?
                                         
                                         Well, we are making circles around this topic all the time
                                         
                                         because basically almost all of our customers,
                                         
                                         which mostly are fast-growing startups,
                                         
    
                                         they reach the point where partitioning is needed.
                                         
                                         They reach the point where log manager contention happens,
                                         
                                         number of indexes becomes concerned,
                                         
                                         and also many of them still don't use prepare statements,
                                         
                                         mostly because PgBouncer didn't support it until very recently, right?
                                         
                                         This means that planning time being basically unnoticed for quite long
                                         
                                         suddenly becomes an issue.
                                         
                                         And we often show them,
                                         
    
                                         oh, it looks like planning time is an issue in these cases.
                                         
                                         Actually, especially partitioning things,
                                         
                                         there is always a question, especially for time series,
                                         
                                         it's always a question how big partitions should be, which can be translated
                                         
                                         to how many partitions you will have, right?
                                         
                                         And yeah, and if you partition not well, you may end up having many thousands of partitions.
                                         
                                         They will be very small.
                                         
                                         But if you don't have prepared statements,
                                         
    
                                         you can end up having a huge problem with planning time.
                                         
                                         And planning time can exceed execution time for simple primary key lookups, right?
                                         
                                         This is not fun.
                                         
                                         Yeah, it's not the only case of that, right?
                                         
                                         But it definitely makes sense that it's a common one.
                                         
                                         Yeah.
                                         
                                         Basically, last week we discussed with a couple of companies
                                         
                                         how to properly implement partitioning.
                                         
    
                                         And this question was in the center,
                                         
                                         what size of partitions to choose.
                                         
                                         The rule of thumb is like,
                                         
                                         don't let them be larger than 100 gigabytes, right?
                                         
                                         Because this is like golden rule of partitioning.
                                         
                                         It's time to apply partitioning
                                         
                                         when you have risks already table exceeded 100 gigabytes.
                                         
                                         But it also can be applied to partitions.
                                         
    
                                         Also, they should be balanced in terms of size.
                                         
                                         This is usually a good idea to do.
                                         
                                         But on the other end, if you make them tiny,
                                         
                                         you end up having a lot of them and it's not good. Why it's not good? Because of planning time, mostly. This is not the only reason why you don't
                                         
                                         want a lot of partitions, but this is one of the biggest reasons. If you have, for example,
                                         
                                         10,000 partitions, you might have dozens of milliseconds or maybe a second of planning time for each query.
                                         
                                         If prepare statements are not used, every time a SQL query is executed,
                                         
                                         planning is happening in Postgres, right?
                                         
    
                                         Yeah, and you're talking about the best case.
                                         
                                         Planning time can expand depending on the complexity of the query,
                                         
                                         and you're talking about for the simplest possible query.
                                         
                                         Very simple.
                                         
                                         Bit real index scan.
                                         
                                         Yeah.
                                         
                                         Is it worth us explaining what are we talking about?
                                         
                                         What is planning time?
                                         
    
                                         Yeah, let's do it.
                                         
                                         So when query is executed, there are stages, right?
                                         
                                         Do you remember stages?
                                         
                                         Planning, it means,
                                         
                                         the planner is based on statistics
                                         
                                         and it considers,
                                         
                                         the goal of the planning
                                         
                                         is to choose the best plan
                                         
    
                                         as fast as possible.
                                         
                                         It might be wrong.
                                         
                                         Very often it's wrong, actually.
                                         
                                         Not very, it happens.
                                         
                                         It happens in,
                                         
                                         for me, it's every day I deal with such cases
                                         
                                         because I see
                                         
                                         I see usually problems not
                                         
    
                                         not like ideal situations right
                                         
                                         that's like
                                         
                                         this is
                                         
                                         when my team and I are involved
                                         
                                         when something is wrong
                                         
                                         so it needs to choose as fast
                                         
                                         as possible the
                                         
                                         like kind of optimal plan
                                         
    
                                         and then when plan is chosen, execution happens.
                                         
                                         Executor executes according to the plan. And there are many, many, many possible plans.
                                         
                                         Many possible plans. Sometimes I think infinite number of possible plans. Right?
                                         
                                         Well, it depends, right? if you're doing a single table
                                         
                                         select and there's no indexes there maybe it's only one possible plan but yeah once you include
                                         
                                         what about prioritization already a couple of possible plans right yeah but all i mean is as
                                         
                                         the complete it gets important like for example when you start joining multiple tables
                                         
                                         and you start to realize that you have multiple scan choices multiple join orders multiple join
                                         
    
                                         types that you could multiple indexes are present uh planning needs to choose them properly so yeah
                                         
                                         and these things are combinatorial these things multiply to create the number of total possible plans. And it quickly, as the number of joins, for example, increases, that can explode exponentially, the time. So there are some things in Postgres 2.
                                         
                                         There's also a genetic optimizer, right? Yes. So because it can, well, I guess the simple version is that the planner isn't always looking for the best plan in those cases.
                                         
                                         It's sometimes looking for a good enough plan in a reasonable timeframe.
                                         
                                         Because if we're taking way longer to plan the query, if we take multiple seconds to plan a query that's probably going to execute in a few milliseconds, it's probably a complete waste of time.
                                         
                                         So there's
                                         
                                         a balancing act here for complex queries but for simple queries it's yeah it doesn't make sense to
                                         
                                         be paying dozens of milliseconds for a 0.2 millisecond primary key lookup right so the
                                         
    
                                         process of finding ideal plan is called optimization plan optimization and we the planner deals with basically several
                                         
                                         things first of all decisions are of course heavily depend on the version of postgres you
                                         
                                         use because all the time improvements are being made second of all it uses statistics currently
                                         
                                         present in postgres so that's why it's important to keep AutoVacuum tuned
                                         
                                         so it maintains the statistics.
                                         
                                         And this is not only PG statistics.
                                         
                                         It's also estimated number of tuples in Postgres rel pages.
                                         
                                         It's number of pages the table or index have.
                                         
    
                                         This is super important.
                                         
                                         And finally, it also takes into account the settings Postgres
                                         
                                         currently has. Interesting point, as I also mentioned many, many times, those who listen to us
                                         
                                         not recently, they know it, right? Interesting that Planner has no idea about real physical
                                         
                                         capabilities of your machine. It doesn't know what type of processor you have,
                                         
                                         how many cores, how fast disks are,
                                         
                                         or how many gigabytes of RAM you provided.
                                         
                                         It has no idea.
                                         
    
                                         It relies on the settings.
                                         
                                         First of all, the costs, right?
                                         
                                         So random page cost, sec page cost, and so on,
                                         
                                         like CPU tuple cost and so on.
                                         
                                         And also settings like workmem and some estimate of available memory called effective cache size.
                                         
                                         And you can put any number there, even if you don't have those gigabytes of RAM. This is bad,
                                         
                                         because if you forgot to tune your server, Postgres might not know that you have a lot of RAM,
                                         
                                         like a terabyte of RAM,
                                         
    
                                         but Postgres still thinks you have only one gigabyte, for example.
                                         
                                         It's easy to be in this situation, and it's bad.
                                         
                                         But at the same time, it's good for us who deal with a lot of experiments
                                         
                                         because it means that we can fool Postgres and say,
                                         
                                         oh, you know, we have the same number of gigabytes as production has.
                                         
                                         And this helps us to reproduce the problem in lower environments
                                         
                                         when we have a much weaker machine.
                                         
                                         This is great for experimentation.
                                         
    
                                         So good and bad.
                                         
                                         So that's it.
                                         
                                         This is everything that matters when optimization happens,
                                         
                                         when the planner chooses the plan the executor will be using.
                                         
                                         Yeah.
                                         
                                         Anybody that's familiar with explain,
                                         
                                         I've had success describing planning time
                                         
                                         as the time it takes Postgres to come up with that explain plan.
                                         
    
                                         That's what it's doing.
                                         
                                         And you can even see this. By default, explain on its own doesn't show planning time.
                                         
                                         But if you do explain summary, so explain with the parameter summary,
                                         
                                         it will show you planning time as part of that. So it's quite a cool way of thinking.
                                         
                                         In the bottom.
                                         
                                         Yes, exactly. Or if you ask an executor to be involved and you run explain analyze,
                                         
                                         then summary is being turned on automatically, right?
                                         
                                         Yeah, and you'll see the planning time and the execution time then.
                                         
    
                                         And the planning time is roughly the time it took to run explain on its own.
                                         
                                         And the execution time, in addition to the planning time, is the time it took to run explain on its own. And the execution time, in addition to the planning time,
                                         
                                         is the time taken roughly to run explain analyze.
                                         
                                         So that's quite a cool way of thinking about the difference.
                                         
                                         Yeah.
                                         
                                         And worth mentioning, again, those who listen to us for long know it very well,
                                         
                                         the word buffers.
                                         
                                         Since some version, like several years ago, it was implemented,
                                         
    
                                         I think it wasgres 13 maybe.
                                         
                                         You can use buffers, expand buffers even without analyze,
                                         
                                         only planning phase,
                                         
                                         and you can see how many buffers were involved into planning.
                                         
                                         And this is super cool because usually bad timing
                                         
                                         is because of a lot of IO, which is buffers.
                                         
                                         IO is number one reason why Postgres can be slow in anything.
                                         
                                         Yeah.
                                         
    
                                         And you might think, what's Postgres reading during planning time?
                                         
                                         Yeah.
                                         
                                         Actually, you know what?
                                         
                                         I explained some picture, which is maybe true in 99%,
                                         
                                         but I now remember a case where something else was happening
                                         
                                         it was some interesting query I don't remember how many tables but definitely it was a join
                                         
                                         maybe a couple of joins and I remember planning time was completely out of normal.
                                         
                                         It was like half a minute or so.
                                         
    
                                         I think I remember this case.
                                         
                                         Was it a merge join?
                                         
                                         Exactly, yeah.
                                         
                                         It was not merge.
                                         
                                         Merge join was not considered the best plan.
                                         
                                         It was not chosen, but it was the reason why it was so slow considering merge join path the planner
                                         
                                         used not only what i just described but additional information it tried to find
                                         
                                         minimum or maximum values for some subsets of data edges real values from table so it needed to to during planning time it needed real values
                                         
    
                                         and this due to some reason was super slow so planning became many many many seconds exceeding
                                         
                                         statement timeout and failing it was extremely hard to diagnose actually and i actually do you remember why yeah why what
                                         
                                         why it was hard to diagnose because explain doesn't show other options it shows only the winner
                                         
                                         well and also extra reasons it was difficult i believe from i remember talking to you at the time
                                         
                                         auto explain doesn't show planning time and therefore when you're trying to it's not
                                         
                                         a problem for us because well we didn't use at that time for this project we didn't use auto
                                         
                                         explain at all but we had dblab engine installed and i knew when these problems started happening
                                         
                                         so i quickly rewinded created snapshot and clone.
                                         
    
                                         So yeah, I worked on the clone and reproduced the problem easily.
                                         
                                         Interesting that like one hour later, it was different.
                                         
                                         And statistics changed somehow and Planner stopped considering merge path, merge join path.
                                         
                                         Yeah.
                                         
                                         So it auto healed, but it was super scary what happened one hour ago, right? But thanks to DbLap Engine, we easily reproduced it.
                                         
                                         So I agree with you, auto-explain, not showing planning time at that time.
                                         
                                         It was an issue, but not in this project particularly.
                                         
                                         What was an issue is you reproduced it, you see super slow planning time,
                                         
    
                                         but you have no idea what's happening and you try to understand if we had a
                                         
                                         way to see second best third best plans yeah in this case we probably yeah second best probably
                                         
                                         would be merge join it lost eventually but during consideration of it it was like posgus planners
                                         
                                         spent a lot of time considering it because it needed
                                         
                                         real values. I think what helped
                                         
                                         in that case actually is
                                         
                                         friend's
                                         
                                         advice because Kukushkin first
                                         
    
                                         said, have you tried set
                                         
                                         enable merge join to off?
                                         
                                         This is weird.
                                         
                                         But once I did it,
                                         
                                         planning time went to almost zero.
                                         
                                         Like, boom.
                                         
                                         It was an interesting case.
                                         
                                         And, of course, when you try to dig deeper,
                                         
    
                                         you can use, like, Perf or BPF
                                         
                                         and understand what's happening under the hood
                                         
                                         and in which functions Postgres spends time.
                                         
                                         And you quickly find its merge join path.
                                         
                                         And then read the code because it's open source.
                                         
                                         This is cool.
                                         
                                         Read the code and understand that Postgres analyzing merge join
                                         
                                         sometimes needs real values from table.
                                         
    
                                         Wow.
                                         
                                         It was exotic, honestly.
                                         
                                         Yeah, that is exotic, but it's also very helpful in understanding how the optimizer is working.
                                         
                                         Because it isn't exhaustively calculating every plan that is possible.
                                         
                                         It can give up on plans once it gets to a sufficiently high cost.
                                         
                                         So because the way enable merge join off is working, it's penalizing merge.
                                         
                                         It's not saying you can't ever have a merge join.
                                         
                                         It penalizes it cost-wise, at which point the planner gives up on that.
                                         
    
                                         Yeah, it discards very quickly because cost is already super high.
                                         
                                         But I don't think that's how everybody thinks of how the planner works.
                                         
                                         They think it's going all the way to the end of each
                                         
                                         option but it kind of bought early and this is one of those trade-offs it's making it will sometimes
                                         
                                         make a mistake because it seems like it's going to be an expensive plan but then it isn't like it
                                         
                                         so it isn't always looking for in my opinion the fastest plan it's looking for a fast enough plan
                                         
                                         like that's quite like in well maybe i'm wrong in that like it is it is looking
                                         
                                         for the fastest but it uses these shortcuts to avoid spending excessive time to do so
                                         
    
                                         yeah yeah exactly as usual i like exotic and hard problems but let's maybe switch to simple things. Yeah, good idea. You sent me a very good post
                                         
                                         by Depeche,
                                         
                                         Hubert Lubashevsky, Depeche, right?
                                         
                                         And it was three years
                                         
                                         ago. He was curious
                                         
                                         about, is it really
                                         
                                         so that Postgres doesn't
                                         
                                         allow more than 65,000
                                         
    
                                         partitions?
                                         
                                         And it turned out it's not
                                         
                                         so. He reached 100,000 partitions, and it turned out it's not so. He reached 100,000 partitions, right?
                                         
                                         But at the same time, Hubert checked the planning time.
                                         
                                         How exactly the planning time depends on the number of partitions.
                                         
                                         And this is back to the initial reason you were looking into planning time.
                                         
                                         Exactly, yeah.
                                         
                                         We don't want a lot of partitions, many, many thousands of them,
                                         
    
                                         because planning time can be already high,
                                         
                                         dozens of milliseconds, maybe hundreds of milliseconds, and so on.
                                         
                                         And basically, the growth is linear.
                                         
                                         And if you double the number of partitions,
                                         
                                         you double planning time, likely.
                                         
                                         So, of course, it depends on the query you use, I guess.
                                         
                                         But the simple rule is just if you have a lot of partitions, you have bad planning time.
                                         
                                         It means we need to be...
                                         
    
                                         I also think Postgres versions matter a lot here.
                                         
                                         Many optimizations were made, right?
                                         
                                         Well, I'm aware of one. So, Dipesh's test, he says that in the comments that he tested on version
                                         
                                         14, which is important because there's a commit in version 15 a couple of years ago from David Rowley
                                         
                                         that says that in the release notes, it said, improve planning time for queries referencing
                                         
                                         partition tables. This change helps when only a few of many partitions are relevant which is the case here where we're doing a primary
                                         
                                         key lookup like if we're just just searching for a single row in a single partition all of this you
                                         
                                         know the hundred thousand partitions only one of them is relevant to this query we can prune that
                                         
    
                                         at planning time right in those cases, planning time should be much reduced
                                         
                                         as of version 15 onwards.
                                         
                                         Well, you know, I use different table structure and queries,
                                         
                                         but it also was primary key lookup.
                                         
                                         But basically, I reproduced the same experiment
                                         
                                         using our AI system.
                                         
                                         And I'm just checking right now,
                                         
                                         Depeche had for 10,000 partitions, he had like 40
                                         
    
                                         milliseconds or so, right? 30, 40 milliseconds. It's pretty much the same what we observed with
                                         
                                         our AI system. But I also used so-called shared environment experiments. So I was not alone on the machine. So that's why I involved buffers in my analysis.
                                         
                                         Because this time growth, linear time growth, according to my benchmarks,
                                         
                                         is explained by linear growth of buffer heats associated with planning. Yeah, so if you have 10,000 partitions,
                                         
                                         it might be several dozens of milliseconds.
                                         
                                         Yeah.
                                         
                                         So what do you think?
                                         
                                         What are you looking for?
                                         
    
                                         I was looking at Depeche's post just to see
                                         
                                         if he gave enough details around how,
                                         
                                         like, did he pick the best of,
                                         
                                         you know, sometimes when you're doing
                                         
                                         this kind of experiment,
                                         
                                         you would run the query a few times
                                         
                                         and pick the fastest or the average.
                                         
                                         Several, yeah.
                                         
    
                                         I didn't.
                                         
                                         He used several, not many,
                                         
                                         I think three or maybe five runs only.
                                         
                                         I used only one run for now.
                                         
                                         I think it's a good idea to check,
                                         
                                         to convert this experiment to more like full-fledged experiment,
                                         
                                         maybe on separate machine,
                                         
                                         and maybe even with PgBench and have many, many competing,
                                         
    
                                         like not competing, concurrent selects.
                                         
                                         The idea is it's again about buffers,
                                         
                                         many, many buffer hits for planning.
                                         
                                         And this explains the growth of planning time yeah
                                         
                                         i like how in your experiment i like how clearly linear the chart with buffers is whereas the the
                                         
                                         one with timing it just it looks like it might be linear but there's enough uh noise that it's
                                         
                                         it's tricky to see although there are a couple of dips in the in the buffers chart so there's
                                         
                                         something interesting going on there.
                                         
    
                                         Yeah, I'm also interested. And we also discussed before this call that
                                         
                                         it really matters, is it first query when you just connect it or even not involving
                                         
                                         prepared statements. If you run the very first query, you have extra penalty because
                                         
                                         in this connection connection information from system
                                         
                                         catalogs is not cached at all and this adds extra to like extra buffer hits during planning time
                                         
                                         yeah that's what we thought it was i don't know for sure and it seems really interesting like what
                                         
                                         what it could be that's going on there the reason I thought it might be that is I've seen in the past when running
                                         
                                         experiments,
                                         
    
                                         if I run analyze,
                                         
                                         like not nothing to have explained in this case,
                                         
                                         just gathering new statistics for the table.
                                         
                                         I've seen planning buffers spike on the next query execution and then come
                                         
                                         back down on the second execution.
                                         
                                         So that makes me think it might be statistics related,
                                         
                                         but yeah,
                                         
                                         I've seen vacuum drastically reduce buffers
                                         
    
                                         as well not vacuum analyzed just vacuum on its own and definitely not vacuum full so whilst buffers
                                         
                                         are much more reliable and stable than timings for this kind of experiment there are a few things
                                         
                                         that can affect buffers still yeah. So anyway, I couldn't reproduce
                                         
                                         vacuum thing and
                                         
                                         analyze thing, but I can imagine if
                                         
                                         you have complex query,
                                         
                                         rebuilding statistics leads to
                                         
                                         many more hits
                                         
    
                                         from PGS statistic, for example,
                                         
                                         of new data just collected.
                                         
                                         Especially if you have a lot of
                                         
                                         buckets of default statistics
                                         
                                         target is increased and so on.
                                         
                                         It's a lot of data to hit additionally.
                                         
                                         I don't know. It's interesting.
                                         
                                         But anyway, back to partitioning,
                                         
    
                                         the current recommendation would be
                                         
                                         let's not allow number of partitions grow
                                         
                                         to many, many thousands,
                                         
                                         unless we are okay with increased planning time.
                                         
                                         We can be okay with increased planning time if we use, first of all,
                                         
                                         puller matters.
                                         
                                         Yeah, true.
                                         
                                         The situation when it's the very first query in session just established,
                                         
    
                                         it's very rare, right?
                                         
                                         It's good if it's really rare.
                                         
                                         And the puller helps.
                                         
                                         Even application side puller helps.
                                         
                                         Yeah. And second, of course, prepare statements just help us shave off a lot of planning time situations.
                                         
                                         You remember Dirk from IDN discussed problems.
                                         
                                         Even if you use prepare statements, still there might be interesting problems.
                                         
                                         And we had a very interesting episode, number 100, right?
                                         
    
                                         It was an interesting discussion.
                                         
                                         But it's already an advanced topic.
                                         
                                         And at bare minimum, it's good if you know planning time is going up.
                                         
                                         It's good to consider prepared statements.
                                         
                                         And Pitcher Bouncer supports them now, so it's great.
                                         
                                         And I must say, like, dozens of milliseconds,
                                         
                                         it's already a super slow overhead for planning.
                                         
                                         If we aim to have all queries fast,
                                         
    
                                         fast for me means in all TP, as usual.
                                         
                                         This is episode number one.
                                         
                                         What is slow query?
                                         
                                         Slow query is if it exceeds 100 milliseconds, 200 milliseconds,
                                         
                                         because this is human perception, 200 milliseconds,
                                         
                                         most people already see it's quite slow.
                                         
                                         And if one HTTP request consists of multiple SQL statements,
                                         
                                         this means we should be below like 10 milliseconds already.
                                         
    
                                         But if planning time adds 50 milliseconds, for example,
                                         
                                         if we have like 20,000 partitions, for example,
                                         
                                         wow, it's not a good situation to be in, right?
                                         
                                         So prepare statements should be the good tool to fight with this problem.
                                         
                                         Or just keep partition number low.
                                         
                                         Well, a few times you've said, and I completely agree with you,
                                         
                                         that our primary server's CPU is our most limited resource.
                                         
                                         And planning time is CPU.
                                         
    
                                         It's a pure use of that.
                                         
                                         And if we can shave it, why not?
                                         
                                         It's CPU, but I insist it's game buffers.
                                         
                                         It's just buffer heats.
                                         
                                         It's related to memory.
                                         
                                         But yes, it's CPU, yes. But still, it's game buffers. It's just buffer hits, it's related to memory. But yes, it's CPU, yes.
                                         
                                         But still, it's I.O.
                                         
                                         I consider I.O. in a wide sense
                                         
    
                                         when we involve communication to memory also considered I.O.
                                         
                                         Not disk I.O., of course, in this case.
                                         
                                         Most of the time.
                                         
                                         It can be buffer reads as well, but in rare cases.
                                         
                                         Sometimes we see buffer reads involved into planning as well
                                         
                                         you mean like shared red rather than shared hit yeah i call them shared buffer reads because
                                         
                                         yeah we discussed it well this i consider this bad naming because it's IEO operations, not amount of data.
                                         
                                         Right?
                                         
    
                                         Right.
                                         
                                         Yeah.
                                         
                                         So, and what else?
                                         
                                         Indexes, right?
                                         
                                         If we have a lot of indexes, planning time is increasing, increasing, increasing also.
                                         
                                         And I suspect it's not linear even, but it's worth double checking.
                                         
                                         So, if we have, for example, 10 indexes, that's some penalty and planning time in a primary queue lookup can be already like one millisecond or so, half of millisecond.
                                         
                                         But if we go, like if our table has many columns, we are tempted to have dozens of indexes.
                                         
    
                                         And that's really bad. And also, not only planning time increases itself due to buffer hits.
                                         
                                         Again, buffer hits also grow here.
                                         
                                         But also locking.
                                         
                                         When planning happens, Postgres applies access share lock to each object involved.
                                         
                                         All indexes of the table are logged by access share lock.
                                         
                                         And this is quite expensive, I would say.
                                         
                                         If you have thousands of queries per second for this primary key lookup,
                                         
                                         it might be a problem, especially if you exceed 15 indexes
                                         
    
                                         because total number of objects locked already reaches 16,
                                         
                                         which is threshold where when Postgres cannot, like above 16,
                                         
                                         it cannot use fast path for for locking in LockManager anymore.
                                         
                                         And we discussed this as well.
                                         
                                         Is that number being increased?
                                         
                                         I'm not sure.
                                         
                                         There are discussions to make these things configurable.
                                         
                                         Actually, I think I raised this question.
                                         
    
                                         It turned out not to be so simple.
                                         
                                         I'm not sure.
                                         
                                         I think in 2017, nothing happened with this.
                                         
                                         So what helps here?
                                         
                                         Add more CPU, more cores, or decrease number of indexes,
                                         
                                         or again, prepare statements, right?
                                         
                                         Prepare statements.
                                         
                                         What else?
                                         
    
                                         What else helps in this situation when you need?
                                         
                                         Yeah, I don't know.
                                         
                                         Maybe even denormalizing.
                                         
                                         If you've got a table with 15 indexes,
                                         
                                         why do you have 15 indexes?
                                         
                                         Or this also helps.
                                         
                                         Decrease QPS of this kind.
                                         
                                         Caching on application side.
                                         
    
                                         If you have 5,000 queries per second,
                                         
                                         I saw it all, I see
                                         
                                         it all the time. So
                                         
                                         every click an application
                                         
                                         reads from users table checking
                                         
                                         who is this.
                                         
                                         Nothing changed. So
                                         
                                         cache it. Why are you reading
                                         
    
                                         this all the time? Or Rails, recently
                                         
                                         we discussed with one customer.
                                         
                                         Rails code is constantly
                                         
                                         bombing Postgres system
                                         
                                         catalogs, checking some pgclass,
                                         
                                         pgattributes, and so on.
                                         
                                         Checking schema all
                                         
                                         the time, all the time, like thousands per
                                         
    
                                         second. Why?
                                         
                                         On replicas, nothing changed.
                                         
                                         It's changing, but not so fast.
                                         
                                         So, lack of caching
                                         
                                         everywhere, and this quickly can become a problem especially if you don't use It's changing, but not so fast. So lack of caching everywhere.
                                         
                                         And this quickly can become a problem, especially if you don't use prepared statements again.
                                         
                                         But interesting that we can combine this problem and they can amplify.
                                         
                                         If you have, say, 10 indexes, it's enough to have like three partitions. Even two partitions is enough if you haven't checked partition pruning and some queries involve both partitions.
                                         
    
                                         You already have a situation where like 22 objects need to be locked
                                         
                                         by access share lock and lock manager contention might strike you.
                                         
                                         Jeremy Schneider had a great overview of various works in this area from various companies.
                                         
                                         It's both about execution time, but mostly about planning time,
                                         
                                         because the planning time of these problems is worse,
                                         
                                         because this is when Postgres locks all the objects.
                                         
                                         During execution, it locks only what is really needed.
                                         
                                         So, yeah, it's the worst case.
                                         
    
                                         You have partitioning, not many partitions,
                                         
                                         but also indexes and then a lot of QPS,
                                         
                                         and it can put your server down completely
                                         
                                         because of log manager, lightweight log contention.
                                         
                                         Yeah.
                                         
                                         I think you've talked about like people doing the right people
                                         
                                         who could be doing things for good reasons still getting caught up but i think there
                                         
                                         are also a couple of cases where like around partitioning people running a lot of queries
                                         
    
                                         without the partition key being supplied in them like that kind of thing can... How to catch those situations? What would you do?
                                         
                                         How to verify that?
                                         
                                         For example, imagine we are prepared.
                                         
                                         We prepared already good partitioning approach.
                                         
                                         We have zero downtime conversion, everything.
                                         
                                         And we need to double check that all queries
                                         
                                         will have partition pruning involved.
                                         
                                         How to check it?
                                         
    
                                         Holistically. Good question. all queries will have partition pruning involved. How to check it?
                                         
                                         Good question.
                                         
                                         Well, actually, that brings me back to an idea.
                                         
                                         We've got PGStats statements,
                                         
                                         and we can now look at some planning-related data.
                                         
                                         As of version 13, it was a big release for planning.
                                         
                                         By default, it's not on.
                                         
                                         You need to enable it.
                                         
    
                                         Yeah, why is it off?
                                         
                                         Good question. Since it's off,. You need to enable it. Yeah, why is it off? Good question.
                                         
                                         Since it's off, many people, including actually myself,
                                         
                                         still have some fears that observer effect might strike you in some cases.
                                         
                                         So, yeah, honestly, yesterday we discussed with one of our clients, we discussed that we should definitely consider enabling it.
                                         
                                         It's a good thing to have.
                                         
                                         It would be good to hear what the observer effect is in that case.
                                         
                                         I think we had experiments checking this with synthetic benchmarks
                                         
    
                                         and it was very low, this overhead.
                                         
                                         But it doesn't eliminate...
                                         
                                         Synthetic benchmarks don't eliminate all the fears, right?
                                         
                                         Of course, of course.
                                         
                                         So, yeah.
                                         
                                         I'm curious if people have planning time.
                                         
                                         Well, I know projects heavily loaded,
                                         
                                         huge clusters, planning time enabled.
                                         
    
                                         So you mean pgsetstatements.trackplanning?
                                         
                                         Not planning time, sorry.
                                         
                                         Yes, yes.
                                         
                                         It's not only time, it's also
                                         
                                         I.O., right? I mean buffers
                                         
                                         for planning phase.
                                         
                                         They are fine, but
                                         
                                         I'm curious about opposite, when people
                                         
    
                                         are not fine and decided to keep it off.
                                         
                                         Otherwise, I will call
                                         
                                         another default bullshit.
                                         
                                         Right?
                                         
                                         We have a collection
                                         
                                         of them, like random patch cost or this one's i think this
                                         
                                         one's even worse because pgstat statements isn't on by default so people already have to turn that
                                         
                                         on and they already agree to pay overhead price yeah but also it means that in a default ship like
                                         
    
                                         if postgres ships by default even if this parameter was on by default, it would be off still
                                         
                                         because pgstat statements isn't on.
                                         
                                         So I think this one might be one of the easier ones to change.
                                         
                                         Yeah, yeah.
                                         
                                         Well, we need to check discussion.
                                         
                                         I might ask our AI to check what was discussion and provide summary.
                                         
                                         But you asked how to monitor for that.
                                         
                                         I would expect to see issues there quite quickly.
                                         
    
                                         I'm interested in such issues.
                                         
                                         So anyway, I agree this is important.
                                         
                                         Observability for planning overhead.
                                         
                                         Also just P95, P99,
                                         
                                         like query latencies I would expect to suffer.
                                         
                                         But I think also I have in mind an approach we discussed with a couple of clients actually over the last few weeks.
                                         
                                         If you enable partitioning, it's worth checking with auto-explain in lower environments, enabling it for everything and just having it enabled and
                                         
                                         playing functionality.
                                         
    
                                         It's
                                         
                                         about testing coverage, right?
                                         
                                         If you cannot reproduce whole workload, of course
                                         
                                         there are chances that you miss something.
                                         
                                         But it's good. Then you can just
                                         
                                         analyze logs and see
                                         
                                         if there are plans
                                         
                                         which involve multiple partitions.
                                         
    
                                         But you need to partition properly as well.
                                         
                                         You need to have multiple partitions for all tables which are partitioned.
                                         
                                         Yeah, true.
                                         
                                         And if you were doing this and you had thousands of partitions,
                                         
                                         that would be a lot of information in the logs.
                                         
                                         And still no planning time information.
                                         
                                         Yeah, yeah.
                                         
                                         But we are interested in plan structure in this case.
                                         
    
                                         True.
                                         
                                         And in lower environments, we usually, like if it's a smaller database, for example, it's not 1,000 partitions.
                                         
                                         For example, we say, okay, we want to have at least 10 partitions for each partition table.
                                         
                                         And then we just check plans.
                                         
                                         And with some analysis, we can analyze them and
                                         
                                         it's a separate question how to analyze them but anyway we can find holistically find bad plans
                                         
                                         but i like your idea about pgsa statements as well i wish it had you know it had information about plans more, not only timing and buffers,
                                         
                                         but, for example, how many objects were logged, for example,
                                         
    
                                         or involved into plan, or, I don't know, like some additional things here.
                                         
                                         For example, just was partition pruning involved?
                                         
                                         How many times it was involved in plans?
                                         
                                         You know, there's a column called plans.
                                         
                                         How many plans, right?
                                         
                                         Yeah, partition pruning.
                                         
                                         Question.
                                         
                                         Because partitioning is hard.
                                         
    
                                         It has so many surprising aspects, I would say.
                                         
                                         So many surprising aspects.
                                         
                                         I have some sympathy with them.
                                         
                                         I think
                                         
                                         PGStatsStatements already has a lot of columns. Yes, I agree. Still not enough.
                                         
                                         It is difficult, yeah. Still not enough. We have interesting work happening in the area of
                                         
                                         monitoring and we discussed PGStatsStatements not only PG-STAT statements but query analysis
                                         
                                         and holistic
                                         
    
                                         table of query analysis is so
                                         
                                         wide because we take every metric
                                         
                                         and have three derivatives for
                                         
                                         all metrics
                                         
                                         time-based differentiation
                                         
                                         calls-based and also percentage
                                         
                                         except column calls
                                         
                                         for this column it doesn't make sense
                                         
    
                                         calls per calls it's always column, it doesn't make sense. Calls per calls, it's always one.
                                         
                                         So it's like four times more minus one.
                                         
                                         It's so many columns to present.
                                         
                                         Yeah, we don't need them all the time,
                                         
                                         but I can find cases when some of them,
                                         
                                         like almost every one is useful in some cases.
                                         
                                         So query analysis becomes harder and harder,
                                         
                                         wider and wider in terms of how many metrics we have.
                                         
    
                                         And it's still not enough.
                                         
                                         That's it.
                                         
                                         So yeah, good.
                                         
                                         I think we covered this topic quite well.
                                         
                                         I have one last question for you.
                                         
                                         Yeah.
                                         
                                         Do you ever change people's from collapse limit
                                         
                                         or join collapse limit?
                                         
    
                                         Oh, interesting.
                                         
                                         I honestly like last, for example, 12 months,
                                         
                                         I don't remember cases of it at all.
                                         
                                         Somehow.
                                         
                                         In the past, I had cases when we considered it.
                                         
                                         I think I even applied it.
                                         
                                         But it was so long ago.
                                         
                                         Yeah.
                                         
    
                                         Somehow. I think the default is quite good. The default for it was so long ago. Yeah. Somehow.
                                         
                                         I think the default's quite good.
                                         
                                         The default for both of them is eight.
                                         
                                         Yeah.
                                         
                                         I think it works pretty well.
                                         
                                         It works pretty well.
                                         
                                         And also, I don't know, like when I see 12 tables involved in plan,
                                         
                                         joins, and like I already like, oh, genetic optimizer already here right it has no 12 threshold no
                                         
    
                                         yeah i only learned this today well but i think because of from collapse limit and joint collapse
                                         
                                         limit gecko never kicks in wow okay i have old memory okay me too like well or maybe i misunderstood
                                         
                                         originally but unless you raise those to above the Jack-Wraith threshold,
                                         
                                         this is my new understanding based on reading the docs,
                                         
                                         unless you raise them above the Jack-Wraith threshold,
                                         
                                         it will never kick in.
                                         
                                         Well, yeah.
                                         
                                         Somehow recently I mostly deal with different problems
                                         
    
                                         when only a few tables are involved into plans.
                                         
                                         Makes sense.
                                         
                                         Yeah, but my experience is not everything right so i'm sure there are cases
                                         
                                         where collapse limits matter like and worth considering but just didn't happen to me
                                         
                                         recently no i was just interested from like a it's one of the things we look out for high
                                         
                                         planning time it's one of the things we recommend looking at in case someone has tweaked them for
                                         
                                         example if somebody's increased them too much that can massively balloon planning time it's one of the things we recommend looking at in case someone has tweaked them for example if somebody's increased them too much that can massively balloon planning time interesting
                                         
                                         but i wanted to know like in the real world for you is it something you see but it sounds like no
                                         
    
                                         yeah cool well there might be more cases we even didn't see yet. Always.
                                         
                                         Always, yeah. So because Postgres Planner is complex.
                                         
                                         It's not as complex as SQL
                                         
                                         Server or Oracle Planners.
                                         
                                         Not yet, no.
                                         
                                         Or hopefully never.
                                         
                                         Already complex enough, I would say.
                                         
                                         So, yeah. Good.
                                         
    
                                         Oh, by the way, if we
                                         
                                         use pgPlan hint plan. Oh, yeah. Good. Oh, by the way, do you know if we use PG plan hint plan?
                                         
                                         Oh, yeah.
                                         
                                         PG hint plan.
                                         
                                         Right.
                                         
                                         Because we tell the planner what to do directly.
                                         
                                         Does it help fight high planning time?
                                         
                                         I don't know.
                                         
    
                                         Good question.
                                         
                                         It would make sense that it would, but I don't know where it's hooking in.
                                         
                                         Yeah.
                                         
                                         I don't have this used in...
                                         
                                         I don't observe it used in production at all, only non-production.
                                         
                                         So I also don't know.
                                         
                                         Yeah, same.
                                         
                                         Okay, good.
                                         
    
                                         Good question for our listeners.
                                         
                                         Yeah, it's homework.
                                         
                                         Let us know if you're...
                                         
                                         And also the other thing I wanted to ask is if anybody using some of the more complex extensions that hook into planning.
                                         
                                         Oh, some extensions affecting time scale.
                                         
                                         Yeah, I've seen more planning time issues.
                                         
                                         It just feels like a coincidence at the moment.
                                         
                                         I haven't looked into if it's definitely the causal rather than just, you know, what's the correlation versus causation.
                                         
    
                                         But yes, that's the one.
                                         
                                         So I don't know if it's just coincidence,
                                         
                                         but I tend to see more planning related issues with the more complex.
                                         
                                         Maybe some extensions help improve planning time, not make it worse.
                                         
                                         Well, you mentioned PGHimimPlan would be the one
                                         
                                         that would make most sense on that.
                                         
                                         Well, yeah, it's just a random
                                         
                                         guess. I'm not sure at all.
                                         
    
                                         Me too. Well, let us know.
                                         
                                         There's definitely people out there that know more about it.
                                         
                                         Let's do a summary.
                                         
                                         Try to avoid a high partition
                                         
                                         number. Try to avoid
                                         
                                         a lot of indexes.
                                         
                                         Especially try to avoid this combination of these two
                                         
                                         problems and use prepared statements.
                                         
    
                                         Right?
                                         
                                         Yeah.
                                         
                                         You don't think avoiding lots of joins as well?
                                         
                                         Maybe yeah.
                                         
                                         Again, like somehow I deal with very simple plans lately.
                                         
                                         I don't know.
                                         
                                         Yeah.
                                         
                                         Good point as well.
                                         
    
                                         Good point as well. Good point. I try to avoid a lot of tables, many joins, and then we try to
                                         
                                         expect good performance because I always hunt for single index scan and better index only scan,
                                         
                                         always. And if you have even one join, even two tables, you cannot build index on two tables. It's always like indexes belong to their tables always, right?
                                         
                                         And materialized views, as we maybe talked,
                                         
                                         out of scope for my recommendations always,
                                         
                                         regular materialized views.
                                         
                                         So that's why I tried to think how can we simplify things
                                         
                                         and, okay, nest loops and so on,
                                         
    
                                         and make queries submit a second.
                                         
                                         But, yeah, if if many many tables planning time might be huge anyway three i wanted to three lessons like low index number
                                         
                                         low partition number but not super low so partitions should be small relatively and
                                         
                                         prepare statements and i think these three might take years
                                         
                                         to implement
                                         
                                         in large projects
                                         
                                         yeah
                                         
                                         true
                                         
    
                                         yeah
                                         
                                         unfortunately
                                         
                                         cool
                                         
                                         nice one Nikolai
                                         
                                         thanks so much
                                         
                                         bye bye
                                         
                                         catch you next week
                                         
                                         bye
                                         
