Postgres FM - Advanced EXPLAIN

Starting point is 00:00:00 Hello, hello, this is Postgres FM. My name is Nikolai, Postgres AI, and with me as usual, my co-host Michael Pijamastat. Hi, Michael, how are you doing? Hello, Nikolai, I'm good. How are you? Very good, very good. So it was my turn to choose the topic. And I chose the topic, naturally, I thought we didn't discuss explain, never ever we discussed explain, right? We didn't do it. But then I googled and found out that you discussed it without me. So I thought, it doesn't count. Let's discuss once again, but maybe going deeper in some places. Yeah, I covered the basics. And I think it's a good suggestion to cover more advanced topic. I think explain is one of those topics that you can just keep getting more advanced.

Starting point is 00:00:50 And there's definitely a level at which I'll get way out of my depth. But yeah, there's definitely room for a second episode where we go into more advanced topics, for sure. Right. And just to those who don't know, you spend most of your professional time dealing with plans because PGMaster is helping people analyze and approve queries, particular queries, looking at explain plans, right? Yeah, exactly.

Starting point is 00:01:22 It's a visualization tool for explain and then also tries to provide performance advice based on that so we get normally people's more complex query plans so i i guess this is my this is an area where i've got more practical experience than most obviously when everyone knows when you're building something, people that build healthcare software aren't always intimately familiar with everything about healthcare. There's certain things you get really familiar with, but you also end up having to deal with things that are not to do with that at all. probably most variations of how performance problems can be shown via explain plans i have some blind spots for sure but yeah it's been a big thing for me but also for you right like this is a big part of your work's been performance consulting and yeah well it's hard to do that without explain for me explain is the not number one tool usually because except cases when a

Starting point is 00:02:28 customer asks us to to help with specific query which happens as well but usually we deal with analysis like top-down analysis of whole workload and we start with pg start statements and weight event analysis performance insights or something and then we go down and when we identify first

Starting point is 00:02:49 aggregated query or normalized query in official pgstat statements terminology and only then

Starting point is 00:02:56 we find examples of bad behaving query then we discuss like how to

Starting point is 00:03:02 how to improve where to we need to a playground we a proper playground, and maybe we will want to discuss how to make it. So, explain plans, match what we see in production. And only then we go and check the actual plan and try to improve it and so on. Sometimes we have all the auto-explain, right? Auto-explain is a great tool. Yeah, we did an episode on that too.

Starting point is 00:03:29 Yeah, yeah, I remember. And yeah, if you have it, it's great. But it's not always so. It's not enabled by default. It requires some effort to be enabled. Although it's present, it's available everywhere. So I wish I saw it more often. Did you see I did a blog post on this this year

Starting point is 00:03:50 on which cloud providers provide access to Xplain and which parameters? I don't remember, honestly. I need to check it. I'll share it. It's not the most interesting read, but it's like a good reference.

Starting point is 00:04:02 And I find it useful for if somebody mentions a cloud provider to me like one of our customers so everyone has it right yes pretty much not every not quite everyone but like pretty much everyone and most of them provide access to most parameters to explain and also explain have a bunch of parameters and it was i found it hard to check which ones offered which parameters and i've got good news as parameters and it was i found it hard to check which ones offered which parameters and i've got good news as well actually i just i got a nice direct message from somebody who works at aws and the one the parameters they didn't offer there's like two

Starting point is 00:04:36 or three that they didn't offer they've submitted a pull request for so i'm looking forward to the next release um of the rds, which I think will include those. That's good. That's good. So, yeah, of course, I will also deal with explain, but it's just one of things and we probably don't dive too deep as you do. So where do you want to start this discussion? It was your suggestion. What do you want to start?

Starting point is 00:05:06 But you're an expert, right? Okay, all right. Maybe. I'm not sure. I still feel really uncomfortable with the word expert. I'm feeling very uncomfortable with the word analyze. It should be execute, not analyze, right? In the explain comment.

Starting point is 00:05:21 I mean, naming. I've been thinking about naming for a while with this because part of our tool, we give people something to copy and paste so they can easily copy it and put it in front of their query if they're using a like P SQL or an editor or something. and longer the more parameters that get added to explain the more like and and the more information the better for when you're using a tool if you don't have to worry about it getting the output getting long and your your visualization tool should be pointing people at the right place anyway the amount of data is like the more the better really the more information you get the more likely it is you can point out the problem however that uh it's become like explain and then in princess analyze buffers verbose settings uh format json and there's more there's like wow there's now there's serialize and memory but all most of these are off by default so you have to specify all of them to get all of

Starting point is 00:06:21 the information so analyze is one of them but the reason i was thinking about naming was i wondered if we could i was wondering about suggesting something like explain all but the problem with that is there's so many like pieces of advice out there in the wild about when you run explain analyze be careful because your query will be executed you know if that if your query is data modifying it will modify the data if your query's data modifying, it will modify the data. If your query's long, it will take that amount of time to run. And all of that advice would become,

Starting point is 00:06:52 not obsolete, but if we change the name, all of that advice becomes difficult. Like, you have to go back and change all of the advice, or you have to add, if you want to explain, analyze, or explain, or... For five years,

Starting point is 00:07:03 the old word could be supported, but it's so confusing to explain people that analyze is actually executing the query, and it's not the same analyze as analyze table name, which also affects probably plan, but you need to understand this. If you run analyze table name, probably after that you will see different plan in

Starting point is 00:07:25 explain and explain analyze. But it's not because these two words are the same words. They are very different words. Remember we discussed Postgres 17. It was confusion resolution. I wish this confusion one day

Starting point is 00:07:42 would be resolved as well. I agree in principle. I just think in practice it's going to be painful. Why painful? The word execute is not used yet. First we introduce synonym, and then in five years we ditch all the words.

Starting point is 00:08:03 It's used in prepared statements, right? Like prepare, execute. Ah, yeah. But, well, execute means execute. And analyze and explain also means execute. Okay. True. By the way, let me confess about something related to prepared statements.

Starting point is 00:08:20 A few weeks ago, we had an episode about planning time, right? Which also related to our today's topic because planning time we check using explain mostly. Of course we can check it in PGSTAR statements, but again, not everyone

Starting point is 00:08:39 enabled track planning in PGSTAR statements. So usually we just check in auto-explain or regular, manually. In ad-hoc way, we check using explain. Not auto-explain, sadly. It cannot show planning time? Wow. No.

Starting point is 00:08:53 Okay. Good to know. But in PitchDestinedStatements, if track planning, yeah. Yeah, if track planning. So if we, for example, deal with partitioning, with high number of partitions, planning time increases, as track planning. So if we, for example, deal with partitioning, with high number of partitions, planning time increases, as we discussed. But I just blog posted about it,

Starting point is 00:09:13 and right after that, we had a discussion on Twitter, and it turned out that my experiment was wrong, and my ideas, not not fully wrong but not purely accurate because planning time is affected only in the very first time when query is executed in a session if you execute it once again

Starting point is 00:09:36 we already have relation metadata cache rel cache and overhead becomes almost zero at least for simple for simple queries yeah simple queries that that um can prune down to a single partition or like a low number of partitions and it was david david roly yeah yeah yeah we point yes of course if there's no pruning then it's a different story of course uh with with a number of partition growth,

Starting point is 00:10:07 more work needs to be done at planning time. But yeah, I'm talking about partition pruning happening at planning time. The idea is only the very first execution suffers. It's so, I mean, I'm really impressed David noticed that so quickly. Yeah. I mean, he knows his stuff around this topic. I'm pretty sure he was involved in the fixes. I think it was in Postgres 12 to improve that exact metric.

Starting point is 00:10:34 So, yeah, kudos to him for pointing that out so politely as well. I think he said it was misleading, not wrong. And also, really good reminder. Do you remember when we had melanie on to talk about benchmarking how difficult benchmarking was to make sure you're measuring the thing you think you're measuring yeah it's just a good reminder like it's these things are so difficult but by publishing them by publishing your findings and by the postgres community being so open to be like to correcting us when we're wrong, we've all learned something.

Starting point is 00:11:06 Exactly. It was very good. And I cannot agree it's wrong because there was no mistake there. There was a problem that the experiment didn't do next step, which of course I should do. So actually misleading is a good term here. So I extended blog post like right away. And it turned out if like just, if you use just explain, no execution, no analyze word. If you just use explain, you see you add partitions.

Starting point is 00:11:37 We checked from zero, 100, 200 and up to 1000. And you see using explain, explain select primary key lookup and partition key basically lookup and you see first of all you see index scan only one partition so partition pruning is working at planning time but planning time

Starting point is 00:11:58 grows linearly with number of partitions and number of planning buffers also grows linearly for this kind of query, and it was range partitioning. We didn't check others. But once you run X-Pen once again in the same session, only three buffer hits always, regardless of number of partitions.

Starting point is 00:12:21 It's also good relief, right? So you think, oh, it's great. I'm going just to relax and I can have 10,000 partitions for simple queries like this. Of course, complex queries will be different, I'm quite sure. But if partition pruning works at planning time, I'm happy, right?

Starting point is 00:12:43 It just tells me I need to take care of proper connection pullers. So I need to have them always.

Starting point is 00:12:53 And interesting that the problem with our discussion and my ideas a few weeks ago when I said like,

Starting point is 00:13:00 oh, if you have overhead of planning time, which obviously it's not, it's only for first execution planning. In this case, just use prepare statements. But this is definitely maybe not only misleading,

Starting point is 00:13:14 but also wrong because, as I actually learned in detail, if you use prepare statements, you see that partition pruning shifts to execution. So the generic plan includes all partitions, right? Because we cannot cache the plan which is dealing only with single partition in this case, but what if we need to execute the prepared statements with different parameters, right? We need different partition probably. So the cached plan includes

Starting point is 00:13:48 all partitions and you run explain for prepared statements for specific parameter and you see that all that execution time, you see subplans removed, right? This word in explain. This is interesting, which means that

Starting point is 00:14:04 overhead shifted to execution. It's not good, right? But again, overhead, which overhead? If we have real cash here, all good, right? So it means that for high number of partitions, not prepared statements matter. They can help, but they don't help fully because of this shift of partition pruning. But what actually helps a lot are pullers. We have pullers in all heavily loaded projects.

Starting point is 00:14:33 And if you are preparing to have huge tables, like we always say, if you exceed 100 gigabytes, you need partitioning but it also means you do need proper proper connection pooling and avoid the initial overhead of of planning basically with high number of partitions planning will be not good right yeah so back to explain So, back to explain. Yeah, question back to explain, kind of linking them. When you did the second explain, extracting the buffers and things, did you use format JSON for that and parse the output? Or what did you do? It was in, yeah, it was text, regular. Yeah, just parsed it.

Starting point is 00:15:20 Settings, yeah. Yeah, and it was done by our our AI so it it doesn't use JSON at this point just we we don't want

Starting point is 00:15:30 very bloated format and yeah and actually yeah we needed to fix how it works because right now it works through

Starting point is 00:15:38 through CyclePG so Python and yeah it supported so if you run a

Starting point is 00:15:45 couple of explains in one shot, it received only one result. We fixed that. So now it sees both results.

Starting point is 00:15:52 And second result is always, in our case, three buffer hits. So planning time remains way below

Starting point is 00:16:00 one millisecond, like I think maybe 10 microseconds or so. So super tiny, almost zero. And the bottom line

Starting point is 00:16:10 here is that, you know, usually we say, if you run explain, analyze, buffers, of course. We had a couple of episodes about it, right? If you run it, always run it a couple of times, maybe a few times because of caches and if you run it, always run it a couple of times, maybe a few times because of caches.

Starting point is 00:16:26 If you want to check timing, if you're too obsessed by timing, not my case, I'm obsessed by buffers, not timing. But you need to run it multiple times to check the second execution because first time probably everything is not cached, data is not in the buffer pool and not in page cache so run it twice that's a good piece of advice right but what we just discussed means also that even for planning run it twice in the same session right because maybe you observe some a lot of work if partition if partitioning is involved, especially a lot of work related to lack of real cache.

Starting point is 00:17:11 I've seen this. The other weird thing I've seen, I don't know if you know the answer to, is you can do a simple ID lookup, like primary key lookup, you will get planning time but no planning buffers reported even if you include the buffers thing Yes What's that? It's a bug I think Yeah, inside my team last week since we worked on this

Starting point is 00:17:38 blog post, a couple of other many people actually were involved but this particular problem was very annoying very annoying so like why buffers are not reported in planning okay now they are reported and we cannot understand on what it depends on on what it depends right so once we understand we're probably it will be a bug report already, right? Because something wrong is there. Because we requested.

Starting point is 00:18:08 It's weird, but I mean, the text format's hurting you there because by default in the text format, if things are zero, they don't get reported. It's like roughly a general rule. You think so? Whereas in JSON format, you'd still get buffers reported, but they would have zeros it's not much better in our case it should be reported because it always was something like

Starting point is 00:18:31 some number of hits but oh yeah so that's a different bug right like all i meant is if planning buffers are zero but i don't understand how planning buffers can be zero if it's like if it's actually doing work if it's actually you know picking which index to scan so I found that in the case of

Starting point is 00:18:49 prepared cached plan if plan is cached there is still some planning time but I suspect

Starting point is 00:18:56 this planning time occurs during execution because of these supplants removed and partition pruning happening

Starting point is 00:19:03 I'm not sure by the way this is some nuance. I think maybe like parsing might count as planning or like there's, I think there's other steps. Well, I'm looking at it right now. I'm looking at the plan, which checks execution of with force generic plan, plan cache mode set to forced generic plan and we check execution for prepared statement, it's cached,

Starting point is 00:19:28 the plan cached, and we see no buffer planning related buffers reported at all, like, it's missing.

Starting point is 00:19:36 But planning time is like 38 microseconds. And partition pruning happened at execution time. So I wonder what are those 38 microseconds.

Starting point is 00:19:49 If plan is cached, maybe just additional checks. I don't know. It's interesting, right? You know there's multiple stages. I can't remember all of them, but one of them, for example, is parsing. One of them is rewriting, so you might rewrite the query. I think some of those get bucketed at the moment as so if probably you're right there are no buffer hits

Starting point is 00:20:11 involved at all and that's why it's not reported at all but i've seen this i've seen zero reported when it's like not a not a plan cache like if it's a primary lookup that you've run well yeah so where there is some work for sure right yeah right or like in my

Starting point is 00:20:30 head it would make sense that there's work anyway it's an interesting quirk and even if

Starting point is 00:20:36 it's not a bug the fact that you raise this topic and we discussed this topic inside my team it's

Starting point is 00:20:42 it's already something right which is not clear. And yeah, if someone who watches us who is more experienced than us, please tell us. This is a super interesting piece of information, right? Yeah.

Starting point is 00:20:58 So what I wanted to say, you need to check the execution of explain for planning for the second time inside the same session just to exclude the fact that you observed overhead from lack of real cash. This is lesson I learned last week. And this is correction to ideas I expressed a few weeks ago in the planning time episode. And there's some good practices like that help avoid mitigate this kind of issue anyway like normally people when they're benchmarking or or testing things will run something five times and maybe take the best of the five or you know running things

Starting point is 00:21:38 multiple times i don't know if you've ever seen some sports do like uh like diving for example they give people scores out of 10 and then they cross out the top scoring judge and the bottom scoring judge yeah and yeah actually we did do that we did it but we like due to some reasons it was not like we we plan to improve this and this kind of analysis should be done in exactly that way. You run multiple times, you check which data points are way off and exclude them and then you take average. That makes sense totally. But in this case, just how we did it, it was a single session experiment and we ran all explains. So bottom point, if you run explain five times and every time it's a new

Starting point is 00:22:26 connection right it's still the problem persists yeah you need to run it multiple times inside the same connection and then and then i would not like i would not agree with excluding the first one due to lack of real cash because it's also an interesting fact that with growing number of partitions the very first execution like it can reach dozens of milliseconds and even hundred milliseconds it's a lot so it means

Starting point is 00:22:55 connection poolers are needed and they need to be configured properly if you have thousands of partitions otherwise it will be very often new connections established, could execute the first time, huge number of partitions,

Starting point is 00:23:10 and you have penalty of dozens of milliseconds for heavily loaded projects. It's a lot, maybe 100 milliseconds. It's like huge overhead. And this fact also interesting, right? So we learned two facts. Overhead is huge, linearly growing for this

Starting point is 00:23:25 particular simple, very trivial case. And there is basically an overhead for subsequent queries. Wow. This is like a sequence of surprises for me in this work. I like it, actually. So yeah, let's maybe switch to another topic. Maybe you have some ideas for what to discuss next. Yeah, well, when I think of advanced explain, I'm thinking more like advanced query optimization. Maybe you're familiar and got good read and explain plans when it's a relatively simple query, maybe some normal scan types, normal join algorithms you're somewhat

Starting point is 00:24:08 familiar with it you can work your way around that but i've spent several years looking at this and i still from time to time see operation types that i've never seen before and and things get complex like in just in terms of how many operations there are or things that complicate things like ctes and init plans and kind of things happening which all the things happen in uh things happening at the same time i think uh we we had haki benito on relatively recently talking about how if you have multiple c, those happen at the same time and independently of each other. So that's confusing. And how those appear in Explain is interesting.

Starting point is 00:24:53 But because it's shown as a tree, they have to kind of work out different display mechanisms for showing that that's not necessarily misleading because it's hard to say how you should visualize that. But that's what I'm thinking in terms of advanced topics around explain. How do you interpret those? And also, what can you ignore? If you want to do this, normally your aim is, can I speed this up?

Starting point is 00:25:20 And as things get complex, you often want to simplify. What can I do to discard a bunch of this information like which bits of this are already fast that i can ignore how do i maybe i can run a simpler query that's only the part of the the plan that's problematic like that's what i'm thinking in terms of more complex topics right i agree Reading complex plans. This is it. Yeah, I I read, quickly scan buffers, and they are cumulative, like they are accumulated to the root, right? And the root includes everything. So you can see and go from root to leaves, basically.

Starting point is 00:26:20 It's like upside down tree, right? So we top-down analysis again we we okay we understand number of buffers understand how many megabytes or even gigabytes sometimes it is if you multiply by the block size which is most most cases eight kibibytes right and then we go down and see where exactly this huge volume came from this is my default type of work with plans including execution of course because if you don't have execution you only have cost

Starting point is 00:26:54 and buffers are reported only for planning as we thoroughly just discussed so in this case usually it helps in most cases it helps and uh i i wish i also was able to see uh log related information right and explain plan yeah right like weights yeah it's difficult to see how they would do that but not not necessarily ways what do you mean by ways what do you mean by locks if uh we yeah yeah actually

Starting point is 00:27:29 maybe you're right yeah what what's happening under the hood slightly but it's already super complicated output right so yeah and and it's like there's macro versus micro issues here if you're on a test environment where you're the only person there, it's much less likely that you're going to be incorrect. Originally, when I said locks, I was meaning heavy locks.

Starting point is 00:27:56 So, you know, at planning time, all indexes are locked. And all tables and all relations are locked. And that's a lot of overhead. I wish I saw it. It's interesting information.

Starting point is 00:28:10 Accessory lock for all of them. And if it's update, okay, what kind of lock I have here. Oh, interesting. Yeah, that's not what I was thinking at all. Yeah, you thought about lightweight locks. Well, no, I was... Or weight events. No, I was thinking, at all yeah you you thought about lightweight locks yeah well no i was oh even a different or wait events no i was thinking like if you easiest experiment to show this is

Starting point is 00:28:33 start one transaction and maybe do some ddl or like yeah drop a column or something yeah anything that's gonna add a huge um like a really heavy lock on that table then when then open a second transaction just do a simple select query on that table go back to the first transaction and commit go back to the second one and see that your your query was just waiting that entire time if you ran that query with explain in that second if you're committed it's not waiting anymore well but the execution time of that explain will be oh yeah yeah yeah yeah yeah yeah yeah you're right and this execution time is it can be explained if we saw like basically a pie chart or something or like distribution um or like There will be no node.

Starting point is 00:29:25 There will be no operation in that explain plan that shows that it was waiting on an exclusive lock. It will just show that one of the operations was extremely slow. Simple wait event analysis for this particular session would help. Yes. And explain technically could show it. It doesn't come up that often to me. Maybe it's like one of the blind spots

Starting point is 00:29:45 because I don't see those, but I don't see plans that have these really unexpected bottlenecks. I would like to be able to, not by default, but I would like to be able to see these wait event analysis and have locks as well because I want to understand this query,

Starting point is 00:30:02 which locks it is going to acquire when I go to production, right? When execution finished, what logs were acquired? Right? Yeah, I think interesting extensions, but it's already so huge. Yeah. And as I mentioned, right, we're getting more and more, I think this is part of the reason people are adding more parameters to explain. So write-ahead logging information. It should not be by default. It's too much. Well, I actually have a proposal here.

Starting point is 00:30:33 I haven't suggested it on any of the mailing lists yet, but I would like your opinion. I thought about adding them by default when you include the parameter verbose. So if you do explain analyze verbose give me everything like you just don't want to update your your website yes yeah i don't want to update that snippet for sure but also i want people to only have to remember to do that and they'll get whatever postgres can give them. Verbose means like really long.

Starting point is 00:31:06 It means everything. It kind of means like I'm okay with loads of information. Give me it all. So it's already kind of in the meaning. So I don't, anyway, I haven't proposed it yet, but that would also come with buffers. So I thought you might be happy. And also if we talk about what's missing

Starting point is 00:31:23 and what you cannot get from explain, I also like at some point, maybe some extension of this could be imagine if we see operations at disk level as well. And we understand not only hits and reads for the buffer pool, but also hits and reads for page cache, for example.

Starting point is 00:31:40 And maybe CPU analysis as well for this particular execution. If it is explain analyze LICE, right? That's a good point. It's not exactly what you're talking about, but loads of people don't realize because it's not an explained parameter that you can, with track IO timing on,

Starting point is 00:31:54 you can start to get some insights into the disk level performance. And that's improving in each. Like there's recent, even in 17, we got some improvements there. So like buffers are split into shared, local, and temp. IO timing wasn't initially and is now split into all three. Yeah, that's great. But I'm talking about like basically PG start K cache, but for explain.

Starting point is 00:32:21 I understand, but the timings are a good start, right? I agree, yeah. And actually, I think itings are a good start, right? I agree, yeah. And actually, I think it's possible to have all of this right now already, but you need to connect many pieces. And for example, it could be a tool

Starting point is 00:32:35 which runs explain, but also looks at wait event analysis for this PID in Pagesat activity and in logs as well. And what else? Like collects everything and big report for a particular query.

Starting point is 00:32:48 And also physical. Physical, if we have physical access, we can do it from proc. Knowing PID, I think we had prototype for our old bot. So it's like you just counters from proc. Because if you've got the query identifier, which is the normalized one that you can match to like PGStats statements, I'm imagining you could start to like tie all of these things together. Yeah.

Starting point is 00:33:18 A lot of stuff is possible. Who is doing this, by the way? Maybe someone is doing this. I don't know the monitoring tools well enough to know who's doing some of this micro level. But if they are, let us know. If you're using one or if you're building one that does, that would be cool to hear.

Starting point is 00:33:37 To be honest, though, this goes back to some of the conversations we've had about the macro to micro zoom. Once you've done your macro analysis and you've identified a few problematic query like a the main consumer if you've got a few bottlenecks explains a really good tool for doing that micro like once you once you've got some a sensible set of parameters that you know will be slow it's not that hard like it's not that hard to reproduce normally the difficult thing is then going through a 200 line explain plan and you

Starting point is 00:34:12 asked what can help i think we've put a lot of effort into like helping with that stage and a lot of the other tool providers have as well one thing we do that i think it's really helpful is if a sub plan is fast like as a proportion of your query we we collapse it by default we hide like we don't hide it we just say there's like there's 20 nodes here everything is fine there probably you don't want to look here which means that like a 300 line plan can go down to like 20 that's that's cool but that like those kinds of things i think can really help initially. It's not good for a hacker.

Starting point is 00:34:47 It's not good for a Postgres hacker. It's like trying to work out the cost model for a new feature. But for somebody just trying to speed up a single query, you don't need to look at the 200 partitions that were never executed or that took zero milliseconds. Yeah, and you mentioned query ID. And for example, I think it was last week we were working with one of the customers.

Starting point is 00:35:11 We saw that AutoExplain was used and it's great. And there was a task to analyze a lot. And we still plan it. And I'm very thankful to you that we have integration with AI right now, and your system helps our system. And I plan to perform some massive analysis at larger scale to be able to do it.

Starting point is 00:35:33 But the interesting thing is that if you want to analyze a lot of queries from AutoExplain, you do need some... You basically need to compute... It's called compute query ID parameter, right? You need to turn it on because by default it's auto, and for auto-explain, it will not work. It does if you have verbose on. Okay, in auto-explain.

Starting point is 00:35:57 Yeah. Okay, in our case, we didn't have verbose on, I think, right. But only as of 16 i think ah it's better to turn it on actually maybe yeah yeah yeah question of overhead and let's by the way talk about overhead in a second but if you imagine you have like a lot of query cases various cases, various queries from the log. Presence of query ID first gives you understanding how many different normalized or aggregated queries you are dealing with, right? And second, you can then join this

Starting point is 00:36:37 information with Pitches of Statements and go to macro level back and understand, okay, this query is not very influential because it's a tiny part of, or maybe vice versa, it's influential, although we see only a few occurrences in our slow log because we have threshold, right, by time, auto-explain mean duration, right, query mean duration.

Starting point is 00:36:59 We usually set it to one second, for example, right? And if we have only a few cases, but we see in workload, oh, this is actually 50% of all total time, total exact time. Oh, we do need to think about it properly, and influence of this optimization will be high. So let's do it. This helps understand when you have a lot of queries to deal with,

Starting point is 00:37:21 where to start. You want to start with most influential steps. So query ID is a good thing. And I'm glad we have it everywhere in explain, in the logs, in PGS statements, you can connect all the dots.

Starting point is 00:37:35 And PGS activity as well, right? So wait event analysis is possible just joining by query ID as well. If you run many, many times, right? So, yeah, great. What about overhead? I think it's also an advanced topic. I know you blog posted about it

Starting point is 00:37:52 and Ongris did. You talked about auto-explain, right? But explain itself also has overhead. Well, explain can have both, can under-report and can over-report the timing. Dying ago, I'm not a fan of dying ago.

Starting point is 00:38:10 Well, but bear with me. I think people focus a lot, there's different things. The Ungress thing was about the queries being slower. You actually pause some overhead by observing. And that's true when you're running explain analyze with time because timing is on with analyze by default but it's also true if you're running auto explain like there's also some overhead to measuring and to logging like if you log a lot of plans that's some overhead as well so there's there's that there's those overheads but there's also the the possibility that it under reports which i think throws them

Starting point is 00:38:51 so if you so this is one of the reasons they added the parameter serialize if you are doing a lot of work after the query is executed but in order to send the data to the client. So I, like, nobody really cares about the server-side time. Like, that's not the big, like, the server-side time is important, but it's what we get from explain and analyze. But what they normally care about is the user experience,

Starting point is 00:39:22 like the full client-to-server-to-client time. I cannot agree here, but okay, go on okay well what what i mean is if you if you if you're getting a report for my dashboard is slow there's no point only speeding up the server side time like if you're sending yeah we need to look at amounts of data yes um and part of that work is so serialize is part of it it's getting the data ready to transmit and then there's the network overhead as well like i've actually trans of actually sending all that data so yeah i guess i guess it's not it's not network cost it's not like transmission cost it's uh cost of not cost cost maybe not a good word that's overloaded here right uh it's it's time right or cost it's or it's cost it's cost serialized serialized is reported as time and

Starting point is 00:40:14 amount of data what what cost okay i actually don't think it has costs relate like i think it's one of those things that gets reported in the summary section. So just-in-time compilation and planning. It's about civilization, not transmitting. Yes, it's a step before getting the data ready to transmit. Still everything on the server.

Starting point is 00:40:39 Yeah, for example, uncompressing or decompressing toast data. And it's Postgres 17 feature, right? Yes. So it just goes deeper to understand what's happening on server side, like execution and what part of it is associated with serialization, attributed to serialization right well and not just that before if you ran

Starting point is 00:41:07 explain analyze select some data that had been toasted from this table you would get a really fast execution plan because it wouldn't do the serialization like it would say this executes really quickly so it wasn't just that you couldn't see the the time it was like a query that might take 100 milliseconds when you run it from psql if you put explain analyze in front of it it could run in one millisecond or 10 milliseconds much much much faster so because it didn't have to do that expensive step i see yeah yeah so this is one of those cases that used to be and not just used to be like by default this will still be this is one of those cases that used to be. And not just used to be, like by default, this is still the behavior in version 17.

Starting point is 00:41:50 If you run explain, analyze, some basic... So like star from blah, blah, blah. Yes. Before 17, we didn't see this part at all. Even in 17, you don't see it unless you also explicitly ask, explain, analyze, That's bad. I like defaults. That's just what I've been saying. Yeah.

Starting point is 00:42:08 Yeah. And what about verbose? Verbose includes it, right? No, it doesn't. No? This is my, but this is my proposal that verbose should include all of these additional parameters. It should, of course, I agree. Yeah.

Starting point is 00:42:23 Well, okay. It is as it is right now so and it's i have to actually say explain is awesome explain in postgres in particular is the best i've seen of any database that i've worked with um other day like my sequel for example is catching up like they are they got explain analyze in version eight but the d the amount of data we can already get is way like it's beyond the other databases i've seen so it is good it's just always can be better right yeah yeah yeah if you if somebody is watching us thinking oh a couple of haters right haters of it's just a lot of stuff yeah yeah but a lot of stuff is helping all the time

Starting point is 00:43:07 it's true do you want to discuss overhead from like buffers option i couldn't when i i did some really basic one really basic experiment where i tried to where i did very very small queries but i couldn't get any additional overhead i couldn couldn't, there wasn't like a measurable extra overhead of requesting buffers. I think you said you did an experiment once that did show it, but I don't remember. On timing, it cannot, like buffers are like, they are basically invariant.

Starting point is 00:43:36 Of course, they can be hits or reads or they're like moving back and forth. But if everything is cached, it's very stable. Like you're reading the same data, same query, just select. Multiple times you will see the same number of buffers all the time. That's one of the reasons I like them. But if you include buffers, I did see in the past, it was some time ago, that timing is increasing.

Starting point is 00:44:04 Because a lot of buffers need to be reported and counters inside need to be presented. If it's a complex query, in many places, this overhead adds up and the timing without buffers option is very different compared to with buffers. I've seen, yeah, I've seen single queries where that's true and I've seen, I think Lucas Fittel from PG Analyze has done some

Starting point is 00:44:31 examples of this. I think I've only seen it in queries that have nested loops with lots of loops. That's the example everybody turns to. For example, join with nest loop. But the reason I don Yeah, but the reason I think it's slightly overhyped

Starting point is 00:44:50 is when you're optimizing, you don't always care. If the explain analyze is four seconds, or maybe let's pick 400 milliseconds, and your real execution time is 300 milliseconds, if you can speed the 400 milliseconds up to 50 milliseconds, the 300 might go down to 40, and directionally, you're still looking for the same things. And by asking for buffers information,

Starting point is 00:45:16 you're getting more information, you're more likely to be able to optimize your query. So I don't care as much as most that it's slightly different. I only care that including buffers doesn't change buffers. That's it. That's it. So for me, when I start working with buffers, I'm less concerned about timing.

Starting point is 00:45:36 Focus on buffers, optimize, optimize, optimize, and then go back to result and see, oh, timing has improved drastically because we reduced number of buffer hits and reads. Right, so that's it. This is very, very common path for me. Just reduce because you like sequential scan, lots of

Starting point is 00:45:53 buffer hits and reads. We are at an index, okay. Couple of, oh, three, four, five buffer hits, that's it. Time to look at timing. Oh, of course, expected a thousand times faster right i like i like you know i like buffers a lot but for me they'll always be secondary because

Starting point is 00:46:14 just because if you look at timings first and and last that's the main aim like the main aim is to get things faster and yes like, in maybe 99% of cases, it's going to be I.O. that's the limiting factor. But in some queries, there'll be a sort in memory. There'll be something that doesn't report extra buffers in explain. There'll be just in time compilation. Or there'll be some limit, like time compilation or there'll be some limit like maybe trigger timing just time compilation is off but all i'm saying or planning time like plan

Starting point is 00:46:51 actually planning time new you now do get buffers but uh trigger times just in time compilation there's like a few other examples sorts in memory hashes in memory like if some of these other like right this bottlenecks don't report buffers i would just why not like look at timings and then buff like i i also i almost always see query plans with both these days so it's that's great yeah yeah that's great that buffers are present if i'm not at some point i started saying if there are no buffers, I'm not dealing with this query. This is what I say to customers. Go and give me buffers because we are missing super important.

Starting point is 00:47:33 Okay, if you put it to second place, this place should be always present. I'm sure we've talked about this before. I think that's probably enough. I've got a talk that I gave a few years ago called Explain Beyond the Basics, which I would have thought we had covered some of the bits of, but actually I took a completely different path. I went down some of the less common types of issue that you can spot with explain, which we haven't really talked about. So I can include that as like a, if anybody wants to know a bit more of the specifics it's a bit nicer like that format is good because you can actually see examples of explain plans so it's probably it's probably a bit better video wise and yet the the reason i see buffers more often now is everybody suggests adding at you but also some of the tools some of the other tools as well like depeche added

Starting point is 00:48:21 it to the default thing that he suggests people get. So there's loads of people. That's news to me. I remember it was missing and it was a big disadvantage. He added it. It's great. That's good. That's good. Yeah.

Starting point is 00:48:36 Anything else you wanted to add? I think it's enough. Me too. Nice one. Thank you. See you later. Bye. Thanks so much, Nikolai.

Starting point is 00:48:45 Take care. Bye.

Postgres FM - Advanced EXPLAIN

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.