Postgres FM - Wait events

Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of PGMustard, and this is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, what are we talking about today? Hi Michael. Finally, it's my choice, and I chose a very interesting topic, weight events and weight event types. We have two columns, and we just have activity. I think it's still not very appreciated by backend engineers, first of all, because tooling is not perfect still in many cases most monitoring systems don't touch this area but they should actually

Starting point is 00:00:32 and i think it's worth knowing that this thing exists and how to use it so wait events i like it so every time i pick the topic it's boring every time you pick the topic it's exciting and interesting i see how i see how it works but the boring. Every time you pick the topic, it's exciting and interesting. I see how it works. But boring topics also need it, you know. I cannot say it's useless. Sometimes boring is even more useful. But in this case, we have a perfect combination of entertaining and useful topic wait events. Yeah.

Starting point is 00:01:00 So should we go into what they are first? Maybe why they were added? Times they can be useful? That kind of thing? Yeah. So should we go into what they are first, maybe why they were added, times they can be useful, that kind of thing? Yeah. So I've just learned from you that it was added in 96. I thought it was in 9.5. It means 2016, right? Or maybe 15. But I observed this, like, since my Postgres experience is quite mature, it's 18 years old already, I can tell you that most of my Postgres life, we didn't have it. I personally used Postgres more and more and stopped using Oracle in 2004-2005. Before I used it for a couple of years when I was working in some companies. And I just thought it's fine.

Starting point is 00:01:43 But then in 2014-15 some events happened interesting events and actually i'm not sure like how these events were connected but i observed it from being in the us and visiting russia back i saw that somehow it's like it was connected actually to politics as well and there was a big big move in Russia started to let's get rid of SQL Server and Oracle and use Postgres. It was after Crimea, first invasion, invasive actions from aggressive Russia against Ukraine. And there was an obvious thought that let's use Post, and Postgres Pro was founded. And at the same time, also at Yandex, I remember I was surprised that already for a couple of years, there was ongoing work to move, to migrate from Oracle to Postgres, but for economical reasons.

Starting point is 00:02:37 They said actually two reasons. Economical reasons, because if you want more machines, you need to pay a lot to Oracle, and also support and how they react to issues. It's very slow. And if you take Postgres, you can fix things yourself. This was second reason for them. But economic reasons were also very important. So we had a combination.

Starting point is 00:02:59 Postgres Pro was founded with the idea, let's grow Postgres inside Russia. And it was related to politics. Definitely, they have clients, which, for example, I personally cannot work with Postgres Pro since last February. I just don't talk to them, actually, because of their clients, actually, and connections and whole idea of... I will skip it. I don't want to say, let's skip this podcast out of politics, because this is also a mistake, I think. I agree. But I also don't want to spend a whole episode for this.

Starting point is 00:03:33 So what happened? A lot of interesting companies migrated from Oracle to Postgres. Yes. And the Alex was one of them. They wanted to. I think this is worth mentioning. And the reason this is related to wait events is that I was actually working on Oracle tooling around that time a bit before. And it wasn't even performance-related Oracle tooling. And I was aware of how much Oracle users loved wait events and a product called Ash.

Starting point is 00:04:00 Yeah, exactly. So even I was aware of how much people love this feature in Oracle. And the idea of moving to a system that doesn't have some features that you absolutely love is a tough, like sometimes you're was a big driver for let's add the feature so that we can migrate or to make the migration less painful or maybe after migrating because it's so painful not being able to have them. Does that sound about right? Right, exactly. I remember because I was actually doing quite successful meetups in Moscow and St. Petersburg at that time. I stole some ideas from George Berkus observing it in San Francisco and brought it to Moscow. And at meetup.com, we quickly started to be in second place below New York, but above San Francisco. I was very proud of that.

Starting point is 00:05:01 But so people coming from Oracle experience, they started to say, you know, Postgres is great, but why don't you have this, right? They named several things. A mug of them was declarative partitioning. So we cannot work without declarative partitioning. You have some partitioning with inheritance, but it's so silly to be at this point.

Starting point is 00:05:23 Let's have declarative partitioning. And wait events, we cannot analyze performance well. PG-STAT statements is not enough. A log analysis is not enough. We need to have wait events. At the same time, we discussed that JSON maybe was contributed a lot, like Crimea happened, right? And Postgres Pro, Russia decided to use more Postgres. It's important. But at the same time, JSON was added. And also RDS Postgres created, like around the same years, like 14, 13, 14, 15, these years.

Starting point is 00:05:55 And observing all monitoring systems, I liked this exercise to observe various monitoring systems. RDS implemented performance insights, making the bet that this is default tool for query performance analysis in RDS, performance insights. And performance insights is Oracle Active Session history, basically. These colorful graphs, which work with these wait events from Postgres.

Starting point is 00:06:19 But to work with it, you need to put it to Postgres first. And this was done, if I'm not mistaken, by Amit Kapila and Ildus Kurbangaliev. Ildus was working at Postgres Pro at that time. And it was released in 9.6

Starting point is 00:06:36 in 2016. Right? That is such a good point. It was a reaction to criticism. We should have it in Postgres. This was influenced by former or current Oracle users who started to work with Postgres. Declarative partitioning and this. Yeah, and I think some performance features as well, like parallel queries. Many other things as well.

Starting point is 00:06:56 I just want to name big things, very noticeable. The Amazon insight is fascinating because Amazon, of course, are also a famous Oracle user themselves that migrated to Postgres. So maybe that's like something… To Aurora version of Postgres, but yes. But still with performance insights. Of course. And still with wait events, of course. So it's, yeah, really, really interesting intertwining.

Starting point is 00:07:20 So actually, I don't think we've actually covered what wait events are and how they can be useful. Yeah, let's do it. We have documentation in Postgres. It's called cumulative statistics. But some parts of it, it's not about cumulative metrics. Pages of activity is not cumulative. It has timestamps and some kind of metrics. So, for example, process ID, query, some timestamps.

Starting point is 00:07:42 And it has two columns right now weight event type and wet event wet event type is it's more high level thing only roughly 10 types exist nine maybe and weight event it's like more fine-grained thing in which type we have many many many actual events possible and what can it give it to you It can give you the knowledge about, okay, this process ID, this Postgres backend is working with this query right now, but what is it doing? Is it doing some IO or some CPU work? Or is it waiting to acquire a lock on some row, for example, or relation, like heavy lock. It's called log, but it's heavy log. Or it's waiting for some lightweight log acquisition, which is actually latches, but it's in-memory physical log, right?

Starting point is 00:08:31 So we talk about logging some pages in-memory buffers, buffers again, right? In this case, you will see in wait event type, colon, LW log, lightweight log. And this can be quite, sometimes quite difficult to optimize and so on i remember inside a96 actually also i think it's maybe also connected to this observability improvement there was a big improvements in terms of how the buffer pool is working i remember a post from karatkov also Also, together with my SQL folks,

Starting point is 00:09:07 like let's achieve 100 million TPS on select only workloads on one machine. And to achieve that, they need to improve the buffer pool behavior to get rid of long lasting LW log in wait event type. Like you run Pidzhi bench and you see a lot of wait event log. So they have this raw data in Pidzhi setch and you see a lot of light, light, light. So they have this raw data in PidzhiSetActivity, and it's already cool.

Starting point is 00:09:29 But to work with it for end users, you need additional tools right now. Because despite of the fact that this chapter is called cumulative statistics, Postgres doesn't provide you cumulative metrics for this type of information, only current view. So you need to use some sampling to analyze it. Yeah.

Starting point is 00:09:49 You mentioned actually quite early on that a lot of monetary tools don't offer this. I was reading a really good blog post by the team at PG Analyze who do, who added this a few years ago. And I'll share that as well because I think it's quite useful, even if you're not using that product, to get an understanding of the kinds of things this can do because once once you're sampling it you can see those cumulative numbers and you can see what your standard workload looks like and then if you've got any anomalies you can start to dig into what are causing those and you start to see patterns yeah

Starting point is 00:10:19 in general i i usually not fully satisfied with any monitoring because I think only few implement very good query analysis in general. It should be done systematically with many metrics involved, and you should be able to analyze many dimensions of query analysis, including total time, IO numbers, buffer numbers, and so on, averages, total numbers, and many things. And weight event analysis is a very important part of query analysis. It's very orthogonal to what the PGSAT statements or log analysis may provide. And only a few are exactly implemented. RDS is a good example. I don't remember implementation of PGAnalyze. I need to revisit.

Starting point is 00:10:59 Maybe it's also interesting. But I would like to mention, back to times before 9.6, when we didn't have weight events in PgSet activity. What happened? Okay, we have some performance issues. We identified some queries which are concerning. For example, we see they are taking three seconds. Like at night when, for example, cron job is running.

Starting point is 00:11:21 During daytime, we check these queries manually right on production because we don't have DatabaseLab installed. Okay. And we see three milliseconds, like a thousand times faster. What's happening? We include buffers. We see only a few buffers. Okay. What's happening at night? We enable maybe explain, auto-explain.

Starting point is 00:11:40 Auto-explain, yeah. Auto-explain with buffers enabled to understand IO numbers. Okay. We see three seconds. I'm describing recent real case, actually, if you already maybe guessed. So we understand, okay, three seconds and only a few buffers, like 200 buffers or maybe like 100 buffers involved. What's happening, right? We don't understand.

Starting point is 00:12:01 We suspected maybe at night we have some issues with dead tuples or something, and I would expect a lot more buffers involved. Check? No. If we have DatabaseLab engine installed, we can point-in-time recovery to proper point and try to explain, analyze buffers, look at it under some angles. But without concurrent workload, you probably won't notice a real problem. this tooling, right? Auto-explain, PGSubstatements doesn't explain where those like, okay, we can enable track IO timing. And at least see was it related to IO or not

Starting point is 00:12:37 IO, which is good. We should do it with double checking. I don't remember the tool. Postgres binaries has a tool to check timing overhead on concrete machine. PG test timing, I think. Yeah, yeah. So usually it's, you can afford it. And usually we recommend to our clients, even with very heavy loads to enable track IO timing, to have it in PG test statements and explain everywhere. But still, it only can tell you, is it IO-related or not? It's just two segments only. We need to dive deeper and understand what happened.

Starting point is 00:13:12 Maybe we had some lock waiting, like heavy lock waiting, or maybe we had some contention in the buffer pool, or maybe something else. Maybe clients are very slow and we are waiting for communication with clients. And this is exactly where WaitEvent helps. It extends query analysis from only PgStats statements, cumulative statistics, metrics, and log analysis, like AutoExplain, which is quite advanced technique as well. And I'm glad PgBedger log analysis tool still exists, developed,

Starting point is 00:13:47 and it supports auto-explain, actually. It's good. We move from them to different dimension, orthogonal to others, and we look inside. We could look inside using perf, for example, and also answer questions, where does backend spend time executing this query? But it's hard to have it all the time, right?

Starting point is 00:14:11 Actually, there are attempts to have using eBPF and so on. There are attempts to have continuous observability, perf-like, flame graph-like, and so on. There is Coroot, new monitoring. Yeah, that's cool. Old friend of mine is developing. Actually, they developed OMeter before that. OKMeter, in my opinion, was the best implementation of query analysis for Postgres.

Starting point is 00:14:32 So, Corout, check it out. It's interesting. And they recently integrated with some tool which provides flame graph-like analysis continuously. In this case, maybe even without wait events. Actually, you can do it with eBPF as well and try to understand which part of Postgres code, where did you spend time for executing this query? But with wait event analysis, you have official way, but you need extra layer of tooling to have cumulative metrics

Starting point is 00:15:00 and to understand historically, for example, at night, what happened. In this case, if you don't have it, my advice is to implement some poor man's tool. You just sample it every, I don't know, 200 milliseconds or half a second. Or even a second, right? You're getting more information than nothing at that point. Yeah. Sometimes people implement it in an application layer.

Starting point is 00:15:25 If you have Ruby or Python, you can do it there. And you can even combine it with additional dimensions, like which module of our application was that, or URL or part of URL. And so you can easily localize this problem and improve it. But in general, you need some sampling. If you have original Postgres, if it's not RDS, which has performance insights, if your monitoring doesn't provide you, you need some sampling. Or you can install PgWatch Postgres AI Edition, which I still think we will probably bring to new PgWatch version, which is under development, as I know.

Starting point is 00:16:01 And in this case, you will... Ah, and you need to install extension PG-wide sampling. So there is extension PG-wide sampling developed by Alexander Korotkov when he was working at Postgres Pro. So it's extension again from Postgres Pro. Or you can use also PG Sentinel, which is additional extension

Starting point is 00:16:17 from an engineer who, as I know, works at AWS team. I might be mistaken. Sorry, I forgot name. Sorry. We actually interact on Twitter and so on quite often, but I'm very bad with names. Sorry.

Starting point is 00:16:32 So you can use either PgSentinel or PgWetSampling to have better sampling inside Postgres. If you can install extensions, that's great. Because, for example, PgWetSampling, by default, will sample every 10 milliseconds accumulating everything inside you won't miss anything actually with this approach right because if you do infrequent sampling you can miss important information if your query is very fast you have fast spikes very acute you can miss it i'm guessing the overheads

Starting point is 00:17:02 pretty low of this kind of thing oh speaking of overhead you can you need to take care of proper cleanup for pgc it can be very detailed information per process id even but you usually don't need it you need it per query id to be able to draw a rough pie chart using weight event type or detailed pie chart fine-grained using weight event and understand okay this query usually spends this oh it also has history so you can understand what happened at night finally with this query and if you install pg white sampling pg white sampling plus pg watch 2 policy edition you will have colorful graphs similar to performance insights and everything is open source and free right now.

Starting point is 00:17:46 So it allows you to build your, like, it can be your second monitoring. For example, some of our clients, they have Datadog. And we, in terms of consulting, we tell them it's okay to have second monitoring. Sometimes even third. Because with Datadog, you don't have it. And Postgres is improving. Database monitoring is improving with Datadog you don't have it and Postgres is improving database monitoring is improving inside Datadog but still very far from being ideal so it doesn't hurt actually to have this monitoring probably using pool approach and so pgwet sampling collects

Starting point is 00:18:18 samples every 10 milliseconds and then your agent being outside of your Postgres machines pulls this data every, lazily, like every 10 seconds or even every minute. But since we have internal sampling very frequent, we observe all queries quite well, right? So this can be improved inside Postgres, I think. These metrics should be probably start to be cumulative. There should be some efforts maybe ongoing, do you know? No, I don't. I think I got excited briefly while I was preparing for this, but I was getting confused with the Postgres 16 recent edition of PGStatIO. So I think there are...

Starting point is 00:18:56 Oh, it's different. I know it's different, but I briefly got confused. So it feels like people are investing in these system views, though, for better observability. So hopefully at some point... Yeah, that's great. Worth mentioning that this is not like from zero to one and that's it.

Starting point is 00:19:14 When wait events appeared, this number of wait events tracked were much smaller than right now. Some Postgres versions had improvements since then and it's growing area yeah we were just looking at the docs weren't we and i think there's like well over 100 different types um definitely yes so a lot of them so if we if we do start monitoring this ourselves like are we going to even notice any overhead is it going to be like super low it depends on how you do it with pg wet sampling you can have some overhead you need to take care

Starting point is 00:19:50 of cleaning up not to exceed some memory allocated for it shared memory located so yeah i don't remember details and but you need to pay attention to this area but if if it's properly tuned overhead is very low it's just something that collects data from pg start activity that's it so it's not a big deal if you do it yourself like poor man's implementation on your application code like every second or so well it's just some query to pg start, I just must warn you that you cannot use now. If you use now and apply it to calculate intervals with, for example, query start or exact start when transaction started or something like this, you sometimes will have negative values. Because while you are reading pgStartActivity, pgStartAct activity is not a table, not a regular table.

Starting point is 00:20:46 You're not dealing with some consistent snapshot. If you have a lot of backends, a lot of entries in PGC activity, while you are reading them, your time is shifting. And first timestamps and last timestamps, they are from different points in time of reality. So you need to use clock timestamp. So every row you will have new. Now, function now in Postgres, it provides you timestamp for the beginning of transaction, your transaction. So it's always some constant single value for all. But if you use clock timestamp, you have it for each row, new value each time. There is

Starting point is 00:21:23 overhead of calculating new timestamp, but it it for each row, new value each time. There is overhead of calculating new timestamp, but it's quite low also. Cool. Good tip. I'm thinking so we can monitor this and then look back historically. We can look at active connections and see like diagnose issues live. And the other big use case I saw for this, see people using this for like diagnosing complex issues, like if you've got locking issues, I guess, dead locks and things like that. Well, if you see, for example, IO, wait event type IO, and you can understand the reading or writing, you already understand that you need to pay attention to your disk subsystem maybe to upgrade your disks

Starting point is 00:22:08 give more power more IOPS, more throughput it's obvious. Or more cache like reducing the load on disks exactly, good point you can attack it from this angle so you allocate more memory you increase shared buffers

Starting point is 00:22:24 value to have bigger buffer pool. And in this case, less IO is needed. So also a good approach. Sometimes you need to combine them. If you see lock wait event type and check which lock event is there. So in this case, you just need to redesign your workload, actually. So to get rid of long lock waiting. So in this case, you just need to redesign your workload, actually, right? So to get rid of long log waiting, for example, very good trick is to use select for update,

Starting point is 00:22:54 either skip locked or no wait. So either you fail if in case of collision on some log acquisition for some row, for example, Or you just skip those rows and for processing, get next available, not yet locked. This is good for parallelization of some batch processing, for example. If you see lightweight lock, this is sometimes maybe the trickiest part. There are several lightweight locks.

Starting point is 00:23:20 It's about memory, work with memory, buffer pool and so on. And sometimes you need to optimize Postgres itself sometimes. As I've said, in the 96 big improvements happened. But I would like to mention an interesting case I recently had with certain lightweight lock spikes. Still not solved, the work is ongoing. But what was there? Like we observed lightweight logs, particularly log manager. It's called log manager. And I just like had no idea originally what to do.

Starting point is 00:23:51 And here I must say big thanks to the AWS team because they have very, very good documentation, which Postgres itself doesn't have. Postgres documentation, this area is very dry, right? So very limited, So, very limited. Very, like, reference style. This, very small description. Of course, you can go to source code and understand more.

Starting point is 00:24:14 Always. Even if you don't understand C, you can read some comments and it's very helpful. If you search for a particular log, wait event, I mean. No, sorry, wait event. You can find or lightweight log as well, you can find interesting information. But AWS,

Starting point is 00:24:31 RDS and Aurora documentation, I originally referenced only Aurora documentation about wait events, but Aurora is much more heavily modified Postgres. And Jeremy Schneider from the AWS team mentioned that RDS documentation also has it

Starting point is 00:24:46 similarly, and it's closer to Postgres, definitely. So you can check there what is written there, and they have more like how-to and explanation-style documents there for each log, maybe not all of them, but many of them. And if you check Log Manager, for example, they provide some example how to actually have it in synthetic environment. They also discuss what's happening. They discuss some constant and postgres code which leads to it. They discussed fast path in pgLogs fast path and what it is why it is and so on and they provide some very very useful tips how to get rid of this type of issue this type of spikes so they have more like practical knowledge

Starting point is 00:25:34 there and i wish postgres documentation original one also be improved in this area there is a big opportunity here yeah i saw your conversation with jeremy i thought it was great. And I'm going to share, in fact, I already shared the link in my newsletter just earlier today, but I'm also going to share it in the notes here. It's worth mentioning that the individual wait event types

Starting point is 00:25:57 are clickable and you can dive into them for those like deeper explanations. Oh yeah, I usually understand this. I like spending so much, so many years in the area of uxl like social media and so on like i still sometimes miss this yes indeed it's not obvious that they are clickable and there are gems inside yeah so yes you can click any of those wait events and see a lot of info inside but usually google provides good does a

Starting point is 00:26:26 good job when you search for some wait event you can find aws talks good point yeah and worth mentioning uh recently in seattle they had a meetup on this topic and jeremy as i understand jeremy was presented with went and performance improvements around this. Need to get him on Postgres TV open talks unless it was recorded.

Starting point is 00:26:49 That's a good idea. Like I will double check. I think it would be interesting to watch this talk if it's available online and if it's not available

Starting point is 00:26:57 I should ask. So just a little bit kudos to AWS team for this wonderful documentation and if you don't use AWS still you can benefit from it as well. Yeah, absolutely. Anything else you want to make sure we covered?

Starting point is 00:27:11 Well, maybe that's it. So sampling, poor man's approach works. Just sampling in a loop to some small bash or in your code, in your Python, Ruby, Java, any code. PG-Wet sampling extension, if you can afford it, or PG pg white sampling extension if you can afford it or pg sentinel if you can afford it and some monitoring if you use just extension you still need maybe some sampling or you need some scripts to analyze its content it's good to visualize it so you need to either put it to your existing monitoring which can be a big task or you need

Starting point is 00:27:43 to use pg watch to postgres ai edition or or you mentioned PgAnalyze. Also PgSentinel, I think it's used in some, I don't remember the name of monitoring, we will attach it to third monitoring option to have it. Or if you are RDS or Aurora user, you already, I'm 100% sure you know performance inside because they put it on the first page of everything. It's where query analysis starts for RDS users, but it's not so for regular Postgres users yet. I think it should be also improved. So PGS stat statements are great, but they don't answer all questions sometimes. No, including these ones.

Starting point is 00:28:22 Wonderful. Actually, vice versa. Wait event analysis quite often doesn't answer all questions as well yeah you need both it's the combination yeah one more thing if you like ad hoc tools there is also a patch viewer yeah available on github also from some russian developer so there is a it's a Java app. You can just download it and connect to your database and start having these performance insights like colorful graph with some details. You can check and see exactly which event,

Starting point is 00:28:55 wait event type, which event, which queries are involved. But you need to keep it open. So it's like an ad hoc tool. And finally, PgCenter by another Russian developer, Alexey Lysovsky. This tool is more like console style,

Starting point is 00:29:11 but it also has sampling for wait events. But more for folks which prefer console terminal over UI, graphical UI. So there are some ad hoc tools as well, which can be helpful. But without sampling, well, good luck with catching this knowledge from pgsetactivity. So I hope future Postgres versions will have some really cumulative metrics. I don't know if it will be an additional table with 100 plus columns

Starting point is 00:29:45 or something else. Overhead is also a question, but I hope it will be created and it will become standard for all Postgres cases. Awesome. Thank you, Nikolai. Thank you, everybody. I hope I told something useful for someone.

Starting point is 00:30:02 Guaranteed. I learned a lot as well. As usual, thanks to our listeners we see feedback we appreciate it a lot we also appreciate when you put likes to videos or to podcast systems you use and also please share with your colleagues who need to optimize post guests they had very interesting topic I think in my opinion related togres, they had a very interesting topic, I think, in my opinion, related to query optimization. We had a query optimization topic in the past in monitoring. And now we just had some kind of deep dive

Starting point is 00:30:33 to one of the areas. I think if you deal with Postgres performance, you do need to understand this topic. So some interesting topic, I think. Okay, I hope I entertained also a little bit, at least. I find this topic very useful. So please share with your colleagues who you think needed. Nice. And please vote on which episode you find the most boring. We need to have some charts. Most boring, less boring. Yeah. Okay. We already have how many? 30?

Starting point is 00:31:03 I think this is episode 36 i need to check six wow cool cool without any weeks missing not yet okay good okay thank you for that as particularly that's it's always great to chat with you until next time bye bye bye

Postgres FM - Wait events

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.