Postgres FM - pg_wait_tracer
Episode Date: April 24, 2026Nik and Michael are joined by Dmitry Fomin to discuss his new tool pg_wait_tracer, as well as changes that could be made to core to allow wait event tracing with lower overhead, and on manage...d services. Here are some links to things they mentioned: Dmitry Fomin https://postgres.fm/people/dmitry-fominpg_wait_tracer https://github.com/DmitryNFomin/pg_wait_tracerpg_wait_sampling https://github.com/postgrespro/pg_wait_samplingpg_10046 https://github.com/DmitryNFomin/pg_10046Jeremy Schneider reply on LinkedIn https://www.linkedin.com/feed/update/urn:li:activity:7414966981847748608RDS for PostgreSQL wait event docs https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL.Tuning.concepts.summary.htmlCustom wait events for extensions (added in PostgreSQL 17) https://www.postgresql.org/docs/current/xfunc-c.html#XFUNC-ADDIN-WAIT-EVENTSHacking Postgres with Dmitry, Kirk, and Nik (Part 1) https://www.youtube.com/watch?v=3Gtuc2lnnsEHacking Postgres with Dmitry, Kirk, and Nik (Part 2) https://www.youtube.com/watch?v=6kqpjnpl5GcTanel Poder https://tanelpoder.com/about/USDT static tracepoints for wait event tracing (proof of concept by Nik) https://github.com/NikolayS/postgres/pull/18Add wait_event_timing: Oracle-style wait event instrumentation (patch by Dmitry) https://github.com/DmitryNFomin/postgres/pull/1PgQue benchmarks https://github.com/NikolayS/pgque/blob/main/docs/benchmarks.mdThe Art of Computer Systems Performance Analysis Techniques for Experimental Design, Measurement, Simulation and Modeling (by Raj K. Jain) https://www.researchgate.net/publication/259310412_The_Art_of_Computer_Systems_Performance_Analysis_Techniques_For_Experimental_Design_Measurement_Simulation_and_Modeling_NY_WileyPerformance modeling and design of computer systems queueing theory in action (by Prof. Mor Harchol-Balter) https://www.cs.cmu.edu/~harchol/PerformanceModeling/book.htmlOracle Performance Firefighting (by Craig_Shallahamer) https://www.abebooks.co.uk/9780984102303/Oracle-Performance-Firefighting-Craig-Shallahamer-0984102302/plpProcess Mining: Data Science in Action (by Wil van der Aalst) https://link.springer.com/book/10.1007/978-3-662-49851-4~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
Transcript
Discussion (0)
Hello, hello. This is PostGos FM. My name is Nick, PostGSIIIII. And as usual, my co-host is Michael Pigeamastert. Hi, Michael. Hi, Nick. And we have a great guest today, Dmitri, who is a very experienced DBA, originally Oracle, now PostGus, already many years. Hi, Dimitri. Hi, hi, hi, thanks for inviting. Thank you for coming. So we have, I think, very, for me, it was surprised when you did it, when you built this tool. And I didn't expect that.
that it's coming to post-Gres ecosystem.
But obviously it's something new
and in the area which is in like my personal key interests lately.
So active session history analysis, weight event analysis,
but you did it very differently compared to what we usually did
with PG weight sampling or other tools,
performance insights in RDS.
So it's a great tool. Where to start?
Yeah. So I think we can start with the time
when I first start working with Postgres.
I started switching from Oracle.
When Oracle, one of my favorite topics always
and the tasks was performance-related investigations.
And when I switched to progress, of course,
I start looking for similar tasks
and I understood that the way of post-gustrable shooting
is very different from how it's done in Oracle.
And definitely the felt lack of tooling,
lack of standard approach and so on.
Okay, so after some time,
I realized that there are some tools that can bring similar feeling, let's say, what I had in Oracle.
And I'm talking about the weight event analysis because I know, in Oracle world, it's a given thing.
Yeah.
I'm going to interrupt.
So when did this switch happen from Oracle to Postgres for you?
Recently I checked this.
It happened 10 years ago.
2016, roughly.
I remember very well a big stream of like big.
wave of people coming from Oracle in 2013, 14, and all of them were complaining about lack
of weight of event analysis. This is how two columns were added to PGSID activity. It was like
maybe a couple of years before basically your switch. And you also had this same feeling like
these tools are needed. Yeah, because when you work with Oracle since I don't know, 2002 or something
like that, it's a given way of troubleshooting. Any Oracle guideline, any book, any training, any training,
official, non-official says just open the active session history and so it's a it's a
it's a gate that you go through and you start with and so I try to use the same approach with
Postgres because for me it was so obvious why it's why it's a way to analyze the performance
because basically what's the weight event says you say you so why it was slow where the
session spent the time session or database in general so you do not need to
to guess was it's IO or CPU, basically where it spent time, so it just says very clearly.
So then why it happened that way, so you need to investigate. But at least you know where.
Because the regular approach in progress just can say that this query was slow. In average,
or with some hacking, you can get some precise numbers of the execution, but it does not
give the idea why it was slow. Okay, yes, you can use the explain analysis.
to see where in the plan you spend the time, but again, it does not explain why.
Sometimes this joint takes, I don't know, one second, sometimes 10 seconds.
From the plan perspective, it looks the same.
And the tooling for weight event analysis evolved year by year.
And so we have the PG rate sampling.
And since they introduced the new event tracing as well, so it became really
tool that give really good data, a good source for weight sampling analysis.
And I use it everywhere where I manage Postgres.
And if it's self-hosted progress, for sure, I will convince the owner of this
Postgres to install Pigeavit sampling.
It's a default way of analyzing.
On all my doorboards, it's the first panel you see when you open any dashboard that I
created.
So it's a weight event analysis.
I convinced all my colleagues to use this approach.
You convinced me to put this ash graph on the top of the first dashboard, which I resisted initially because the first dashboard for me was seen as very shallow and very wide troubleshooting where you identify the areas.
But you convinced me to put ash graph on the very top.
And now I'm convinced.
Yeah, it's a great idea.
So it very quickly helps you understand some pain points for performance.
Yeah, I have a, I will try to make it brief two examples when it's the weight event analysis shines.
So one was there was a migration, hardware migration from, let's call it this old hardware to new hardware.
The data files stored in the storage appliance, so we just remounted.
So the data layout was the same.
Everything was the same in terms of the data layer, storage layer.
but the hardware, the compute part, changed, and changed to the highest spec.
So the number of virtual cores increased, the clock, increased everything increased.
All tests were good.
When we start loading production, so we see the really huge degradation performance.
And so what you would think, so the plump sleep or something like that.
But you have the same, everything is the same.
Just hardware is different, so it basically cannot be planned,
Because the parameters are the same.
Your slide deck has a very interesting example, very simple,
so anyone can understand about contractors doing some work.
So in this very example you describe right now,
it's like basically you hired a different team and do the same work,
but it does it very differently, right?
Yeah.
So that example, I just to explain what like those who haven't seen.
So it's a simple, like there are contractors.
They say, we are busy.
We spent so much time.
And without weight event analysis, you just know they were busy.
But what they exactly were busy with, you don't know.
And in this case, okay, they were busy with IO or something else, right?
Yeah, yeah, exactly.
Switching hardware, you see different picture and so what was in the end of that story?
So it was a problem was LV lock manager.
So, you know, it's one of your favorite LV LLLOTF.
A lot of dots are connected here because we spoke about so many problems which are explored with weight event analysis on this very podcast.
So right, Michael?
You remember lock manager?
Yeah, of course.
It feels like whenever there's like a system-wide issue, it's such a great starting point, right?
That's, sorry.
And that's the reason next case when I would say that's, uh, low level.
I disagree with Michael because I agree on one side, yeah, system-wide.
But on the other side, more and more, I think, we just don't have this tool yet,
but it should exist.
On PG-Master, we should have it.
I mean, we have plan analysis.
And for one call, it would be great to see weight event profile as well.
Because imagine we were holding, like we were blocked by lock acquisition, which were pending.
You will see a lot of time spent, buffer numbers are low.
That's it. You need to guess. With weight event analysis for a single query execution, you will see, okay, this was log weight event type.
But wait, but I think this is where we get into the core of what Dimitri's done, which is cool, which is moving from sampling to tracing.
I think when it historically has been reliant on. I do think it's really important. Maybe I'm wrong, but my understanding is.
I also thought this, like this is what expectation I had as well. It's about a single backhand tracing. But let's, let's.
Let's move on and hear what Dimitri will say.
Exactly.
Yeah, just the next case when the weight event really saved a lot of time.
And when this single backend analysis, I mean, a single backend analysis case, it does not work only on the instance level.
It's on the single back end it also works.
I would have a case.
I got the complaint that sometimes query takes from few milliseconds and it's killed with one second time out.
We have, I don't know, 100 calls.
10 of them failed with one second timeout.
And the query is primary key lookup.
So you cannot, they cannot mess up with a plan for this case.
So it's just B3, primary key look up, very simple.
It's not join nothing, just very simple look up.
And it's not lock manager contention because exactly lock manager contention,
in many cases, it's primary key lookup.
It was not that.
Yeah.
Okay.
So you open the dashboard, so you don't see any.
spikes of any specific weight events. So it's definitely very low level with some specific
backend problem. So I also captured just in case the explain analysis, the plan was the same,
so everything was the same. But sometimes it goes from few milliseconds to one second,
and I start sampling. So I make some custom scripts that will just catch the needed back end
and start sampling. And I saw that when it's slow on AIO. I mean, I saw the, I saw the
I mean, look, how it can be, just a very simple read. So it just reads very few blocks from disk.
So next step, so I understood with the weight event analysis, it's nothing else but just slow I.
for some cases. And with the EBPF and tracing these backends, I understood that's the problem with
the long tail latency on the storage appliance that we use. Sometimes it goes from one millisecond to
tens milliseconds. And indeed, if you're not lucky enough and you need to read these blocks
from disk, not from memory, not from OS cache, these even few blocks become seconds. And yeah,
and I think without weight event analysis, it will be just guess game. But this just in a couple
of days, I found the problem. A couple of quick questions. One is around, like you can get,
timings in explain and analyze output.
Obviously, it's a bit tricky if they're always getting cancelled.
I don't know how you got the explain and analyzed for the queries that were timing out,
actually.
How did you do that?
I just increased the timeout for some proportion of them.
To make this finish in a few seconds, yes.
DBA should have IO timings.
Amount zero, right?
Yeah.
Did you have IO timings on the server?
Yes, I do.
but when you have 1K calls a minute of this time and just very few fails, you cannot track it.
It's one of my favorite topics that averages.
It's every time.
So you have to have to have P99 and better more nines to see the actual picture of the senior system.
Just to push back, I think this is a really interesting topic.
I think actually in this one specific case,
or to explain might have been okay.
Like catching the, that's where it can be really good
in terms of catching the outlier queries
that go beyond a certain thing.
As long as you've got the IO timings on,
in this one case, we might have seen,
oh, IO timings are the issue for these slow queries.
But I'm sure there are other examples
where the weight events would have shown something more interesting.
But weight event will analysis,
or not, auto-explain, we'll show you the plan,
will show you timing if you enable it,
we'll show you buffer numbers if you enable it.
Good.
But it won't show you weight event profile.
Although now I think, why not?
It's a good idea probably.
This is maybe, yeah, even maybe better than for manual explain to have it in auto-explain.
I think you're right, but I think I was talking about track I-o timing.
You know the parameter track-I-O timing?
Yeah, I-O timing, great.
But I-O-timing is working, first of all, it's working at high level.
But what if it would be not I-O-O?
Yeah.
IPC or a light with local or lock or something like full picture would be great to have, right?
Also, weight event, we talk like this is a weight event type level.
We have how many like seven, nine weight event types, but how many weight events?
Hundreds.
So it's very precise understanding what the system was doing.
So I think, just imagine for auto-explained to have a weight event profile logged optionally.
I think it's a good idea for those who want to hack Postgres, right?
Yeah, me too. And I think it probably transitions onto like, why don't we have this? What are the downsides? What are the overhead? What are the tricky parts of this?
And we will go there. Yeah, because it's definitely a thing that I cannot take out of my mind for weeks already. And so this last case with this single backend problem, I reflected this experience. I spent a couple of days on it. And I thought so what tooling can help me with it to,
time to finish this investigation in hours and better probably even enable the people to do it
instead of DBAs. And the Oracle has very simple answer for this type of problems. It's the 146 signal.
It's a tracing when you enable it on a certain level. So it writes everything in the file,
every single CIS call, every single topple, I mean the raw joint, everything, I mean very detailed,
like real tracing.
And I decided to
create the PG-146
extension that will
do it in Postgres.
And the prototype was really good, so I really
proud of it. But for weight events, Oracle
also traced the weight events.
And I did the same, but I had to
sample it because I didn't find by that time
any tracing approach for weight events in Postgres.
I know when I start posting on
LinkedIn this, Jeremy Schneider
came and commented. So yeah, while it's really nice, you cannot call it 146 because
weight events are not tracing. Very true, but there is no way in Postgres to do it.
Yeah, yeah, one second. I just wanted to send big hello if Jeremy Schneider listens to this.
And I hope one day we will have him on this podcast. And Jeremy Schneider is important because
he was like making it popular, blog posting about Ash a lot. And he was in the Ard
team when they also started to do this like performance insights and so on.
So it's great that you connected over LinkedIn with him.
I think was also involved in the very good weight event documentation that RDS.
Oh yes.
We just discussed with me off line a lot.
Yeah. Yeah.
Yes.
I mean, it's really a lot of kudos to them for this work.
Okay.
And again, I was brainstorming with all LLMs that I had by that moment.
and I start complaining.
Sometimes I talk to them, like, to real people.
And I was complaining.
So, yeah, I had a really nice extension, but lack of real tracing, it's a big caveat.
And the plot said me, yeah, but you can trace.
And I said, no way, how.
And he explained me details, how I can really trace the weight events.
And for two days, I challenged this idea a lot for reason.
it could not be me the first who found this.
Not me, okay, the clot found, but I'm not the first guy in the progress world who used this.
Either this just does not work or it was basically the only idea, so it just does not work
because otherwise someone should find it before.
Anyway, two days of testing this idea showed me that it basically works.
And I started prototyping the tool.
It's still in a very immature.
state, but it's already
showing nice pictures
and already can help
to find some
interesting, I'd say, correlations.
For example, I found that
most of
extensions now still use
extension weight event.
And when extension does something,
you really cannot distinguish
it from doing a real job
or it's just a weight loop.
When I use
PJ weight sampling, for example, to cross-check the
numbers that I get from my tool and from PG weight sampling.
I understood that the background of a PG weight sampling,
just even it does not, does nothing because it works in the background and there is no
job for me.
It's still in a waiting state in that.
And so I think since PG-17, you can have custom weight events for extension.
And I really hope if someone listened, just please use the custom.
weight events in your extensions that it will really help to understand is your extension busy
with doing some valuable work or it's just spinning a loop right this is why we in my pgash also
listening to dmitri i also spend some time analyzing colors so green is what's usually called
CPU and this is why in pgash and in our monitoring tool CPU has asterisk so CPU star star means that it's
CPU or maybe not. And we also discussed it a few times on our podcast. And yeah, so this is exactly why. So some extensions. Yes. Something. Yeah. I was just thinking whether PG-Ash would show up in PG-Ash with its own custom weight events. But then I remembered it's not really, it's not an extension. So it can't. Exactly.
Fully. Sorry. Yeah. So that's basically the story how this came to life. So want to me to show it?
We cannot show it on podcast, unfortunately.
Oh, okay.
But just to recap, like you complained to LLM that something is impossible,
and LLM said, let's do it.
Yeah.
And this is how it was.
Basically, it is.
This is crazy.
It means that it's good to sometimes complain.
It would be good to have, unfortunately, we cannot.
And then maybe you just don't know how it's possible.
Yeah, brainstorm with the LLM's, I think.
It's one of the best features, yes.
Similarly, our monitoring, I remember I was talking to you and you said, oh, it's really hard or maybe impossible to show the beginning of query in query analysis in Grafana.
And that's it.
Oh, you mean query text.
Yeah, we see the chart, we see query ideas, but it would be good to see query itself.
Is it select or something?
So I just went and did it with AI.
And it's there right now.
because we had limiting belief like it's not possible.
It's interesting.
So, yeah, the AI helps here.
Sometimes it doesn't help, but here definitely helps to explore new options.
And yeah, create something.
Yeah, the new idea of validation that it's also one of fantastic features,
you can really fast check the idea if it basically working or not.
Of course, bring the mature software, will take,
time. So I mean, an LLM cannot do it in one night. But prototyping the idea cross-check.
So this new tool is tracing. So it's true, like it won't miss any weight event or it still
has some sampling. Because when you showed me the pictures, which look amazing, I still think
you need to post those pictures to add those pictures to read me because this is like super
impressive. And it was a micro-second precision. I was like, my mind was blown. So it's really crazy.
But then we started talking and Dmitri actually came to our hacking sessions, which
happened on Wednesdays, online, on YouTube. So a couple of times we sat and thought about how
to do something. And that's something was about changing Posgous, right? So why is it needed
to change Posgues for this tool?
And here we, no, for this tool, we do not need to change progress.
But the thing is, this tool has overhead.
Abduver effect, right?
Yes.
In more or less, I'll say if you can call it regular load, not pathological,
let's call it an acceptable level, like 10% something plus minus.
So you pay overhead for every weight event transition.
So when you get from one to a one,
And if you get more transitions per second, so you pay more overhead.
And the maximum that I got was 220K transitions per second.
And in this case, yeah, 220,000 transitions.
Yeah.
And so in this case, my overhead was basically 30% of the query duration.
So was it with very tiny buffer pool, shared buffers, tiny share buffers?
Yes.
When it read, that's why.
it need to go to the OS to get the block,
but block is in OS cache.
So that's why it's really fast.
But you need to, almost for every block, you need to go.
Yes.
And even for evicting, so you need to pinning, unpinning,
so all this kind of stuff.
So it goes, I mean, in this case,
I was able to generate this amount of transitions.
And while it's kind of a, how to say,
not edge case, but Kinda, very synthetic.
But anyway, so it says that the overhead,
you cannot avoid it
and when the Jeremy Schneider
raised this concern about
it's not tracing he also said
that so the patch for
progress can be very simple
you just add two probes in
these functions that used for set and
unset weight events and basically
that's it
and then you created this
EBPF
yes yeah yeah
and you created this patch for
I wanted to say that even
if like you said for regular workload it's not 30 like 10% overhead observer effect it still sounds
super useful to me to use it in lab environment so if we reproduce some problem and study it not in
production but on some clone or something right it's like i'm ready to pay 10% 20% maybe even 30%
because it gives me an exact understanding of everything would happen right but of course it would
be good to have a very low observer effect and have it in production right
way. To be honest, I personally would pay even 20% in production just will instrument me to find the
problem really fast. Because once I spent, I don't know, two or three days finding where the
problem is, because there was some changing in the security configuration of the server,
and basically everything became very slow. And again, so I spent a lot writing custom tooling,
custom scripts to find out and go to the problem.
but if I had the really good instrumentation like we now had discussing,
so it took a couple of hours to find a problem.
And as the panel podders said in some of the comments,
also, I mean, was triggered by these pictures,
that the observability is investment.
It's not for free.
And for example, in Oracle World, you just paid, you just don't know it.
Right.
Because, you don't, you can, yeah, you don't,
you cannot have the Oracle that.
without the instrumenting.
So I think they instrumented kernel in version 7.
And you cannot strip it out and test it without it.
Tannel polar is just worth mentioning is a very like noticeable expert in the Oracle world.
Yeah.
It's great that also paths connected here.
Yeah.
And that's why I think the weight event analysis should be in the capabilities for that,
should be in the Postgreas score.
And two ways of doing it.
A small patch that you did with this problem.
patch I already forgot.
Yes.
Yeah, yeah.
But I wanted to go further and build the patch that will bring Oracle level of weight event analysis,
bring this system time model, the tracing on the backend level.
So without this, and at the beginning I was skeptical, and I thought that overhead would be kind of similar to this,
because why it should be really less.
But in reality, way less.
I have this patch in my GitHub.
And so last couple of weeks I was iterating it, making it better and better.
And I hope it will be ready to propose to a community just, I mean, today.
But today I was analyzing there was a commit in the master that breaks a lot.
Yes, they changed some memory management and basically breaks some path in my code.
And I need to rework some.
Yeah, and it's like now it's a heavy work of preparing PostGuard 19 and maybe like slightly later it will be less like super active in terms of cold changes I guess.
I wanted to mention not only one limiting belief was like, okay, it's not possible.
Now you think, okay, observer effect is too high.
And obviously it's not so high.
And I remember the upgrade and post resume in PGR Bouncer, I had so big belief it's it won't work.
but we see now how it works beautifully under huge loads and huge clusters.
So we just go and check, right?
Now experimentation is cheap with AI.
It's great to check instead of thinking it's not possible or hard.
It's great.
It's super inspiring to hear what you say.
Yeah, and back to the Tully, I still think it's very applicable in case you can, you,
I don't know if the patch, my patch or similar patch with a single.
idea would be committed any day upstream or you cannot afford new version so you will stick with
some vanilla Postgres without these new capabilities. Nice thing that it's fully independent from
Postgres. It runs like a sidecar so you can switch on and off. You do not need to recompile
Postgres. You cannot you don't need to change anything postgres. It just runs. You can start to
next to it on the yes, you need to self-host Postgres. You need to access to have access to the
Not route, but with elevated access rights for running EBPF, basically.
Okay.
And so that's a nice thing about the tracing.
As soon as it's pure tracing, it's not sampling at any moment.
It opened doors for very interesting things that are not even implemented in Oracle world.
And here we can beat Oracle in Postgres because it's tracing.
It means that we have a full history of transitions from
from one state to another state to the next state and so on.
And you can reconstruct all performance problems or clashes or bursts in your system,
not only in time, but also find the dependencies who started the problem.
Because even with a weight event analysis, with a sampling approach, for example,
you cannot distinguish between you had 10 short are your weights or one long.
With the tracing, you can do it, and you can find the session who started the problem,
because in a regular way, it's really hard to distinct the session suffered from the problem,
or it caused the problem.
So with this, I believe with the tracing, you can find the root of the problem and all the
consequence of it.
And with all these transitions, you can see it as the servers and can analyze this,
from the Qin theory perspective.
You can get the already really mature mathematician approach
and everything is already there.
You can analyze it how much you can serve
in a given period of the time.
And you can find the real bottleneck, for example,
with a regular weight event analysis,
you can see that a lot of IO, for example.
That's IO is a bottleneck.
But in reality, if you see from the tracing
that by IOR always gets the LV lock, for example, for any reason.
Even if you give, you would be able to solve the problem with IOR,
giving, for example, find the faster disks.
The I.O. would be shorter, but anyway, you would be locked by LV locks.
So the bottleneck would be just moved to the next stage.
This is what makes benchmarking so hard.
Yes.
And what you're saying, as I hear, I don't know all the details, but what I hear is there is,
mathematics like apparatus basically which could help here and analyze bottlenecks much better and
understand real bottleneck reliably that's all amazing I want this yeah so it's not at all so
yet not yet I haven more ideas that's semi-implemented so I'm just playing with it because
collect data it's not the even analyze the data it's not the last step the last step is
present data to the human to make it
consumable from human to understand this data, to make the action. So this is the process analysis.
There is a, again, already people spent a lot of time, really smart people thinking,
and they already have the really good approach, how to analyze the graphs. And when you move
data from one state, for example, tickets, when you move from one state, ticket to the open,
in progress, I don't know, test completed.
And same with the logistic,
when you need to find the bottlenecks on your graph.
And you can, with this math, you can find the patterns,
that the most frequent path is that,
and the bottleneck is this specific state.
It means that the, I don't know,
most of your backends go through the buffer pin, for example.
and buffer pin basically the limiting factor.
You do not need to even look at the dashboard with your eyes.
System, not LLM, simple code can do it for you.
We analyze the backends, traces.
Yes, everything goes through the buffer pin.
And so we analyze that the buffer pin and a system can make 100K buffer pins a second.
But your system demands 200.
So basically you need to solve this problem here.
So the problem is here.
It's a very deterministic approach.
It can give the report really easy.
You do not need hallucinations.
You don't need to look at your dashboards for that.
And so I think it opened doors for really interesting stuff.
And one last idea that I have and tried to implement already.
System can find the deviations.
For example, for query ID, the common weight event profile is, I don't know,
spent one millisecond in CPU, one millisecond.
second in I.O. And after that, sometime in the locks. But instead of 40 milliseconds, it took
50 milliseconds. And it's because it's introduced new weight event in between. So again, it's very
easy to, I think, to write the code to make this analysis in a semi-automatic way. So you just
can, like in Oracle, you open the one hour AWR report, but in the bottom you will have, so I made
analysis for you. So the problem is here. The query with a query ID, change the weight event profile
from that to that, and you have a problem with this, I don't know, disk or file or something like that.
So it's a kind of a project, but I think it's doable nowadays. Going back quickly, you mentioned
let's say, if we're self-managing using Postgres today, presumably with this tool already would
work for us. So I'm wondering what the changes in core would be that you want to make? What added
benefit would that have? Or is it mostly about getting it to work on managed service providers or
something else? From one hand, these two, I think, enable, I hope would enable a lot of debates
to make the perfect weight event analysis. But moving these capabilities to the Postgres
core will reduce the overhead from one hand. Second hand, that you will have it in vanilla
everywhere. So you would not need to bring any tool. You would not need to have access to the
box. Because not all Postgres debas have it, right? You have interface, maybe SQL interface or something.
Yeah, yeah. So it's definitely going to be available through the SQL views. Yes.
But one question, we haven't started any discussions in hackers yet, right? Yes.
So this is the interesting point
what will they say
because
Yeah, and why I
still want to make this
patch at least from the first look
really solid
because I expect a lot of pushback
because it's a kind of a consensus
for some reason
when I talk to on any conf
on any place to the
production DBAs
everybody says yes we need it
no dups
but when you talk to the people
who has
some weight in the community who can really promote something, they, how do I say it?
I would say they spent less with a real production issue, less time with the real production
issues.
And that's why probably they pushing back for any changes in core that can introduce the
performance degradation, even if it brings a lot of new, I don't know, tooling for DBS.
Because it really depends on your perspective.
I look at the Postgres as the production DBA.
That's why I think I would pay 20% overhead, so I don't care.
Just give me a tool.
It's already so you don't need to go to them.
As you said, you're already working with this like 10, 20% or something overhead.
Yeah.
But a quick question on the tool as it is,
it's on the GitHub repo.
It says version 0.8 is the last one I saw.
Are you working towards what does 1.0?
look like what is it reliability thing is it like a more tests thing what's missing that you don't want to
call it 1.0 yet yes i still in some tests it gets out of memory so seems that there is a i mean
like so yes i need the the more tests increase the quality of the code and i want to get to run it
on real box not virtual a machine from some cloud provider on real box to make you know to say to
people so it's if not production ready but at least it's safe to run somewhere on the test
environment because nowadays I still cannot say it for sure but again I thought I was really close to
the patch to make this ready for community to start a conversation I know it's a really long journey
and I wanted to start this and as soon as I had a feeling that I'm really close to it I invested
all my time last couple of weeks in the patch and I didn't commit anything to this tool
that's kind of opposed but I will get back to it for sure but yes they call it a couple of questions
for those who are interested in trying out earlier first of all what are best use cases for it
in its current state not in future state and second question is how to start quickly just clone
repo and that's it yeah I hope at least I try to make the read me up to date and it should be
sufficient just to go through the read me to compile it and just run it. I tailored it for my use
case. I mean, for my it today life. And I hope that it's a good example and people should be
able to use it just right of the just after the cloning. But I obviously see, for example,
next big benchmark actually probably should work, worth using it to see how it helps to understand
what's happening with that benchmark. But maybe.
what are the other cases, like some troubleshooting of complex situation and production
and try to reproduce it on a clone or something else?
I think if you, for example, you have a problem and fortunately enough, you can reproduce it,
but you just don't know in some isolated environment.
You just don't know what happens during the problem.
You can run the tool and it would be just look into the problem with microscope.
It will track everything, it will show everything to you.
So I think at this moment it's good enough for that type of usage,
investigating some problem in isolated environment.
But if you're brave enough, you can try it on some close to product,
to some load in production because you just can kill it.
Brave enough or desperate enough.
Sometimes you have a problem.
You cannot reproduce and let's go to production if it's self-managed.
In this case, someone goes and needs it in production
because otherwise it's not possible to understand what's happening.
Do you recommend to attach to only one backend or to analyze all of them?
How is it better to use this tool?
I built it with the trace everything, including the auxiliary processes,
these Osgress processes, because sometimes that's the problem.
And I build this tool with this in mind.
So just let's trade everything and have everything.
But it's possible.
When you don't know, yes, yes, yes.
It's a default mode.
It's possible to narrow to only one session or only one backend.
Is it possible?
Possible as well.
Yes.
Good.
So if like it's just less risk is lower if you decide on the only one backend, right?
Yeah.
Theoretically, yes.
Surface should be smaller is.
Yeah, that's great.
Yeah.
I think I'm definitely, we'll plan to use it in some next next benchmarks I will be having.
I had some benchmarks last Friday.
I just realized I should involve this tool.
But I'm going to revisit those benchmarks.
They are fully scripted.
So I guess I will just try and see what it will say.
It's about this Q in Portuguese tool.
I will definitely connect to you about that.
So anyway, like I'm excited to see this work.
It's very unexpected for me angle of active session history analysis
which brings, which actually dissolves one of the biggest concerns
about Ash methodology, which is sampling, right?
Pugister statements are precise, unless query ID is evicted,
or unless we talk about the part which is like canceled statements,
which statements don't see, but they are precise.
It's just cumulative metrics, counters, incrementing, like tracking all.
I really liked producer statements when they were created,
because before that we had only pre-PG Badger, we had PG-Fween.
It's a French name.
And it was P.HP script.
And it was based on sampling.
Everything related to logging is painful, and sampling is painful.
So, okay, we now have exact precise instrument tool to work with.
But then ash is impressive, great approach, but it's sampling-based.
Now you say ash can be precise.
And this is like New Horizont opening for me.
I'm super excited.
Thank you for making this work.
It's great.
And when I, working on this tool and working on the Pigeful Progress, I understood that there are some, how to say, gaps in the attributing time to the queries that's not really, I would say, straightforward to understand.
For example, the client reads the weight event that we see really often, but is it really database time or it's still client time?
It's not that simple and easy, but even if you go deeper, so there are, for example, you, you know,
You ended the query, I mean, you open transaction, begin, you made some changes.
The last update finished.
And you issue commit or end.
And really, you need to attribute the time and ways to end to commit.
I think it's default behavior.
But also even after commit, there are some work that progress does, some cleaning up.
And from EGSTAT statements perspective, it's already not there.
and some part of the work,
it's kind of unadributed.
And it goes to nowhere.
It's not really big amount of work,
but still, playing with this precise tool,
if you're curious enough,
so we'll get you to the next level of understanding
how Postgres actually works to the internals.
Really interesting.
I really like for that,
because I never thought about it
until I start working with it.
And it's true that not only you can understand
like exact sequence of weight events for this particular load but also you can map it to source
code right of course nowadays lLM can explain it to you so it's there is not it's not a foreign
language anymore so you can speak to it you can really easy to understand which opens doors
for more optimization of post goes itself i hope so yes yeah like usually with gdb or with perf now with
this tool also it's possible.
If you have some pathological workload,
you reproduce it in synthetic environment, let's go.
That's great.
Michael, you wanted to add something, right?
Oh, only when you were talking about unexplained overhead,
it often shows up in explain, analyze,
when you look at the difference between the execution time
at the end of the query plan
and the actual total time of the, like, the top-level operation.
You almost always see a small discreet.
there as well and not accounted for anywhere, which is, yeah, the same thing you're talking about, I think.
Yeah, a few percent there and there and you lost 10 bucks.
Yeah.
Well, yeah, I'd like to echo what Nick's saying.
It's really interesting work that you're doing.
And thanks for sharing it.
And also, I think you published the tool under the Postgres license, which is really cool.
Nice one.
Yeah, I personally deal with it.
So if you develop for Postgres, just and share it.
And what book on Q theory you are reading for quite long time, as I know?
Yeah, so it's a quite old one.
Because it connects me to my current work, like this tool I just released.
And I guess I need to read it as well.
I will send it.
You can put it to the description.
Great.
Okay, good luck with the tool.
Definitely.
I hope it will be popular.
And I hope your patches will be welcomed in hackers.
Yeah, we will see how it's going, how it goes.
Great.
Thank you so much for coming again.
Thank you.
Thank you.
Thanks a lot.
