Postgres FM - pg_flight_recorder
Episode Date: May 15, 2026Nik and Michael are joined by David Ventimiglia to discuss pg_flight_recorder, a new tool he created for monitoring a Postgres database from within. Here are some links to things they mentio...ned: David Ventimiglia https://postgres.fm/people/david-ventimigliapg_flight_recorder https://github.com/dventimisupabase/pg_flight_recorderSupabase https://supabase.compg_wait_sampling https://github.com/postgrespro/pg_wait_samplingpg_ash https://github.com/NikolayS/pg_ashpg_cron https://github.com/citusdata/pg_cronpg_tle https://github.com/aws/pg_tle~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
Transcript
Discussion (0)
Hello and welcome to Postgres FM, a weekly show about all things PostgreSQL.
I'm Michael, founder of PGMust and I'm joined us always by Nick, founder of Postgresair.
Hey, Nick.
Hi, Michael.
And we have a special guest, David Ventimilia, Solution Architect at Superbase, and the creator of PG Flight Recorder, which we're talking about today.
Welcome, David.
Thank you so much.
Good morning and I guess good afternoon and good evening.
It's nice to meet you.
Yeah, I think we've got all three bases covered.
Where would you like to start?
perhaps with why another tool in this area? What's the origin story? What's the motivation?
The motivation, as a solutions architect at Superbase, my job evidently is to help our customers.
And often they come to us and they say, I've got this problem. My database is slow or this
query is behaving weirdly. Please help us. And then we try to bring whatever tools we have to bear on
subject. And to be honest, often we're starting out with no idea what the problem is. And our beautiful
customers cannot be relied upon to, you know, relay all the information perfectly. And I just needed
more. I was telling Nick this at one point, you know, some people say less is more. I say more
is more. I needed more data. And I just wasn't getting it. And I did the usual thing that people would
do these days, version zero of this, maybe four or five months back.
in it was a rush job with our good friend Claude Code.
And I just cobbled something together to get the data that I needed for a particular customer.
And we got the data that we needed.
And we were able to get through that particular instance rather swimmingly.
And I was pleased by the outcome.
And I thought, you know, let's try to turn into something a little better and a little something more real.
As a sidebar, I think we're all, most of us are familiar with the excellent.
p.T. Weight sampling, which is an excellent extension, but sadly is not available on all managed
Postgres services, even Superbase.
Cloud SQL is one little one.
But even Subabase, on whose staff is Alexander Karatkov, who wrote it, but we don't
have that extension on the platform.
and there's, you know, there's some resistance within managed platforms to get new extensions added.
I mean, there's a rich and vibrant ecosystem for extensions, and that's one of the strengths of PostgreSQL,
but a weakness is getting those into all of the places where you needed it.
So I needed to, I needed something that was a poor person's substitute for PG weight sampling.
That's really how it started out was just a worse version of PG weight sampling written in SQL and
PLPGS, UL.
And then you know how these things go.
It just took off from there.
Yeah, there's a lot of unwrap here, actually.
Yeah.
Yeah, and I agree with you, starting from the end of your intro.
I keep saying, like, extensions are not extending us anymore,
because in reality of managed postgres,
we are limited by only the set of extensions which are present.
There is PG-TLE, and we should discuss that separately,
which is, I think, a great idea.
And actually you told me. It's a lot of happening also.
So let me tell my story.
I think every experienced post-guvies DBA at least once wrote this snapshot tool for PG-Stat statements and other PG-Stat views.
Because it's only cumulative statistics.
Some numbers are growing.
That's it.
And we need persistent storage.
Usually it's in bigger, for bigger clusters we have monitoring.
But I remember I was working with a really big company called Chuvi.
And I remember they had a great monitoring already.
Some clusters were on RDS already.
So they also have some like performance insights and so on.
Still I wrote my snapshot tooling because I didn't fully trust monitoring things.
And I also wanted to verify and some details were missing because they didn't capture all
metrics I needed specific metrics and so on. So meanwhile like there are some projects which
implement this idea but they are extensions so they are not available. PG profile I think there is
such a great PG profile. So that's why many many DBAs, maybe not all, but many DBAs
at least once wrote some snapshoting tool. And I remember our checkup tool was also snapshoting.
First versions of checkup tool it was a shell script. So with snapshot it, we had to be a snapshot
that you have two snapshots, now you need DIF.
And since we were in BASH, we didn't want to do DIF.
So we sent these snapshots back to do DIV on the observed Posg.
So I saw a lot of solutions which try to make this data persistent and have snapshots
and then show everything without the need to set up full-fledged monitoring.
And even if it exists still, sometimes we need additional lightweight solution.
This is one thing to warm up us why it's needed.
Another thing is that I guess SuperBase has a lot of clusters.
Some of them are...
We have a few, yeah.
Yeah, like millions, right?
It's quite unique, interesting story, and many of them are small,
and you cannot justify full-fledged monitoring, right?
And getting smaller, yeah.
Getting smaller.
Yeah, because people just experiment so much, right?
And they need so many databases.
Yeah, we are now, I mean, we have a skewed distribution, of course.
We have a few giant customers, but we have lots of medium-sized customers and many more
small ones and millions of tiny ones.
And now with the AI builders, we are shoveling millions of nano instances into the AI builder
furnaces.
Like, where are these databases going?
What's the long-term prognosis for these tiny databases?
Who knows?
Who cares?
That's a completely different animal.
But yes.
It's a very, it's a vibrant ecological niche that we've developed here.
Yeah.
On that note, who's this tool for?
Like, of that distribution, are there some where it's not appropriate for them and some way it's ideal?
Or like, where does it fit?
Yeah, that's a great question, Michael.
Again, and as Nick indicated, this is, you know, all these things, what time is a flat circle,
all these things will happen before and will happen again.
You know, versions of this have been written before and versions of this will be written in the future.
There's nothing really that profound about this.
But this is the tool that I needed right now at this time for the reasons that Nick just described.
Among which is, you know, we have, who this tool would be for, I think would be startups, SMBs, builders,
sort of the canonical super base customer, those who are starting out, you know, building a business,
building a backend, building a project, they need PostgresQL, and off they go.
You know, we at Superbase, we don't, I don't think it's any big surprise.
We don't really have that many migrations onto Superbase.
I mean, we would love to have more than anybody who's willing to bring giant workloads over to Superbase, come on over.
But we know that databases are infamously sticky tools anyway.
People don't really migrate that often.
And they're probably less likely to migrate giant workloads over from Oracle or Microsoft SQL server to Superbase.
Although we are entertaining that option.
But if we did do those things, those folks would probably come over with DBAs, database experience, database expertise.
So our sort of customer portfolio doesn't really reflect that.
What we have are, even our largest customers, I would say tend to be, I mean, they may have
three or four or five years of experience with Postgrescue L now by dint of hard effort,
but they all started out small.
Every one of our large customers was a little acorn that grew into a giant oak.
And we try to make SuperBase easy, and we do.
It's certainly easy to get into.
I've used this analogy too many times, but it's like the car dealership.
You can drive it off the lot in five minutes, but actually operating it, especially at scale,
is something different.
And we, Nick knows this, Michael, you know this.
We would all benefit for more and better automation.
And it's coming.
It's coming from within the community, and it's coming from Superbase.
We will be able to help these customers more seamlessly and operate their databases in the future.
But right now what we need is tooling to help customers as they grow and as they scale.
So in a nutshell, like who's this for?
People who are not DBAs, not database experts, they just want to run a business.
They want to grow that business.
And they want some tools to help them.
That's it.
Some of them start with some small, very small database instance paying 25 or some very low number of bucks, right?
and it's hard to justify paying right away $150 or $400 or $500 or $500 for monitoring a full-fledged solution
and then you need to spend time there and so on.
It cannot be justified easily.
And also, I wanted to mention it's quite elastic.
So if you just inject this tool inside your post-guess database, it starts collecting inside,
like self-observed.
Yeah.
And you pay a little bit for those megabytes per day.
I don't know. I think since I helped with storage to rewrite it, it was like it's quite
efficient and I again used this approach for PGQ rotation of partitions and truncate, so it's
very efficient and so on. And I'm just saying it's like a little bit, you pay a little bit
and it's self-observed, right? And when I was thinking what a person comes self-observed versus
externally observed? Ideally we need to have both actually because you cannot understand
all agent right to RDS or super base machine.
So if you observe it outside with external monitoring tool, if something bad happens,
maybe you don't have connectivity, right?
While this thing sitting inside it still keeps observing, right?
That's right.
At the same time, if everything is down, you don't see, you cannot reach the data, right?
So like external tools also have benefits.
They have both pros and cons if you think about it.
It's interesting.
So in my realization, even bigger clusters should have maybe a small like this black box
or flight recorder, right?
While we have full-fledged solution outside, they're both like remote telemetry and something
internal, right?
Yeah, that's right.
And I think I landed on the name PG flight recorder.
And then at some point I think it had some reservations because I thought, sure, but
you know, in the event of a crash, then maybe the data aren't available and it's not really
that useful.
But then, I mean, it's not.
If an actual airplane crashes, then that airplane also is not really useful either.
To find it.
That airplane is dead.
No one will be using that.
Just a side note, I just learned David has PhD in astrophysics.
So this name is not a random thing, I guess, right?
And a master's in aerospace engineering.
But at every turn, I was trying to do something else.
And I was trying to get away from computers.
And I just kept getting sucked back in.
But I grew up in the 70s when it seemed like airplanes were
crashing all the time when they weren't being hijacked.
Mercifully, that doesn't really seem to happen all that often.
But I think, I'm not a pilot, but it's my understanding that actual flight recorders are
useful for far beyond crash investigation.
They're useful for optimization, for troubleshooting, like in-flight incidences.
And so I think the nature of this is hopefully a little bit more like that.
Nick, you would know better than I would, but I have the feeling that in reality, databases don't
really actually crash all that often.
what they do is they exhibit behavior and we want to be able to investigate that behavior.
And that's what this, this helps us do briefly about the tool itself.
Again, all it really does is it takes snapshots of weight events.
That's how it started.
It's like PG-Ash.
We developed it in parallel, actually.
Yeah.
So when I told David that there should be something small which self-observes,
they just sent me a link.
It's already done.
It was interesting that we had parallel courses of development of PGA and PG flight recorder.
They're very similar in this case.
Yeah, very similar.
And like that, it captures active session history, weight events, nature of pours a vacuum,
and idle hands of the devil's workshop.
You know, with the tools available, I couldn't resist the urge to just keep pouring more into it.
So it records lock activity and check pointer activity and background activity and IOS stats and statement stats.
and config changes, right?
Config changes as well.
And hopefully the conjecture is that there's some value in,
if not capturing everything,
having an opinionated and curated set of many things
that are captured simultaneously in a correlated fashion
so that maybe you experience a checkpoint storm.
And then you notice that there has been a config change recently
and you're able to bring these things together.
That's the idea.
As Nick indicated, it was, when we talked about this,
version zero was done.
Again, it's a pretty simple tool.
I had a few guiding principles, one of which was,
sort of the Hippocratic Oath, try to do no harm.
I put a lot of effort into making this safe to run.
Statement timeouts and so on, right?
Yeah, statement timeouts, circuit breakers,
graceful degradation of some of the components,
dozens, too many configuration settings, but then configuration profiles that capture those to make it easy to use.
I think I got three quarters of the way there or maybe 50% of the way there, but there were still some improvements to be made,
among which the storage engine, which we can thank Nick for rewriting using PGQ, or essentially the engine that is part of PGQ.
Correct, Nick?
Yeah, it's like partitions, rotation, I think daily partitions.
there is also a roll up for all data to have it less precise, not raw, but aggregated.
And everything is already implemented.
And I remember I was brainstorming with cloud code, like what kind of storage we should choose.
Because I think originally you used a lot of JSON, right?
It's quite bloated, in my opinion, sometimes.
Well, it wasn't JSON, but I had originally, I was, you know,
I was using skip locked and unlogged tables in a vein attempt.
attempt to mitigate dead tubals and bloat, but if you're not diligent, then they're still there.
And so that's why you can rewrite the engine.
But also like data format is interesting.
And I had multiple ideas and I have some like brainstorm document where like thinking what
to choose and some ideas were compressing data quite a lot.
But it was hard to deal with because it was basically encoded so much that it's inconvenient.
So I did like trade of choice.
It should be human readable even in raw form.
Although I did apply it some tricks from PGS as well.
Like timestamps are relative to, I think, I don't know, 2020 or something.
Like Unix timestamp bites shifted.
So we have capacity until the end of century.
And you have as few bytes wasted as possible, like very compact way.
Plus this PGQ style rotation.
And also like worth mentioning, it's working.
like there is like soft requirement pgcron it's not requirement but it's very recommended because
this is how it's ticking as well right that's correct so again it is it's a simple tool it's two
sort of packages two simple install scripts two schemas one of which is required one of which is
optional the part that's required it's the data model the tables and the views and the functions
to record those data and then the other optional piece is a set of functions for analyzing
those data, but again, they could be analyzed in raw form in whatever way you like.
But as Nick indicated, somebody has got to generate the ticks.
Somebody has got to force the samples.
And that could be PG cron.
It could be something like an outside scheduler.
Somebody's got to do it.
The sort of default ways with PG cron.
And PG cron is available everywhere.
Yeah, PG cron sort of snuck in before the sort of iron curtain started the drop on extensions.
Maybe. So it's in a lot of places at least, you know.
I, I, yeah, maybe that's true. I got the impression it solved such a useful problem and was from such a
reputable author that I think people trusted it and also thought it's simply enough that we can
maintain. I understand why managed service providers don't just offer any extension. But if you
think about how much work it would be to maintain PG-Gron if the author ditched it or, you know,
actually it's not huge and it's so useful.
And it should be in core.
That's all of it.
That's true.
I wish PG-Cron was in core.
And we would have, for example,
automated new partition creation out of the box
without any extensions.
It's magic.
So simple thing actually, right?
But the one thing about PG-Cron
from a database perspective is it is,
I think, once per second is the lowest you can schedule.
So what do you use, David, when you're using this with people?
Do you use PG-Crom with like a one-second tick,
or do you suggest like something else?
So far I have used it just with PG-Crone, the resolution that has been sufficient so far,
you know, because with the customers that I've worked with,
the resolution before has been, I guess, infinite,
as in they didn't have this at all.
So it's just worth having the data.
And a finer resolution, I haven't encountered a demand for that or a need for that yet, although certainly plausible.
But again, yeah, that has worked well so far.
And again, it's just a simple tool.
The idea, the objective anyway, is sort of a set it and forget it.
Install the tool, then forget that you've installed the tool.
But because it's safe to run, by virtue of Nix,
hard effort. It is safe to run. So just forget that it's there. And then you have an incident.
And then you think, oh, wait, I have PG flight recorder. Let's find out what happened.
And just point your eye to the data, maybe a dump of that data or something. And that's it.
And this second package you mentioned, it has interesting functions, like what happened at or
something, right? It's based on function names. I see already your thought, oh, AI should guess,
right? You designed it for, so it's self-explanatory, right? So it's great.
But I also wanted to, about PG-Cron a little bit, version 1.5, as I remember, the lowest resolution once per second, and I use it for PGA.
But I guess you use it by default at much less frequency, at much less frequency, especially for Azure data, right?
Maybe once per 30 seconds or 60, but it's tunable, right?
It's everything is tunable.
You know, default sample collection is, I think I haven't said to once per second, but there are, there are,
that are taken at a course of resolution.
There are roll-ups that are taken in a course of resolution.
Data are archived at a course of resolution.
Then there's retention for the core tables,
is I think that my default is seven days,
for the aggregates of the seven days,
and then for snapshots, I think it's by default 30 days.
But all of these are configurable.
So there are a few different cadences that are happening.
And there is, again, as you indicated,
On the analyzed side, there is a wall of functions appropriately named, meant to be understandable by humans and by AI alike, so that they can use these functions to analyze the data.
But then, of course, it's always available to be analyzed in raw form as well.
Yeah.
I can share some interesting story from PGQ about function names.
So when I was dealing recently in library, client libraries for PGQ, multiple times cloud code.
made mistake because there is a function forced tick, but it's not ticking, it's just shifting
this pointer. And then you need to run ticker in a separate transaction. And Cloud couldn't get it
because it's confusing name, actually. And it made mistake multiple times developing this.
And this, I had huge flashback to 15 plus years ago when I made the same mistake manually without
AI because it was also confusing to me 15 years ago. So I just renamed that function to
force next tick. So you need to understand this is about next tick. You're not doing, you're just
preparing this job. And looking at your functions, what happened at incident timeline. I'm just thinking
this is life explanatory, like maybe long, but everyone will understand what it is for. So it's worth
making long. Yeah. That's right. And it's and if there are too many functions, then again,
we can use AI to paw our way through and figure out which ones they use. Just the final thing.
I mean, it's meant for a few things, not just incident response, but also capacity planning, blast radius evaluation.
You know, again, where I intend to go with this is just getting back to Superbase briefly.
I have lots of customers that I have to, I should say, I'm blessed with helping, but so many of them I want to get too early.
It's, you know, again, we talked about this, but all of these databases are small.
They start out small anyway.
I would say from a certain point of view,
from the point of view of scale,
many of them are sort of doing things wrong,
but that's okay because they're small.
You can do, with a small database,
you can do everything wrong and it's fine, no problems.
But it's when you start to scale that you need to think about this.
So it's like exercise.
It's something you have to get into the habit of doing it early and often,
even though you don't want to.
And maybe you would benefit from,
a personal trainer and some encouragement to get you on the path early so that it pays dividends
when you're old like i am and when you're a big database like these some of these eventually will
become i wanted also to mention like by default it's consuming up to a couple of gigabytes for those seven
days right but again it's tunable if you or less yeah it's tunable it's a few gigabytes
nick i think you and i we benchmark this i think we don't remember with a new storage engine i think we
estimate maybe for like under on the happy path maybe I think around like 20 gigabytes for the
month it depends there's some there's some yeah one of my goals is to sort of draw more attention to
this so that I could get feedback and improve it and there is some low-hanging fruit to be plucked in
so far as data retention for yeah I think it depends on the like how many queries you have in
and PG's statements by default up to 5,000.
And also, like, you collect data about indexes and tables,
so how many tables and illnesses you have.
This is, and I think you have limits there,
but still, like, it depends a lot of cardinality of these things.
And about use cases.
I used it recently for benchmarking PGQ.
So I just, and for me it's so natural.
I had multiple already projects like this, and I just, okay,
I injected both PG-H and PG flight recorder,
because PGH has more frequency and more details about SASH data,
PG flight recorder brings a lot of stuff, right?
So I just injected it into some synthetic database as provisioned with PG-Crone configured.
And I just asked AI, of course, to do it, right?
So just inject it.
And then don't forget to dump after each run.
And then visualize it.
That's it.
Only three sentences.
Yeah, exactly.
And this is how I created a beautiful looking.
I actually asked to animate benchmarks because it's great to look how lines go.
And this is what brought PGQ good attention because this data is easy to understand.
So for example, how much wall was generated, right?
A lot of stuff.
What was the behavior of checkpointer or auto vacuum and so on?
That was going to be my next question actually on the wall front.
You mentioned a few minutes ago about how you originally went with unloged table.
does that mean these are now logged and there is wall generated?
It was my decision.
It was my decision to say that, first of all, important limitation of all those tools which
are ticking on PG-Cron and write something and it's only PLPG-SQL.
It's primary only, right?
But we live in this strange situation for me, old DBA, when a lot of clusters are single-node.
I have even cases clients are coming like 10 plus, like 15 terabytes on single node and they are fine.
cloud like resources became quite relevant.
For some, actually, I think it's okay because backups matter more than H.A.
Because they are fine to be done, but not to pay for additional couple of notes and so on.
So anyway, this is primary only because we cannot write.
Yeah, can I add something to that?
Because you say single node, but I think that's slightly simplistic because a lot of ones I see,
they're H.A., but the replicas are not real.
They're like the failure of a replica.
shadow standby node.
I wouldn't call that a one-node cluster,
but it's still you only need to monitor the primary.
You're right, actually.
You're right.
But in this case, you are not interested
because you don't have any workload
on that hidden standby, right?
Exactly, exactly.
So you are interested on the primary,
and we see so many projects
reaching dozens of terabytes already,
which single node.
You inject it, we need to understand,
okay, it's self-recording,
so it's going to produce some rights.
If it's on log table,
if it crashed, it's gone.
That's the key idea.
We cannot...
At least the...
Where the data lands initially, those data would be gone, right?
When they were on log tables.
You need to snapshot, you need external means.
So to understand the incident after crash, we should use regular tables.
And when we've redesigned storage, it's not so super expensive.
Of course, there is some wall to be written and some data storage to be paid.
And of course, a little bit of shared buffer.
are occupied by our data.
If you have replicas, it goes to replicas.
Maybe it's not a bad thing because if it's half of primary,
we can pay this price.
What are we roughly talking?
You mentioned a few gigabytes up to maybe 20 gigabyte,
like that kind of amount for storage.
What are we talking about in terms of wall generation by default,
just to give people a rough idea?
I don't remember.
I thought about baby clusters like one gigabyte once,
like three tiers up to one gigabyte, right?
So I thought they should afford this maybe with a little bit tuned to less frequency or something retention wise.
Are you talking about storage now or wall three?
Both, both.
Okay.
They are connected, actually.
If you need to write.
You mean on super base?
Anywhere, any postgres.
If you need to write 100 megabytes to storage, you will produce like very roughly.
You will produce kind of close to 100 megabytes to wall because this is the same data.
Yes, in different form.
but it's the same data, right?
If you need to write 100 times more,
expect 100 times more of wall.
Yeah, order of magnitude, it would be about the same.
Yeah, very roughly.
Of course, like full-page rights,
all the compression, but it's very different.
I also think of them very differently
because with war, I think of it as like megabytes per second always.
It's always like a time component, if that makes sense.
So it's like a constant amount that we're generating,
of course over a month or whatever it is, that's a few gigabytes.
But I guess that doesn't actually add up to very much per second in terms of, yeah.
I think we should expect something like 100 to a few hundreds megabytes per day
with all a lot of queries and indexes and so on.
Yeah, if you like ballpark math, if it was, if it were 30 gigabytes of data per month,
that would be roughly, I guess, by the power of arithmetic, maybe a gigabyte per day.
It's very stable.
And by 3,600, you could figure out megabytes per second of wall generation.
Kilabytes maybe already, right?
And it's very stable because it depends only on this cardinality.
And if you have some spikes of foreclode, it doesn't affect the amount of data,
these snapshots right, to wall and data directory, right?
Yeah, it's a baseline.
Yeah.
So just, you know, you're paying maybe $25 to Superbase, maybe pay $26,000.
and just pay for a little bit of more storage.
Not hundreds more as you would pay if you install full-fledged monitoring.
And if it's only, if it's to first order for the primary node,
again, all of these small projects are starting out with only one note anyway.
I mean, life is complicated and these people are just trying to get like a business off the ground
and a job done.
they're not thinking about multiple nodes, especially early on.
But they, you, I mean, we three know that they will need data to guide them on their journey.
So this is just part of that.
I mean, Michael, you work in the observability space.
We sort of dilate this out to a wider view.
This is just another entry in the observability space, like maybe a new generation.
0.5 of observability tools for PostgreSQL, but, I mean, and there's PG-Ash,
there's so it's not only about observability I see actually the word new kind of breeded
you used I think in our discussions I so I have PGh you have PG flight recorder I also
trying to revive PGQ in this very format PG crone and POPGQ only that's it so it can be
installed anywhere and just tick I also have index which is not yet released which is
rebuilding indexes on PGCron that's it you can inject PPLGQL and tick on PGCron
That's it. It's super simple. I already think about tool for automated partition creation
without heavy tools. I don't know. It should be easy to use. But then you mentioned PGTLE.
Yes.
So can you like maybe elaborate a little bit of why PGTLE? Why not just single SQL file or PLPG SQL fire?
It's both. So for those who don't know, TLE is trusted language.
extensions, which I have my own view on that. I regard it as just a little bit of extra
housekeeping that's associated with just a simple SQL install file. But they are, you know,
TLE sort of dress up SQL and PL, PGSQL code as if they were a sort of kind of managed extension,
but they can be installed without super user privileges. PG Flight Recorder comprises both
just simple install scripts, you can use PSQL to install it. But it also is available as a trusted
language extension that can be installed through, I think, DB dev, because that's how some people
want to be able to install. Just to track like metadata. Yeah, just so you can do, it makes housekeeping
a little bit easier. You can slide PG flight recorder in with an install, and if you don't like it,
you can uninstall it in a very managed fashion. So that's all that's meant there. But yeah,
It's a very, TLE, it's a very lightweight way to have managed extensions, and flight recorder offers that as well.
Yeah, I actually wish PG-Cron and T-L-E both inside Postgres itself.
Yeah.
And we would say something like create package or something, I don't know, and it's just a bunch of SQL and build PCQL code,
or maybe build Python if you want, like, anything.
And it just can be installed anywhere with versioning and so on, with like tracking of CVEs, I don't know, like, if there are,
and so on. Who knows? Yeah, but just like extensions don't feel like a part of extensibility of
postgres to me anymore. This is my honest like feeling lately. Well, it also what worries me is
who is testing, I mean with with major version upgrades, let alone minor versions,
who's testing all of these extensions? But this question is also applicable to any
regular backend code. You use some libraries. You just, you just,
just import them to your code somehow include, right?
And that's it.
And versions also matter there.
And it's on your shoulders, right?
It should be on your shoulders.
This idea that we are not providing some extensions
because we will need to maintain,
give it to shoulders of people, right?
This is different part of thing.
This is intention with products which telegraph or advertise
that were easy to use and you don't need to worry,
we will handle it for you.
Yeah.
But I think it's great to have.
flexibility and if people can use various like choose and and but they need to be responsible for
upgrades and part of maintaining as they are already for libraries and go language any language right
type script and so on so there is something here and i think it's edible yes guys who created
pg t lear right so definitely this project was created with realization that something is
limiting people here and let's bring something here that's a great idea
And I wish they learned.
Michael, do you experience that all with your customers?
Their challenges.
I mean, I know you work in a slightly different space, but you certainly must encounter
this as well, like tensions with extensions, with managed database providers.
Yeah, so I get the impression, so I don't speak to people all the time about this kind
of thing, but I get the impression that people are looking for a little bit of advice, almost,
from their managed service provider, on which.
extensions they should trust, which are like the best at what they do. You know, often there's
a choice of two or three and they kind of want their managed service providers to pick one and say,
you know, this is the one we suggest or this is the one we support. And I feel like there's a
little bit of that going on as well. So it isn't just, I think it is a little bit of, I don't
know if it's like king making or something, but like people saying this is the one of one using.
When people come to Postgres, for example, for the first time, they're like, which backup
should I use? Which monitoring? What's everyone else using? And there's no kind of like official, there's no, there's barely any kind of extension management systems. There's been about three or four kind of created. And I think there is still, is it PGXN that's probably the most used? But it doesn't have like reviews or it doesn't have, like, it doesn't have a lot of things people are looking for in terms of which ones actually used, which ones people do that people actually like, which ones have got a good track record when it comes to major versions or low, maybe no CV.
or very few reviews, you know, that kind of thing.
So I think trust is a big part of it,
and also people want a shortcut as to which ones of these should I be using.
I should agree.
Soapabases, this database.org, it's another attempt to have this register of extensions, right?
It is yet another attempt, which seems to be somewhat honored in the breach.
But it's yet another attempt.
But Michael, I take your point definitely that there is a need
and I would say a growing need for, if not kingmaking,
at least someone to offer guidance and to bless these.
You would know as well as I would like the sort of persona for database operators
that definitely does seem to be changing.
I mean, there once was a time when databases were an arena for people to sort of develop
and then project expertise, which is certainly true.
But more and more I encounter people, customers, superbased users, whoever I can be.
They will say to me or to us, I don't know what I'm doing.
I'm not a DBA.
Some of them will say, I'm not even a tech, I'm not even technical.
I'm a founder.
And I just vibe-coded my way into this.
And, you know, there's less of an urge now than there was in the past to sort of burnish your credentials as a database expert.
People are very happily, they're very candid.
And they will say, I am not a database expert at all.
So please, can you help us?
Can you offer guidance?
If you tell me what extension to install, I'll install it.
So there's a growing need for those kinds of tools
and for greater and better automation.
Oh, yes.
That can be a topic for another session.
That's where my mind is good.
And I agree, actually, with this authority, what's good, what's reliable.
Sometimes I have cases where we have huge postgres, self-managed postgres clusters,
And when I'm saying we should add some extension, I'm saying it's available on this and this managed platform.
So it's like it's reliable, you know.
Let's add it.
It helps me to speed things up.
So yeah.
I had a couple of last things.
I wanted to make sure, like, or it would be great to get your thoughts on.
One is, is there any, like, is there anything that we haven't talked about that you have one of your favorite features of the tour?
I've seen there quite a few in there.
And then also, is there anything missing that you really want to add?
There are, in reverse order, things that I want to add, again, I really want to, I would like to fortify this against observer effect, against deleterious consequences.
I already know that there are some important fixes to be made, and I'm committed to doing that.
I'm hoping, my tender hope is that people will use this to some degree again so that I can get feedback if there are.
if it needs to be strengthened, I will strengthen it. I will pour effort into it to make sure that
happens. So I wouldn't say there are features that I want to add. It's probably already
bloated in terms of features anyway. So maybe I'll just stop in terms of adding features.
In terms of favorite features, I mean capacity planning, Nick, you know, it is another
worthwhile endeavor besides incident management. And it's something that is sorely needed for
super-based customers. There are functions within
flight recorder to help you project your capacity needs.
Like everything else, those functions will be strengthened and improved,
but I really would like to exercise those.
I'd rather this tool is used to forest all problems
rather than to investigate ones.
Let's just not have problems at all.
You touch several things.
I wish we had a separate episode on which, like,
and actually I think like how we would approach incident response,
RCA with this tool, particularly.
step by step it's possible.
And also capacity playing, I agree.
And a thing related to observer effect,
one thing everyone who is using this new breed of tools ticking on PG cron
should remember that PG cron records logs, right?
And you need to clean them up.
And I think we need to team up and bring some pull requests
to make this configurable and make it Unixway, Linux way,
when everything is cool, don't say anything and don't log anything.
because it works, right?
Like some levels, right?
Like warning error level for each job in PG cron.
And another thing is this verbosity,
PG cron also depends on this frequency
and it can produce a lot of bloated logs right in Postgres as well.
Yeah, especially if we go to sub-second frequency,
which is just implemented for PGQ,
just running a single stored procedure,
ticking 10 times within one second.
Yeah, I don't know if it's needed for PG flight recorder.
Maybe not at this point.
It's too much precision, right?
Yeah, anyway.
Anyway, yes.
So there is a soft dependency on PG cron.
PG cron is also maybe a little overly chatty.
My kitchen wall clock just silently ticks away.
It doesn't generate a daily journal of the fact that it ticked.
Yes.
I would like.
This is what we have.
live with. So yeah. Let's create pool request. I can create or you can create and just support
each other. Maybe PG cron maintainers will agree that it's actually like I opened the issue.
I didn't see feedback from them, but it's definitely an issue. It was issue before we started
creating these tools. I have another places where PG cron was chatty in logs. Yeah,
anyway. And final thing for me, good place to start is benchmarking. If you just do benchmarks,
just inject these tools, PG flight recorder, inject PG flight recorder,
and ask your AI to visualize the result of that data,
like before destroying instance or something,
destroying database, just dump that data and visualize it.
It's so easy these days.
It is, yeah, the nature of these tools is changing,
and this makes it super easy.
That's what I learned as well.
But yeah, that's it.
It's not very profound.
It's not very complicated.
But we need more and better automation in the community.
This is just one small contribution.
Many more to come, I think, from you to and hopefully me and others in the community.
So looking forward to that.
Nice one, David.
And just to check, what's the license here?
It's as generous as I could make it.
Nice.
So open and permissive.
Yeah, exactly.
Super base style.
Super base style.
I work at Superbase, but this belongs to, if anybody, to the community.
Wonderful.
Love it to meet you.
You as well.
Very kind.
I appreciate it.
Thank you.
Thank you.
