Postgres FM - Bloat
Episode Date: January 26, 2024Michael is joined by Chelsea Dole, Staff Software Engineer and Tech Lead of the Data Storage Team at Brex, to discuss bloat. They cover what it is, why it's a problem, strategies for minimisi...ng it, and the options when it's really bad. Here are some links to things they mentioned:Managing your tuple graveyard (talk by Chelsea) https://www.youtube.com/watch?v=gAgbzvGT6ckpgstattuple https://www.postgresql.org/docs/current/pgstattuple.html pg_class https://www.postgresql.org/docs/current/catalog-pg-class.html Bloat in PostgreSQL: a taxonomy (talk by Peter Geoghegan) https://youtube.com/watch?v=JDG4bMHxCH8 It's Not You, It's Me: Breaking Up with Massive Tables via Partitioning (talk by Chelsea) https://youtube.com/watch?v=TafwSuLNxe8 pg_repack https://github.com/reorg/pg_repackpg_squeeze https://github.com/cybertec-postgresql/pg_squeeze VACUUM https://www.postgresql.org/docs/current/sql-vacuum.htmlautovacuum https://www.postgresql.org/docs/current/runtime-config-autovacuum.html CLUSTER https://www.postgresql.org/docs/current/sql-cluster.html HOT updates https://www.postgresql.org/docs/current/storage-hot.html Our episode on BUFFERS https://postgres.fm/episodes/buffers-by-default Our episode on TOAST https://postgres.fm/episodes/toast Our episode on index maintenance https://postgres.fm/episodes/index-maintenance Chelsea's website: https://chelseadole.com/~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PGMustard, and today I am joined by Chelsea Dole,
staff software engineer and tech lead of the data storage team at Brex,
and speaker at several prestigious Postgres conferences over the past couple of years.
Thank you so much for joining me, Chelsea.
Thanks so much for having me, Michael.
Wonderful. Well, I have seen several of your talks and loved how well
you've explained a few concepts. Now, one of those was bloat. And looking back at our previous
episodes, I realized we hadn't actually done an episode on bloat itself. We've done plenty that
mention it, that are around it. But I loved your explanations. And I'm looking forward to having a
chat with you about what it is, how people can think about it and some strategies around it.
Thank you. Well, you know, as a Postgres FM loyal listener, I've definitely listened to a couple of those, you know, bloat sphere conversations, let's say.
So it's nice to be addressing it more directly.
Yeah, awesome. So in terms of where to start, I guess
we should cover what is bloat? How do you think about it? So Postgres bloat basically occurs as
a function of MVCC and it's kind of extension of vacuum. So in MVCC, all the actions you're doing,
inserts, updates, deletes, those are all actually updates or editing metadata on a tuple, not hard deleting it in place.
And this is basically allows MVCC to be both compliant with ACID and still fast.
So it doesn't slow it down extremely slow.
When those tuples are generally hard deleted is through a vacuum, which, you know, runs every now and again, depending on your configurations and auto vacuum. But if auto vacuum can't keep up,
and you have lots of those inserts and updates and deletes, then you can get into a state where
your Postgres pages are basically bin packed with a bunch of dead or soft deleted tuples.
And that leads to bloat. So bloat is the state where you have these table pages that
are full of basically useless data, that vacuum is hurrying to kind of run around and catch up.
And so Postgres has to keep on adding new empty pages to the end of your heap, which leads to all
sorts of I'd say non optimal outcomes. So too long didn't read. It's an optimal tuple density in your pages.
Yeah, I really like that. So if we end up in a situation where a large proportion of our table, maybe including the indexes, is sparsely stored on disk, that has knock-on effects and that's referred to as a table with a lot of bloat i've
thought about in the past as almost the diff between the table's current state and if we
completely rebuilt that table with its indexes like the diff i i think for split i don't i don't
think it's the only definition that's acceptable because of like fill factor and things like there are other
technical details but i like it because i think it's quite practical and i um in terms of what
it means so what but why why is this a problem like when when have you seen this cause issues
and how bad are these issues there's a huge spectrum there and i would first preface by saying
i wouldn't call this a hot take but maybe a lukewarm take is that bloat is really not always a problem. You know, I think that a lot of people think of it as this, oh, God, our tables are bloated, what are we ever going to do? But there's plenty of situations where you can have a bloated table that's serviceable, and you're able to get it to a slightly better state without any drastic measures
but the main issue that bloat can cause which can lead to downstream issues is really increased io
and io is sort of the grandfather of all these downstream issues that nobody likes like high
latency on your reads you know it leads to because essentially, if you think of it logically, like if I have 10
rows across two pages, if I do a sequential scan, I've now scanned two pages, that's a certain
amount of IO. But if I have really bad tuple density, I've got table bloat, then maybe I'm
scanning 10 rows or 10 tuples across eight pages. And so I'm scanning the same amount of data, but I just had, you know, 4x the IO. So that can lead to downstream negative effects with reads, of course. And you guys talked
a couple times about, you know, explain buffers and things like that. So that's a really good way
to sort of see off the cuff, you know, whether you're using an appropriate amount of IO. But
as I said, there are places where you can have some amount of table bloat,
and it's not really causing an issue. I would say that where I look for it as the biggest long-term
issue to solve and really address is those workloads that are going to be very update
and delete heavy. Having some bloat on a normal workload, if your users aren't seeing effects on latency through the IO, I would
sometimes just say shrug, you know, the important thing is the end to end experience, and the long
term maintainability for you as a dev. Yeah, love that. And very practical as well. I've seen a
couple of a couple of talks on this. And you mentioned IO, but you mentioned it quite late
on. I'm like, Oh, yeah,, it is as simple as that, really.
Obviously, there are other bad things.
It's taken up more disk space, right?
It's taken up more space in memory,
but ultimately, user experience-wise,
that is normally what a lot of us are fighting with,
especially on busy systems.
So slow queries that users are reporting is a natural sign. How do you go from that to
realizing, oh, my problem is a bloated table? There are sort of two paths for that, in my
experience. The first one is before you ever know what bloat is, how do you discover bloat?
And I would say that usually the path is you have a really bloated system. And there's really no
other explanation for it until you go down those
paths and try to figure it out. For myself, just sort of going through how I discovered TableBloat,
thinking back, you know, I've worked, I would say, in Postgres at scale, the last two companies I've
worked at. Before that, with it more as like a passing just tool I happened to use. And I saw it
in great scale when I first started at Brex. And the way
that I actually saw it is that we had a developer reach out and they said, hey, you know, I have
this, let's say 50 gigabyte table, 50 gigabytes of tables total on this server. It's only one
database. And for some reason, we're like almost running out of disk why do i have so like why where's all the
space going because i can see my table size and then remaining table space on disk you know we use
rds on cloud to be able to see all that pretty easily and um i went and i said huh i actually
don't totally know let me check this out and when I went and checked it out, I could see that
there was just a ton of bloat. And here, interestingly, the bloat was actually coming
from a toast table bloat. Oh, nice. I know, which is like an interesting little side quest here
conversationally. But you know, toast is just other tables, you know, the oversized storage
technique or whatever fun acronym they made up with it. Those are just tables under the hood too, so they can also get bloated. So I'd say that was my first time needing to go through the entire song and dance of bloat that needed to be fixed through, I would say, like strong measures versus maybe just, you know, tinkering some little stats here and there. But I think that once you know what bloat
is, it's pretty easy to look out for it. Operationally, there's observability tools
and dashboards, you can instrument your own, I think a whole lot of DBAs have their own,
you know, private stash, or ideally GitHub open source stash of like fun queries they like to use
or extensions they have in their back pocket. And we can dive a little more into those if you want but um i think that there's sort of those
two paths there's the path of oh god what's going on and the path of once you've seen it you can
kind of pattern match yeah nice i like the system level like path in and i've i because of my
background and the tool i work on i see it more from the other direction.
So somebody looking at an incredibly slow read query that has no write being that slow and using explain analyze buffers.
And as you mentioned, when we say incredibly bloated, you could easily have, at least before Postgres 14, which had some great, one especially good optimization in it you could have an index
that was 10 times bigger than it needs to be if you let's say you rebuilt re-index concurrently
the result afterwards might be 10 times smaller so that that's a significant amount of extra
reads you might be doing especially if you're returning a bunch of rows so you mentioned like eight reads instead of
one but if you're looking at the last thousand rows if you're doing 8 000 instead of 1 000 you
start to notice that in latency so it's it's quite cool we in fact we used to call the tip
table bloat likelihood and i renamed it a couple of years ago to read efficiency
partly because of your like it goes back to your density question
again uh it's it's it's not necessarily bloat but it well it's not it's not necessarily about
the likelihood of it it's more about the efficiency of those reads and it might be a locality issue
which is somewhat bloat related as well so yeah love this great So you mentioned, and you've spoken before about queries to estimate and other techniques for looking into like getting more maybe accurate or depending on your definition of accurate measures of this. What's your advice on minimizing this, dealing with it yeah i guess the first thing to the to the measure of how to
identify whether you have bloat or how much you have and decide what to do with it the first step
there is really again kind of a trade-off of what matters to you does the speed and low system impact
matter to you in this sort of dba process of figuring out how much bloat you have, or does accuracy of the exact
percentage of bloat or the exact tuple count and being 100% sure that your stats are right matter
to you. And so again, kind of like I said in the discussion about how bloat can kind of be okay,
or not something you really need to deal with in certain situations, this is one where you
kind of get to decide your priorities. If your priority is 100% accuracy, or I would say also, if you have downtime
hours on your database where you can easily do this without any user effect, or if you have a
small system and for whatever reason, you happen to be bloated through your own write patterns,
but it's not actually that important, then I would suggest
pgstat tuple. pgstat tuple is a PostgresCountry module. And basically, it gives you some functions
that you're able to run. And they will basically do a sequential scan through a target table,
or I think through a target database, if you want to run all tables and it will return to the count of live dead tuples, free space map stuff, as well as a few other statistics.
And that one, I would say, is on the side of you have resources to spare.
You're not going to impact users because CPU does spike during this.
And there's no way to get around the sequential scan because you can't exactly index it.
The point is to actually look at the pages.
So it's always going to be a sequential scan.
The other option is through using, I would say, estimation queries.
And these tend to leverage internal tables already used by Postgres or kept up to date during the analyze process.
So these would be like PG class rel tuples and things like that will estimate the number of live dead tuples you have based on a sampling during analyze.
So before you run anything that's an estimate, you do want to run analyze right before.
But then you're able to guess a proportion based on a sampling of your table data.
This is a better option, I would say, for really high criticality systems
or extremely large tables.
You know, if you have a 500 gigabyte, 800 gigabyte table,
or even just one that's 80 gigs,
but is very, very critical for users,
and maybe you're already kind of medium on CPU,
you can't stand those resources,
then there's really no downside
to just using a table sample.
You know, there's no reason downside to just using a table sample. You know,
there's no reason that this number needs to be precise. Yeah. So what is alarming? Well, I guess
the answer is it depends. But you, in fact, I've rewatched your talk recently. So I know I'm
stealing from you there. But you've got some great rules of thumb on what you consider
to be bad on table bloat yeah well i don't think i
could possibly still it depends because that's just like a everybody in engineering should be
saying all the time but you know my personal rules of thumb speaking just for myself is that
on very small tables blow is not of a problem so a gigabyte, two gigabytes, that, as I'm saying, is very, very small.
Even if you see up to 75% bloat, you know, auto vacuum will be able to take care of this,
the total impact to your system, there's just no way that it can be high or significant,
you know, knock on wood. I'm sure somebody will comment with some counter example,
but at least in mine. Beyond that, I would say if you get to one to 30 gigabyte range, I would say 25%
bloat. So 25% of your total table space being taken by dead tuples is acceptable. And then as
you go higher from about 30 gigs, I would say you want to inch that acceptability downwards.
So I would say once you get up to 100 gigabytes, I would aim for like 18%. And then I would flatline at about 18%. I would never one important thing is you should never expect 0% dead tuples. If you completely rebuild a table, or if you fix every auto vacuum setting, or you've used an extension and repacked it, you still will have some bloat and that's okay yeah right like unless you unless you're going to show
off because you've got like an append only table with no bloat but if you've got updates and
deletes it's a it's a trade-off of the system right it's the trade-off of the design that
postgres has given us you've recommended this talk a few times i'm most of the way through
watching it but peter gagan's given another great talk that is criminally underwatched at 200 views
so i'm going to share that but it mentions some some of the trade-offs that Postgres is making in this area.
So it's guaranteed. I think as soon as we start doing updates and deletes,
it's guaranteed that at least past a certain frequency of those updates and deletes,
they're going to have some bloat. So yeah, it's a really good point.
Yeah, there's no wholesale way
to avoid it it's more mitigating the downstream effects and making sure you don't accidentally
look up and realize you're in a crazy situation in a couple months or years
where did i'm curious though i understand totally kind of as you get larger data volumes trying to
aim for a smaller percentage bloat because like in real terms that's
that's more gigabytes of bloat or more pages that could be being scanned through for reads and things
like that but where it did 18 come from like trying to get it lower than that like fighting
some update heavy tables or where that seems quite a specific number to me for like a thumb
like rule of thumb yeah Yeah, great question.
And actually, this is the first time I've been asked this question, which I agree.
The first answer to why 18% or why these numbers is, as someone who watches Postgres talks
and as somebody who takes it in, I have to admit, there's part of me that gets annoyed
when nobody gives real numbers.
And I realize that it comes from the reality of it
depends as we already kind of espoused on, but I really wanted to provide something concrete.
And so when I gave those numbers, I thought about my previous projects and I thought about, okay,
well, what happens if I completely rebuild this table or I repack it and don't insert anything?
Usually you would see a table blow at around eight to 10%
then still, at least in my experience, because if you're rebuilding a table, at least if you're
using extension like pg-repack, for example, or pg-squeeze, you know, it's still going to need to
kind of build up and write all the live data into it as it's rebuilding the duplicate table.
So you're not going to end up at like 0% unless you,
I believe, I assume, unless you, you know, vacuum full. And so 18% came out of, I'd say, like
opportunism of what I wanted to provide. And also just on experience of when I started to see a
tipping point into performance. And when you start to get far enough that you can't really recover it
without rewrites. So the reason I would say 18% is not because 20% or 25% even on a large table
is the end of the world. But because I think once you start slipping into that more bloated table
space at large tables, it's harder to recover from. And it needs more drastic measures than a little tuning
here and there. And also, if you imagine you have a, you know, one terabyte table, which I know that
I've had before at companies, then if you get to 20% of dead tuples, you now have, you know,
200 gigabytes of just bloat. So at scale, you know, that becomes money, whether you're in the cloud or not,
you're paying for that disk, that's kind of another downstream negative effect other than IO.
So it's kind of it's, it's money, it's personal experience. And it's also just wanting to put a
sticker and a number on something. So you have a target to aim at. Yeah, I really like that. I
agree. It's difficult without any real real numbers but it also like you could have
you could have said 15% I probably wouldn't have questioned it it's just more it's more practical
it's more likely to be based on your real experience which I liked a lot cool so I wanted
to come back to the you mentioned toast and the acronym or probably back when him have you've got
any of course that phrase yeah that's a really fascinating case that i don't
think i've not seen i've not spoken to anybody about but it makes like intuitive sense because
because of the nature of toast we could have potentially large values across multiple pages
but i i would have guessed they would be more easily reused but i'm not but that's a total
guess i've yeah did you have any more
details on that well i i'm sure i'm sure i'll get some of this possibly wrong but i think
actually it's surprising that we don't see toast in more situations upload because when you're
updating a value in toast to my knowledge it's not able to be as intelligent as other update methods can be.
So you're actually replacing or updating more often than not.
So Toast tables themselves, as I guess a background in case anybody listening is like not familiar,
what happens is if you have an extended field, you know, you have extended and non-extended data types in Postgres. If you've got an extended field to things such as JSONB
or, you know, VAR cars with, I think, longer limits or byte A,
then all these are examples when they can go over
the maximum single tuple size.
If they reach that threshold,
then Postgres will compress it through some method,
depending on your Postgres version,
and they'll just store it in a separate table called ToastTable.
And in the main table, let's say you have a user's table,
then that's basically just a pointer to this ToastTable.
So in the example I mentioned before when I ran into bloat on the ToastTable,
the reason you would see this is maybe you have a small table even.
It could be a five-gigabyte table,
but every row in this is extremely you know, extremely large data types.
You know, I'm sure we've all seen those tables where there's five columns and each of them
is like a massive JSON B value. And they could be updating those all the time. You could be doing
field updates, you could be doing anything. And if you get up to a certain TPS on that,
then every single time you do an update, it's going to be rewriting that value in the toast
table. And Postgres does look at toast, as far as I know, like just any other table. So auto vacuum
is running it on it the same exact way. You know, I would talk on partitioning, and I kind of say
the same partitioning to me is like, UX, it's, it's DBA UX, we see it as one table. Postgres just sees tables. And same thing with Toast.
And so in that case, we had a small table that was just super high update for those very large
fields. And correct me if I'm wrong, Michael, maybe you know more about this, but I think that
the nature of how Toast does those updates is they're not able to do, I think, key value updates in Toast fields.
They have to always rewrite.
Does that sound right to you?
Yeah, well, it sounds like any other value in Postgres, right?
It's a single entity.
So even in a large text field, without it being Toasted,
if you had,
if you pasted in the entirety of War and Peace
and then only changed one word,
but it was all a single value,
I believe we'd need a new tuple with,
well, bad example,
because that one definitely wouldn't fit in a single page.
So it would be toasted.
But yeah, if you,
let's say you took a one kilobyte chunk of text
and changed only like a few characters in it, I think it would work the exact same way.
So, yeah.
The only caveat to that, I believe, is that when you toast something, toast an object, you toast it in chunks of a discrete size.
So if you did toast War and Peace, then, you know, it would be, you know, you would run into the tuple length
issue in toast as well. Toast doesn't have a longer, you know, maximum tuple length. It's
just that it will chunk it via some algorithm, compress it. And then so war and peace will
actually be a pointer to, let's say 10 tuples, all of them compressed. So I believe that when you
edit one word, you know, you fix your spelling mistake,
you know, Leo Tolstoy really has to go back and fix that. Then when he retoasts it, then he has
to recompress and rechunk all those 10 values again, not just the one. So I think it might be,
you know, we're figuring this out really talking about, I think it might be something that scales
the issue with depending with increased
size of the object itself yeah i'd love to hear from anybody who has dealt with this um or looked
into or written it sometimes we get some quite in-depth responses which we love the question i
guess is let's say 10 let's say we've got a something that's spanning tens of kilobytes,
so multiple pages.
Your eight-page example might be good.
And we rewrite it so we get eight more pages.
The question then is what happens to the eight previous ones
once they get marked as dead.
And if they could get reused easily, we shouldn't get too much bloat.
But if they're getting, I guess if auto vacuum can't keep up, that's when it accelerates.
So it would make sense in a system, for example, where autovacuum isn't able to keep up with that toast table for some reason.
Or maybe it's been disabled on that table.
Yeah, totally.
That can also be a place where you need to tune autovacuum max workers higher.
Because autovacuum max workers defaults to three.
And this doesn't really have an impact depending on your table size.
It has an impact depending on your table count.
So if you have, I would say, more than hundreds, so getting into thousands of tables, that's
or and many of them are large.
Maybe the AutoV auto vacuum worker takes
a long time on one table. That's where you're going to want to start tuning that up, giving
extra resources on your server over to vacuum compared to servicing queries, I would say for
the greater good. Yeah. And just to be clear, when you say like hundreds of tables, we're counting
each partition as a table in that case yeah yeah makes a lot of sense cool
so anything i haven't asked you about that i should have done or any other tips for this
let's see i think the only thing we didn't get around to that i think i would want to make sure I shill is some more details about how to fix a loaded table once
you see it. So if you have discovered you have a loaded table, let's say that you're above that
metric, that rule of thumb, maybe well above it. Like I said, when I found that toast table,
it was like 95% bloat. So that explained a lot. And I would say that there's at that point, there's a
couple of routes you need to go down. The first is to recognize that bloat is caused by a combination
of auto vacuum configuration and update and delete heavy workloads. Just one of them won't get you
there. It's both together. And so if you're only a little bit bloated, maybe you just kind of keyed
onto it via some, you know, observability metric some observability metric or warning level ping you got in some place or other.
At that point, I would recommend going the route of tuning your auto vacuum configurations, as well as really talking to the dev team or working as a dev yourself to figure out whether there's anything you can do to reduce the volume or difficulty of those updates and deletes. Some common anti-patterns I see are things like
cron jobs that run every day or week and delete a huge amount of data. And they often think they're
being helpful when in reality, they can be kind of degrading the quality and IO return value of
that database. In terms of tuning configurations,
usually you want to go one of two broad ways.
You either give the server more resources for AutoVacuum
through AutoVacuum Max Workers,
or you tune it to run more frequently.
So you tune AutoVacuum to actually run more aggressively,
which I generally recommend based on system defaults.
AutoVacuum did get more aggressive
in more recent versions of Postgres.
However, it's still generally good to turn up, like,
AutoVacuum vacuum scale factor,
especially if you have those large tables.
You know, defaults to only triggering AutoVacuum
when 20% of your table is dead rows.
So, you know, that's already beyond, I would say,
my recommended 18% goal. So if you know, that's already beyond, I would say my recommended 18% goal.
So if you really want it to trigger more proactively,
you would need to tune that down from 0.2
to let's say like 0.1 or far less.
You know, I see a lot of recommendations online
that I've used as a user that suggests, you know,
every 1% or 3% of dead tuples.
Yeah, cause for a huge table, that's still a lot of tuples we could still be talking about tens of thousands hundreds of thousands even millions
of tuples and by the time that it's actually done you could be up higher than that because
that's what triggers and what if it takes hours and hours yeah absolutely if you do get really
really far behind you know you check the clock and you're up to that like 90% bloat table, that is a good time to consider rebuilding the table. If you can afford it, vacuum full, you know, almost people can't these days if it's a user facing application. So that's the reality. But I always say that first, because there are situations in which you can,
again, looking from the user perspective, and it's a lot easier. The other thing you could do
is use an extension like pgRepack or pgsqueeze to rewrite it. And this is basically creates a
shadow table duplicate schema, it will copy over all the data, use triggers to update all the data coming in from one to another.
And then once it's all caught up, it will, within an access exclusive lock, which lasts less than a second, definitely,
it will basically switch the table names so the prod traffic points towards the new table.
This is something that I have seen be flaky. I wouldn't personally recommend
automating pgRepack. I've seen and heard of cases in various talks and just through friends in the
industry, people that try to automate pgRepack. And, you know, I've seen it lead to a lot of
incidents and issues. You know, I've personally never run into an issue where there's any data
loss, because in the case that you just kill the PID of pgRepack for example you just have some dead tables you have to manually drop the duplicate ones aren't complete
so you use the old one but I've heard of it causing other issues so I would be I would say
careful but I wouldn't stray away from it totally nice I've always thought of it as like
re-indexed concurrently but for the whole table is it is that a reasonable
comparison or i guess it's missing some subtlety around potential flakiness but that can fail as
well right like if re-index concurrently fails you can end up with some invalid indexes i think
that's a really good analogy actually mainly because they're both non-transactional because
you know concurrently or anything, you know, concurrently
or anything concurrent, you know, the thing that it's, it's an unusual move by Postgres. And I'm
sure there was a lot of discussion on the core team about this when they first started releasing
concurrent features, because it's a decision by the core team to value the user Postgres experience
by DBAs and applications over the strict rules of MVCC.
Because when you create an index concurrently,
if it fails, you have the invalid index.
So it's not atomic.
It's not able to roll back.
And the same thing with pgRepack. If it fails, then you have these invalid tables
and valid indexes that you need to drop.
You know, if you try to rerun it, it's not a no-op.
You'll have to clean it up first.
Yep, cool. Okay okay i'm glad that's
good so that that's really good point in terms of auto vacuum and in terms of repacking or vacuum
for if if you can afford the heavy locks um or if you if your system just doesn't have any users at
a certain time of day or something like that right but it is it's rare but it is common enough that
i have been caught
out by not recommending it a couple of times which is super interesting cluster is in the same
category right like same as vacuum full but you get to order it by an index which can be helpful for
like reads definitely cool uh last in fact you mentioned right at the beginning and i had to
bite my tongue not to jump straight on the pun, because that's how my brain works. You mentioned having a hot take. So you talked quite a lot in the past about access patterns. And one point I loved was the idea of considering if you've got an update or delete heavy workload, which could be the reason you're in this situation. Do you even need to be doing those updates and deletes?
That's a question that doesn't get asked very often.
And you made a really good point.
So there's that angle that I'd love you to talk on if you want to.
And there's also the hot update optimization we have in Postgres that can be a huge help
for some like avoidings.
If you're aware of it it not indexing a certain column
if you don't have to like there's some trade-offs there that might be interesting i don't know if
you've got experience with those yeah i guess so so first address the first part of it i'm glad
you glad you brought it up because this is a this is definitely a strong opinion of mine and i think
that's something that comes from coming to being what I would describe as somewhere between the liminal space of a software engineer and a DBA through, you know, back end engineer to data engineer to DBA, just, you know, sinking my way down into infrastructure.
And, you know, I think that I still tend to think from a remember that so many of these access patterns and writes, the biggest hammer you can use is to just not do it or to rewrite it.
And if you're a DBA managing a couple or hundreds of databases, you know, speaking for myself, I'm managing the hardware and the high level metrics.
So I don't really have access or knowledge about talking to somebody into the why.
Why do we have this query?
Why do we have this database itself?
What the heck's in it?
And so I think that if you really want to address bloat, often the best thing you can
do is to start a conversation and say, hey, what's the value of this?
Can we simplify it?
Do we need to be deleting this?
Do we need to have this table?
It's crazy how many times that I've
dealt with a problem that way. And I've never needed to delve into the more, I would say,
extensive measures. And also, if you can, keeping those relationships with people at your organization
or whoever you're working on a project with, to try to be able to let them keep that up by
themselves. You know, at my company, we've built some automation around,
let's say like auto education as far as we can.
We're still working on it,
but a way to kind of allow developers
to be proactively educated
about how they're building their systems.
And so I think that as much that you can do that
and just, I would say,
change the patterns from the code side
is the quickest way. That's a PR.
Awesome. I was not expecting it to go that way. What's this automatic education thing?
Is there anything you can share there? Yeah, I will my occulpe here and say that I wish I
could say it were a more, a more, you know, like amazing system than it probably is. But
we've used, for example, GitHub hooks and GitHub web
hooks to automatically comment documentation that we've written on PRs. For example, if we see
running a migration, we pin the migration guide to your PR rather than requesting us as reviewers,
because I work in an organization of larger than, you know, a thousand people. So I don't want to
be a blocker on your migration. I want to
educate you. Same thing with partitioning. I wrote after we dealt with partitioning stuff,
you may have noticed that a lot of the talks I write are based off whatever the heck I'm
struggling with at work. And so I wrote a very in-depth migration, like partitioning migration
guide, both for future people on my team, as well as people who might want to partition and need to
understand why, how, whether it's a good idea. So I think that creating documentation is good,
but we all know it falls out so quickly. You change one thing, it's deprecated, you forget
about it, the person leaves. So I think that the underappreciated side of it is figuring out easy
systems where you're auto commenting it or you're
you know pushing it towards people in a way that actually keeps it actively read
awesome sounds great is there anything else you'd like to plug or give a shout out to
um not particularly i think that i you know i'm sort of a relative newbie to the Postgres community
being involved in the open source side. You know, I've went to my first Postgres conference last
year and then sort of ran at it at, you know, 100 miles an hour ever since then, which has been
really fun to get involved. So I guess I would just say thank you to all you guys for inviting
me in. It's been a great past year to being more involved in postgres it's awesome to have you in the community i still feel new and i feel like
i've been here for five or six years so it's awesome having you here you're a wonderful
contributor to the community your talks are great please keep please keep giving talks on issues
you're hitting with postgres they're some of the most fascinating that the community can have and
also not common
enough at conferences, I think personally. So yeah, appreciate it. And I appreciate the real
numbers as well. So thanks so much, Chelsea. Yeah, thank you for for hosting this. I listen
to you guys as many people do in the car. So, you know, keep keep me going with good things to read
while yelling at various drivers on the road
well apologies you're probably gonna have to skip a week okay yeah yeah listen to your voice is too
hard tell me about it take care chelsea thank you