Postgres FM - Is pg_dump a backup tool?
Episode Date: November 22, 2024Michael and Nikolay are joined by Gülçin Yıldırım JelÃnek and Robert Haas to discuss both the technical question of whether or not pg_dump is a backup tool, as well as the tone and inte...nt behind the statement "pg_dump is not a backup tool". Here are some links to things they mentioned:Gülçin Yıldırım JelÃnek https://postgres.fm/people/gulcin-yildirim-jelinekRobert Haas https://postgres.fm/people/robert-haasWhy you should upgrade PostgreSQL today (blog post by Gülçin) https://xata.io/blog/cve-2024-7348-postgres-upgradeIf pg_dump is not a backup tool, what is? (blog post by Gülçin) https://xata.io/blog/pgdump-is-not-a-backup-toolIs pg_dump a backup tool? (blog post by Robert) https://rhaas.blogspot.com/2024/10/is-pgdump-backup-tool.html?m=1Why pg_dump is amazing (blog post by Robert) https://rhaas.blogspot.com/2024/11/why-pgdump-is-amazing.htmlAvoid too prominent use of "backup" on pg_dump man page (commit by Peter Eisentraut) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=4f29394ea941f688fd4faf7260d2c198931ca797Is pg_dump a backup tool? (poll by Nikolay with options Yes / No / Define backup) https://x.com/samokhvalov/status/1847015453056786771What's the best way to make a backup (a recent example discussion on Reddit) https://www.reddit.com/r/PostgreSQL/comments/1gu4r05/whats_the_best_way_to_make_a_backup/Hackers mailing list https://www.postgresql.org/list/pgsql-hackers/ Praise, Criticism, and Dialogue (blog post by Robert) https://rhaas.blogspot.com/2023/12/praise-criticism-and-dialogue.html Out-of-cycle release scheduled for November 21, 2024 https://www.postgresql.org/about/news/out-of-cycle-release-scheduled-for-november-21-2024-2958/ pgBackRest https://github.com/pgbackrest/pgbackrest Barman https://github.com/EnterpriseDB/barman Our previous episode on backups https://postgres.fm/episodes/backups ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL.
I am Michael, founder of PGMustard, and I'm joined as usual by Nikolaj from PostgresAI.
Hey Nikolaj.
Hi Michael.
And today we are delighted to be joined by two excellent guests
who have each contributed a lot to Postgres over many years now
and who both recently published blog posts on the topic we're going to be discussing.
Let me introduce you both quickly.
First we have Gucin Yildirim Yelinek, who co-founded the Prague PostgreSQL Meetup and is a staff engineer
at Zeta. Welcome, Gulcin.
Hello, thank you for having me.
We're delighted to. And we're also honored to be joined by Robert Haas, long-serving
PostgreSQL major contributor and committer and VP, Chief Architect, Database Service
at EDB. Welcome, Robert.
Hello, thank you for having me.
It's our pleasure as well. So to kick us off, I've prepared a couple of questions to ask each
of you in turn, but I'd also like to encourage you to ask each other questions as we go along.
Perhaps we can start with you, Gulcin. What are your high-level thoughts on the topic of
is PGDump a backup tool tool and why is it something you wanted
to write about recently it is funny because i didn't actually wanted to write about pgdump
and i just joined the my current employer is zeta and it was my first week and then i noticed
something in the discord channel that we have somebody's having an issue with pgdump i was like
oh what's happening and i saw like some parameter that I didn't recognize.
Like in the error message, there was like restrict non-system relation kind.
I was like, I don't know this, you know, configuration option or anything.
And then I noticed it was actually introduced recently at that time.
And I was like, oh, okay, why?
And then I check it and it is kind of related to this CVE.
I remember the number 2024-734.
Doesn't matter.
But there's a blog post about it so you can find with this number.
And in there, it explains like what is this vulnerability and how can actually people use this vulnerability to actually compromise when you are potentially your database.
Because it affects the PG dump.
So people can actually create a non-temporary object in the database.
And then just before PG dump begins,
it changes this object with a different thing,
like a view or a foreign table.
So people can insert SQL there.
And then when PG dump attempts to do the backup,
then it kind of can run the injected SQL code.
So why we are there?
Because it affects it.
And then I said like, hey, this affects from Postgres 12 to 16, upgrade the,
you know, Postgres versions and test if pgdump scripts are working, review the
user permissions, you know, the standard, you know, recommendations when this kind
of thing happens and then when we were sharing this blog post on Twitter, I
think our marketing team, you know, made like, okay, it affects PGDump, a tool to backup Postgres databases.
It is a definition of that tool, basically.
And then I retweeted and everything and I noticed, oh, people are saying, you know, the usual, when you say something about PGDump, it is not a backup tool.
And then I was like okay and then
basically it kept going so i had to write another blog post to say is it really or is it not
who first who first said this i don't know i didn't know this because there were so many and
i know that because you know in postgres community, whenever PGDump topic opens up,
somebody will say, you know, PGDump is not a backup tool.
But then actually a few days before this discussion happened,
Peter Eisentraut committed a change, which will be in effect in PG18,
that tries to remove the backup terminology and kind of converse to export
so that people are not considering it as a backup tool in a way.
So this, I think, made people to be more vocal, saying that, look, this was how it was before, but not anymore.
And you should not say it. And then I had to write another.
I mean, I felt like I should write something more about it to explain why and why not.
We cannot consider, you know pg
dump backup or not and in my opinion it is a tool that can be used to backup a database
and it is a logical way of doing a backup you can call it a dump maybe you know you can define you
know nicole i was saying is it backup tool or, or defined backup? So it can be a backup.
In your case, when I was working as a DBA for a long time, I was using it to backup databases.
It depends, basically, the context of how you use it and the nuances that you can actually utilize this tool.
So, yeah, that's where I stand today.
I don't agree that it is not a backup tool.
It can be a backup tool.
But there are maybe later on in the
discussion we can discuss what are the drawbacks with it and how actually regular backup tools
that are out of Postgres can help, like Fireman or Backrest or something.
Can I jump straight away with a question?
Yeah, please.
Yeah, I saw also comments that it's maybe for very large databases, like many terabytes
and more, it's not a good tool, backup tool, but at least for small databases, it's good
and also partial, we can export only one table.
Imagine we have a tiny database, just like, I don't know, like 100 rows, one table, and
I select star from this table in PSQL and just make a picture on my
iPhone. This is a backup, this picture. We can restore from it, right?
Well, maybe. It's a snapshot, right? Yeah. Why not?
Well, dump is also snapshot, right?
Yeah. And that I don't really see like why it can't be called backup basically.
It is a moment and you can use that moment to do something with it right i thought it was a really good blog post gullichin i'll share it in the show notes as well for anybody that hasn't
seen it and speaking of good blog posts on the topic i think robert added a lot of good points
as well both of your first blog posts included a lot of technical details like the technical aspects of why it technically could be considered a backup tool but also the
drawbacks the many drawbacks of it and why you might recommend for a general purpose backup tool
using something else so robert how about yourself what how would you summarize your high level
thoughts on the topic and and why it was something? Well, maybe you didn't want to write about it either.
Well, I think it just kind of got under my skin because, you know, Goldstein's blog post
was not the first time that I've heard people sort of using this PG dump is not a backup
tool line. And to me, that kind of came across as shouting at people
without necessarily giving a reasoning, right?
The documentation said for literally 20 years
that BGDump could be used to make backups,
or I don't remember exactly what the wording was.
And when I looked into the history in Git,
I actually found that the language that it's been changed to now with exporting the database is very similar to
the original language that was used to describe pgdump when that code was first added to PostgresQL.
But there was a two-decade period in the middle
when the documentation said,
hey, you can use this to take backups.
And from my point of view,
it doesn't even matter whether that's true
or whether you think that's true.
If the documentation said for two decades
that X piece of software could be used to do Y,
then nobody should get in trouble for saying that. Nobody should get called out for saying that.
That just doesn't make any sense to me. I mean, honestly, I think, you know, we, some of us, self-included, can be a little too
eager to jump on people's case from time to time.
And I don't think that's, like, good for our community.
I think we want to be the kind of community where when people show up, we give them help,
we give them good advice, and we don't come down on them like a ton of bricks. And Goldstein is not the only person I've seen who seemed to me to be kind of getting beaten up a little bit.
And I was just like, why are we doing this?
Like, clearly PG Dump isn't right for every purpose.
And there are lots of situations where it's probably not what you want.
But I just, the tone is baffling to me
because it seemed very hostile to me.
And I couldn't make any sense of really
why we should be that hostile about anything,
but especially why we should be so hostile
about that in particular.
And to that, I actually have something to add
because after this discussion started to come up again,
and I was looking at the groups like where people are actually this, like, it's not a backup tool rhetoric.
And I've seen a few users that are trying to get help from these Postgres communities that we have online, a lot of them.
And I noted two of them for today.
One of them is asking, PGDump can limit a backup by schema.
I mean, it's like using this sentence and
there's somebody answering directly. It's not related there, but PGDump is not a backup.
And then there's another user. Can someone send me the command to take back of a partial
database, which actually PGDump can, right? We can do the schema only, we can do just,
you know, the data, whatever, or we can do a table, any type of object.
And the answer is like, there is no such command.
The standard backup tools take backups of the entire database cluster.
So basically, it doesn't consider it as a backup tool, even though there is a pgdump
command that can actually do what people are asking.
So that's what I find very, like, not helpful, right?
We could just say to people, look, this is this pgdump command that you can actually take this, you know, a table that you want to take or select a free store, whatever you want to do with it, and help people to the direction that they're actually trying to get there.
Instead, just saying there is no such a command.
It's not a backup tool anyway, because the standard backup tools takes the entire database cluster.
So that I don't find
helpful, is what Robert is saying. That is not helpful at all. You might not agree that it's a
good tool for using it as a backup solution, in which we can talk where it could be improved or
why actually people should prefer, you know, backup solutions. But this is still not helpful.
There's a tool that we were all using for a long time, and it can does all the things that these
people were asking. So nuance of the question matters, the context of it. And that's where I am, basically.
And you know, if somebody asks about how to use PG dump, and you want to tell them,
hey, here's how to do that thing with PG dump, but maybe you want to consider some other
alternatives instead. Cool, like I got no problem with that.
That can be helpful advice.
But like pretending that the thing that they're asking about doesn't exist when it does,
that I just don't understand that at all.
So yeah, I've definitely got some theories as to why people are behaving and speaking like that but i do think
one of such people is just here i can speak first hand if you want you know i wanted to yeah i wanted
to i think you've got some really good language around this nicolai around logical backups and
physical backups that really helps clarify and i think if people use that language in those
sentences it would immediately help with clarity and also limitations.
But I'd love to hear your high-level thoughts on the topic as well.
And yeah, why is something you say?
So this statement, PJ Dump is not a backup tool, is a reaction to the statement the documentation had 20 years. And we saw so many disastrous situations in many companies who
tried to rely on this as backup tool while growing.
So we did like actually it was not my statement, right?
I just picked it right it.
I think Frank Paschot also mentioned it.
I don't I'm not sure he was the first who reacted to the gulchin's article
but i joined as well and i'm sure in many not only discord or slack or i've seen anywhere many
people are picking up this motor because it's painful to observe how many companies relied on
pg dump as like dumps as backups right if we call dumps and considering backups okay we can do that
but there are limitations there is big power in this not only partial you can take specific tables
these days we have many managed postgres offerings and they don't share backups with us
right if you want multi-cloud backup, you must use PgDump.
You cannot get data
or copy or something.
You cannot get physical backups
out of RDS, for example, right?
But this pain observed
for a couple of decades
caused me joining this movement
saying that PgDump
is not a backup tool.
At the same time, I told Michael there is a kind of professional shift in my mind here,
because when somebody says backups, I envision only physical backups,
although there are logical backups, of course.
And again, this is not my idea to introduce this language. I checked it in Oracle and my SQL documentation. And I think maybe
it's a good idea to borrow this concept and mention specifically SQL. There are two kinds
of backups, physical and logical. They all have pros and cons, right? For example, logical backup.
If you rely on PgDump as backup tool, like for example, partial and escaping from RDS, it's good pros, right?
Speaking of cons, it's always like kind of snapshot.
It puts pressure on your database in terms of Xmin horizon affecting auto vacuum behavior,
which is unacceptable if you have 10 plus terabytes and heavy load. pressure on your database in terms of Xmin Horizon affecting auto vacuum behavior, which
is unacceptable if you have 10 plus terabytes and heavy load.
Also at one day some bug or corruption might happen and you simply cannot read your data
at logical level while physical backups are not affected.
They just copy files, right? And like, there are many pros and cons
to compare, right? And I like the idea to split language between logical and physical.
And for me personally, when somebody says backups without specification, I still see
by default physical only, right? If we are considering the corruption in the logical backups, the corruption can be also
in the physical level.
Right, I'm okay with that, but I have backup I can restore and deal with it, right?
Well then actually you can maybe keep this corruption between your physical backups if
you didn't notice, if it goes unnoticed and then if you had the
logical backup on top of it maybe you know it could be another tool to fight this physical
corruption that you have yeah what i'm trying to say if i have physical backups with corruption i
will deal with it and so on but if i have corruption which prevents pgdump from reading
data it will just fail and i don't have anything. Yeah, so I'd just like to make a couple of
comments here. I think one of the things that I find really interesting is that people who work
for different companies that all support and use Postgres can have very different experiences
of some of this stuff. And I've seen that before with other issues. And I'm seeing it here too, because my typical experience with PGDump is not the one that you were describing
at all. In fact, since I've worked at EDB, which is the whole of my professional PostgresQL career,
I've never had that situation happen. Like not once have I run into a situation where a customer should have been using something other than PGDump and they were just using PGDump and then they got into trouble. gone really badly wrong for some reason, and PG dump becomes the way that we can help that
customer to get out from under that problem. So just as your experience with the customers
that you've worked with is informing the way that you view the issue, I have a different set of
experiences, a very different set of experiences from what it sounds like. And so the thing that to you
feels like, oh, this is the catastrophe, we've got to steer clear of this. In my experience,
that's never the problem. It's always the thing we reach for to get out from under the problem.
And I really just want to highlight that because I'm not saying your experience isn't valid,
and I hope you'll, you you'll return the same courtesy.
I actually understand partially what Nikolai is trying to say here.
Because before EDB, before working with Robert, I was working for Second Quadrant, and we were building our own backup solution, Barman.
And now it's EDB owns it.
And then I know, because I was actually doing remote DBA work and there was a lot of
customers with backup issues you know they had their own home cook scripts in the wrong hands
this can go wrong because there are some things that pg dump and restore you have to know about
it like how do you do the dump process how do you restore do you actually test these things
do you copy this all you know the whole wall directory or do you like kind of consider
this just some logs that we can actually delete at some point and so on so if people don't know
how to maybe put these things together in a way it is like not really helpful for some people
then things can go wrong and i seen that things actually went wrong that's why we were steering
people you know if you just do regular you know and restores, use this tool that we have or any other tool that can be for, you know, cooked for backups,
and you can keep the retention period, you can keep your backups for X days, you can restore
them and test and you can have continuous backups that edit so it's not like partial, you know,
it can be just like a continuous thing that you don't need to worry and you can do point of time
recovery and so on and so on. So I understand this rhetoric and I was the advocacy of it.
But then I also feel like it went too far saying you know, this is not this is not usable
and that I oppose basically.
Stas Vazisayak Yeah, it's like pendulum I agree. Yeah, the
start of this pendulum is this these 20 years of documentation. So you raise a very good point about restore.
When I hear backup, full-fledged backup, it's not only physical to me, it's also verified.
And if we have physical backup which we test, that's great.
While with dumps, I'm very curious, while, Robert, you didn't see an ability of PgDump to read some database which is corrupted and we cannot get dump out of it.
But second question, like here, second case.
Okay.
That actually happens all the time. And one of the things that I often end up helping people do is fixing the
database enough that we can use PG dump to get the data out of it. Because if the database has
incurred a lot of damage at a physical level for some reason, we're never going to be able to repair that well enough to give
confidence that everything is the way that it should be.
So a dump and restore, in my professional opinion, is absolutely essential in that situation
to get back to a clean state.
Now, you are 100% correct that the dump may also fail or the restore may also fail.
But those are problems that we can understand and fix, right?
We can look and say, ah, well, you have a PG class entry, but you're missing a PG index
entry.
So we need to create the one or delete the other, right?
That's a problem where we can say, aha, that's something that we as Postgres experts can can look into and understand what needs to be done to bring this back to a state where PGDub is going to run.
But the blocks being messed up at a physical level or out of sync with each other because we've had some time travel of some kind or something like that.
Those are problems we won't be able to get out from under that ever.
Does that make sense? Yeah, it makes total sense. And moreover, it's a very popular approach to use PgDump to
test physical backups to see that we can read all except indexes. For indexes, we use PgM check,
but to test physical backups, we use Pgm check, but to test physical backups,
we use pgdump to def null, for example, right? Just to see that there is no corruption,
like we can read it for sure. And like you mentioned restore, I remember a couple of times I saw dump could not be restored because of unique key violation right because of corruption of uniqueness
constraint because some duplicates happened and unique key didn't save us due to some bugs or
something maybe some somebody disabled something i don't know like or foreign keys foreign keys
as well if you disable triggers you can corrupt your data easily right you disable triggers you
you load something and you enable triggers and
Postgres won't check it. And
during pgdump, pgdump you can have
but you cannot restore from it.
Right? So, yeah,
we see some mutual points
definitely here and
the question is just about language, I guess.
That's it.
Well, I think it's also about experience, Nikolai.
You mentioned some disasters.
My right in understanding this is folks who have come to you with some issue
and it's not just that they're using PGDump as a backup tool,
it's their only form of backup.
And what kind of issues is that causing?
Remember the first managed service, managed Postgres service created,
popular at least. It was called Heroku. I think it still exists but not being actively developed these
days. And they offered backups as dumps. You can download them. That's great actually.
If a managed service, Postgres service provider allows you to download backups, that's great.
But it was just backups. And nobody does this.
I mean, nobody among very popular managed Postgres providers do this. They rely on physical backups these days, right? And also
on snapshots and so on. I mean, cloud snapshots, full disk snapshots.
And this also shows evolution, right?
Of backup concept in many people mind minds right not only us so i think it would be great just to agree on the language and discuss
like i'm okay to be alone thinking that backup is just physical backup. Backup could include both logical and physical,
and we could clarify documentation and language articles and so on.
And I see it's a pendulum, right?
Again, this is my point.
Too long documentation was claiming this is a backup tool.
This language was super harsh.
And I remember I was trying to explain
at least a couple of times in my life,
I was trying to explain some customers
with growing Postgres databases,
like exceeding terabyte and approaching 10 terabytes.
I'm saying, don't rely on PgDump as backup solution.
And they just showed me documentation saying,
this is what they say.
Vendor is saying this, right?
Yeah, I think that there is maybe a difference
between something that creates a backup and a backup tool.
I mean, this does get down a little bit to what you think words mean,
so it almost seems like a silly thing to argue about, right?
But I think, you know, you asked Gultin at the beginning,
like, if I take a snapshot of all of my data on a cell phone,
is that a backup?
And I think the answer is obviously yes,
but equally obviously that's a silly way to do a backup
because your restore procedure is going to be very unpleasant,
which is not what you want, right? So I think
sometimes when people talk about a backup or a tool that can take a backup or a backup tool,
sometimes they mean like, can I get a copy of my data from which I could recover, right? And that's
one question. And PGDump will give you that right the the other question sometimes
what people mean is they mean is this like and they may have some particular commercial product
in mind that offers a certain feature set and their question is am I going to get this feature
set where for example my retention times will be managed managed and my actual process of orchestrating the backup
and orchestrating the recovery will be managed. And then the answer is no, PGDump is not going
to do that for you. And you probably do want those things in most cases. So I don't know.
Like, I think there's a lot of nuance that's possible in the language here. But for me, the important thing is to make sure that we're clearly able to explain what the benefits and drawbacks of the different approaches are, rather than spending too much time fighting about the specific language, which for me, it gets a little bit silly.
I agree.
Yeah.
I agree as well robert in your blog post you make a really good
case for the tone of the statement being difficult and i think you actually use some language that is
that like waters it's that it down a little bit or explains a little bit more it doesn't take
many more extra words to do so but i also wanted to ask do you see this problem in other statements
in the postgres community like are there other things people are saying that remind you of the tone of this kind of statement
as well i don't have specific examples in mind off the top of my head but definitely yes i mean
it's a chronic problem on hackers you know i think i wrote a blog post about the sort of tone of
dialogue in the in the postgres community towards the end of last
year. And it's always a problem because when you post your patch on PostgresQL hackers,
you're essentially soliciting review. And people are rarely going to write you a review where
they're like, you know what, this patch is amazing and I love it. I mean, it happens. People actually do get those kinds of reviews
and it's a great day when you do.
But generally, when you're reviewing a patch,
you're picking something that you actually like
and would like to see go forward.
And then you're saying the worst things about it
that you can think of to say.
You're like, so here's all the problems.
Here's all of the stuff that I think needs to be better You're like, so here's all the problems. Here's all of
the stuff that I think needs to be better in order for this to become part of the product, which
I hope it will, but these things are the things that I think need to be fixed first. And so what
I see is that actually for a lot of committers in particular, people's mental health is not in a great place. You know, I kind of
thought my mental health was not in a great place around some of this stuff. And then I talked to
some other people and found that they were feeling worse about it than I was feeling by like
significant margins. And it's, in my opinion, it's rarely because of bad intent. I mean, obviously
people get frustrated. People say things that they shouldn't have said, or they don't say it and the kind of engineering that we're doing,
it's difficult and it's error prone. And even the absolute smartest people in the community make
all kinds of mistakes, you know, over and over again, right? Like we were doing a rewrap of a
scheduled minor release that happened last week. We're doing that this week because somebody
committed a fix for a
bug and the fix contained another bug. And it doesn't matter who made the mistake or who didn't
catch the mistake. That's not relevant. It happens all the time. And I think it's really challenging
to people because we work in a very open environment where everybody sees every email we
write, every patch we commit, every patch we thought about committing.
It's out there constantly
and you just realize that there are so many ways
for you to screw up.
And every time you make a mistake, everybody sees it.
So I think it's a struggle for everybody.
As far as I can tell,
every single person who works on hackers
encounters this problem
of getting the tone right all the time.
And I am certainly not going to sit here and pretend like I get it right more often than
average.
I think a lot of people would say I am below average in that way.
But I can tell you I'm very aware of the problem.
And I am trying to figure out how to do it better
because at the end of the day,
it's not enough for us to deliver great software.
We need to deliver great software
while also creating a community
that people want to participate in.
And that applies for me,
first of all, to the developer community
because that's where I spent most of my time.
But it also, I think, applies much more broadly to the user community.
And I think that is part of the reason this issue set me off a little bit, because it's
the sort of thing that I'm struggling, often in vain, to do right on a daily basis.
But instead of being targeted at other developers who at least kind of know
that the negative feedback is coming, some of this felt to me like it was targeted toward users
who like, they don't realize that they're about to get jumped on for, you know, waiting into a
flame war about whether something is or isn't something, you know, and I just don't want,
I don't want users, I don't want anybody to have that experience and I just don't want, I don't want users to, I don't want anybody to
have that experience. I certainly don't want users to have that experience.
I personally think that only from having you articulate that, I've thought of one that
annoys me a little bit, and that's the correction of people pronouncing or spelling Postgres
wrongly or missing the S off sometimes happens if people are new to the community. And immediately they get
jumped on. I think, oh, come on, they're clearly new. So yeah, I can definitely see that.
It also happens a lot with people based on their language of origin. You know, like,
the fact that we pronounce it PostgreSQL, I believe that's at least one of the canonical pronunciations, you know, that is much more natural for somebody who learned to speak
English in the United States than it is for somebody who learned to speak English in,
for example, India, right?
Like, it is English, but the way that English is spoken in India, it's a distinct dialect. It has its own ways that people say things, ways that people communicate, characteristic patterns of speech.
And that's not the only place, certainly.
I think, actually, there are probably other countries where the problem is even more acute because English isn't even used as a common language communication in many parts of the world. But even when it is,
it's not necessarily the same as your English, and people aren't necessarily going to be,
you know, starting from the same point, right? If I read a word that is unfamiliar, and my wife
reads the same word, we're likely to pronounce it the same way in most cases. But if a colleague
from halfway around the world reads the same word, their instinct may not be the same way in most cases. But if a colleague from halfway around the world reads
the same word, their instinct may not be the same as mine. And that's not necessarily a question of
me being right and them being wrong. That's a question of we went to different schools,
we were taught different things. Yeah, I think it also points out to the wider problem in
many communities, like the longevity of the projects will depend on people. And if you
are hostile to people, or like, because we all come from different parts of the projects will depend on people and if you are hostile to people or like because we all come from different parts
of the world I didn't learn English until I was all like you know an older
kid and it is always a problem when I give a talk or when I write an email it
is still in back of my mind that I try to correct myself I use multiple tools I
try to present myself as good as I can, but there are
limits. I still confuse the propositions I use in and at, all around, randomly. I could never fix
this. And that doesn't mean that I can't contribute to the project, and I could, and I do.
And that's what I believe, like these little statements, maybe we took it to a philosophical
approach through, it's not about pgdump backup
or not, but like as you know, saying Postgres, but we should do better in how we handle communication,
because this is the way that people interact with you, they report issues.
And if you don't accept problems, well, people will not report it, or they will not actually
use this and report back what they use so that you don't actually get the feedback from
people. And because you cut these channels that people actually try to communicate to you
instead of opening all these channels that we should actually amplify we should have more
channels for people to bring stuff that they they interact with postgres or ecosystem in general
so that's where i was really impressed by robert's blog about how open he was about this.
And I appreciate the efforts that going on towards this,
because when I started,
I also felt scared,
almost reading like some of the emails.
I was like,
I wouldn't want this reaction to come to me,
for example.
So it shouldn't be like that.
And I think it's not just an issue of dialect either,
you know,
like that is definitely part of it.
But one thing that I've noticed on hackers is that clarity and extreme precision of expression is very, very highly valued.
Right. Like someone can come along with a worse idea and because they explain it extremely clearly and precisely, either it gets accepted or they get feedback on how it should be changed or positive comments.
Welcome to the community. Hey, great to have you. Right. Somebody else writes a worse email about a better idea, and it actually gets
a worse response. And I do understand some of why that happens, right? We value people whose style
of expression is similar to our own, where we feel like we can freely and easily communicate
with those people, And everybody's busy.
So you don't want to spend a huge amount of time trying to understand email A
if you could very quickly and easily understand email B.
But it's obviously super off-putting to people
when you may have proposed something that was actually great.
And if somebody had given you five minutes of their time,
they could have understood exactly what you were trying to say.
But they just flipped through the email really fast and then they moved on because they're busy.
And that's obviously going to be demoralizing to people.
To play devil's advocate a little bit, I personally err on the side of being polite and trying to be kind and trying to be welcoming.
But I also think sometimes that approach doesn't always land
people don't always take the lesson from it or learn from the statement or realize that what
maybe what i'm really trying to say or i'm not being clear enough that kind of thing and i do
think for example with the the comment that we started with i feel like there's a certain amount
of trying to save people from themselves or trying to shock people, deliberately trying to be provocative in order to make people think, oh, we shouldn't only be relying on this tool for this purpose.
Or, you know, maybe I should be rethinking my thoughts.
You know, it doesn't apply to all of these cases, like mispronouncing the project name but i've seen this specific comment come mostly from consultants some experienced consultants some who are very kind and also
involved in like diversity initiatives i've definitely seen this from people that you
wouldn't necessarily expect to be direct and unkind so the exact phrase pg dump is not a backup tool
so i i think that's coming from a place of having seen people shoot themselves in the foot and wanting to save people from that and wanting to be quite direct to avoid it.
So I don't know for sure, but I believe their intent is good.
But maybe they're deliberately choosing to be provocative or direct or I'm not sure.
I'm not sure. Maybe I'm putting words in their mouths basically i think it's like we are not calling out people for just saying you know this is not a backup tool because we understand where they come
from because we are in the same industry working for ages we know these people we all have the
customer stories and so on but i think the general idea from here that when somebody shares the blog
post let's say we all wrote about it right i? I had wrote two blog posts and Robert wrote two more.
And we just, you know, got together and talking about, let's say, he's pointing out like why PGDump is good at dependency management, let's say, you know, it's like we take it for granted,
which I wanted to bring up in today's call to just actually showcase that there are things
we should appreciate in this tool, why he says it is an amazing tool.
Then towards this, somebody writes like, but it is not a wake up tool. Then I don't get it because it's not it is an amazing tool. Then, towards this, somebody writes like,
but it is not a backup tool. Then I don't get it, because it's not what is the discussion about.
We are trying to discuss that there are ways you can make this tool in your tool set. It's not the
only tool. There are professional solutions for backup, baking up your database against disaster
recovery, as we mentioned, the retention and the whole orchestration of the database backups and recovery.
But when we are discussing this tool specifically, which I feel that is important here because there's nuance to be discussed,
and just shutting down the discussion saying, but it's not a backup tool,
that's where I feel that this needs to be improved better because then you don't really contribute to this
because you need to say then why it is not in this case why don't you agree with this let's say is the dependency
management thing is not for you or like why it could be improved you could say that pgdump could
be improved because let's say we could run vacuum after it or we can do i don't know like do
statistics better or something i mean just to contribute contribute where a PGDump might have been
improved because I've seen people like, you know, in the discussions that they struggle with like
mapping, let's say PGDump options to PGRestore options, because they assume the order will be
the same and they don't get it and so on. So there are things maybe we could get input from why
people complain about these things and to improve. That's where I go for issues. You know, I see this comments in the forums and like, okay, this
is a good idea. Maybe I can actually talk about this. But then when we are discussing
this and coming with like, okay, there's not a backup tool, it kind of brings back to the
zero and doesn't really improve anything.
But your second article was basically agreeing that it's not a backup tool.
No, in the sense that people say, as I'm saying, as a solution, if you want to orchestrate
your backups, use a tool that is like, you know, Barman, Backrest or something.
But then another discussion we have, why it can't be, why PG dump, we are discussing, because in the second blog post of Robert, for example, he gives up like this, you know, why
it could be a nice tool for these of the use cases that he lists. And they're getting the question
of again, it's not a lab that I don't agree, basically, like, okay, use a better maybe
solution, if you are managing production databases in multiple environments that are giant databases
and you really don't need to deal with you know home run but historically it is still a tool that
we use you know it could be used for different cases what what i hear is you're saying when
people come to you to comment to your first blog post saying i think it was Frank Pashot and I'm joining him, still joining
and he said PgDump is not
a backup tool, you think it's
like shuts down some discussion
and so on but I just
explained that this is a pain
from a lot of experience
and we are just reacting
and what I hear you still try to judge him
right?
No, no, it's definitely not for it.
I'm just saying we discussed it.
The second blog post was about, you know, there's backup tools, you should use it.
But then when Robert was describing a part of why PGDump is good, in my opinion, it was like very valuable points.
And there, it was not even relevant.
We were not even discussing, should you use this tool or not? And I'm not targeting anybody. I'm not even relevant. We were not even discussing should you use this tool or not.
And I'm not targeting anybody.
I'm not targeting anybody.
So be clear about it.
Yeah, so the change happened only now.
It's in Postgres 18.
And recently I had discussion
this like claiming,
oh, it's not backup tool.
Somebody said,
oh, what is this about then?
And sending me a link
to Pidget Dump documentation.
So I think I would not judge
people who are saying PidgeDump
is not a backup tool
until we have this change in documentation
and start recovering
from this stress we had 20
years. This is
my point. And I stay on
this point very strong.
I mean, I like... and common ground is let's let's
start distinguishing physical and logical backups we can clarify this on documentation as oracle
and mysql did and there is already part of documentation speaking of backups it describes
dumps i mean pdump and then file system
snapshots and then
point-in-time recovery, full-fledged
backups. And just if we
clarify documentation,
I will stop seeing customers sending
me this link saying, you're wrong,
this is documentation saying you are wrong.
I mean,
but, like, I think,
you know, I don't know, like, if you can't win an argument against a documentation link, I don't know, it feels like something's not right there, you know?
Like, I'm not trying to be harsh, but I just feel like, you know, if somebody hires you to give them good advice, and you give them advice that is actually good and their
response is let me interrupt you sorry i i'm just like like i feel judgment in you and gaussian's
wars like you tell me now how you want to be welcoming and now you judge me like i cannot win
i cannot win two things pg dump is a is a backup tool. Sometimes I cannot. They say they trust the computation more because many more minds behind it. And
also PgSaaS statement, documentation says you cannot set it to positive value, keep
it zero globally because globally it's a bad idea. I already like some customers I win,
some customers I don't. I'm not genius, right? but i feel in both of you i feel judgment why
don't we stop judging people and sentiment and so on i bring you like improvement let's say there
are two types of backups logical and physical and then we we develop language from there and this
joins us right when you judge people saying they came to me with this statement,
or you say you cannot win your customer organization, this splits us.
And I start fighting with you.
I don't want to fight with you.
But, I mean, that's also my complaint about the language that you were using.
So, like, I don't know how to have this discussion without having opinions about whether
certain language is good or bad. And I don't think, I mean, you can't, right? We have to be
able to talk about what the language does and to what extent it helps or hurts. And yeah, of course,
there's some judgment there. I don't know. Like, I definitely have been in the situation of having a customer who wouldn't listen. And I,
the frustration that you feel with that situation feels very genuine to me. Like, I can totally
imagine that happening and being a bad experience. But I don't know. I'm not even saying it's a bad thing
that we changed the language and the documentation.
I was only reacting against the sort of like
conclusory statement,
PGDump is a backup tool
and now I don't want to talk about it anymore.
I think we should always be talking about it more.
I think we should be trying to, as you say,
bring clarity to it and bring precision to it.
I agree with you totally. I hear you now well. And I think we will stop saying this actually,
if documentation will be, it's already fixed. I think it can be fixed even better if we say
it's a logical backup tool, for example. Everyone will be happy, I think, right? And we will stop
saying it's not a backup tool. We will stop saying it's not a backup tool.
We will start saying it's not a physical backup tool, which is obvious, right?
And this will join everything and so on, right?
I agree with your reaction, actually, which says this statement, it's not a backup tool.
It's like too far from balance, right?
It's off balance. I agree with this. So it's not a good statement,
actually, I admit. But again, it's a reaction to another not a
good statement, which we had in documentation, which didn't say
logical backup, it said just just backup. Okay, we're pretty
much out of time. Okay, that I wanted to thank you all for your thoughts on this.
I think it is a difficult subject,
and I think actually it's really nice to have three people
that all care about educating folks
and teaching people how to do things well
with different opinions on how to do so
or slightly different approaches on how to do so.
But as Nikolai says, as Gulchin pointed out in her blog post,
the language around this has been
changed in the documentation Robert keep fighting the good fight on the on the hacker the tone on
things on hackers um is there any any last words anybody else wants to add let's start Gulcin did
you want to say anything else at the end no I'm happy that we are discussing it and I don't take
things personally I mean we are here just to discuss technically why this could be useful in some cases and why not.
And yeah, that was anyway, the summary of what I said in the blog post as well.
So if people like to read it and comment on it, and I'm happy to discuss more.
Thanks.
Wonderful.
Well, we're looking forward to your future blog posts, whether you want to write them or not.
Robert, any last words from you? I just think, you know, on Nikolai's comment about making documentation better, what I would
encourage, and of course, this is much longer than we can actually do in this forum is, you know,
let's get down beyond the headline, right? Like saying in the headline that it is a backup tool,
or that it's not a back to pull, or it's an export, it's a dump, it's a lot. We got to get beyond that, that subject line and think about what we what we
say down deeper. And I think one of the areas where the Postgres documentation is sometimes
weak, is it doesn't always do a good job listing pros and cons, pros and cons very often don't get
listed for things. So you know, that that's probably an area where we could grow as a community.
Big time.
Brilliant.
Well, thank you so much, everybody.
And thanks, Nicola.
Catch you next week.
Thank you.
Thank you for coming.
Thanks.
Bye-bye.
Ciao.
Bye.
Ciao.