Postgres FM - PostgreSQL 15
Episode Date: October 14, 2022Here are links to a few things we mentioned:Â PostgreSQL 15 release notesSpeeding up sort performance in Postgres 15 (blog post by David Rowley)Past, Present, and Future of Logical Replicatio...n (talk by Amit Kapila)Â Postgres 15 improves UNIQUE and NULL (blog post by Ryan Lambert)Â Do you vacuum everyday? (talk by Hannu Krosing)Â Why upgrade PostgreSQL? (by depesz)What did you like or not like? What should we discuss next time? Let us know by tweeting us on @PostgresFM or by commenting on our topic ideas Google doc.If you would like to share this episode, here's a good link (and thank you!)------------------------Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things Postgres QR.
I'm Michael, founder of PgMastered, and this is my co-host Nikolai, founder of Postgres AI.
Hey Nikolai, what are we talking about today?
Hi Michael, this is episode number 15, right?
It is, and because we are geniuses, we planned this really far ahead,
and we're going to be talking about Postgres 15.
Because this week it's released. I think it's already released yesterday, right?
Yeah, exactly.
The way we record beforehand,
so fingers crossed the release went well.
And if so, it came out yesterday.
So thanks to everybody involved in that.
And sorry in advance if anything went wrong.
Right.
And also I should congratulate you and myself
because we didn't skip any weeks,
15 weeks in a row.
It's a big, big achievement, right?
And thank you all for great feedback.
Again, we received very good feedback.
Thank you.
And requests also.
We listen to requests.
Requests exceed our capabilities, definitely.
But we will try to catch up.
Please continue.
Yeah, absolutely.
I think it would have been easy to skip
a week if we weren't getting lots of nice comments so thank you everybody okay postgres 15 what's your
favorite feature straight to it you know i'm a performance fan in general but i have a few reasons
for picking the performance improvements to sorts specifically there are multiple ones yeah there are lots and i i know
they are separate features but when you consider them together as a group i think they're so
powerful mostly because anybody who upgrades will benefit from them without having to change anything
on their side right and everyone does it any project project has order by, right? Exactly. I think so.
99.99% have it.
Even, yeah, exactly.
And I think it's used in other cases as well, right?
Like I see query plans with sorts in them that don't, you know,
they might be related to other operations as well.
Right, the question is how much improvement can be not noticeable.
But like, I don't know.
I haven't seen details haven't tried
myself but i do see many mentioning of order by small improvements here and there like with gist
indexes and other things i noticed it as well yeah so order by was improved yeah and there's a really
good blog post by david rowley or rowley i I'm not sure. Sorry, I've definitely got that wrong at least once,
on the Microsoft blog that I can include.
Obviously, benchmarks are tricky, but it has some benchmarks on each of them.
And there's some decent wins in there.
Oh, so a whole blog post only about sorting improvements?
In Postgres 15, yeah.
By the way, I don't like the word sorting.
This is official in sql and and
postgresql release notes use it as well but sorting sometimes like normal people not engineers
they think about it like okay this goes here this goes there right not not the changing order you
know this ordering yeah but ordering is much better in my head than sorting but sorting this is what we have
in source code and everywhere and so on so yeah one of the features there this is a bunch of
improvements right and one of the improvements i would like to notice is improvement of performance
of sorting or ordering when workmem has exceeded this is is interesting, right? If I'm not mistaken from Peter Eisenkraut,
his name was mentioned there.
But I would like to test this one.
I'm not sure how much it was improved,
but it definitely sounds interesting
because workmem sometimes is not enough.
Like we have temporary files and so on.
And this is interesting.
Yeah, there were multiple.
There were improvements to on-disk sorts.
So that's what we're talking about here. There were improvements to in-memory sorts, and there were improvements to the amount of memory needed for sorting, especially certain data types. that's important because it means some sorts that previous even if you don't change your work mem
setting some sorts that previously would have spilled to disk will now be able to happen in
memory so that'd be an extra performance boost around that threshold so yeah so many improvements
that hopefully in combination will will help people without them noticing and not necessarily
without noticing but hopefully upgrading like always hopefully upgrading will give you a
performance boost right out of the gate.
Yeah, worth testing, definitely, and checking. Interesting.
How about you then? What's your favorite feature?
Very small one, small feature.
By the way, also Peter Zetraut, as I remember, was involved there.
It was collation control.
Like Postgres will... I don't know details, obviously, but I know the problem very well.
When we upgrade operational system,
glibc, glibc is a silent upgrade.
We talked about it in previous episodes.
Might happen, right?
And it usually happens if you upgrade, for example,
from Postgres, from Ubuntu 18.04 to 22.04, for example.
And the question is, is it a dangerous upgrade of glibc or it's not?
And more often than I would like to have it,
it's quite dangerous.
So you can have some indexes corrupted silently
and nobody will tell you.
So this is like a field of minds.
You can step into it.
And after upgrade also, you don't see any problems. But after a couple of minds, you can step into it. And after upgrade also, you don't see any
problems. But after a couple of days, your users started to complain, some queries don't
work as expected. And this is obvious sign of index corruption. So you should test it
with check. So now in Postgres 15, it will report that actual version, it's not what
database expects. And it's controlled on database level, as I know. So it's good at least to have immediate error or message.
Yeah. So is that a log message? How does it report it?
I don't know, actually.
I just saw that this problem was addressed at least somehow.
In my opinion, it should be so.
Postgres should care and Postgres should know which JLipsey version was used when table
was created, database was created, and now it's there. So it knows and it complains about difference.
How it complains, I have no idea, unfortunately, sorry. We will see. Unfortunately, in systems I
deal with, we will see only in like three or so years, because like it will take time to upgrade big systems.
But smaller systems, it's good that it will be there very fast.
And it's quite common when you, for example, copy.
Operational system upgrade is one of cases.
So you can also, for example, take your PGA data and bring to a different machine without noticing that the JLIPC version has changed.
Or, for example, you run Postgres in containers, PGData was created using one glibc version,
but in container you have different glibc version and also have problems.
So finally, since Postgres 15, we will have visibility to this issue.
And this is very important thing.
I think like it's like, it feels quite small, but it's so painful not to have it.
Great that Postgres 15 finally has it.
Also, Merge, of course.
This is big, right?
I have checked the history of Merge.
Can you guess when the talks about it started in the Postgres project?
I can cheat because I saw a talk by Simonon riggs at postgres london and i
think he was involved from quite early on but yeah so it's a long even though even though i was told
a few months ago i'm i'm still probably gonna under guess let's say six years ago 2005 so yeah
16 years ago yeah 16 or 17 well roughly. Oh, roughly. Yeah, wow.
Right, and it was reverted in 2018 in Postgres 11.
Yeah, it had issues.
So Simon Riggs committed it,
and then he needed to revert it, unfortunately.
And this was a big disappointment moment for Postgres 11.
I remember it quite well.
So we lived with Absurd, but Merge is much more powerful. It has
conditions, it has ability
to delete instead of just update
or insert.
And it's also standard compliant,
which is very important. And it also
Oracle,
SQL Server, DB2,
like big data, DB2, like okay.
Two big databases, Oracle and SQL Server,
they support it so if you
migrate from there it's like one of very common points of pain when you need to rewrite your
queries now it will be much more convenient yeah exactly less work to do a big migrate like
everything we can do to make those migrations from things like Oracle less work overall,
I mean, the balance of cost.
Benefit from the project, right?
Exactly.
The cost-benefit ratio keeps going in our favor.
So yeah, thanks to everyone who's worked on that for so long.
Yeah, I was surprised that suddenly, okay, it's committed again.
This time, obviously, quality is great.
So it's going to stay.
And it was a big surprise so many years right that's probably the longest feature in development merge so yes and then the other
thing you mentioned there was sql standard compliance and we've we talked even quite
recently i think a couple of episodes ago on why we like how important that is for people choosing
postgres so everything for that let's mention the other standard compliant feature
that probably is repeating the path of merge
because it was reverted after first beta or second beta.
I don't remember exactly.
The JSON.
Right.
SQL JSON, part of SQL standard.
It was reverted both from 15, already after beta, right?
And from 16 development branch it was also
reverted for some time until it polished and it was a probably the biggest disappointment of the
15 release right yeah absolutely but when i read the a lot of the things we've praised postgres
for recently are things like how how high the quality bar is and how strict the release
process is and things like that. And it seemed like there were really good reasons for not
committing it and that overall, we're probably better off with it coming back at a later date
in a better state. Right. Some other feature was reverted, but I don't remember which one.
A smaller feature also was reverted. I don't remember either.
So already after first beta a couple of
revert actions happened so it's an interesting observation it's good to see things happening in
the beta phase though it means people are trying it people are you know looking at each other's
patches and just making sure this is being held to the same standard across the board i really
appreciate it as somebody who mostly relies on postgres is reliability and
performance of course but mostly reliability i think a lot of the community is is here because
it just works and features that you know features that go in and then you can end up with weird
things like if you look at the data types j i know it's not related but data type json and then data
type json b we're forever having to tell people about JSONB
because of JSON being done first.
And I'm wondering if that's the kind of thing
that maybe wouldn't have happened
in the current way of releasing things.
But I'm not sure.
Well, between them,
there are a couple of years of development.
So maybe no, right?
Yeah, maybe.
Good point.
So what's next?
What's the topic, subtopic to discuss well we have a couple of like there are a couple of other things listed in the top line features
i'd be interested in your opinion on there's some improvements to logical some logical
replication improvements a bunch of improvements right yeah that seems to be something that's
getting better and better each major release.
It's not something I use myself, so I haven't read them in detail.
Fine effect.
I'm using it, but not intensively on big production systems yet because of issues with it.
I see observing improvements during the last couple of years, like much more active improvements compared to several previous years.
I'm excited about it because it feels like soon we will have much better logical replication,
much, much better.
So big systems, for example, those who generate more than one terabyte of wall data per day
or having like dozens of thousands of TPS size like terabytes or dozens of terabytes. The maintenance of logical replication will be not
such painful as it is right now. And I'm not going to describe all of the features. It's a bunch of
good improvements and features. I will mention only a couple of them. For example, now you'll
be able to skip some actions from the stream of changes. Because if somehow on recipient side, on subscriber side, for example, conflict occurs,
unique key violation or lack of something because of foreign key violation or something.
Usually it means that's it for your logical replication.
You need to start from scratch or to fix somehow to get rid of unique key or something.
It's not good.
But right now there will be ability to skip some record in the stream
and you can continue and understand why this happened and fix it later.
But the big goal number one is to continue
because if you have a lot of changes, you need to continue applying changes.
And losing just one change is less problematic
than being stuck and not apply changes at all.
So this is a quite interesting feature.
And some other features are also related to performance and so on.
And I would like to mention that Amit Kapila, who participated in many of these improvements,
he gave a talk in our Postgres TV Open Talks series a few months ago.
So go to postgres.tv.
It's a YouTube channel.
Or you're already here,
right? If you watch us with our faces, not only on podcast version. So just listen to that talk.
It was like from firsthand, a lot of insight, ideas, thoughts, and observations, both about
Postgres 15 and future versions as well. So this is better to listen from there,
instead of just listening to
us here. Okay, that's it with logical. What's next? What do you think? We have a couple of other ones.
There's more compression options, for example, pgBaseBackup. Yeah, pgBaseBackup is a way to
create, I call it, thick clone, like regular clone. Like you copy, if you have terabyte,
you will copy this terabyte to different place
on the same disk or different disk,
maybe different server.
And of course, compression is good to have
because we have a lot of CPU power in many cases.
But disk and network may be worse bottleneck than CPU.
So compressing everything and sending less
can be beneficial
in terms of time, right? So I easily can see how we can win in many places in our daily operations,
DBA operations. So I'm glad this appeared in Postgres 15.
Yeah, it's really cool. I think we've seen a few compression-related features in the last
couple of versions. So it feels like there's probably a few people pushing those. So thank
you to them as well. Right right and also wall compression can be controlled i mean in wall
there is no such thing as wall compression there is if you enable wall compression you basically
enables enable not everything but only for full page rights the wall is recorded in two types
if you change some row in a table, it's recorded as a change.
But if full page writes are enabled, they are enabled by default and should be enabled
to what corruption in many cases, the first change after checkpoint is recorded as like
full page eight kilobytes, kibibytes. And if compression is not enabled, you spend eight
kibibytes. If it's enabled, you spend less. And I recommend everyone to enable this wall compression.
However, I saw on Twitter somebody complained that CPU usage, some queries degraded after
enabling compression. In my own experience, dealing with large systems, highly loaded,
wall compression was always beneficial and only benefits were observed. So we write much less in wall.
So now as I remember Postgres 15 will allow you to control the compression type because
usually it's like it was quite lightweight compression.
Maybe you want to compress more heavily.
So now you can tune it.
Additionally, it's interesting.
So this is what means wall compression.
Only full page rise.
First change in the page after
checkpoint. Subsequent changes until next checkpoint will be recorded individually,
only the data what was changed. Okay, I'm with you now. That makes sense.
Right, right. So like more fine tuning for enhance of DBAs. This is good.
Yeah. And the last one that made the kind of a major features list,
at least in the draft release notes,
was JSON format for server logs.
Oh, before we go there,
I also have you noticed,
this is what is interesting item.
I'm looking at it right now.
Add support for writing wall using direct IO on macOS
from Thomas Munro.
This is interesting.
Usually Postgres has no anything with direct IO.
Some systems have MySQL, Oracle, they allow it.
So now only on macOS it's possible.
With additional conditions like max wall sender is zero and wall level is minimal.
So this looks like kind of experimental thing.
And only on macOS, which is funny, right who who runs production postgres on macos no but then again
i guess sometimes people do yeah i guess sometimes people do look at performance things on their local
machine and that's an interesting case maybe a problem but the the other thing that i've heard of is people
wanting to test the the processes the mac processes as you know potentially interesting to run
postgres workloads on yeah not in production yet but if they you know the m1 m2 processes
if they're really good then i wonder like what kind of performance we could see database-wise. Yeah, direct IO for wall, it's interesting.
I would definitely spend some time benchmarking it and just to understand what kind of benefits we can have here.
But my opinion, this is experimental, some small move.
I didn't see discussions, unfortunately, in hackers about it.
But I think something will happen in the future, my gut tells me.
Because not only on macOS, on Linux as well, in this area.
And this is interesting.
So, yeah, this is it about wall.
Some improvements with pgBaseBackup also happened,
some more control and so on.
This is also good.
And exclusive backup mode is killed. so no more exclusive backup mode nobody
was using it for several years already it was default and i remember some confusion but now
just we forget some cleanup what happened here i think that's it about backups let's talk about
some develop developer stuff what else we discussed merge we discussed reverted SQL JSON, we discussed some sorting or
ordering optimizations. With indexes, there is some work continued related to duplicates.
Remember deduplication and improvements in B3 in Postgres 13, and I think in 12 and 14,
definitely, from Peter Gagan and others. And now more cases are supported.
Right.
So tables with toast, their indexes also have improvements.
This will affect, as you said, this will affect everyone.
If you have a quite big table, so if table is toasted,
meaning that you have records roughly more than two kilobytes.
So big records, for example, JSON or text.
Almost anybody that's storing JSON, yeah, exactly.
Right.
Well, sometimes we have small JSON values,
but often we have quite large ones.
And in this case, if this table is indexed with B3,
now this benefits previous releases introduced
in terms of index size and how degradation happens when you update it.
So bloat growth will slow down and so on.
And this is good.
So probably it's finalizing the work in this area, maybe.
I'm not sure.
Maybe there is something else.
But it feels like what was done before, now we have full coverage of cases also now remember this
like in unique keys we can say all nulls are the same which is not what books teach us now should
be this like null equals null no nulls are distinguished usually because null is unknown so
if we compare nulls usually the result was no
they are not the same but now there is ability to say they are all the same and we have only one
single null in our universe in terms of unique keys and in some cases it's useful there's a
blog post by ryan lambert of must-proof labs about that i'd forgotten that was in 15 just on the
development front i've just been looking at a few of Peter's other commits, and I'd forgotten that a hash mem multiplier has
been increased. So this was, I think, introduced in 14. So anybody running 14 might be interested
in this as well. It's a multiplier to work mem that can let you raise the amount of memory
available for hashes, but without raising it for sorts so you
could say i want there to be 16 megabytes available for work mem but i want there to be 32 or 64
available for hashes so you could set the multiplier to two or four interesting how to make this
decision like how can we can make the decision with numbers we need some proper analysis before
we do it, right?
This is interesting.
It's super interesting that the default's been changed.
I think that's something that Postgres generally shies away from doing.
And I'm really impressed that they've done.
You know my opinion about defaults, right?
No, I don't think I do.
My opinion, a lot of defaults are absolutely outdated.
It should be changed.
We should care about modern servers,sds and so on and in this context i will pull us back to operational side and
log checkpoint is on now log checkpoints it was off by default and it was terrible state because
checkpoint data you always want it right and this discussion was like
discussion was like we don't want to generate a lot of logs because you know like sometimes we
have small machine with small disk and we don't want to fill it with logging but checkpoint data
is so useful to understand what's happening for dbas it should It should be on. And this is a small win. I think this is good.
I'm in the camp of let's make defaults much more modern, up to date. And on for checkpoints,
definitely a win. The other one, log mean, auto vacuum mean duration was also changed to 10
minutes. And I think it's only partial win. I would change it to one second. I personally use sometimes zero.
Of course, it generates a lot of logs, but also useful for analysis.
And even if Autovacuum took half a second, you have interesting data to analyze.
You see a lot of interesting stuff.
But of course, in some heavy-loaded systems, it will produce a lot of logging volumes.
Yeah, the line in the release notes for this is fascinating,
and I think shows that maybe you have a slight difference of ideology
with whoever's making these decisions,
because it says this will cause even an idle server to generate some log output,
which might cause problems on resource-constrained servers
without log file rotation.
So the question is, why are we optimizing for idle servers
without log file rotation versus a lot of people running these things?
Yeah, I think this note is about log checkpoints because we have the checkpoint timeout and they happen by default five minutes.
So also not good default as well.
But anyway, and I think they just want to avoid the situation when you install Postgres. The usual opinion about Postgres from wide audience,
from long ago, it's hard to start.
But once you install it, it's just working
and you simply forget about it, right?
Of course, it makes sense to think about
it's not good if you installed it
and it stopped working after a year or two suddenly
because of logs.
It may happen, of course, yes.
But log rotation should be enabled.
But funny fact, a couple of days ago, we had a system where log rotation,
quite important system in our infrastructure,
where log rotation was disabled and we had zero disk space, three disk space.
So it happens.
But still, log checkpoint on.
One thing on that note is that
i guess the world is changing quite a lot and and a lot of people a lot of the time these defaults
don't matter as much because more and more make like i think i saw a survey not that long ago
that suggests it might even be close to 50 percent now of instances are running on managed services
on they have their own set of defaults. And they can change that.
They can set the defaults for you
and they can do these things like logging,
like log rotation for you.
Exactly.
So I think some of these things...
But there, this problem is solved already.
I think, I'm not sure,
but I'm almost sure that on RDS,
log chip points are wrong by default for log.
Maybe I'm wrong, maybe I'm wrong,
but it should be so.
This is very important information.
While we're on this topic, I think the other one that makes sense for modern systems,
the first one that I thought of was random page cost.
And that's still being...
Well, random page cost should be very close to sec page cost if you use SSD, definitely.
Or if your database is below your RAM also different like then yeah so because it
means that if you're in fully cached state it means the sequential random doesn't matter they
are like similar so they should the cost should be close or equal but back to logging many things can be should be changed you can see for example uh log log locking you also should
have it logged but it's off by default or and many things some some people for example enable
logging of connection disconnections as well but it can spam your logs definitely so but yeah based
on based on this philosophy an idle server wouldn't generate
loads of logs for either of those so I could
see people being more open
to that than maybe some of the other ones
anyway it seems like lots of
progress
everything we wanted it's so much
progress I hope this
reconsideration of defaults will
continue in the right direction
but 10 minutes for auto
vacuum is not enough what 10 minutes okay okay the other thing i'm looking forward to this year
is seeing how fast each of the cloud providers release new versions i think we've seen some
some of the newer ones some of them very new players right yeah exactly so i think crunchy
bridge were pretty much the same day if not a day or two after the release last year. They had a version of 14.
Yeah, I remember that.
Microsoft, one of their services was, I think RDS started to improve in terms of Postgres suddenly.
Because a few years ago, I was saying, don't use Google Cloud.
They managed Postgres.
Right now, things are changing.
They have interesting things.
Again, we had a guest on Postgres TV.
Hanu.
Hanu Crossing, former Skype Postgres architect.
And it was great talk about vacuum yeah it really was
do you yeah and we're i'm guilty of being a google cloud postgresql user as well and they
they are doing some really cool things they did release 14 quite quickly as well so fingers crossed
so things are improving i think it's competition yeah absolutely and people wanting it which is
good news people users asking for it.
So yeah, encourage everybody to check out 15
as and when they can and to keep upgrading,
especially, I guess, anybody on version 10 or before
should definitely be thinking about upgrading
as soon as possible
because that will stop getting security patches.
10 is, that's it.
For 11, one year left.
Yeah.
And even regardless of security patches and things,
just the number of performance improvements
you could get by upgrading is worth checking out.
Yeah, exactly.
As well as all of these features we've been talking about.
Great.
Some partitioning improvements we haven't mentioned.
Some other things like for on-data wrappers.
Okay.
A lot of more improvements are there.
So it's good, it's good.
So many, hundreds.
In fact, actually another website that I will link up
is a good one.
Why upgrade?
Depeche.
Yes.
Depeche.com.
Why upgrade?
Yes.
I always use it to show people what they are missing.
Yeah, it's really good at showing security patches
you're missing in bright red. Highlighted red, missing. Yeah, it's really good at showing security patches you're missing in bright red.
Highlighted red.
Yeah.
And additionally, it's got a nice search feature.
So if you're wondering about any changes
in the last few major versions
to a feature you really care about,
you can search for that feature by name
and see all of the commit messages related to that.
It's cool.
Great.
Wonderful.
Any last things?
Yeah. Usual reminder to subscribe, like, and provide more feedback. We like it. Thank you. Thank you. Yeah, thanks. It
keeps us going. And we really appreciate it. So yeah, have a good one, people. And see you next
week. Bye.