Postgres FM - Our favourite v16 feature
Episode Date: September 8, 2023Nikolay and Michael discuss their favourite feature each from the upcoming PostgreSQL 16 release. Here are some links to some things they mentioned:v16 draft release notes https://www.postgr...esql.org/docs/16/release-16.htmlPGSQL Phriday #012 invitation from Ryan Booz https://www.pgsqlphriday.com/2023/08/pgsql-phriday-012/ Subscribe options for the podcast https://postgres.fm/subscribeA recent closed source ClickHouse feature https://github.com/ClickHouse/ClickHouse/issues/44767#issuecomment-1683293218  Postgres TV hacking session with Andrey Borodin on \watch with limited number of loops (v16) https://www.youtube.com/watch?v=vTV8XhWf3mo Allow \watch queries to stop on minimum rows returned (v17) https://github.com/postgres/postgres/commit/f347ec76e2a227e5c5b5065cce7adad16d58d209 pg_stat_io commit mentioning the op_bytes column (v16) https://github.com/postgres/postgres/commit/a9c70b46dbe152e094f137f7e6ba9cd3a638ee25 pg_size_pretty function https://www.postgresql.org/docs/current/functions-admin.html#id-1.5.8.33.9.3.2.2.7.1.1.1 Visualizing Postgres I/O Performance (talk by Melanie Plageman at PGCon) https://www.youtube.com/watch?v=CxyPZHG5beI Our episode on BUFFERS https://postgres.fm/episodes/buffers-by-default EXPLAIN (GENERIC_PLAN) blog post by Laurenz Albe https://www.cybertec-postgresql.com/en/explain-generic-plan-postgresql-16/ Running EXPLAIN on any query (video by Lukas Fittl) https://www.youtube.com/watch?v=CMftYJnqou0 PostgreSQL 16 Beta 1 New Features with Examples.(English Version) by Noriyoshi Shinoda https://twitter.com/nori_shinoda/status/1664481483355226114 Have auto_explain's log_verbose mode honor the value of compute_query_id (commitfest entry) https://commitfest.postgresql.org/42/4136/ Make auto_explain print the query identifier in verbose mode (commit) https://github.com/postgres/postgres/commit/9d2d9728b8d546434aade4f9667a59666588edd6~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hi everyone, this is PostgresFM episode number 62. This is Nikolai and Michael. Hi, Michael.
Hi, Nikolai.
Your time to choose the topic. Tell us, what is it?
Yeah, so I chose to go along with the monthly blogging event. So Ryan Booz invited people to
blog on the topic of what their favorite Postgres 16 feature is. So this is the upcoming release.
Next week.
Yeah, we had a release candidate one last week or so.
Yeah.
Last week release candidate, next week release.
Yeah, normally it means release is coming
unless anybody finds a bug or two.
So thank you to everyone involved in that
and figured it would be a good time
to talk favorite features.
Right.
By the way, I just realized I'm interrupting you
and making the process of subtitle transcript creation worse.
But this is a good moment to remind our audience
that first of all, we have our podcast published everywhere, basically.
One listener, very good Postgres DBA I know many years,
wrote me that it's a good idea to publish us on Spotify,
not only on YouTube.
I told like postgres.fm, that's it.
You can see subscribe or something, the button, which will show you many options.
Spotify or Apple podcasts or Google podcast, everything, Google play, anything.
It's convenient.
And yesterday I saw even on Amazon,
it's also published.
Interesting.
And of course, it's worth not only subscribing,
but leaving a review.
You know what to leave there.
So, and yes, we have good transcripts.
On PostgresFM, we have outdated version.
I hope we will update them soon.
But on YouTube, we started publishing
improved version of subscripts. It's combined effort
from open AI API, Swiss parent GPT for and manual corrections. We
have glossary we have improved process and I also plan to apply
this process for some talks. So it's interesting content
pipeline, which probably will produce content for people who want to read it maybe later or something.
Okay, enough.
Postgres 16, where to start?
What's your favorite feature and how did you go about working on it?
My favorite feature, yes.
I thought about it.
I have two answers here.
My most favorite feature is that all features are open.
Nice.
Okay, yeah.
In the age we live in, when companies are choosing not open path,
Elastic and Mongo first, and now we have HashiCorp.
It's not related to databases, but it's adjacent technology.
Many people provision infrastructure using Terraform.
And this morning I've learned that ClickHouse decided not to open source
new implementation of some, I forget already, let me check,
I have it open, share merge tree table engine.
New implementation will be not open in ClickHouse.
ClickHouse is open source and we appreciate it.
It gained popularity being very good engine for analytical workloads.
It was open from the very beginning.
It was amazing.
But they also started to choose the dark side.
So you mean almost like a meta feature that you're grateful and pleased?
Exactly. I wanted to say meta feature.
Yes, exactly. You caught it.
So all Postgres features are open still.
Let's appreciate it.
Yeah, I want to add to that.
I love not just that the code is open,
but also the discussion around the feature is open
on the mailing list and on the commit fest.
There's great commit messages
Here we come to some dangerous area
not discussing versions and technology
because I have my personal
opinion and I see that many things
are not open unlike many companies
which are more open than Postgres project
but it's a different discussion let's discuss it
in another episode
So other than the meta feature
that the code is all open
and the features are all open source,
what's your favorite actual feature?
Okay, I have two more answers.
One of them, purely technical.
One of them is that the feature we discussed,
it was my idea.
You know, I like poor man's solutions.
You know, like when you don't trust fully or you're missing something in monitoring, you start using some system performance tools, like starting from
top PS, Iostat, Iotop, anything. And if you can connect, of course, I'm not talking about RDS here. So similarly, if you need to update billion rows
and you understand that it will take ages to explain how to do it properly to people.
But it's like urgent, for example, next week,
your integer for primary key will be out of capacity.
So you just use PSQL script scripting and backslash watch. So I always select
ability to specify number of cycles I need. And I said this to Andrey and we coded it on Postgres
TV. It was live. Andrey sent it to hackers and in Postgres 16 we will have this feature. Obviously,
I like it. It's a tiny feature, but not only it was committed to Postgres 16.
In Postgres 17, I already see continuation of this idea
and people add more tweaks,
more options to backslash watch command in PSQL.
It means that it resonated.
I'm happy.
So just to go into the specifics of what,
so in the past you could watch
and it would basically run the same query to check in the past, you could watch and it would basically
run the same query to check on the progress, for example, of something. And you could do it. Yeah,
go on. Let me as usual, I explained some things I learned over years, which are like tricky and
not obvious. First of all, backslash watch is a replacement for semicolon. You don't need semicolon. By the way, even
Andre learned this from me when we discussed this extension. I bet 95% don't realize you don't need
semicolon. You say select, blah, blah, blah, or update, blah, blah, blah, and then backslash watch
and it already works as semicolon and you don't need extra execution so immediately you start executing
in loop by default it will be two seconds of wait time between loops not loop time but delay time
two seconds because also query takes some time usually right it's not zero and the only option BackslashWatch had in the past, it's this wait time.
You can adjust it.
In extreme cases, I remember putting something like.1.
.1 means 0.1.
Tenth of a second?
100 milliseconds.
So it's very quickly doing something.
You can see if connection is good, you can, for example, monitor PgSet activity.
You see data changing there almost live.
It's kind of cool.
If you understand the overhead of what you are doing, it is not high.
But now you can also specify number of loops you want.
If you want to run something 10 times
do it for example you with simple math you can understand i want this to run approximately one
hour the query to be just activity it's fast okay we have 3600 every second for example and this will continue one hour poor man inside me is happy
i mean poor man approaches i like them a lot because they are very trustworthy psql is the
most trustworthy client we have and it's a simple solution right in the past so just to clarify in
the past it would just run infinitely until you stopped it. Right. Yes, and I used select CTE, and then if nothing to update, for example,
I just used division by number of rows, which is zero, and it stopped by error.
This is how I stopped.
And in Postgres 17, like highlighting this new feature,
which is like additional option other people developed.
Sorry, I don't remember names but it's
it's not andrea already no i'm not involved at all additional feature will say how many rows to
require to continue if like something like minimal rows and so on and you can say if zero rows stop
now officially without division by zero which is very good just probably a stupid
question do you know why if what you really wanted was let's say please do this for an hour
and instead of having to do the math why not have a time frame in there well now the like
pandora box is open right we? We can propose new options.
And my honest belief,
PSQL itself needs some extensibility improved.
You know, like idea of plugins in VI,
in Tmux, in Sublime, everywhere.
Yesterday, I tried to install something for Sublime
for my daughter.
It was insane.
I gave up.
So I usually use many plugins until i don't because if you switch computer switch hosts and they were like you
are in new environment and it's default vi no control a is not working anymore you need to
go back to control b as is control for control sequence? So you use control B. So I stopped using plugins
for T-Max and VI mostly. But still, plugins for PSQL is a good idea, in my opinion. But
it's not, it doesn't exist yet, right? But extensibility like this for a backslash watch,
it's good, I think. Small, very small very small but useful so let me um actually
technical feature i like the most is different one this this feature is uh i've been related to it so
that's why i i want to talk about it but among all features i like the most you know pgstat.io
the new system yeah no no no not it op bytes column inside it this is what
okay op bytes operation bytes and if you check it it will be always uh 8192 so 8k
8 8 kb so yeah it will be filled with the same number always.
Okay.
It looks kind of silly a little bit, right?
But why I like it? Because it changes mindset from buffers to bytes, finally.
Yeah, okay.
And I checked multiple articles talking about Pidges.io,
and they mentioned, oh, you know, there's this strange column.
And they immediately started to say, if you have this number of buffer operations, writes or reads, this translates to this number of gigabytes or something, maybe bytes and so on. I like a lot because I know how important it is to translate this for backend engineers
who usually
it's like
additional step in
mind needed to understand what
buffers mean and it's hard
it's not simple arithmetic and the numbers
get big quickly so I'm
guessing you just multiply them
in queries where you're looking into this
new view you multiply them how queries where you're looking into this new view. You multiply them.
How many times my BC in my terminal, BC-L, you know, like calculator.
How many times it received comments like some number multiplied by 8 and divided by 1024
to understand number of MIBI bytes.
It's like I already already type it very quickly.
Times 8 slash 1024.
Just to translate to mebybites
and explain to other engineers
this is a lot, you know,
or it's tiny, it's nothing.
So I think this strange column,
let me highlight it,
it's a very good thing to have.
Although it's not, like, ideally I would prefer seeing derivatives,
like derived columns from original block numbers,
derived additional computed.
It's called computed.
Like, it exists in Postgres also, right?
Yeah, we have that feature.
Although here we have...
Well, this is a view as well.
It's a view.
Just, yeah, it could be on fly. Yeah. It would be good to see, right? Yeah, we have that feature. Although here we have... Well, this is a view as well. It's a view, just, yeah, it could be on fly.
Yeah.
It would be good to see, like, byte numbers.
There's also some functions, old ones, not new in this release,
but ones for showing bytes, like for making them prettier, right?
So if you've got a number in megabytes...
I don't like it.
You don't like it?
It's old-fashioned.
It does respect the gigabytes versus gigabytes.
It has issues.
I don't like it.
So what do you do?
Do you show everything in megabytes then?
I usually write my own for this.
Okay.
If it's easy.
Cool.
And I know Melanie Plageman,
I don't know if I'm pronouncing that correctly, as with most Postgres names.
I think they were involved, but I'm not sure who else was.
Yeah, I attended the talk she gave at PgCon.
It was one of very few talks I attended. It was good. I liked it.
Playing with PgBench a lot to see how I.O. is working.
By the way, this view, of course, itself is a big achievement
because we always needed something to answer a question. How much I.O. is related to auto
vacuum activities, for example. But I would like to point that, of course, this view shows
high-level numbers. It's not actual disk I.O. It's operations with buffer pool.
So some of those numbers are purely memory IO, not disk IO.
So we still need old tricks to find the process IDs of autovacuums,
of workers, and checking proc process, IO numbers are still valid.
But it's still very good.
You can perform analysis in a convenient way.
You can understand your IO.
And IO in database is the most important thing.
It's the main source of latency, right?
More IO, more latency.
We've talked about this a lot before.
PG-STAT IOM is global, very detailed buffers.
Yeah, by system level rather than query level, right?
Right, but from there you can go to PG-STAT statements and so on.
If you see backends have a lot of IOM
writes or reads and so on.
So I expect many pie charts and maybe some additional graphs in, well, maybe like colorful graphs in monitoring, which will show spikes of, for example, checkpoint or like backends produced writes or reads and so on.
It's good.
It also has additional things like evictions, like some extra things. So
usually, for example, in PGSR statements, we don't have it. Here we have it. It's great.
I mean, it's development of analysis ideas. Right? Yeah. Good.
Nice. So yeah, on that note, these system views are really helpful and it's great to see new ones
being added not related to this conversation but i saw there was one for weight events coming
well at least it looks like it's been committed to version 17 which is really cool but yeah back
to 16 we probably like it feels like this is a really good episode to talk about our favorite
features even if they're small we'll probably do another one on the larger features but so in on that note um i can tell you about larger features
very in short like very short about larger features almost nothing well the gap between
delivers every year and what people expect continues to grow this is my opinion i thought
there were quite big changes
around logical replication.
Perfect.
Unfortunately, DDL is still under development,
logical replication of DDL,
but the ability to replicate from standbys,
it's long awaited feature and also parallel application
of long transactions, long running transactions,
very, very important feature.
I like them a lot. Moreover, I need them important feature. I like them a lot.
Moreover, I need them a lot.
We need them.
Yeah.
Can't wait to deal to happen.
It's a very good thing to have, but I'm talking about things like zero downtime.
A lot of them, even alter table zero downtime fashion does the best, but
upgrades people need to stop thinking about upgrades
and many things right so let's talk about another time yeah let's let's talk about that yeah i have
a small feature that's like it's probably uh just based on the bias of what i tend to come across
but my favorite is also a tiny little one which might be even considered a bug that has been fixed which in version 14
explain verbose so explain with the parameter verbose would give you the
query identifier which was added to pgstat statements as well so this is a
really cool feature to uniquely identify a query parameterised query so two
queries that are looking for different IDs
would still give you the same query identifier.
To join data from various sources, from logs, from PgStats activity,
from PgStats statements, and find examples.
Exactly.
But, and you brought up the important point,
logs, there was something that was missing
that nobody noticed for a couple of versions,
or at least nobody reported
and has been fixed in version 16.
Nobody created, maybe.
Pardon me?
The main part is nobody created.
Nobody proposed the patch, probably.
Nobody proposed or did the work.
Or proposed it, but it was not perfect in the beginning.
This is an idea that lives many lives many many years and the need is
there many years right yeah so it wasn't in if you used auto explain with verbose so it has like
auto explain dot log verbose setting even in versions 14 and 15 you wouldn't get the query
identifier in the logs and now as of 16 16, you will. So I think this is
obviously tiny, but there are a couple of really cool things about it. One is, I think this empowers
monitoring tools, like we were talking about just now, to, if you, for example, even without a
monitoring tool, if you are looking at PGStats statements, you find some top queries, maybe by IO.
PGStats statements has buffer information in there as well.
And you want an example query plan, so you can look into it.
Yeah, super important.
Either an example query plan or even just an example query,
example parameters.
AutoExplain is now a really good way to get that as of version 16.
It's as simple as searching the logs for the same identifier
so that that's really cool um obviously it makes it easier for tools as well can you remind me
at the same time now you you're it's possible to explain genetic plan to get uh like if you
if you don't have examples at least you can get like default plan from the planner and it's also it was added
in postgres 16 right explain generic plan yeah yeah exactly now this i don't like it right i i
see you i mean it's just not i can see how it is useful if you if you cannot find examples
are using prepared statements or if you need to know about generic plan behavior in general, it's a really cool
feature, but I don't see
the people I work with don't
tend to need it.
I suspect it's because I'm not seeing
the right use cases.
The right use case, you develop something new,
new feature or completely new
product, and you don't have data yet.
But generic plan will be not good in this case, right?
I think it's more like if something...
I think it's more around prepared...
I don't understand it well enough, clearly.
But I saw people were excited about it,
like Lawrence from CyberTech.
He definitely blogged about it.
Lucas, a PGA journalist, Lucas Fidel, he definitely blogged about it. He might even.
Lucas,
a PG journalist,
Lucas Fiddle also discussed it,
I remember.
Yeah. They did it at the same time,
around like in spring.
Makes sense.
This spring,
yeah.
We can link those up
for people that are interested in that.
But yeah,
in the performance work I've been doing
and seeing customers doing,
I've not seen a case where
adding that would really help.
it can be beneficial in automation of analysis when you have a lot of queries
to analyze at least somehow and you need to get at least something and understand like
highlight like I don't know like low hanging fruits like for example sequential scandal on large table and
this can can be helpful but it's a rough approach not fine fine grained approach well and and you
don't know that it will be using the like it's not real performance data right it's the generic plan
whereas in you what you really what tends to be most helpful is looking at examples that have actually
been slow and looking at performance data there and that's where i love auto explain like it's
helpful not even just so some sometimes it's intermittently slow right if a query is intermittently
slow it's actually quite difficult to look into auto explain helps because it captures the plan
when it was slow so i i prefer to look at it like...
I understand the argument is you can't always get the plan for something.
You might have constraints or you might be in an environment
where that's not possible.
I'm trying to invent ideas where it can be slow.
Another idea is to understand how many tables and indexes
can be involved in execution.
So how many objects and databases we need to log in shared mode,
and there is like some threshold 16.
When you reach it, fast path is not possible anymore.
And if you have a very frequent query to be executed,
for example, many thousands per second,
in this case you might have issues with log manager and it hits
you and there is no good solution. So I can imagine a generic plan could help understand
the number of logs this query needs. And probably this could be like, if we find the second part of
information, expectations of QPS for this query, if we
understand it's a very frequent query and we see a lot of objects are involved, probably
it's time to optimize proactively, not waiting for this log manager wait event popping up
in Pgset activity.
But it's kind of quite like, I'm not sure I'm talking about something a lot of people understand.
We should discuss it another time as well.
And it's not my favorite feature.
Okay.
Sounds good.
Yes, I almost stole agenda here.
Let me not to forget to mention very good work, which is published with every release every year it's from
Nuri Yoshi Shinoda Hewlett Packard Enterprise Japan it's a big PDF yeah we have examples of
new features for and for every release they publish it and this time is no exclusion we can
see postgres 16 new features published already.
Let's link it as well.
Oh, wow.
Yeah, I haven't seen that.
I'd love to see that.
It's usually like beta one, beta two,
around the time they publish it.
And it's great to, like,
it can be used like a reference, you know,
like if you're curious about how some feature is working,
you have practical example and it's great.
Yeah. Oh, look at that.
In fact, you mentioned in Japan,
it reminded me of a couple of other things I really liked
about this little auto-explain feature,
was that it was a cross-continent, cross-company collaboration.
I know reviewers often have to be,
at least it's considered best practice
that they're not from the same company
as the person proposing the patch, but it's really cool to see people're not from the same company as the as the person proposing
the patch but it's really cool to see people from japan working with people from europe from amazon
from ntt data like just lots of different people from different walks of life and different
companies collaborating on the same things really cool and the other thing i liked about it was
the discussion i mean it's probably a reasonable question people might have wondered when i collaborating on the same thing is really cool. And the other thing I liked about it was the
discussion. I mean, it's probably a reasonable question. People might have wondered when I
mentioned that this won't work in versions 14 and 15, why not backpatch it to 14 and 15?
And I think that's a very reasonable question. And I'm not entirely sure it's a good idea not to,
but I love the reason why, which was they didn't want to break any log parsing tools that people have built.
So not even anything within Postgres, but very much something that is part of the ecosystem.
Now, I think the most popular one is pgbadger, and I don't think that would break by adding them in.
If they talk to the maintainers of pgbadger, they might even want this feature backpatched to versions that people are probably more likely to be using in the near term.
But as an external tool provider myself, I love that the developers are considering our needs and what might help or hinder us.
So really cool to see that conversation, to see the reasons, to see somebody question it, and people answer it.
Good.
Anything else to add from your side?
No, I think I'm already out of quota already, right?
We wanted one feature I discussed, I talked about too many.
We'll be back with the Postgres 16 episode, I'm sure.
Okay, okay.
Maybe we'll do an overview of a wider picture,
and let's plan to discuss features it doesn't have.
Maybe that, or that, or yeah, maybe both.
I'd like to discuss what the important features are.
This is favorite.
Whilst I claim this is my favorite feature,
I would never claim this is the one I think is the most important
so apologies in advance to all the great features
that didn't get a mention yet
well yeah yeah yeah there are many things
for example improvement of performance
of copy command
3x faster it's great
and so on yeah a lot of good stuff
it's like a couple of hundred or something
isn't it like it's so many
yeah anyway
thank you nicolai thank you everybody catch you next week it's time not to add features but to
remove some auto vacuum should be removed okay i'm joking uh thank you everyone it's not eight
for the first nicolai yeah okay careful okay thank you everyone indeed see you next time take care bye