Postgres FM - Our favourite v16 feature

Episode Date: September 8, 2023

Nikolay and Michael discuss their favourite feature each from the upcoming PostgreSQL 16 release. Here are some links to some things they mentioned:v16 draft release notes https://www.postgr...esql.org/docs/16/release-16.htmlPGSQL Phriday #012 invitation from Ryan Booz https://www.pgsqlphriday.com/2023/08/pgsql-phriday-012/ Subscribe options for the podcast https://postgres.fm/subscribeA recent closed source ClickHouse feature https://github.com/ClickHouse/ClickHouse/issues/44767#issuecomment-1683293218  Postgres TV hacking session with Andrey Borodin on \watch with limited number of loops (v16) https://www.youtube.com/watch?v=vTV8XhWf3mo Allow \watch queries to stop on minimum rows returned (v17) https://github.com/postgres/postgres/commit/f347ec76e2a227e5c5b5065cce7adad16d58d209 pg_stat_io commit mentioning the op_bytes column (v16) https://github.com/postgres/postgres/commit/a9c70b46dbe152e094f137f7e6ba9cd3a638ee25 pg_size_pretty function https://www.postgresql.org/docs/current/functions-admin.html#id-1.5.8.33.9.3.2.2.7.1.1.1 Visualizing Postgres I/O Performance (talk by Melanie Plageman at PGCon) https://www.youtube.com/watch?v=CxyPZHG5beI Our episode on BUFFERS https://postgres.fm/episodes/buffers-by-default EXPLAIN (GENERIC_PLAN) blog post by Laurenz Albe https://www.cybertec-postgresql.com/en/explain-generic-plan-postgresql-16/ Running EXPLAIN on any query (video by Lukas Fittl) https://www.youtube.com/watch?v=CMftYJnqou0 PostgreSQL 16 Beta 1 New Features with Examples.(English Version) by Noriyoshi Shinoda https://twitter.com/nori_shinoda/status/1664481483355226114 Have auto_explain's log_verbose mode honor the value of compute_query_id (commitfest entry) https://commitfest.postgresql.org/42/4136/ Make auto_explain print the query identifier in verbose mode (commit) https://github.com/postgres/postgres/commit/9d2d9728b8d546434aade4f9667a59666588edd6~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hi everyone, this is PostgresFM episode number 62. This is Nikolai and Michael. Hi, Michael. Hi, Nikolai. Your time to choose the topic. Tell us, what is it? Yeah, so I chose to go along with the monthly blogging event. So Ryan Booz invited people to blog on the topic of what their favorite Postgres 16 feature is. So this is the upcoming release. Next week. Yeah, we had a release candidate one last week or so. Yeah.
Starting point is 00:00:29 Last week release candidate, next week release. Yeah, normally it means release is coming unless anybody finds a bug or two. So thank you to everyone involved in that and figured it would be a good time to talk favorite features. Right. By the way, I just realized I'm interrupting you
Starting point is 00:00:45 and making the process of subtitle transcript creation worse. But this is a good moment to remind our audience that first of all, we have our podcast published everywhere, basically. One listener, very good Postgres DBA I know many years, wrote me that it's a good idea to publish us on Spotify, not only on YouTube. I told like postgres.fm, that's it. You can see subscribe or something, the button, which will show you many options.
Starting point is 00:01:16 Spotify or Apple podcasts or Google podcast, everything, Google play, anything. It's convenient. And yesterday I saw even on Amazon, it's also published. Interesting. And of course, it's worth not only subscribing, but leaving a review. You know what to leave there.
Starting point is 00:01:35 So, and yes, we have good transcripts. On PostgresFM, we have outdated version. I hope we will update them soon. But on YouTube, we started publishing improved version of subscripts. It's combined effort from open AI API, Swiss parent GPT for and manual corrections. We have glossary we have improved process and I also plan to apply this process for some talks. So it's interesting content
Starting point is 00:02:01 pipeline, which probably will produce content for people who want to read it maybe later or something. Okay, enough. Postgres 16, where to start? What's your favorite feature and how did you go about working on it? My favorite feature, yes. I thought about it. I have two answers here. My most favorite feature is that all features are open.
Starting point is 00:02:26 Nice. Okay, yeah. In the age we live in, when companies are choosing not open path, Elastic and Mongo first, and now we have HashiCorp. It's not related to databases, but it's adjacent technology. Many people provision infrastructure using Terraform. And this morning I've learned that ClickHouse decided not to open source new implementation of some, I forget already, let me check,
Starting point is 00:02:53 I have it open, share merge tree table engine. New implementation will be not open in ClickHouse. ClickHouse is open source and we appreciate it. It gained popularity being very good engine for analytical workloads. It was open from the very beginning. It was amazing. But they also started to choose the dark side. So you mean almost like a meta feature that you're grateful and pleased?
Starting point is 00:03:21 Exactly. I wanted to say meta feature. Yes, exactly. You caught it. So all Postgres features are open still. Let's appreciate it. Yeah, I want to add to that. I love not just that the code is open, but also the discussion around the feature is open on the mailing list and on the commit fest.
Starting point is 00:03:44 There's great commit messages Here we come to some dangerous area not discussing versions and technology because I have my personal opinion and I see that many things are not open unlike many companies which are more open than Postgres project but it's a different discussion let's discuss it
Starting point is 00:04:00 in another episode So other than the meta feature that the code is all open and the features are all open source, what's your favorite actual feature? Okay, I have two more answers. One of them, purely technical. One of them is that the feature we discussed,
Starting point is 00:04:19 it was my idea. You know, I like poor man's solutions. You know, like when you don't trust fully or you're missing something in monitoring, you start using some system performance tools, like starting from top PS, Iostat, Iotop, anything. And if you can connect, of course, I'm not talking about RDS here. So similarly, if you need to update billion rows and you understand that it will take ages to explain how to do it properly to people. But it's like urgent, for example, next week, your integer for primary key will be out of capacity. So you just use PSQL script scripting and backslash watch. So I always select
Starting point is 00:05:09 ability to specify number of cycles I need. And I said this to Andrey and we coded it on Postgres TV. It was live. Andrey sent it to hackers and in Postgres 16 we will have this feature. Obviously, I like it. It's a tiny feature, but not only it was committed to Postgres 16. In Postgres 17, I already see continuation of this idea and people add more tweaks, more options to backslash watch command in PSQL. It means that it resonated. I'm happy.
Starting point is 00:05:40 So just to go into the specifics of what, so in the past you could watch and it would basically run the same query to check in the past, you could watch and it would basically run the same query to check on the progress, for example, of something. And you could do it. Yeah, go on. Let me as usual, I explained some things I learned over years, which are like tricky and not obvious. First of all, backslash watch is a replacement for semicolon. You don't need semicolon. By the way, even Andre learned this from me when we discussed this extension. I bet 95% don't realize you don't need semicolon. You say select, blah, blah, blah, or update, blah, blah, blah, and then backslash watch
Starting point is 00:06:19 and it already works as semicolon and you don't need extra execution so immediately you start executing in loop by default it will be two seconds of wait time between loops not loop time but delay time two seconds because also query takes some time usually right it's not zero and the only option BackslashWatch had in the past, it's this wait time. You can adjust it. In extreme cases, I remember putting something like.1. .1 means 0.1. Tenth of a second? 100 milliseconds.
Starting point is 00:07:00 So it's very quickly doing something. You can see if connection is good, you can, for example, monitor PgSet activity. You see data changing there almost live. It's kind of cool. If you understand the overhead of what you are doing, it is not high. But now you can also specify number of loops you want. If you want to run something 10 times do it for example you with simple math you can understand i want this to run approximately one
Starting point is 00:07:33 hour the query to be just activity it's fast okay we have 3600 every second for example and this will continue one hour poor man inside me is happy i mean poor man approaches i like them a lot because they are very trustworthy psql is the most trustworthy client we have and it's a simple solution right in the past so just to clarify in the past it would just run infinitely until you stopped it. Right. Yes, and I used select CTE, and then if nothing to update, for example, I just used division by number of rows, which is zero, and it stopped by error. This is how I stopped. And in Postgres 17, like highlighting this new feature, which is like additional option other people developed.
Starting point is 00:08:24 Sorry, I don't remember names but it's it's not andrea already no i'm not involved at all additional feature will say how many rows to require to continue if like something like minimal rows and so on and you can say if zero rows stop now officially without division by zero which is very good just probably a stupid question do you know why if what you really wanted was let's say please do this for an hour and instead of having to do the math why not have a time frame in there well now the like pandora box is open right we? We can propose new options. And my honest belief,
Starting point is 00:09:08 PSQL itself needs some extensibility improved. You know, like idea of plugins in VI, in Tmux, in Sublime, everywhere. Yesterday, I tried to install something for Sublime for my daughter. It was insane. I gave up. So I usually use many plugins until i don't because if you switch computer switch hosts and they were like you
Starting point is 00:09:33 are in new environment and it's default vi no control a is not working anymore you need to go back to control b as is control for control sequence? So you use control B. So I stopped using plugins for T-Max and VI mostly. But still, plugins for PSQL is a good idea, in my opinion. But it's not, it doesn't exist yet, right? But extensibility like this for a backslash watch, it's good, I think. Small, very small very small but useful so let me um actually technical feature i like the most is different one this this feature is uh i've been related to it so that's why i i want to talk about it but among all features i like the most you know pgstat.io the new system yeah no no no not it op bytes column inside it this is what
Starting point is 00:10:27 okay op bytes operation bytes and if you check it it will be always uh 8192 so 8k 8 8 kb so yeah it will be filled with the same number always. Okay. It looks kind of silly a little bit, right? But why I like it? Because it changes mindset from buffers to bytes, finally. Yeah, okay. And I checked multiple articles talking about Pidges.io, and they mentioned, oh, you know, there's this strange column.
Starting point is 00:11:06 And they immediately started to say, if you have this number of buffer operations, writes or reads, this translates to this number of gigabytes or something, maybe bytes and so on. I like a lot because I know how important it is to translate this for backend engineers who usually it's like additional step in mind needed to understand what buffers mean and it's hard it's not simple arithmetic and the numbers get big quickly so I'm
Starting point is 00:11:40 guessing you just multiply them in queries where you're looking into this new view you multiply them how queries where you're looking into this new view. You multiply them. How many times my BC in my terminal, BC-L, you know, like calculator. How many times it received comments like some number multiplied by 8 and divided by 1024 to understand number of MIBI bytes. It's like I already already type it very quickly. Times 8 slash 1024.
Starting point is 00:12:09 Just to translate to mebybites and explain to other engineers this is a lot, you know, or it's tiny, it's nothing. So I think this strange column, let me highlight it, it's a very good thing to have. Although it's not, like, ideally I would prefer seeing derivatives,
Starting point is 00:12:32 like derived columns from original block numbers, derived additional computed. It's called computed. Like, it exists in Postgres also, right? Yeah, we have that feature. Although here we have... Well, this is a view as well. It's a view.
Starting point is 00:12:44 Just, yeah, it could be on fly. Yeah. It would be good to see, right? Yeah, we have that feature. Although here we have... Well, this is a view as well. It's a view, just, yeah, it could be on fly. Yeah. It would be good to see, like, byte numbers. There's also some functions, old ones, not new in this release, but ones for showing bytes, like for making them prettier, right? So if you've got a number in megabytes... I don't like it. You don't like it?
Starting point is 00:13:03 It's old-fashioned. It does respect the gigabytes versus gigabytes. It has issues. I don't like it. So what do you do? Do you show everything in megabytes then? I usually write my own for this. Okay.
Starting point is 00:13:17 If it's easy. Cool. And I know Melanie Plageman, I don't know if I'm pronouncing that correctly, as with most Postgres names. I think they were involved, but I'm not sure who else was. Yeah, I attended the talk she gave at PgCon. It was one of very few talks I attended. It was good. I liked it. Playing with PgBench a lot to see how I.O. is working.
Starting point is 00:13:41 By the way, this view, of course, itself is a big achievement because we always needed something to answer a question. How much I.O. is related to auto vacuum activities, for example. But I would like to point that, of course, this view shows high-level numbers. It's not actual disk I.O. It's operations with buffer pool. So some of those numbers are purely memory IO, not disk IO. So we still need old tricks to find the process IDs of autovacuums, of workers, and checking proc process, IO numbers are still valid. But it's still very good.
Starting point is 00:14:28 You can perform analysis in a convenient way. You can understand your IO. And IO in database is the most important thing. It's the main source of latency, right? More IO, more latency. We've talked about this a lot before. PG-STAT IOM is global, very detailed buffers. Yeah, by system level rather than query level, right?
Starting point is 00:14:56 Right, but from there you can go to PG-STAT statements and so on. If you see backends have a lot of IOM writes or reads and so on. So I expect many pie charts and maybe some additional graphs in, well, maybe like colorful graphs in monitoring, which will show spikes of, for example, checkpoint or like backends produced writes or reads and so on. It's good. It also has additional things like evictions, like some extra things. So usually, for example, in PGSR statements, we don't have it. Here we have it. It's great. I mean, it's development of analysis ideas. Right? Yeah. Good.
Starting point is 00:15:39 Nice. So yeah, on that note, these system views are really helpful and it's great to see new ones being added not related to this conversation but i saw there was one for weight events coming well at least it looks like it's been committed to version 17 which is really cool but yeah back to 16 we probably like it feels like this is a really good episode to talk about our favorite features even if they're small we'll probably do another one on the larger features but so in on that note um i can tell you about larger features very in short like very short about larger features almost nothing well the gap between delivers every year and what people expect continues to grow this is my opinion i thought there were quite big changes
Starting point is 00:16:25 around logical replication. Perfect. Unfortunately, DDL is still under development, logical replication of DDL, but the ability to replicate from standbys, it's long awaited feature and also parallel application of long transactions, long running transactions, very, very important feature.
Starting point is 00:16:44 I like them a lot. Moreover, I need them important feature. I like them a lot. Moreover, I need them a lot. We need them. Yeah. Can't wait to deal to happen. It's a very good thing to have, but I'm talking about things like zero downtime. A lot of them, even alter table zero downtime fashion does the best, but upgrades people need to stop thinking about upgrades
Starting point is 00:17:05 and many things right so let's talk about another time yeah let's let's talk about that yeah i have a small feature that's like it's probably uh just based on the bias of what i tend to come across but my favorite is also a tiny little one which might be even considered a bug that has been fixed which in version 14 explain verbose so explain with the parameter verbose would give you the query identifier which was added to pgstat statements as well so this is a really cool feature to uniquely identify a query parameterised query so two queries that are looking for different IDs would still give you the same query identifier.
Starting point is 00:17:51 To join data from various sources, from logs, from PgStats activity, from PgStats statements, and find examples. Exactly. But, and you brought up the important point, logs, there was something that was missing that nobody noticed for a couple of versions, or at least nobody reported and has been fixed in version 16.
Starting point is 00:18:11 Nobody created, maybe. Pardon me? The main part is nobody created. Nobody proposed the patch, probably. Nobody proposed or did the work. Or proposed it, but it was not perfect in the beginning. This is an idea that lives many lives many many years and the need is there many years right yeah so it wasn't in if you used auto explain with verbose so it has like
Starting point is 00:18:36 auto explain dot log verbose setting even in versions 14 and 15 you wouldn't get the query identifier in the logs and now as of 16 16, you will. So I think this is obviously tiny, but there are a couple of really cool things about it. One is, I think this empowers monitoring tools, like we were talking about just now, to, if you, for example, even without a monitoring tool, if you are looking at PGStats statements, you find some top queries, maybe by IO. PGStats statements has buffer information in there as well. And you want an example query plan, so you can look into it. Yeah, super important.
Starting point is 00:19:14 Either an example query plan or even just an example query, example parameters. AutoExplain is now a really good way to get that as of version 16. It's as simple as searching the logs for the same identifier so that that's really cool um obviously it makes it easier for tools as well can you remind me at the same time now you you're it's possible to explain genetic plan to get uh like if you if you don't have examples at least you can get like default plan from the planner and it's also it was added in postgres 16 right explain generic plan yeah yeah exactly now this i don't like it right i i
Starting point is 00:19:53 see you i mean it's just not i can see how it is useful if you if you cannot find examples are using prepared statements or if you need to know about generic plan behavior in general, it's a really cool feature, but I don't see the people I work with don't tend to need it. I suspect it's because I'm not seeing the right use cases. The right use case, you develop something new,
Starting point is 00:20:18 new feature or completely new product, and you don't have data yet. But generic plan will be not good in this case, right? I think it's more like if something... I think it's more around prepared... I don't understand it well enough, clearly. But I saw people were excited about it, like Lawrence from CyberTech.
Starting point is 00:20:42 He definitely blogged about it. Lucas, a PGA journalist, Lucas Fidel, he definitely blogged about it. He might even. Lucas, a PG journalist, Lucas Fiddle also discussed it, I remember. Yeah. They did it at the same time, around like in spring.
Starting point is 00:20:52 Makes sense. This spring, yeah. We can link those up for people that are interested in that. But yeah, in the performance work I've been doing and seeing customers doing,
Starting point is 00:21:01 I've not seen a case where adding that would really help. it can be beneficial in automation of analysis when you have a lot of queries to analyze at least somehow and you need to get at least something and understand like highlight like I don't know like low hanging fruits like for example sequential scandal on large table and this can can be helpful but it's a rough approach not fine fine grained approach well and and you don't know that it will be using the like it's not real performance data right it's the generic plan whereas in you what you really what tends to be most helpful is looking at examples that have actually
Starting point is 00:21:46 been slow and looking at performance data there and that's where i love auto explain like it's helpful not even just so some sometimes it's intermittently slow right if a query is intermittently slow it's actually quite difficult to look into auto explain helps because it captures the plan when it was slow so i i prefer to look at it like... I understand the argument is you can't always get the plan for something. You might have constraints or you might be in an environment where that's not possible. I'm trying to invent ideas where it can be slow.
Starting point is 00:22:16 Another idea is to understand how many tables and indexes can be involved in execution. So how many objects and databases we need to log in shared mode, and there is like some threshold 16. When you reach it, fast path is not possible anymore. And if you have a very frequent query to be executed, for example, many thousands per second, in this case you might have issues with log manager and it hits
Starting point is 00:22:46 you and there is no good solution. So I can imagine a generic plan could help understand the number of logs this query needs. And probably this could be like, if we find the second part of information, expectations of QPS for this query, if we understand it's a very frequent query and we see a lot of objects are involved, probably it's time to optimize proactively, not waiting for this log manager wait event popping up in Pgset activity. But it's kind of quite like, I'm not sure I'm talking about something a lot of people understand. We should discuss it another time as well.
Starting point is 00:23:29 And it's not my favorite feature. Okay. Sounds good. Yes, I almost stole agenda here. Let me not to forget to mention very good work, which is published with every release every year it's from Nuri Yoshi Shinoda Hewlett Packard Enterprise Japan it's a big PDF yeah we have examples of new features for and for every release they publish it and this time is no exclusion we can see postgres 16 new features published already.
Starting point is 00:24:06 Let's link it as well. Oh, wow. Yeah, I haven't seen that. I'd love to see that. It's usually like beta one, beta two, around the time they publish it. And it's great to, like, it can be used like a reference, you know,
Starting point is 00:24:19 like if you're curious about how some feature is working, you have practical example and it's great. Yeah. Oh, look at that. In fact, you mentioned in Japan, it reminded me of a couple of other things I really liked about this little auto-explain feature, was that it was a cross-continent, cross-company collaboration. I know reviewers often have to be,
Starting point is 00:24:39 at least it's considered best practice that they're not from the same company as the person proposing the patch, but it's really cool to see people're not from the same company as the as the person proposing the patch but it's really cool to see people from japan working with people from europe from amazon from ntt data like just lots of different people from different walks of life and different companies collaborating on the same things really cool and the other thing i liked about it was the discussion i mean it's probably a reasonable question people might have wondered when i collaborating on the same thing is really cool. And the other thing I liked about it was the discussion. I mean, it's probably a reasonable question. People might have wondered when I
Starting point is 00:25:08 mentioned that this won't work in versions 14 and 15, why not backpatch it to 14 and 15? And I think that's a very reasonable question. And I'm not entirely sure it's a good idea not to, but I love the reason why, which was they didn't want to break any log parsing tools that people have built. So not even anything within Postgres, but very much something that is part of the ecosystem. Now, I think the most popular one is pgbadger, and I don't think that would break by adding them in. If they talk to the maintainers of pgbadger, they might even want this feature backpatched to versions that people are probably more likely to be using in the near term. But as an external tool provider myself, I love that the developers are considering our needs and what might help or hinder us. So really cool to see that conversation, to see the reasons, to see somebody question it, and people answer it.
Starting point is 00:26:06 Good. Anything else to add from your side? No, I think I'm already out of quota already, right? We wanted one feature I discussed, I talked about too many. We'll be back with the Postgres 16 episode, I'm sure. Okay, okay. Maybe we'll do an overview of a wider picture, and let's plan to discuss features it doesn't have.
Starting point is 00:26:33 Maybe that, or that, or yeah, maybe both. I'd like to discuss what the important features are. This is favorite. Whilst I claim this is my favorite feature, I would never claim this is the one I think is the most important so apologies in advance to all the great features that didn't get a mention yet well yeah yeah yeah there are many things
Starting point is 00:26:52 for example improvement of performance of copy command 3x faster it's great and so on yeah a lot of good stuff it's like a couple of hundred or something isn't it like it's so many yeah anyway thank you nicolai thank you everybody catch you next week it's time not to add features but to
Starting point is 00:27:13 remove some auto vacuum should be removed okay i'm joking uh thank you everyone it's not eight for the first nicolai yeah okay careful okay thank you everyone indeed see you next time take care bye

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.