Postgres FM - Postgres 16

Episode Date: September 22, 2023

Nikolay and Michael discuss the release of PostgreSQL 16 — the most important new features, what they mean for us as users, whether and when to upgrade, and more. Here are some links to so...me extra things they mentioned:Release notes https://www.postgresql.org/docs/current/release-16.htmlNew Features With Examples (PDF from Noriyoshi Shinoda of Hewlett Packard Enterprise Japan) https://h50146.www5.hpe.com/products/software/oe/linux/mainstream/support/lcc/pdf/PostgreSQL16Beta1_New_Features_en_20230528_1.pdf  Why Upgrade? (site by depesz) https://why-upgrade.depesz.com/Waiting for PostgreSQL 16 (blog post series from Depesz) https://www.depesz.com/tag/pg16/Our episode on favourite features https://postgres.fm/episodes/our-favourite-v16-feature  Our episode on logical replication https://postgres.fm/episodes/logical-replication Active Active in Postgres 16 (blog post from Crunchy Data) https://www.crunchydata.com/blog/active-active-postgres-16 AlloyDB adaptive autovacuum https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-under-the-hood-adaptive-autovacuum Visualizing Postgres I/O Performance (talk by Melanie Plageman at PGCon) https://www.youtube.com/watch?v=CxyPZHG5beI Our monitoring checklist episode https://postgres.fm/episodes/monitoring-checklist pgvector https://github.com/pgvector/pgvector ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, hello, this is PostgresFM episode number 64. This is Nikolai and Michael. Hi, Michael. Hello, Nikolai. And we are going to talk about Postgres 16 yet another time. The release happened last Thursday, which is good, a week ago. What do you think in general? I'm really happy and I think it's always great to see another major version.
Starting point is 00:00:24 The team's really impressive actually, shipping every year consistently, pretty much exactly every 12 months even. It's very impressive and lots of good things again. A couple of hundred features, we definitely can't list them all. I have seen and heard people's opinions that there weren't necessarily any huge features
Starting point is 00:00:43 necessarily in this one, but I personally really like that they are continuing to improve some of the things that Postgres is best at. Continuing to improve on the reliability, a few little features that help with that, continuing to improve on SQL standard compliance and on performance. Obviously my favorite topic but it's often a big reason for choosing Postgres. So I don't want to discount that these are important changes, even if they aren't necessarily brand spanking new shiny features. Yeah, right, right.
Starting point is 00:01:19 I'm in the camp who say no breakthrough features in this release, obviously. There are several very good ones. I wish I had already all of them on my production systems. But in general, yes, it's like a convenience release. A lot of hard work, of course. You mentioned like 200 changes. And by the way, it's normal number for last years. Every major release, roughly 200 something changes. But I would like to say that without breakthrough features,
Starting point is 00:01:55 it's also good because it means it's less like, it's more reliable. DBS should love it because you have a lot of improvements. And imagine if, for example, Threads IDE already was implemented in this release. I would wait probably until 16.10 or 20 before trying it on production. But with this, the system is becoming mature in terms of features it's already providing. They are improving, improving, like logical replication. It was added many, many years ago. Six, right? Or how many years ago? In Postgres 10, right? I think.
Starting point is 00:02:41 Yes, wow, six years ago now. Right. And it was not super complete in the beginning, of course. And then logical decoding originally, right? And now, last few years, a lot of very good improvements. Again, still several things in to-do. And I hope 17 will have more things like the application of DDL. But this release adds two important improvements for logical replication in decoding, and it becomes maybe more than two, right?
Starting point is 00:03:17 And it becomes, yeah, definitely more than two. And it becomes more mature. It means that upgrading should be less risky. And we just benefit in terms of performance and features and capabilities. So like replication from physical standbys. Super good, I think. So my point is, this is not breakthrough release. But it should be adapted with less effort.
Starting point is 00:03:47 And sooner, I think. I wish the upgrade process was simple. We discussed it many times. Of course, upgrade is always a task, but if you already use, for example, Postgres 14 and you go to 16 or 15 to 16. It's just a lot of good things to have. And risks are not super high as it would bring threats or something like changing source code, 50% of source code or something.
Starting point is 00:04:23 So, yeah. I love seeing it that way. I hadn't considered it through that lens. But yeah, great point. Yeah, DBAs are conservative guys, usually. They prefer some reliable software to run. So I expect 16 should be very reliable because it's just polishing things
Starting point is 00:04:41 and improving many, many small pieces. Yeah. Well, even developers are quite keen on their database just working in inverted commas. So I think there is a big selling point of Postgres that has been around for so long. It has been run by sensible people, smart, sensible people who are also risk averse.
Starting point is 00:05:07 So yeah, that's cool. And as you say, logical replication is continuing to improve. And two headline features, or two out of the seven or eight features that get called out at the top of the release notes are logical replication related. Would you like to start with those? Yeah, let's discuss it. First of all, performance improvements for long transactions, right? So it can be processed in parallel mode on subscriber, which is like improvement,
Starting point is 00:05:35 which is transparent for you. If you use logical replication and you have longer transactions, you just benefit from it. That's it. And this is good, because usually if we have a bottleneck on the subscriber side, as we discussed in the last episode, which was about logical application. By the way, it's natural for us to start with logical, because we are still in that context. So if you have bottlenecks on the subscriber side, the usual trick is to use multiple slots and multiple subscriptions, publication subscriptions. But it has downsides, as we discussed.
Starting point is 00:06:13 Foreign keys violation is normal for this case. Or you need to say, I want foreign keys to be maintained all the time, but then you will have lags. So it's quite tricky. And if you use a single slot, single subscription, obviously you don't have a lot of choice how to speed it up. But now for long transactions, it's already improved automatically, transparently for you. It's a good thing.
Starting point is 00:06:40 Another thing is that you can logically replicate from physical standbys, which is a very, very good thing to have, because we usually say it's a risk to have a higher load on the primary. Of course, if it's just one wall sender and you have many CPUs, it won't saturate your CPU or disk or something. But if there is another risk to be out of disk space on the primary, but it can be mitigated using relatively new setting. I don't remember the name, but you can say I don't want to reach, to exceed some specific
Starting point is 00:07:20 threshold for my slots. So, yeah, but still, I just don't want to mess up with primary in any way and i can now move to this to secondaries it's very good for i mean logically replicate or for two other systems like such as like snowflake or click house or something for analytical needs exactly that's the use case i have heard for it somebody asked me if this was possible a couple of years ago and i had to say unfortunately not because they they wanted to replicate to i think it was actually redshift just for analytical purposes and they didn't want to put that extra load on their primary but they had a replica doing
Starting point is 00:08:00 nearly nothing you know serving some read queries but it was a little bit out, you know, they, if it's asynchronous, I think they had it set up asynchronously. They didn't mind if their analytics were, you know, however much their lag was out of date, it didn't matter that much. So that feels like a really good use case for this. It's also related to how logical replication survives switchover or failover, but it's not that. I mean, I saw this work is still in progress, and we discussed Patroni implements some workaround to survive failovers, but it's about primary. So you have logical slots on primary,
Starting point is 00:08:39 how not to lose them if failover happens. This is the question. I think here we also have a question if this secondary is down we don't want to start from scratch right oh two more features about logical i i now remember one is related to the ability to filter by origin right it's like multi-master is native now. Yeah, so this, well, be careful. I saw a very excited blog post about this saying whatever multi-primary is now possible and showing you how you can set up writes from two different nodes
Starting point is 00:09:17 synchronizing to each other because logical replication, you can now filter any that came in, I'm forgetting the wording that they used for it, but any that came in on that node can get replicated. Yes, origin, perfect. But any that came in via replication, you don't send it out again. So if you were told it by a different node, but there's a ton of issues with that, aren't there? Without a switch feature, you have infinite loop risk. Right? So like, you replicated something, it was replayed, written to a wall, and it got replicated back and you have infinite loop. And so this feature allows you to break this loop and replicate only your local changes, not those which
Starting point is 00:10:01 were already replayed by you from different from different origin right and do you think i was wrong like saying it's not a big release because imagine this could be named as like multi-master support is that is here why not i don't think it's a good idea and i know you're joking or at least teasing me but i think it's dangerous i'm partially joking yeah why it's dangerous because i think people would actually use it for that and then they would end up having loads of issues around data synchronization or you know basically i think there was a good blog post by crunchy data who just showed the most basic version of updating a row on one of them leaving the transaction open,
Starting point is 00:10:45 just to make it really easy to demonstrate, updating the same row, but with a different update on the second primary, then letting them both go through, and you end up with, in fact, ironically, I guess, with each one getting the other ones change, but not their own. So it's, yeah, I think there's
Starting point is 00:11:05 problems that people wouldn't anticipate. Can we say now that Postgres natively supports multimaster topologies? With an asterisk? That would be my argument. Right, okay. Well, I think multimmaster is intentional split-brain, right? Okay, there you go. And still we have question how fast is to replay logically, like if, for example, some update updated thousands of tuples, how cheaper it is in terms of io and operation cpu as well on the subscriber side compared to the publisher side it's actually not trivial but also not simple not trivial but also
Starting point is 00:11:56 not super difficult benchmark to do i wish i i checked it already. It's an interesting question. Because I remember BDR claimed that replaying is cheaper. We discussed it, yes. And if it's indeed in some cases cheaper, then in some cases it's good to have such topologies and multi-master approach, bidirectional logical replication. And if a customer asked you if they should set up a multi-primary setup with Postgres 16 tomorrow, what would you say? Well, right now I will stick with my regular answer, you don't want a multi-master
Starting point is 00:12:38 and then we dive into details and explain why. But still, we can say already postgres supports such topologies okay okay let's let's let's change the topic also logical there is interesting feature we also discussed it but like related to what we discussed i explained in last episode i explained how to convert physical replica to logical replica. And interestingly, in Postgres 16, you can specify that you want to initialize your... In your logical replica, you want to initialize table using binary format. I missed that. I never tried it, but as I understand it, it's a similar thing.
Starting point is 00:13:22 I mean, bloat will be preserved. And it's a similar thing. I mean, bloat will be preserved. And it's faster. I mean, indexes just copied as files. You don't need to wait your subscriber node to rebuild them. So this is what my understanding of this feature. Again, I haven't tried it. I tried different recipe involving recovery target LSN,
Starting point is 00:13:44 which we discussed last time. But it's interesting. I think I will use it in many cases because it's now natively supported and I can just provision my logical replica preserving all the bloat. But downside, you don't lose the bloat. It's kind of you're losing this benefit. But good thing is that it's faster than dump restore approach, which is default. And natively supported, right? Like it just feels less. There's less moving parts on your side. Less things.
Starting point is 00:14:23 Just SQL. Just one SQL command and you have it. It's good. So maybe enough logical today. There's a good segue. Yeah, there's a good segue. You mentioned the parallelization of more join types full joins and right outer joins i think right i'm curious why right not left i don't see many
Starting point is 00:14:54 right joins in the wild but plenty of like full joins or do you see more i i write them sometimes. Well. Oh, really? Why not? It depends on your point of view, you know. It's relative, right? Some languages have right-to-left writing also. Some people write with left hand, some people write with right hand. It depends.
Starting point is 00:15:21 And sometimes you start, if you have quite complex query, you might want to add one more table in the list of your source tables and it might happen, right? John is more convenient for you. Of course, I agree, left is more popular because of the way how we write queries. I write queries, why not? I write all the types of queries. Yeah, let's maybe move away from parallelization. Let's talk about performance.
Starting point is 00:15:56 What else about performance? There's one more that mentions it in the headline features, although I think it's possibly more on the kind of reliability maintenance front, but improved performance of vacuum freezing. It's from Peter Gagan about much less bytes written to wall right? Like five x improvement. Yeah. And it's a part of different work. It's part of reconsideration of how to perform freezing more proactively, I think. Yeah, exactly.
Starting point is 00:16:33 And to avoid, well, so this is why I think it's more reliability related, because it's to avoid those heavy anti-wrap around vacuums that we've talked about several times, tripping you up on various things so i think it's more around the kind of making making vacuum more predictable and not as likely to you know hit us when we least expect it so i think that's really cool it's a performance improvement in a way but the benefits are further reaching i I think. And then the other big performance when I did want to call out was... Sorry, I just...
Starting point is 00:17:08 I smiled a little bit because I haven't to read list the article this morning from AlloyDB, Google Cloud. Oh, yeah? About how they implemented adaptive auto-vacuum, which is interesting also to read about this. I mean, AI performing vacuum.
Starting point is 00:17:31 Many people dream about getting rid of vacuum completely, right? But let's AI solve this problem. I don't know. I need to read about it. I really like the approach Peter's taking and see and presumably others as well but I think he actually explained this quite well around the indexing changes he was making but he seems to be doing the same around vacuum which is trying to improve the efficiency of how it works and when it works So sometimes doing a little bit more work upfront to save lots of work in the future that that to me seems so
Starting point is 00:18:10 Smart and so like it's makes so much sense for that to be in the database like So and it's not like artificial intelligence. It's just using deterministic approach to the oh if If you can do a little bit extra work now to save more expensive work in the future, maybe that's worth the overhead. Right. In Postgres 13 and 14, the optimization
Starting point is 00:18:34 related to B3. And the word deduplication, as I understand, appears here again. This optimization for wall rights from freezing, it's also an idea of deduplication. Am I right? I've seen that word as well.
Starting point is 00:18:49 I wish I understood it. I think maybe we have to have a Pete on at some point to ask him the details. But yeah, I tried reading up on some of this stuff and it started hurting my head. Well, many very small local optimizations happened as well. I like the idea to improve set config performance. It's interesting.
Starting point is 00:19:13 If you use a lot of changes of GUC, Postgres parameters, maybe you have your own Postgres parameters, including dot. You can say blah, blah, dot, blah, blah, and assign some string to it. So this became much faster, maybe like 10x or more. Yeah, I saw your note about that.
Starting point is 00:19:38 Yeah, yeah. And also... Huh? Who doesn't think there are enough Postgres parameters already and creates their own extra ones? That's some brave people. I do it all the time. What do you create?
Starting point is 00:19:51 I mean, first of all, extensions do it. I don't get your joke, so I'm asking. Oh, yeah, yeah, yeah. Extensions make sense. For example, AutoExplain, you should know about it. Also, sometimes I put a lot of settings there. Sometimes I prefer putting application settings in a table, but sometimes I put them as G-U-C, GOOC.
Starting point is 00:20:13 I say like up dot something equals something. For example, if you use Postgres, it's natural for you to put something as a GOOG parameter. It makes some sense, and it's good to have several choices, and you just choose what's more convenient and secure and efficient and so on. Another very small but interesting optimization, wall receiver is more smart in terms of waking up. If you have a server which is not doing anything, wall receiver will be doing much less work. And also promo trigger file was removed to avoid some checks.
Starting point is 00:20:58 So it looks, you know, it looks like I've read one article, I don't remember in which, it looks like this idea to optimize these two parts looks like an attempt to save energy. If you're not using some Postgres server, it should do less movements, right? Not to... Like, it's eco-friendly changes. It's funny.
Starting point is 00:21:24 Maybe we shouldn't be doing checkpoints every five minutes. Yeah, and also create wall. There's also timeout when new wall is created filled with zeros, right? So maybe also... It's interesting. I remember discussion about checkpoint timeout. Oh, log checkpoint. It became default a couple of versions and years ago, right? And I remember discussion like it's bad
Starting point is 00:21:52 because it will produce records to log. And if nobody is using server, we are filling the log. Why? But it was made because benefits from having this log checkpoint on by default are much better because we need that information. Speaking of which, in 16, as I remember, now we can see in checkpoint log messages, we see LSN positions of checkpoint itself and reduce starting point, which is sometimes good to troubleshoot. So there is a lot of small convenience things. I think it's a convenience release,
Starting point is 00:22:31 a reliable convenience release. This is my impression, which is not a bad thing at all. What else should we discuss? Well, a brand new thing is the PGStack.io view that we've talked about before. Yeah. The author is Melanie Plageman. Yeah.
Starting point is 00:22:53 I have no idea if you're pronouncing it right, but yes. Right. I was at PGCon and was at her talk. So it was a good talk, and congratulations. This is a very noticeable contribution and people need it. I personally am going to use it all the time. And I hope observability tools developers will integrate this to their dashboards and so on. For example, NetData and so on.
Starting point is 00:23:21 This is a good thing to have. And it provides our favorite buffers, but globally for database, right? Which it doesn't provide. And timings. Yeah, timings and many details. It's good. You understand who is doing work.
Starting point is 00:23:39 I.O. work is the most expensive in databases. And in most cases, slow means a lot of work needs to be done, right? A lot of I.O. work in databases. And in most cases, slow means a lot of work needs to be done, right? A lot of I.O. work in databases because a database is an application which is I.O. intensive application, right? So I.O. bound, we usually I.O. bound.
Starting point is 00:23:58 And the main thing to understand about this new system view is that it's at Postgres level for a buffer pool, it's not about disk. It's about buffer pool. So abstraction is high. It means that if you see a lot of IO numbers, maybe they are, if it says reads, maybe it's reads from buffer pool for buffer pool from page cache and under buffer pool, we have
Starting point is 00:24:24 page cache. so maybe it's still all about memory we don't know at this level so reads is not discrete it's reads from the page cache to buffer pool or it could be either but we don't know right yeah at this level we don't know so we need to consult for example example, Proc.io using our process IDs or some different tool. But it's already very good. At this level, it's super important to have and useful. And also, it doesn't track some I.O. It tracks I.O. only related to the buffer pool.
Starting point is 00:24:58 It doesn't track I.O., for example, wall writes or reads. Exactly. But it does for auto-vacuum. You mentioned you're going to use this all the time. What kind of thing do you imagine yourself using it for? So if we have a problem in any analysis, we need to divide two segments. It's about budgeting. It's about performance optimization, anything. You need to perform some segmentation to understand where to dive in further. And this is a perfect starting point. So I understand, okay, database is slow, some developers say. They like to say this. Database is slow. We look at this, better to draw this in terms of like a
Starting point is 00:25:39 graph, colorful graph with segments. Okay, this IO is from backends, this IO is from checkpoint, or this IO is from auto vacuum workers, and so on. And I quickly understand where the most work is done. Is it about backends or checkpoint or where? And then I can go to pgstat IO tables, indexes, or to pgstat statements. I can go down to pgat kcache if I have it. This will be physical level for queries. Or I can analyze wait events also additionally. This is, I mean, this is perfect starting point for performance troubleshooting. Makes sense. And also, I guess, if you could rule out that it's an IO problem, I know it probably will be, but if there's nothing lighting up here in your monitoring of this view,
Starting point is 00:26:27 you can also rule out that it's, either it's IO related or that it's, maybe it's wall related. Yeah, we have, we discussed monitoring and troubleshooting and we have some kind of runbook and also a list of what monitoring should have and troubleshooting runbook number one means like you have 30 seconds,
Starting point is 00:26:47 you need to understand which directions to dig. And I'm almost sure we will include PGS at IEO in versions for Postgres 16 and newer. So as a starting point for this quick and shallow analysis, as wide as you can and very shallow. This should be a part of methodology, I guess, for troubleshooting of performance. Yeah. And in terms of tooling, it was the PG Analyze team. Lucas and one of the developers were involved in reviews of this, so I'm pretty sure they'll be adding it to PG Analyzer. I suspect it's already there. Yeah, very likely.
Starting point is 00:27:29 Okay. Next thing I had, I think it's important, not because I've had many people talking about it for 16, but because people talked about it so much being reverted from 15, was the SQL JSON stuff, so the SQL standard constructors and identity functions these were reverted from 15 because they weren't quite ready and it was a big deal lots of people saying they were disappointed and now that they've been added or at least most of them I'm so I was a bit surprised that they didn't make a lot of the various things I've seen on PostgreSQL 16 already. So yeah, I think it's cool. And I think standard
Starting point is 00:28:11 compliance is important. I hear about it quite often when people say why they're moving to PostgreSQL or why they're picking PostgreSQL in the first place. So I know this work can sometimes be thankless. So I just wanted to say I appreciate all that hard work. Well, yeah, Postgres SQL for many cases is standard de facto already. I mean, it's syntax. And we know a couple of Postgres folks
Starting point is 00:28:36 are members of SQL standard committee, which is good. And of course, supporting standard is always good. I work with JSON in SQL context all the time. I wish the syntax was less bloated and there are ideas how to change it, but it would require change of grammar,
Starting point is 00:28:57 heavy change of grammar. But still, I agree with you. Standard support is a good thing to have. On this note, syntax, I think breakthrough change is now we don't need to specify alias for subqueries in from. It hits everyone.
Starting point is 00:29:14 Everyone who writes SQL at some point, you see this error for sure. Very helpful error. It does tell you exactly what you're doing wrong, but it doesn't... Yeah, it says just add as alias. Yeah. And I think it's not standard, this
Starting point is 00:29:30 change. I suspect the standard requires you to need one but I know other databases... I think this has come up quite a lot in migrations, like migrations from other databases that don't force you to do this. And it must be
Starting point is 00:29:45 quite annoying. This is exactly my point. Here, we probably deviate from standard, but for the sake of easier migrations, it's a good thing to have. But I personally, I'm going to stop writing those aliases. Closing parentheses and space underscore was always already i have a habit already writing this but now i can drop this well not now in couple of years when production systems will be on 16 i will i'm going to drop this habit yeah which is good like more convenient yeah slightly back to performance um i'm one of guys who like to use some kind of newer, it's not new already, but newer SQL syntax. For example, you can order by inside aggregate function, which is interesting.
Starting point is 00:30:38 Sometimes, for example, for DBA, for example, you select from Pages Activity, and you group somehow, for example, by state, right? Idle, active, idle, and transaction. And you want a couple of examples, a few examples of queries. You cut them using left function, for example, taking only like 10 first characters. And then you aggregate using string agg or array agg and and then you think oh okay i don't want arbitrary examples i want like the the longest lasting queries for
Starting point is 00:31:16 example and you say inside a string agg or array agg you write right inside there you write right inside there, you write order by query start or exact start, and limit will be applied using array indexes, different. But you order by inside aggregation, you can specify order. So when I used it, I always understood that it's not going to benefit from any indexes, it will be in memory, so it's not going to benefit from any indexes, it will be in memory, so it's quite not scalable approach. I mean, performance is not good. If I have billion rows, I probably won't do it. But for business activity, it's good. So now the
Starting point is 00:31:57 planner can use indexes to support this order by inside aggregate functions, which is good. And also distinct inside. You can write distinct inside it also. So string edgy distinct. This support was always there, but it was not benefiting from indexes. Now it can benefit from indexes. Yeah, these Planner improvements get quite in detail, but all of these things add up, right? It's very cool that it can do that. Yeah. What else? And 150 more changes, right? Yeah, right.
Starting point is 00:32:36 Which we don't have time to cover. Yeah. What about just generally, you mentioned like updating production in a couple of years um i see people i see some people upgrading every year or at least you know most years but they don't seem to be in the majority every couple of years does seem to be really popular i know there is overhead to do a major version upgrades i think that's the reason is there like what is there anything else you wanted to add there? About upgrades? That's a huge topic. No, no, no. About
Starting point is 00:33:10 should people upgrade? How often should they upgrade? Is it okay to still have a Postgres 12 instance lying around and just keeping it up to date? So usually you should upgrade minor upgrades the newest in most cases.
Starting point is 00:33:26 But again, like enterprise approach should involve proper testing in lower environments. But for major versions, it depends on each case. But in general, old school accommodation is wait until, for example, 16.2. A couple of three, it will be already third minor version 0 1 2 right and then upgrade if you have some kind of critical system but here again my point
Starting point is 00:33:56 is that for 16 probably we have more stable release because not half of source code was rewritten how Postgres is built was changed drastically. Moving to Mason, new system to build. Oh, yeah. The same system for all operations.
Starting point is 00:34:14 The same approach now is used for Linux, FreeBSD, MacOS, Windows. So it's a newer approach. And a lot of legacy was removed. But it's behind the curtain for users, right? You don't see this because you just take the binaries. It's good for the Postgres developers and for people testing patches. Yeah, so more unified and more modern approach
Starting point is 00:34:43 to build Postgres, right? Okay. Actually, one thing I really did want to say quickly, and I'm surprised we haven't had more requests for this, is that I think probably the biggest thing that's happened to Postgres in the past year is outside of Postgres itself, which is, I think, the PG vector extension. I suspect we're going to do a whole episode on it at some point.
Starting point is 00:35:03 Yes, we should. But it's cool because of the extensibility that these massive changes can happen outside of the major version release cycle. I have a different opinion here. So I think PgVector should be part of Postgres as soon as possible, but obviously it would mean that releases would happen only once per year, which is currently for pgVector is quite slow because the pace of development is super high
Starting point is 00:35:32 right now. But if it's still like, if it's third-party extension continues to be so, like, it's also losing some attention if for example json was in core right in right away yeah and this was good it's kind of why i wanted to bring it up because it could easily have been something that that was decided to be put into postgres but i imagine it would have taken an extra couple of years and And maybe it still will. Maybe that is what will happen. But it's really cool that it can be out there in the wild, getting quick releases that aren't tied to the Postgres major version cycle. And if it is in all the major cloud providers already,
Starting point is 00:36:17 it kind of is in Postgres. It is, in a way, because anybody that self-hosts can install it, and anybody on cloud platforms can. I don't agree here, because if you check, for example, versions are maybe very old, for example. They upgrade slowly. Ah, good point. Also, some providers still don't support it.
Starting point is 00:36:40 And also, if it goes to core, there are two paths. It can be as contrib module, like officially supported extension, which is already good. Or it can go directly to core. And, you know, I imagine a lot of discussions when people think what to choose, Postgres or specific vector database to store vectors. Or, I don't know, Elastic or others. They all support vectors, either by third-party extensions as well or in core. I think support in core can be argument for choices. This is like argument.
Starting point is 00:37:13 It's very high-level argument, not very diving to details, but some people might think, oh, if it's in core, it's better supported. And you cannot say, like you can try to convince, no, no no no some every every like rds adds support cloud sql adds support by the way it added right cloud sql there was a question recently it did but i think not not the latest version not 0.5.0 which is important an important release so you make a good point on they might support it,
Starting point is 00:37:45 but is it up to date? Without HNSW indexes, right? Yeah. Which everyone wants right now, almost everyone. Yeah. So, by the way, about PG Vector, someone did a very, very good point in my Twitter when I was discussing it,
Starting point is 00:37:59 saying that, you know, with HNSW indexes, it breaks the rule. Indexes should not change what is returned by query but this approximate neighbor search they will produce different results like without index you have one result with index you have different result
Starting point is 00:38:16 and this probably like the thing Postgres never had I mean never had in core extensions can yeah I've heard that that is a reason for not including it never had. I mean, never had in core. Extensions can... Yeah, I have heard that that is a reason for not including it. But yeah, we'll see, I guess. I think it should be in core because Postgres is good at JSON, it should be good at vectors, obviously. But yeah, I understand some obstacles.
Starting point is 00:38:43 Anyway, sorry for the detour, but I thought that was worth adding. It's a good topic, I think, for future. It's very interesting and very attractive for many. Very attractive topic. Okay, thank you for interesting discussion as usual. And let's remind our listeners that we need feedback, either in terms of subscriptions, likes, or maybe comments and suggestions what to talk about. And we still continue working on subtitles on new episodes have them, we're
Starting point is 00:39:20 improving, so feel free to use them and just also give us feedback please on twitter or on youtube or anywhere else thank you thank you bye

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.