Postgres FM - Postgres 16
Episode Date: September 22, 2023Nikolay and Michael discuss the release of PostgreSQL 16 — the most important new features, what they mean for us as users, whether and when to upgrade, and more. Here are some links to so...me extra things they mentioned:Release notes https://www.postgresql.org/docs/current/release-16.htmlNew Features With Examples (PDF from Noriyoshi Shinoda of Hewlett Packard Enterprise Japan) https://h50146.www5.hpe.com/products/software/oe/linux/mainstream/support/lcc/pdf/PostgreSQL16Beta1_New_Features_en_20230528_1.pdf  Why Upgrade? (site by depesz) https://why-upgrade.depesz.com/Waiting for PostgreSQL 16 (blog post series from Depesz) https://www.depesz.com/tag/pg16/Our episode on favourite features https://postgres.fm/episodes/our-favourite-v16-feature  Our episode on logical replication https://postgres.fm/episodes/logical-replication Active Active in Postgres 16 (blog post from Crunchy Data) https://www.crunchydata.com/blog/active-active-postgres-16 AlloyDB adaptive autovacuum https://cloud.google.com/blog/products/databases/alloydb-for-postgresql-under-the-hood-adaptive-autovacuum Visualizing Postgres I/O Performance (talk by Melanie Plageman at PGCon) https://www.youtube.com/watch?v=CxyPZHG5beI Our monitoring checklist episode https://postgres.fm/episodes/monitoring-checklist pgvector https://github.com/pgvector/pgvector ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artworkÂ
Transcript
Discussion (0)
Hello, hello, this is PostgresFM episode number 64.
This is Nikolai and Michael.
Hi, Michael.
Hello, Nikolai.
And we are going to talk about Postgres 16 yet another time.
The release happened last Thursday, which is good, a week ago.
What do you think in general?
I'm really happy and I think it's always great to see another major version.
The team's really impressive actually,
shipping every year consistently,
pretty much exactly every 12 months even.
It's very impressive and lots of good things again.
A couple of hundred features,
we definitely can't list them all.
I have seen and heard people's opinions
that there weren't necessarily any huge features
necessarily in this one,
but I personally really like that they are continuing to improve some of the things that Postgres is best at. Continuing to improve on the
reliability, a few little features that help with that, continuing to improve on
SQL standard compliance and on performance. Obviously my favorite topic
but it's often a big reason for choosing Postgres.
So I don't want to discount that these are important changes, even if they aren't necessarily
brand spanking new shiny features.
Yeah, right, right.
I'm in the camp who say no breakthrough features in this release, obviously.
There are several very good ones.
I wish I had already all of them on my production systems.
But in general, yes, it's like a convenience release.
A lot of hard work, of course.
You mentioned like 200 changes.
And by the way, it's normal number for last years. Every major release, roughly 200 something changes.
But I would like to say that without breakthrough features,
it's also good because it means it's less like, it's more reliable.
DBS should love it because you have a lot of improvements.
And imagine if, for example, Threads IDE already was implemented in this release.
I would wait probably until 16.10 or 20 before trying it on production.
But with this, the system is becoming mature in terms of features it's already providing.
They are improving, improving, like logical replication.
It was added many, many years ago. Six, right? Or how many years ago?
In Postgres 10, right? I think.
Yes, wow, six years ago now. Right. And it was not super complete in the beginning, of course.
And then logical decoding originally, right?
And now, last few years, a lot of very good improvements.
Again, still several things in to-do.
And I hope 17 will have more things like the application of DDL.
But this release adds two important improvements
for logical replication in decoding,
and it becomes maybe more than two, right?
And it becomes, yeah, definitely more than two.
And it becomes more mature.
It means that upgrading should be less risky.
And we just benefit in terms of performance and features and capabilities.
So like replication from physical standbys.
Super good, I think.
So my point is, this is not breakthrough release.
But it should be adapted with less effort.
And sooner, I think.
I wish the upgrade process was simple.
We discussed it many times.
Of course, upgrade is always a task, but if you already use, for example, Postgres 14 and you go to 16 or 15 to 16. It's just a lot of good things to have.
And risks are not super high
as it would bring threats
or something like changing source code,
50% of source code or something.
So, yeah.
I love seeing it that way.
I hadn't considered it through that lens.
But yeah, great point.
Yeah, DBAs are conservative guys, usually.
They prefer some reliable software to run.
So I expect 16 should be very reliable
because it's just polishing things
and improving many, many small pieces.
Yeah.
Well, even developers are quite keen on their database
just working in inverted commas.
So I think there is a big selling point of Postgres
that has been around for so long.
It has been run by sensible people, smart, sensible people
who are also risk averse.
So yeah, that's cool.
And as you say, logical replication is continuing to improve.
And two headline features, or two out of the seven or eight features that get called out
at the top of the release notes are logical replication related.
Would you like to start with those?
Yeah, let's discuss it.
First of all, performance improvements for long transactions, right? So
it can be processed in parallel mode on subscriber, which is like improvement,
which is transparent for you. If you use logical replication and you have longer transactions,
you just benefit from it. That's it. And this is good, because usually if we have a bottleneck on the subscriber side,
as we discussed in the last episode, which was about logical application.
By the way, it's natural for us to start with logical,
because we are still in that context.
So if you have bottlenecks on the subscriber side,
the usual trick is to use multiple slots and multiple subscriptions, publication subscriptions.
But it has downsides, as we discussed.
Foreign keys violation is normal for this case.
Or you need to say, I want foreign keys to be maintained all the time, but then you will have lags.
So it's quite tricky.
And if you use a single slot, single subscription,
obviously you don't have a lot of choice how to speed it up.
But now for long transactions, it's already improved automatically,
transparently for you.
It's a good thing.
Another thing is that you can logically replicate from physical standbys,
which is a very, very good thing to have,
because we usually say it's a risk to have a higher load on the primary.
Of course, if it's just one wall sender and you have many CPUs,
it won't saturate your CPU or disk or something.
But if there is another risk to be out of disk space on
the primary, but it can be mitigated using relatively new setting.
I don't remember the name, but you can say I don't want to reach, to exceed some specific
threshold for my slots.
So, yeah, but still, I just don't want to mess up with
primary in any way and i can now move to this to secondaries it's very good for i mean logically
replicate or for two other systems like such as like snowflake or click house or something
for analytical needs exactly that's the use case i have heard for it somebody
asked me if this was possible a couple of years ago and i had to say unfortunately not because
they they wanted to replicate to i think it was actually redshift just for analytical purposes
and they didn't want to put that extra load on their primary but they had a replica doing
nearly nothing you know serving some read queries but it was a little bit out, you know, they, if it's asynchronous, I think they had it set up asynchronously.
They didn't mind if their analytics were, you know, however much their lag was out of
date, it didn't matter that much. So that feels like a really good use case for this.
It's also related to how logical replication survives switchover or failover, but it's not that.
I mean, I saw this work is still in progress,
and we discussed Patroni implements some workaround
to survive failovers, but it's about primary.
So you have logical slots on primary,
how not to lose them if failover happens.
This is the question.
I think here we also have a question if this
secondary is down we don't want to start from scratch right oh two more features about logical
i i now remember one is related to the ability to filter by origin right it's like multi-master is native now. Yeah, so this, well, be careful.
I saw a very excited blog post about this saying
whatever multi-primary is now possible
and showing you how you can set up writes from two different nodes
synchronizing to each other because logical replication,
you can now filter any that came in,
I'm forgetting the wording that they used for it,
but any that came in on that node can get replicated. Yes, origin, perfect. But any that
came in via replication, you don't send it out again. So if you were told it by a different node,
but there's a ton of issues with that, aren't there? Without a switch feature, you have infinite loop risk. Right? So like, you replicated something,
it was replayed, written to a wall, and it got replicated back and you have infinite loop. And
so this feature allows you to break this loop and replicate only your local changes, not those which
were already replayed by you from different from different origin right and do you think
i was wrong like saying it's not a big release because imagine this could be named as like
multi-master support is that is here why not i don't think it's a good idea and i know you're
joking or at least teasing me but i think it's dangerous i'm partially joking yeah why it's
dangerous because i think people would actually use it for that and then they would end up having
loads of issues around data synchronization or you know basically i think there was a good blog
post by crunchy data who just showed the most basic version of updating a row on one of them
leaving the transaction open,
just to make it really easy to demonstrate,
updating the same row,
but with a different update on the second primary,
then letting them both go through,
and you end up with, in fact, ironically, I guess,
with each one getting the other ones change,
but not their own.
So it's, yeah, I think there's
problems that people wouldn't anticipate.
Can we say now that Postgres natively supports multimaster topologies?
With an asterisk? That would be my argument.
Right, okay. Well, I think multimmaster is intentional split-brain, right?
Okay, there you go.
And still we have question how fast is to replay logically, like if, for example, some
update updated thousands of tuples, how cheaper it is in terms of io and operation cpu as well on the subscriber side
compared to the publisher side it's actually not trivial but also not simple not trivial but also
not super difficult benchmark to do i wish i i checked it already. It's an interesting question. Because I remember BDR claimed that replaying is cheaper.
We discussed it, yes.
And if it's indeed in some cases cheaper,
then in some cases it's good to have such topologies
and multi-master approach, bidirectional logical replication.
And if a customer asked you if they should set up a multi-primary setup with Postgres
16 tomorrow, what would you say?
Well, right now I will stick with my regular answer, you don't want a multi-master
and then we dive into details and explain why.
But still, we can say already postgres supports such topologies
okay okay let's let's let's change the topic also logical there is interesting feature
we also discussed it but like related to what we discussed i explained in last episode i explained
how to convert physical replica to logical replica.
And interestingly, in Postgres 16, you can specify that you want to initialize your... In your logical replica, you want to initialize table using binary format.
I missed that.
I never tried it, but as I understand it, it's a similar thing.
I mean, bloat will be preserved.
And it's a similar thing. I mean, bloat will be preserved. And it's faster.
I mean, indexes just copied as files.
You don't need to wait your subscriber node
to rebuild them.
So this is what my understanding of this feature.
Again, I haven't tried it.
I tried different recipe involving recovery target LSN,
which we discussed last time.
But it's interesting. I think I will use it in many cases because it's now natively supported
and I can just provision my logical replica preserving all the bloat. But downside, you don't lose the bloat. It's kind of you're losing this benefit.
But good thing is that it's faster than dump restore approach, which is default.
And natively supported, right?
Like it just feels less.
There's less moving parts on your side.
Less things.
Just SQL.
Just one SQL command and you have it.
It's good.
So maybe enough logical today.
There's a good segue.
Yeah, there's a good segue.
You mentioned the parallelization of more join types full
joins and right outer joins i think right i'm curious why right not left i don't see many
right joins in the wild but plenty of like full joins or do you see more i i write them sometimes. Well.
Oh, really? Why not?
It depends on your point of view, you know.
It's relative, right?
Some languages have right-to-left writing also.
Some people
write with left hand, some people write
with right hand. It depends.
And sometimes you start, if you have
quite complex query, you
might want to add one more table in the list of your source tables and it might happen,
right? John is more convenient for you. Of course, I agree, left is more popular because of the way how we write queries.
I write queries, why not?
I write all the types of queries.
Yeah, let's maybe move away from parallelization.
Let's talk about performance.
What else about performance?
There's one more that mentions it in the headline features,
although I think it's possibly more on the kind of reliability maintenance front, but improved performance
of vacuum freezing.
It's from Peter Gagan about much less bytes written to wall
right? Like five x improvement. Yeah. And it's a part of
different work. It's part of reconsideration of how to perform freezing more proactively, I think.
Yeah, exactly.
And to avoid, well, so this is why I think it's more reliability related,
because it's to avoid those heavy anti-wrap around vacuums
that we've talked about several times, tripping you up on various things so
i think it's more around the kind of making making vacuum more predictable and not as likely to you
know hit us when we least expect it so i think that's really cool it's a performance improvement
in a way but the benefits are further reaching i I think. And then the other big performance
when I did want to call out was...
Sorry, I just...
I smiled a little bit
because I haven't to read list
the article this morning
from AlloyDB, Google Cloud.
Oh, yeah?
About how they implemented adaptive auto-vacuum,
which is interesting also to read about this.
I mean, AI performing vacuum.
Many people dream about getting rid of vacuum completely, right?
But let's AI solve this problem.
I don't know.
I need to read about it.
I really like the approach Peter's
taking and see and presumably others as well but I think he actually explained this quite well
around the indexing changes he was making but he seems to be doing the same around vacuum
which is trying to improve the efficiency of how it works and when it works So sometimes doing a little bit more work upfront to save lots of work in the future that that to me seems so
Smart and so like it's makes so much sense for that to be in the database like
So and it's not like artificial intelligence. It's just using
deterministic approach to the oh if
If you can do a little bit extra work now to save
more expensive work in the future,
maybe that's worth the overhead.
Right. In Postgres 13
and 14, the optimization
related to B3.
And the word deduplication, as I understand,
appears here again.
This optimization for wall rights
from freezing,
it's also an idea of deduplication.
Am I right?
I've seen that word as well.
I wish I understood it.
I think maybe we have to have a Pete on at some point
to ask him the details.
But yeah, I tried reading up on some of this stuff
and it started hurting my head.
Well, many very small local optimizations happened as well.
I like the idea to improve set config performance.
It's interesting.
If you use a lot of changes of GUC,
Postgres parameters,
maybe you have your own Postgres parameters,
including dot.
You can say blah, blah, dot, blah, blah,
and assign some string to it.
So this became much faster, maybe like 10x or more.
Yeah, I saw your note about that.
Yeah, yeah.
And also...
Huh?
Who doesn't think there are enough Postgres parameters already
and creates their own extra ones?
That's some brave people.
I do it all the time.
What do you create?
I mean, first of all, extensions do it.
I don't get your joke, so I'm asking.
Oh, yeah, yeah, yeah.
Extensions make sense.
For example, AutoExplain, you should know about it.
Also, sometimes I put a lot of settings there.
Sometimes I prefer putting application settings in a table,
but sometimes I put them as G-U-C, GOOC.
I say like up dot something equals something.
For example, if you use Postgres,
it's natural for you to put something as a GOOG parameter.
It makes some sense, and it's good to have several choices, and you just choose what's
more convenient and secure and efficient and so on.
Another very small but interesting optimization, wall receiver is more smart in terms of waking up. If you have a server which is not doing anything,
wall receiver will be doing much less work.
And also promo trigger file was removed to avoid some checks.
So it looks, you know, it looks like I've read one article,
I don't remember in which, it looks like this idea to optimize these two parts
looks like an attempt to save energy.
If you're not using some Postgres server,
it should do less movements, right?
Not to...
Like, it's eco-friendly changes.
It's funny.
Maybe we shouldn't be doing checkpoints every five minutes.
Yeah, and also create wall. There's also
timeout when new wall is created filled with zeros, right?
So maybe also... It's interesting.
I remember discussion about checkpoint timeout.
Oh, log checkpoint.
It became default a couple of versions and years ago, right?
And I remember discussion like it's bad
because it will produce records to log.
And if nobody is using server, we are filling the log.
Why?
But it was made because benefits from having this log checkpoint
on by default are much better because we need that information.
Speaking of which, in 16, as I remember, now we can see in checkpoint log messages, we see LSN positions of checkpoint itself and reduce starting point, which is sometimes good to troubleshoot.
So there is a lot of small convenience things.
I think it's a convenience release,
a reliable convenience release.
This is my impression,
which is not a bad thing at all.
What else should we discuss?
Well, a brand new thing is the PGStack.io view that we've talked about before.
Yeah.
The author is Melanie Plageman.
Yeah.
I have no idea if you're pronouncing it right, but yes.
Right.
I was at PGCon and was at her talk.
So it was a good talk, and congratulations.
This is a very noticeable contribution and people need it.
I personally am going to use it all the time.
And I hope observability tools developers will integrate this to their dashboards and so on.
For example, NetData and so on.
This is a good thing to have.
And it provides our favorite buffers,
but globally for database, right?
Which it doesn't provide.
And timings.
Yeah, timings and many details.
It's good.
You understand who is doing work.
I.O. work is the most expensive in databases.
And in most cases,
slow means a lot of work needs to be done, right? A lot of I.O. work in databases. And in most cases, slow means a lot of work
needs to be done, right?
A lot of I.O. work in databases
because a database is an application
which is I.O. intensive application, right?
So I.O. bound, we usually I.O. bound.
And the main thing to understand
about this new system view
is that it's at Postgres level
for a buffer pool, it's not about disk.
It's about buffer pool.
So abstraction is high.
It means that if you see a lot of IO numbers, maybe they are, if it says reads, maybe it's
reads from buffer pool for buffer pool from page cache and under buffer pool, we have
page cache. so maybe it's
still all about memory we don't know at this level so reads is not discrete it's reads
from the page cache to buffer pool or it could be either but we don't know right yeah at
this level we don't know so we need to consult for example example, Proc.io using our process IDs or some different tool.
But it's already very good.
At this level, it's super important to have and useful.
And also, it doesn't track some I.O.
It tracks I.O. only related to the buffer pool.
It doesn't track I.O., for example, wall writes or reads.
Exactly.
But it does for auto-vacuum.
You mentioned you're going to use this all the time. What kind of thing do you imagine yourself using it for? So if we have a problem
in any analysis, we need to divide two segments. It's about budgeting. It's about performance
optimization, anything. You need to perform some segmentation to understand where to dive in further. And this
is a perfect starting point. So I understand, okay, database is slow, some developers say.
They like to say this. Database is slow. We look at this, better to draw this in terms of like a
graph, colorful graph with segments. Okay, this IO is from backends, this IO is from checkpoint, or this
IO is from auto vacuum workers, and so on. And I quickly understand where the most work is done.
Is it about backends or checkpoint or where? And then I can go to pgstat IO tables, indexes,
or to pgstat statements. I can go down to pgat kcache if I have it. This will be physical level
for queries. Or I can analyze wait events also additionally. This is, I mean, this is perfect
starting point for performance troubleshooting. Makes sense. And also, I guess, if you could rule
out that it's an IO problem, I know it probably will be, but if there's nothing lighting up here
in your monitoring of this view,
you can also rule out that it's,
either it's IO related or that it's,
maybe it's wall related.
Yeah, we have,
we discussed monitoring and troubleshooting
and we have some kind of runbook
and also a list of what monitoring should have
and troubleshooting runbook number one means like you have 30 seconds,
you need to understand which directions to dig.
And I'm almost sure we will include PGS at IEO in versions for Postgres 16 and newer.
So as a starting point for this quick and shallow analysis,
as wide as you can and very shallow. This should be a part
of methodology, I guess, for troubleshooting of performance. Yeah. And in terms of tooling,
it was the PG Analyze team. Lucas and one of the developers were involved in reviews of this,
so I'm pretty sure they'll be adding it to PG Analyzer. I suspect it's already there.
Yeah, very likely.
Okay.
Next thing I had, I think it's important, not because I've had many people talking about it for 16,
but because people talked about it so much being reverted from 15,
was the SQL JSON stuff, so the SQL standard constructors and identity functions
these were reverted from 15 because they weren't quite ready and it was a big deal
lots of people saying they were disappointed and now that they've been added or at least most of
them I'm so I was a bit surprised that they didn't make a lot of the
various things I've seen on PostgreSQL 16 already. So yeah, I think it's cool. And I think standard
compliance is important. I hear about it quite often when people say why they're moving to
PostgreSQL or why they're picking PostgreSQL in the first place. So I know this work can sometimes
be thankless. So I just wanted to say I appreciate all that hard work.
Well, yeah, Postgres SQL
for many cases
is standard de facto already.
I mean, it's syntax.
And we know a couple of Postgres folks
are members of SQL
standard committee, which is
good. And of course,
supporting standard is always good.
I work with JSON in SQL context all the time.
I wish the syntax was less bloated
and there are ideas how to change it,
but it would require change of grammar,
heavy change of grammar.
But still, I agree with you.
Standard support is a good thing to have.
On this note, syntax,
I think breakthrough change is
now we don't need to specify
alias for subqueries in from.
It hits everyone.
Everyone who writes SQL
at some point, you see
this error for sure.
Very helpful error.
It does tell you exactly what you're doing wrong,
but it doesn't... Yeah, it says just
add as alias. Yeah.
And I think it's not standard, this
change.
I suspect the standard
requires you to need one but
I know other databases...
I think this has come up quite a lot in
migrations, like migrations from
other databases that don't force
you to do this. And it must be
quite annoying. This is exactly my point. Here, we probably deviate from standard, but for the
sake of easier migrations, it's a good thing to have. But I personally, I'm going to stop writing
those aliases. Closing parentheses and space underscore was always already i have a habit already writing
this but now i can drop this well not now in couple of years when production systems will be
on 16 i will i'm going to drop this habit yeah which is good like more convenient yeah
slightly back to performance um i'm one of guys who like to use some kind of newer,
it's not new already, but newer SQL syntax.
For example, you can order by inside aggregate function, which is interesting.
Sometimes, for example, for DBA, for example, you select from Pages Activity,
and you group somehow, for example, by state, right?
Idle, active, idle, and transaction.
And you want a couple of examples, a few examples of queries.
You cut them using left function, for example,
taking only like 10 first characters.
And then you aggregate using string agg or array agg and and then you think oh okay
i don't want arbitrary examples i want like the the longest lasting queries for
example and you say inside a string agg or array agg you write
right inside there you write right inside there, you write order by query start or exact start,
and limit will be applied using array indexes, different.
But you order by inside aggregation, you can specify order.
So when I used it, I always understood that it's not going to benefit from any indexes,
it will be in memory, so it's not going to benefit from any indexes, it will
be in memory, so it's quite not scalable approach. I mean, performance is not good. If I have
billion rows, I probably won't do it. But for business activity, it's good. So now the
planner can use indexes to support this order by inside aggregate functions, which is good.
And also distinct inside. You can
write distinct inside it also. So string edgy distinct. This support was always there, but
it was not benefiting from indexes. Now it can benefit from indexes.
Yeah, these Planner improvements get quite in detail, but all of these things add up,
right? It's very cool that it can do that.
Yeah. What else? And 150 more changes, right?
Yeah, right.
Which we don't have time to cover. Yeah.
What about just generally, you mentioned like updating production in a couple of years um
i see people i see some people upgrading every year or at least you know most years
but they don't seem to be in the majority every couple of years does seem to be really popular
i know there is overhead to do a major version upgrades i think that's the reason
is there like what is there anything else you wanted to add there?
About upgrades? That's a huge topic.
No, no, no. About
should people upgrade? How often should they
upgrade? Is it okay to still have
a Postgres 12 instance lying around
and just keeping it up to date?
So usually
you should
upgrade minor upgrades
the newest in most cases.
But again, like enterprise approach should involve proper testing in lower environments.
But for major versions, it depends on each case.
But in general, old school accommodation is wait until, for example, 16.2.
A couple of three, it will be already third
minor version 0 1 2 right
and then upgrade
if you have some kind of critical
system but here again my point
is that for 16 probably
we have more stable release because
not half of source
code was rewritten how
Postgres is built was changed drastically.
Moving to Mason, new system to build.
Oh, yeah.
The same system for all operations.
The same approach now is used for Linux, FreeBSD, MacOS, Windows.
So it's a newer approach.
And a lot of legacy was removed.
But it's behind the curtain for users, right?
You don't see this because you just take the binaries.
It's good for the Postgres developers
and for people testing patches.
Yeah, so more unified and more modern approach
to build Postgres, right?
Okay.
Actually, one thing I really did want to say quickly,
and I'm surprised we haven't had more requests for this,
is that I think probably the biggest thing that's happened to Postgres
in the past year is outside of Postgres itself,
which is, I think, the PG vector extension.
I suspect we're going to do a whole episode on it at some point.
Yes, we should.
But it's cool because of the extensibility
that these massive changes can happen
outside of the major version release cycle.
I have a different opinion here.
So I think PgVector should be part of Postgres as soon as possible,
but obviously it would mean that releases would happen only once per year,
which is currently for pgVector is quite slow because the pace of development is super high
right now. But if it's still like, if it's third-party extension continues to be so,
like, it's also losing some attention if for example json was in core
right in right away yeah and this was good it's kind of why i wanted to bring it up because
it could easily have been something that that was decided to be put into postgres but i imagine it
would have taken an extra couple of years and And maybe it still will. Maybe that is what will happen.
But it's really cool that it can be out there in the wild,
getting quick releases that aren't tied to the Postgres major version cycle.
And if it is in all the major cloud providers already,
it kind of is in Postgres.
It is, in a way, because anybody that self-hosts can install it,
and anybody on cloud platforms can.
I don't agree here, because if you check, for example,
versions are maybe very old, for example.
They upgrade slowly.
Ah, good point.
Also, some providers still don't support it.
And also, if it goes to core, there are two paths.
It can be as contrib module, like officially supported extension, which is already good.
Or it can go directly to core.
And, you know, I imagine a lot of discussions when people think what to choose, Postgres or specific vector database to store vectors.
Or, I don't know, Elastic or others.
They all support vectors, either by third-party extensions as well or in core.
I think support in core can be argument for choices.
This is like argument.
It's very high-level argument, not very diving to details,
but some people might think,
oh, if it's in core, it's better supported.
And you cannot say, like you can try to convince,
no, no no no
some every every like rds adds support cloud sql adds support by the way it added right
cloud sql there was a question recently it did but i think not not the latest version not 0.5.0
which is important an important release so you make a good point on they might support it,
but is it up to date?
Without HNSW indexes, right?
Yeah.
Which everyone wants right now, almost everyone.
Yeah.
So, by the way, about PG Vector,
someone did a very, very good point in my Twitter
when I was discussing it,
saying that, you know, with HNSW indexes,
it breaks the rule.
Indexes should not change what is returned by query
but this approximate
neighbor search
they will produce different results
like without index you have one result
with index you have different result
and this probably like the thing
Postgres never had
I mean never had in core
extensions can
yeah I've heard that that is a reason for not including it never had. I mean, never had in core. Extensions can...
Yeah, I have heard that that is a reason for not including it. But yeah, we'll see, I guess.
I think it should be in core because Postgres is good at JSON, it should be good at vectors,
obviously. But yeah, I understand some obstacles.
Anyway, sorry for the detour, but I thought that was worth adding.
It's a good topic, I think, for future.
It's very interesting and very attractive for many.
Very attractive topic.
Okay, thank you for interesting discussion as usual.
And let's remind our listeners that we need
feedback, either in terms of subscriptions, likes, or maybe comments and suggestions what to talk about. And we still
continue working on subtitles on new episodes have them, we're
improving, so feel free to use them and just also give us feedback please
on twitter or on youtube or anywhere else thank you thank you bye