Postgres FM - Extensions
Episode Date: June 9, 2023Nikolay and Michael discuss Postgres extensions — what they are, how they affect your decisions around Postgres, and some things to keep in mind when using them. Here are links to a few th...ings we mentioned: Extensions (docs) https://www.postgresql.org/docs/current/external-extensions.html Extension (pgPedia) https://pgpedia.info/e/extension.html pgvector https://github.com/pgvector/pgvector PL/Rust https://github.com/tcdi/plrustZomboDB https://github.com/zombodb/zombodb Why is Postgres popular episode https://postgres.fm/episodes/why-is-postgres-popular Citus https://github.com/citusdata/citusTimescaleDB https://github.com/timescale/timescaledb OrioleDB https://github.com/orioledb/orioledbPostGIS https://trac.osgeo.org/postgis/ “There’s an extension for that” (tweet from Robert Treat) https://twitter.com/robtreat2/status/1665735485883314178 RDS supported extensions https://docs.aws.amazon.com/AmazonRDS/latest/PostgreSQLReleaseNotes/postgresql-extensions.html RUM https://github.com/postgrespro/rum pg_repack https://github.com/reorg/pg_repack PGXN https://pgxn.org/ pgTrunk by CoreDB https://pgtrunk.io/ Dbdev by Supabase https://supabase.com/blog/dbdev StackGres https://github.com/ongres/stackgrespg_tle by AWS https://github.com/aws/pg_tle Modern Postgres monitoring (slides from Nikolay’s tutorial) https://twitter.com/samokhvalov/status/1664686535562625034 Awesome Postgres https://github.com/dhamaniasad/awesome-postgres ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!If you would like to share this episode, here's a good link (and thank you!)~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork
Transcript
Discussion (0)
Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael,
founder of BG Master. This is my co-host Nikolai, founder of Postgres AI. Hey Nikolai,
what are we going to be talking about today? Hi Michael, let's talk about extensions.
Awesome. Yeah, we've had a couple of requests for this since we started nearly a year ago,
so it's cool to be getting to it. I know we've covered a couple of extensions specifically,
but yeah, this was my choice,
and I thought it was about time we actually got to
what extensions are, how they can be useful,
some things to watch out for, that kind of thing.
Yeah, first of all, we started 49 weeks ago
because this is episode number 49,
and it amuses me that we didn't miss any single week.
And soon it will be one year.
In one month, it will be anniversary of our podcast, which is great.
And indeed, extensions is a big topic in Postgres.
And it's one of the key aspects of Postgres
where we can say it's very, very extensible, extensibility
of this system is very great.
Not only you can create obviously tables and so on and data types, for example, you can
get data types, you can create indexes for your data types and so on.
You can create languages, but also you can create extensions.
And actually, custom languages is one kind of extensions.
One of the ideas of abstract extension, I have some special data type, and I want this data type to be performing very well.
So I create a special type of index, probably based on generalized search tree,
or generalized inverted index gin.
These are two like abstract systems to create new indexes.
In this case, my data type is backed by some index.
I can give it a name and package as extension.
And this happens all the time. And recently, on the wave of LLM exploding popularity,
extension PG vector started to grow very well.
It was created before chart GPT.
GPT version 4 was released.
But of course, this fueled the growth of popularity of PG vector.
And even RDS already supports it.
I don't remember that requests to add some extension were handled so quickly by RDS.
It means something interesting.
And this extension also provides a special type of indexes for KINAN search.
You find key nearest neighbors for your vectors.
It helps you to work with embeddings for ggpt or other LLMs and so on.
So it's very helpful to build your own systems using large language models.
And this is just an example of how it can happen.
Yeah, nice example.
I didn't realize this, but I think PG Vector is, for the Ruby folks out there,
something written by, or at least maintained, mostly written by Andrew Cain,
who's really famous in the Ruby world for the number of gems.
So I don't think he's a listener, but yeah, I'm sure a lot of our listeners are very aware of who that is.
Very cool.
Right, right.
So just an example.
It's a very popular example, but of course, extension can do different things.
It can implement not only data type of indexing.
It can implement something else.
For example, it can be just a set of SQL functions which are useful for administration. It can be, of course, a full-fledged language.
And recently also, I think Ardeus also added PL Rust,
which has some high demand.
Yeah, I think that was a cool thing.
I think maybe the team behind Zombodb are the ones leading the charge on PL Rust.
But I could be wrong on that
but yeah another extension so there's yeah i've seen a couple of good definitions that i like
the simplest was an extension is a package of code and sql objects to extend postgres functionality
and so by that definition it's extremely broad And Postgres was extensible before the concept of these.
What we're talking about here specifically was added in 2011.
I've got down here Postgres 9.1, the ability to package these up in a fully supported manner.
But you could extend Postgres long before then.
In my opinion, I think we covered this actually in our episode on why was Postgres so popular, but I think it's been a really key reason that Postgres
has been so successful over the years. I agree. Yeah. This property to be very extendable,
it's very important. And it's not like marketplace, but to be able to allow people to add what they want to add,
this is super important for an open source project.
And sometimes we have competing implementation of the same,
like to achieve the same goal.
And this is normal for open source
because it doesn't have a single direction and roadmap.
Well, by the way, it's also a different topic.
Maybe kind of roadmap should exist.
But in general, if you take some company,
usually they have some direction, they have goals,
they have strategy where to move.
But open source usually it's like, you know,
cathedral versus bazaar.
And for open source, it's essential to maintain
this bazaar philosophy And for open source, it's essential to maintain this bazaar philosophy
where many, many voices are heard from everywhere, right?
And people just experiment
and the most successful, interesting ideas
resonate in other minds, right?
And this is how some extension can grow.
But of course, we have examples of cathedral
style extensions and we know them very well one of them is site also already acquired by microsoft
some people call it super extension this these kinds of extension and we have another another
one timescale it's just extension right this is great you don't not a fork right you can install it community edition
at least you can install it to regular postgres this is super cool and they they have a lot of
stuff happening it's not just data type and index it it's a lot of stuff they have their own nodes
and execution plan and so like so many things I think super extension is a really good word for it.
And I think so.
I think there are some that do so much that you can almost consider them
shifting Postgres enough that it becomes a different kind of database.
Like some things that some people, let's say Postgres experts that have never
worked with Citus or never worked with Timescale before might struggle to diagnose and
help you with certain issues because so many things have changed, but they are implemented
as extensions, which is really cool. So I think there's a new one, a new-ish one, I guess it's
been going a while now, AureolDB that I put in that category. And I've seen some people put
PostGIS in that category as well, even though it's a bit more... a bit more Hold on, AureolaDB cannot be put into
this category yet, it's a fork
and it will be difficult
I hope they will succeed but it will be
difficult to transform
to extension. Actually as I remember
Timescale was a fork as well
originally
or Citus
some people passed this
route, like first it's a fork but then
slowly moving to be able to like maintaining fork is nightmare usually we can look at a green plan
for example it's not it's a lot of work you're always lagging behind community even rds yeah
right right right well i'm not sure how rDS is. They lag a bit though, right?
It takes them a while, Aurora especially.
Aurora is a different database.
But then you become extension.
It's great because it's much easier to start working with new major version.
Of course, some work is still required, but you have interfaces,
even sometimes hooks and so on, but it's possible.
And AureolaDB has a lot of innovative stuff a lot like a lot of stuff changed and as I understand their plan of probably should talk to them one day
firsthand understand and send problems first hand but the ideas is like things
should be moved to postgres core and in the future, once these things are moved to the core,
to Postgres core, they become extension.
This is the plan, as I understand.
But now it's not so, just to clarify.
Right?
Yeah, I think you're right.
And the things that they need to put in core
would be things to make it more extensible for others as well.
So it's increasing the extensibility is is their goal which is it's cool and and there have been previous
projects around this before right around pluggable storage and things so postgres there seem to be
people in the ecosystem who want it to be even more extensive even though it's the most extensible
database i've ever seen people want it to be even more extensible which is quite cool but there are as well as these super extensions which I think are worth mentioning most extensions
are much smaller and much more kind of like small utilities to allow you to do something small
and there's a there's just so many hundreds of them in fact I saw kind of a throwaway joke tweet
I think he's said it a few times and probably it's a bit of an in-joke between people,
but I've seen people using the Apple quote, you know, that there's an app for that, but
for Postgres, there's an extension for that. So if you can think of something,
want Postgres to be able to do something, there's a chance that you'll be able to find an extension
to do it. Now, it might not be available on your cloud. This is the can this is the key this is the key so we live in
time when a lot of postgres users are not actually postgres users but some
managed service users and some people say for example postgres RDS is not
postgres which like I thought many times in various places where people get
community support they some people come and they say we have this problem and then turns out they use RDS for example or Cloud Seeker or
something and then some community members in this chat or anywhere like
Telegram for example they say oh it's not Postgres go to RDS. I find
that it's not right and I think RDS played a huge role and other managed Postgres
services. They still play a huge role in growing Postgres popularity. It's important. But at
the same time, they have a very strict list of extensions. And for example, I remember there is such extension called, there is GIN index, generalized inverted index.
And it's powerful for large data sets.
It's important for full text search, for search of overlapping arrays and so on.
There is a problem with GIN.
We want to find everything by sort according to timestamp or
according to some id so we want the freshest items go first it's a unsolvable problem for gin
so it works not good and people live with it for example 3 gram search also backed by gene and you have this problem
so planner chooses either to use b3 and then follow filtering
on the fly or to extract everything and order by in memory top and ordering and so there is also extensions called ram and you cannot install it on RDS. RAM solves this problem extending
GIN with these timestamps or IDs or numbers and then it's a single index scan you have ordered
result and order by limit n works well. But on RDS it's not available. I requested it many many years ago. I still like RAM is questionable
that it can be used in each case because the size of index is huge in this in my cases.
But still I want to be able to test it on RDS if my customer is RDS customer also. Right,
but I cannot and I asked them they didn't do it and this limits our um so in this like when
people say go to rds we won't help you in community i think it's not fair because it behaves like
postgres it helps to grow postgres popularity we should like accept this fact right and we should
help these people as well it's interesting like. Like we have many animals, all kinds of animals.
Some of them really look like Postgres.
Why not like accepting them, right?
But at the same time,
we cannot add extensions.
We won't, right?
And the list is limited
and we need to wait forever sometimes.
Well, if it's PG vector,
okay, everyone accepts.
It's very popular right now
and they edit it.
But if it's some extension like RAM, it's still not there.
And sometimes if you go to Heroku, they even don't support PG Repack, for example.
I have fresh case, fresh customer on Heroku.
And they experience blowout issues. What to do?
They say, oh, vacuum full analyzer, it's not good. I say, PGD pack.
And then I suddenly realized Heroku doesn't have PGD pack.
Maybe they added it yesterday and we haven't noticed yet. Remember the auto-explain
example from a few weeks ago?
Maybe in the end of the day, RDS and others are not true
open source Postgres indeed.
And somehow, like some things should be changed.
So either they find a way to support all extensions or we need to find a way to have all the benefits they propose.
But somehow, so we have all extensions and we have extensible fully extensible postgres again
let's come back to that because there is a there is something new in that area isn't there but
quickly to defend the people that i do have some sympathy for the people that have animosity
towards the rds support because they have a commercial product and people come to support for support for it from a free community
and for years amazon famously didn't contribute that much back to postgres aside from the
marketing which i think shouldn't be overlooked but you know development wise there was much less
involvement than there is now so i think some of it the hangover from that but coming back to the
yeah so heroku i think of a bad example because i don't
think they've implemented even some extremely popular extensions but i think the amazon list
of extensions is that is relatively like a relatively reasonable list of what the most
popular postgres extensions are there are some exceptions i think they support some things that
i've never seen anybody it's it's actually a good point. I use it in the cases when customers have self-managed Postgres.
I sometimes use this approach.
I say, you know, we should add this extension.
Even RDS has it.
Yeah, it's like proof that it's somewhat popular.
Yeah, so it's verified, you know.
It's good.
I agree this is good.
Like this helps me to sell some extension to some customer, right?
I mean, to have it, to install it.
While we're on that topic, I think there is PGXN as well, the extension network.
There are a couple of these not fully official but pretty official.
I think it's that.
I mean, it looks like a lie, but it's that, this idea.
It didn't become like the marketplace of some registry.
When I see PGXN, I usually understand.
For me, it means I will have some troubles installing it
because I need to install these additional steps.
I don't like it at all.
It doesn't work because, in my opinion,
the failure was because it was done by community, like, and so on. And if a website was polished,
if it was very social-oriented,
like, ratings, comments, and so on,
probably it would have chance to be, like,
some single registry of extensions.
But we don't have this.
Like, Postgres doesn't have good
registry of extensions right now.
And it's like, there are many lists,
and PGXN is just the ability to have some standard way to install it
and standard way to count all of them, to have registry.
It doesn't work.
I think it doesn't work.
There are new attempts.
I remember some, like CoreDB launched recent attempts.
Yeah, they called it PG trunkunk and also Superbase, right?
DB Dev.
Superbase, yeah, yeah, yeah.
Interesting to observe what will happen.
I have low expectations, but maybe something will work.
It's good to see such attempts.
I also had ideas to do this and realized like you need to be very good at marketing here so for me
the most important
would be reading comments and
people opinions, feedback
real experience but
PgXn doesn't provide this
for installing
it's a headache for me to use it for
installation, I don't know
value is not there, it's like source force
it became that eventually
source force also was good
until it wasn't right
so yeah sorry for
direct and maybe like
for some people offensive opinion
but this is how I
feel and I think
I'm not alone here
but yeah it definitely feels
like extension providers currently still link to,
people generally link to, let's say, their GitHub to show you.
This is one of the advantages sometimes a cathedral would have over a bazaar.
They can tell you if it's a closed source system with extensions or an app store,
you can point at the official one and everybody points at that.
You can see your reviews there.
You can see...
Documentation.
All sorts.
Yeah, exactly.
But we don't have that.
So yeah, it's interesting to see the efforts.
A lot of effort has gone into them,
but I've not tried to use it.
There is some demand in such registry
of a huge list of all extensions,
maybe not only extensions,
but also Postgres-related projects
like UI tools and CLI tools, everything.
Backup tools, a lot of stuff.
Of course, it would be great
if someone would maintain some registry
with comments, reviews, and so on.
Maybe someone would...
I see some efforts from different areas.
Maybe some of them will succeed but let me notice
like let's just think about it extensions currently probably one of the key areas where managed
postgres providers compete yeah they they compete in like supporting more and more extensions. This is good, but still I think it still feels like not open source
because I want full freedom.
I want to be able to install my extensions I developed yesterday right now.
Why not?
And I cannot do that.
Well, I think the reason is security, right?
I can name you five reasons.
Understandable, right? security right like i think that's the i i can name you five reasons understandable right so like if if security and if we support something we are responsible yes like for example what if
extension developer stop developing it and we find a bug who will apply it this this logic is
clear absolutely but open source source is not like that.
Open source is freedom, right?
We should be able to go wherever we want.
And I think it's a problem, unsolvable if you close these doors.
You can try to maintain a lot, a lot, a lot of extensions,
be responsible for them at different levels.
For example, Stargrass maybe has the biggest list
of extensions supported, more than 100.
But what happens if some extension,
like not maintained by original author,
who will fix bugs?
It's a question.
But still, Stargrass, I think, has the biggest list probably,
like in terms of extensions supported.
And it's interesting.
But can I install my own extension there?
Probably no.
I need to go to them and say, add it to your distribution.
But it already feels better as more like open source than managed cloud providers.
It's easier to add so it's a
difficult topic responsibility is a difficult topic i think you're completely right i think
it we probably have to mention here the project amazon announced last year i think it was last
year of tla which is itself an extension which is quite confusing. At least it confused me for a while.
But yeah, it's Trusted Language Extensions is what
TLE stands for. And I was originally quite excited by this.
I thought this was Amazon trying to really let you bring your own
extension to... Everyone
was in this trap. I also did. I saw the extension. There is some need here.
I want to install anything, but they cannot solve it because of this responsibility problem.
And when they released this TLE, many people, including both of us, started to think they
solved it somehow. But no, no, no, no, no. This is only for languages. It's very
different, right? It's not for extensions. Languages are
extensions as well, but specialized extensions.
I might have misunderstood or misremembered this, but I think they also
support some very basic level things that you would consider extensions.
For example, you've just got a few small plpg sql objects for example i think those could be packaged up like this
but they're again they're very they'll be very simple and very small and i think it's more about
cloud it's more of a cloud provider thing to let people bring some of their own code and maybe let a DBA allow their team to
install or not install certain packages that they've developed in-house on their database or
not so it's I think it's very it's a very small subset of what current extensions can do is
this allows but maybe that's like maybe that's just step one and they've got step two to ten
planned already and this is like maybe it could be a cross cloud provider collaboration effort on how
they can let people bring extensions to their clouds but yeah i'm i understand it is difficult
yeah well for me if if i again like let's like i do some bottom, what I have in my head, and we will probably discuss something else, maybe more useful for users.
Because I think we discussed some philosophical questions, which are very important, but maybe not that helpful for users.
Actually, they're helpful because they need to make decisions.
Go to RDS, go to Timescale Cloud, go to something else, go to install from Kubernetes operator or maintain everything themselves.
So currently we have not good situation.
Extensions are like very limited.
Okay, list is growing, but limited.
And full power of open source is full freedom.
And managed Postgres cannot provide it.
And this is not right.
I mean, if I want,
I pay you, right? And this is the resources you provide and Postgres you provide with some additional automation, but I paid you a lot. And if I want some extension, I would like to have it,
even if like accepting some responsibility on my shoulder. So I say, okay, I understand that something can be broken
if I install this extension.
But this right should belong to me.
I'm like paying, this is open source.
Something is totally wrong in industry right now here.
Totally wrong.
Like open source money I pay you,
and you tell me I cannot use the power of open source.
I cannot install all extensions.
I think something should be changed in the next five years or so
because right now it feels, for me, like Oracle.
I think you're onto something,
but I might even use the example that we're currently living in a world
where most cloud providers act like Apple,
and they have an app store,
and they very much have to only allow a small number of apps
onto it. And maybe what we're missing is a cloud provider that acts a little bit more like
Google or Android that allows a lot more. Perfect example, exactly. Which is based also on Java,
a lot of things are open source. It's also Linux, right? I mean, their operational system.
And it can be installed on various devices, not only one single vendor.
This is a great example, by the way.
So, yeah, I think something like that is missing in our Postgres ecosystem.
And we observe those dozens of companies who compete in the area of managed Postgres
and Timescale who switched to the idea cloud first, not only
cloud first, but
something else can be done
here and something else should appear soon
I think. And this thing
that will appear will not limit us
in the area of extensibility
of extensions. I should be able to install
extensions. Extension I developed
yesterday. Deploy to my
production, Why not?
Quick fire. I think we've
not got that long left.
So I think we should do so. I think you've made
a really good point that we haven't done that many practical things.
I've got a couple of tips for people.
I imagine you'll be able to
add a few more onto this.
But one thing to be aware of is
major version support. If you're doing major
version upgrades, please consider your extensions and check compatibility.
Everyone forgets about it.
Yeah, everyone forgets, but it's a major issue.
And then the second thing is before you check, before you're using an extension, do check how it's maintained.
I think I quite often see people having to ping maintainers to add major version upgrades, major version support.
So some are maintained impeccably and some it's a little bit slower.
So just make sure that they're in line with what you're going to need and expect or that you could have the resources in-house to help them out if you really need to allow it.
But each extension is a small separate product.
Sometimes not that small if it's a Cytos or Timescale.
But anyway, it's a separate product and it might introduce problems.
They have usually smaller release cycles than Postgres if they are developed actively.
But if you just install without thorough testing, you might have issues.
I had issues like some extension was updated and whole Postgres started to have segmentation faults.
And we need to troubleshoot to find that it's not bad.
Bad minor upgrade of some extension caused issues.
So it's an area where we need to invest some efforts to maintain.
But we shouldn't upgrade, but we should test it as well, right?
Yeah.
And for this, I would pay.
I mean, my ideal thing, through open source Postgres,
installed anywhere, and supporting any extensions,
some of them are maintained by this provider and
I pay for it because they tested very well all upgrades and they automate
those upgrades and so on. None of managed Postgres providers we currently have do
this yet so this opportunity I guess. Sounds like it. and finally what's the most
popular extension
we have
everyone knows
it right
pgstat statements
do you count that
what does it mean
it means it
should be not
an extension
I've just
returned from
from Ottawa
where I had
pgcon tutorial
and you can
find on
twitter
like a lot of
likes
slide deck unfortunately it was not
recorded. Probably we should repeat it.
And I was telling
basic things, TPS, QPS
and latency for
queries. Only TPS
can be provided by Postgres itself.
For QPS and for latency,
you need to use substatements. It should be in core.
I like that. Maybe every five
years, the most popular extension there's some effort to put it into core. That should be in core. I like that. Maybe every five years, the most popular extension,
there's some effort to put it into core. That would be
amazing. Yeah, if
something is big, it should be merged, right?
I like that.
Well, I think that's probably all we've got time
for. So I'm sure we'll go into some of those
topics in more detail in further episodes.
But any parting things
to leave people with?
Well, I also wanted to mention
there are some third-party registries of extensions.
There is awesome Postgres on GitHub.
It's not only about extensions,
but it's a good list of manually curated list of extensions
and not only extensions.
And it's good to check.
It's kind of what people consider consider interesting and there are so many
extensions out there actually there is also a saying that for each problem there is an extension
it's already solved for example you you showed me today like uh someone said maybe it's a good idea
to have a timestamp with time zone store and also answer was there is already extension for
that so yeah not just any old person that was robert haas and robert tree oh right yeah right
great well thanks so much nicolai thanks everybody for listening and catch you next week thank you
bye