Postgres FM - Extensions

Episode Date: June 9, 2023

Nikolay and Michael discuss Postgres extensions — what they are, how they affect your decisions around Postgres, and some things to keep in mind when using them. Here are links to a few th...ings we mentioned: Extensions (docs) https://www.postgresql.org/docs/current/external-extensions.html Extension (pgPedia) https://pgpedia.info/e/extension.html pgvector https://github.com/pgvector/pgvector PL/Rust https://github.com/tcdi/plrustZomboDB https://github.com/zombodb/zombodb Why is Postgres popular episode https://postgres.fm/episodes/why-is-postgres-popular Citus https://github.com/citusdata/citusTimescaleDB https://github.com/timescale/timescaledb OrioleDB https://github.com/orioledb/orioledbPostGIS https://trac.osgeo.org/postgis/ “There’s an extension for that” (tweet from Robert Treat) https://twitter.com/robtreat2/status/1665735485883314178  RDS supported extensions https://docs.aws.amazon.com/AmazonRDS/latest/PostgreSQLReleaseNotes/postgresql-extensions.html RUM https://github.com/postgrespro/rum pg_repack https://github.com/reorg/pg_repack PGXN https://pgxn.org/ pgTrunk by CoreDB https://pgtrunk.io/ Dbdev by Supabase https://supabase.com/blog/dbdev StackGres https://github.com/ongres/stackgrespg_tle by AWS https://github.com/aws/pg_tle Modern Postgres monitoring (slides from Nikolay’s tutorial) https://twitter.com/samokhvalov/status/1664686535562625034 Awesome Postgres https://github.com/dhamaniasad/awesome-postgres ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!If you would like to share this episode, here's a good link (and thank you!)~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of BG Master. This is my co-host Nikolai, founder of Postgres AI. Hey Nikolai, what are we going to be talking about today? Hi Michael, let's talk about extensions. Awesome. Yeah, we've had a couple of requests for this since we started nearly a year ago, so it's cool to be getting to it. I know we've covered a couple of extensions specifically, but yeah, this was my choice, and I thought it was about time we actually got to what extensions are, how they can be useful,
Starting point is 00:00:33 some things to watch out for, that kind of thing. Yeah, first of all, we started 49 weeks ago because this is episode number 49, and it amuses me that we didn't miss any single week. And soon it will be one year. In one month, it will be anniversary of our podcast, which is great. And indeed, extensions is a big topic in Postgres. And it's one of the key aspects of Postgres
Starting point is 00:01:02 where we can say it's very, very extensible, extensibility of this system is very great. Not only you can create obviously tables and so on and data types, for example, you can get data types, you can create indexes for your data types and so on. You can create languages, but also you can create extensions. And actually, custom languages is one kind of extensions. One of the ideas of abstract extension, I have some special data type, and I want this data type to be performing very well. So I create a special type of index, probably based on generalized search tree,
Starting point is 00:01:44 or generalized inverted index gin. These are two like abstract systems to create new indexes. In this case, my data type is backed by some index. I can give it a name and package as extension. And this happens all the time. And recently, on the wave of LLM exploding popularity, extension PG vector started to grow very well. It was created before chart GPT. GPT version 4 was released.
Starting point is 00:02:18 But of course, this fueled the growth of popularity of PG vector. And even RDS already supports it. I don't remember that requests to add some extension were handled so quickly by RDS. It means something interesting. And this extension also provides a special type of indexes for KINAN search. You find key nearest neighbors for your vectors. It helps you to work with embeddings for ggpt or other LLMs and so on. So it's very helpful to build your own systems using large language models.
Starting point is 00:02:59 And this is just an example of how it can happen. Yeah, nice example. I didn't realize this, but I think PG Vector is, for the Ruby folks out there, something written by, or at least maintained, mostly written by Andrew Cain, who's really famous in the Ruby world for the number of gems. So I don't think he's a listener, but yeah, I'm sure a lot of our listeners are very aware of who that is. Very cool. Right, right.
Starting point is 00:03:27 So just an example. It's a very popular example, but of course, extension can do different things. It can implement not only data type of indexing. It can implement something else. For example, it can be just a set of SQL functions which are useful for administration. It can be, of course, a full-fledged language. And recently also, I think Ardeus also added PL Rust, which has some high demand. Yeah, I think that was a cool thing.
Starting point is 00:03:59 I think maybe the team behind Zombodb are the ones leading the charge on PL Rust. But I could be wrong on that but yeah another extension so there's yeah i've seen a couple of good definitions that i like the simplest was an extension is a package of code and sql objects to extend postgres functionality and so by that definition it's extremely broad And Postgres was extensible before the concept of these. What we're talking about here specifically was added in 2011. I've got down here Postgres 9.1, the ability to package these up in a fully supported manner. But you could extend Postgres long before then.
Starting point is 00:04:40 In my opinion, I think we covered this actually in our episode on why was Postgres so popular, but I think it's been a really key reason that Postgres has been so successful over the years. I agree. Yeah. This property to be very extendable, it's very important. And it's not like marketplace, but to be able to allow people to add what they want to add, this is super important for an open source project. And sometimes we have competing implementation of the same, like to achieve the same goal. And this is normal for open source because it doesn't have a single direction and roadmap.
Starting point is 00:05:22 Well, by the way, it's also a different topic. Maybe kind of roadmap should exist. But in general, if you take some company, usually they have some direction, they have goals, they have strategy where to move. But open source usually it's like, you know, cathedral versus bazaar. And for open source, it's essential to maintain
Starting point is 00:05:44 this bazaar philosophy And for open source, it's essential to maintain this bazaar philosophy where many, many voices are heard from everywhere, right? And people just experiment and the most successful, interesting ideas resonate in other minds, right? And this is how some extension can grow. But of course, we have examples of cathedral style extensions and we know them very well one of them is site also already acquired by microsoft
Starting point is 00:06:13 some people call it super extension this these kinds of extension and we have another another one timescale it's just extension right this is great you don't not a fork right you can install it community edition at least you can install it to regular postgres this is super cool and they they have a lot of stuff happening it's not just data type and index it it's a lot of stuff they have their own nodes and execution plan and so like so many things I think super extension is a really good word for it. And I think so. I think there are some that do so much that you can almost consider them shifting Postgres enough that it becomes a different kind of database.
Starting point is 00:06:56 Like some things that some people, let's say Postgres experts that have never worked with Citus or never worked with Timescale before might struggle to diagnose and help you with certain issues because so many things have changed, but they are implemented as extensions, which is really cool. So I think there's a new one, a new-ish one, I guess it's been going a while now, AureolDB that I put in that category. And I've seen some people put PostGIS in that category as well, even though it's a bit more... a bit more Hold on, AureolaDB cannot be put into this category yet, it's a fork and it will be difficult
Starting point is 00:07:29 I hope they will succeed but it will be difficult to transform to extension. Actually as I remember Timescale was a fork as well originally or Citus some people passed this route, like first it's a fork but then
Starting point is 00:07:47 slowly moving to be able to like maintaining fork is nightmare usually we can look at a green plan for example it's not it's a lot of work you're always lagging behind community even rds yeah right right right well i'm not sure how rDS is. They lag a bit though, right? It takes them a while, Aurora especially. Aurora is a different database. But then you become extension. It's great because it's much easier to start working with new major version. Of course, some work is still required, but you have interfaces,
Starting point is 00:08:21 even sometimes hooks and so on, but it's possible. And AureolaDB has a lot of innovative stuff a lot like a lot of stuff changed and as I understand their plan of probably should talk to them one day firsthand understand and send problems first hand but the ideas is like things should be moved to postgres core and in the future, once these things are moved to the core, to Postgres core, they become extension. This is the plan, as I understand. But now it's not so, just to clarify. Right?
Starting point is 00:08:55 Yeah, I think you're right. And the things that they need to put in core would be things to make it more extensible for others as well. So it's increasing the extensibility is is their goal which is it's cool and and there have been previous projects around this before right around pluggable storage and things so postgres there seem to be people in the ecosystem who want it to be even more extensive even though it's the most extensible database i've ever seen people want it to be even more extensible which is quite cool but there are as well as these super extensions which I think are worth mentioning most extensions are much smaller and much more kind of like small utilities to allow you to do something small
Starting point is 00:09:34 and there's a there's just so many hundreds of them in fact I saw kind of a throwaway joke tweet I think he's said it a few times and probably it's a bit of an in-joke between people, but I've seen people using the Apple quote, you know, that there's an app for that, but for Postgres, there's an extension for that. So if you can think of something, want Postgres to be able to do something, there's a chance that you'll be able to find an extension to do it. Now, it might not be available on your cloud. This is the can this is the key this is the key so we live in time when a lot of postgres users are not actually postgres users but some managed service users and some people say for example postgres RDS is not
Starting point is 00:10:16 postgres which like I thought many times in various places where people get community support they some people come and they say we have this problem and then turns out they use RDS for example or Cloud Seeker or something and then some community members in this chat or anywhere like Telegram for example they say oh it's not Postgres go to RDS. I find that it's not right and I think RDS played a huge role and other managed Postgres services. They still play a huge role in growing Postgres popularity. It's important. But at the same time, they have a very strict list of extensions. And for example, I remember there is such extension called, there is GIN index, generalized inverted index. And it's powerful for large data sets.
Starting point is 00:11:12 It's important for full text search, for search of overlapping arrays and so on. There is a problem with GIN. We want to find everything by sort according to timestamp or according to some id so we want the freshest items go first it's a unsolvable problem for gin so it works not good and people live with it for example 3 gram search also backed by gene and you have this problem so planner chooses either to use b3 and then follow filtering on the fly or to extract everything and order by in memory top and ordering and so there is also extensions called ram and you cannot install it on RDS. RAM solves this problem extending GIN with these timestamps or IDs or numbers and then it's a single index scan you have ordered
Starting point is 00:12:15 result and order by limit n works well. But on RDS it's not available. I requested it many many years ago. I still like RAM is questionable that it can be used in each case because the size of index is huge in this in my cases. But still I want to be able to test it on RDS if my customer is RDS customer also. Right, but I cannot and I asked them they didn't do it and this limits our um so in this like when people say go to rds we won't help you in community i think it's not fair because it behaves like postgres it helps to grow postgres popularity we should like accept this fact right and we should help these people as well it's interesting like. Like we have many animals, all kinds of animals. Some of them really look like Postgres.
Starting point is 00:13:07 Why not like accepting them, right? But at the same time, we cannot add extensions. We won't, right? And the list is limited and we need to wait forever sometimes. Well, if it's PG vector, okay, everyone accepts.
Starting point is 00:13:22 It's very popular right now and they edit it. But if it's some extension like RAM, it's still not there. And sometimes if you go to Heroku, they even don't support PG Repack, for example. I have fresh case, fresh customer on Heroku. And they experience blowout issues. What to do? They say, oh, vacuum full analyzer, it's not good. I say, PGD pack. And then I suddenly realized Heroku doesn't have PGD pack.
Starting point is 00:13:51 Maybe they added it yesterday and we haven't noticed yet. Remember the auto-explain example from a few weeks ago? Maybe in the end of the day, RDS and others are not true open source Postgres indeed. And somehow, like some things should be changed. So either they find a way to support all extensions or we need to find a way to have all the benefits they propose. But somehow, so we have all extensions and we have extensible fully extensible postgres again let's come back to that because there is a there is something new in that area isn't there but
Starting point is 00:14:32 quickly to defend the people that i do have some sympathy for the people that have animosity towards the rds support because they have a commercial product and people come to support for support for it from a free community and for years amazon famously didn't contribute that much back to postgres aside from the marketing which i think shouldn't be overlooked but you know development wise there was much less involvement than there is now so i think some of it the hangover from that but coming back to the yeah so heroku i think of a bad example because i don't think they've implemented even some extremely popular extensions but i think the amazon list of extensions is that is relatively like a relatively reasonable list of what the most
Starting point is 00:15:16 popular postgres extensions are there are some exceptions i think they support some things that i've never seen anybody it's it's actually a good point. I use it in the cases when customers have self-managed Postgres. I sometimes use this approach. I say, you know, we should add this extension. Even RDS has it. Yeah, it's like proof that it's somewhat popular. Yeah, so it's verified, you know. It's good.
Starting point is 00:15:40 I agree this is good. Like this helps me to sell some extension to some customer, right? I mean, to have it, to install it. While we're on that topic, I think there is PGXN as well, the extension network. There are a couple of these not fully official but pretty official. I think it's that. I mean, it looks like a lie, but it's that, this idea. It didn't become like the marketplace of some registry.
Starting point is 00:16:07 When I see PGXN, I usually understand. For me, it means I will have some troubles installing it because I need to install these additional steps. I don't like it at all. It doesn't work because, in my opinion, the failure was because it was done by community, like, and so on. And if a website was polished, if it was very social-oriented, like, ratings, comments, and so on,
Starting point is 00:16:33 probably it would have chance to be, like, some single registry of extensions. But we don't have this. Like, Postgres doesn't have good registry of extensions right now. And it's like, there are many lists, and PGXN is just the ability to have some standard way to install it and standard way to count all of them, to have registry.
Starting point is 00:16:55 It doesn't work. I think it doesn't work. There are new attempts. I remember some, like CoreDB launched recent attempts. Yeah, they called it PG trunkunk and also Superbase, right? DB Dev. Superbase, yeah, yeah, yeah. Interesting to observe what will happen.
Starting point is 00:17:12 I have low expectations, but maybe something will work. It's good to see such attempts. I also had ideas to do this and realized like you need to be very good at marketing here so for me the most important would be reading comments and people opinions, feedback real experience but PgXn doesn't provide this
Starting point is 00:17:35 for installing it's a headache for me to use it for installation, I don't know value is not there, it's like source force it became that eventually source force also was good until it wasn't right so yeah sorry for
Starting point is 00:17:52 direct and maybe like for some people offensive opinion but this is how I feel and I think I'm not alone here but yeah it definitely feels like extension providers currently still link to, people generally link to, let's say, their GitHub to show you.
Starting point is 00:18:12 This is one of the advantages sometimes a cathedral would have over a bazaar. They can tell you if it's a closed source system with extensions or an app store, you can point at the official one and everybody points at that. You can see your reviews there. You can see... Documentation. All sorts. Yeah, exactly.
Starting point is 00:18:30 But we don't have that. So yeah, it's interesting to see the efforts. A lot of effort has gone into them, but I've not tried to use it. There is some demand in such registry of a huge list of all extensions, maybe not only extensions, but also Postgres-related projects
Starting point is 00:18:46 like UI tools and CLI tools, everything. Backup tools, a lot of stuff. Of course, it would be great if someone would maintain some registry with comments, reviews, and so on. Maybe someone would... I see some efforts from different areas. Maybe some of them will succeed but let me notice
Starting point is 00:19:07 like let's just think about it extensions currently probably one of the key areas where managed postgres providers compete yeah they they compete in like supporting more and more extensions. This is good, but still I think it still feels like not open source because I want full freedom. I want to be able to install my extensions I developed yesterday right now. Why not? And I cannot do that. Well, I think the reason is security, right? I can name you five reasons.
Starting point is 00:19:46 Understandable, right? security right like i think that's the i i can name you five reasons understandable right so like if if security and if we support something we are responsible yes like for example what if extension developer stop developing it and we find a bug who will apply it this this logic is clear absolutely but open source source is not like that. Open source is freedom, right? We should be able to go wherever we want. And I think it's a problem, unsolvable if you close these doors. You can try to maintain a lot, a lot, a lot of extensions, be responsible for them at different levels.
Starting point is 00:20:26 For example, Stargrass maybe has the biggest list of extensions supported, more than 100. But what happens if some extension, like not maintained by original author, who will fix bugs? It's a question. But still, Stargrass, I think, has the biggest list probably, like in terms of extensions supported.
Starting point is 00:20:48 And it's interesting. But can I install my own extension there? Probably no. I need to go to them and say, add it to your distribution. But it already feels better as more like open source than managed cloud providers. It's easier to add so it's a difficult topic responsibility is a difficult topic i think you're completely right i think it we probably have to mention here the project amazon announced last year i think it was last
Starting point is 00:21:19 year of tla which is itself an extension which is quite confusing. At least it confused me for a while. But yeah, it's Trusted Language Extensions is what TLE stands for. And I was originally quite excited by this. I thought this was Amazon trying to really let you bring your own extension to... Everyone was in this trap. I also did. I saw the extension. There is some need here. I want to install anything, but they cannot solve it because of this responsibility problem. And when they released this TLE, many people, including both of us, started to think they
Starting point is 00:22:04 solved it somehow. But no, no, no, no, no. This is only for languages. It's very different, right? It's not for extensions. Languages are extensions as well, but specialized extensions. I might have misunderstood or misremembered this, but I think they also support some very basic level things that you would consider extensions. For example, you've just got a few small plpg sql objects for example i think those could be packaged up like this but they're again they're very they'll be very simple and very small and i think it's more about cloud it's more of a cloud provider thing to let people bring some of their own code and maybe let a DBA allow their team to
Starting point is 00:22:47 install or not install certain packages that they've developed in-house on their database or not so it's I think it's very it's a very small subset of what current extensions can do is this allows but maybe that's like maybe that's just step one and they've got step two to ten planned already and this is like maybe it could be a cross cloud provider collaboration effort on how they can let people bring extensions to their clouds but yeah i'm i understand it is difficult yeah well for me if if i again like let's like i do some bottom, what I have in my head, and we will probably discuss something else, maybe more useful for users. Because I think we discussed some philosophical questions, which are very important, but maybe not that helpful for users. Actually, they're helpful because they need to make decisions.
Starting point is 00:23:39 Go to RDS, go to Timescale Cloud, go to something else, go to install from Kubernetes operator or maintain everything themselves. So currently we have not good situation. Extensions are like very limited. Okay, list is growing, but limited. And full power of open source is full freedom. And managed Postgres cannot provide it. And this is not right. I mean, if I want,
Starting point is 00:24:12 I pay you, right? And this is the resources you provide and Postgres you provide with some additional automation, but I paid you a lot. And if I want some extension, I would like to have it, even if like accepting some responsibility on my shoulder. So I say, okay, I understand that something can be broken if I install this extension. But this right should belong to me. I'm like paying, this is open source. Something is totally wrong in industry right now here. Totally wrong. Like open source money I pay you,
Starting point is 00:24:39 and you tell me I cannot use the power of open source. I cannot install all extensions. I think something should be changed in the next five years or so because right now it feels, for me, like Oracle. I think you're onto something, but I might even use the example that we're currently living in a world where most cloud providers act like Apple, and they have an app store,
Starting point is 00:25:01 and they very much have to only allow a small number of apps onto it. And maybe what we're missing is a cloud provider that acts a little bit more like Google or Android that allows a lot more. Perfect example, exactly. Which is based also on Java, a lot of things are open source. It's also Linux, right? I mean, their operational system. And it can be installed on various devices, not only one single vendor. This is a great example, by the way. So, yeah, I think something like that is missing in our Postgres ecosystem. And we observe those dozens of companies who compete in the area of managed Postgres
Starting point is 00:25:40 and Timescale who switched to the idea cloud first, not only cloud first, but something else can be done here and something else should appear soon I think. And this thing that will appear will not limit us in the area of extensibility of extensions. I should be able to install
Starting point is 00:26:00 extensions. Extension I developed yesterday. Deploy to my production, Why not? Quick fire. I think we've not got that long left. So I think we should do so. I think you've made a really good point that we haven't done that many practical things. I've got a couple of tips for people.
Starting point is 00:26:16 I imagine you'll be able to add a few more onto this. But one thing to be aware of is major version support. If you're doing major version upgrades, please consider your extensions and check compatibility. Everyone forgets about it. Yeah, everyone forgets, but it's a major issue. And then the second thing is before you check, before you're using an extension, do check how it's maintained.
Starting point is 00:26:39 I think I quite often see people having to ping maintainers to add major version upgrades, major version support. So some are maintained impeccably and some it's a little bit slower. So just make sure that they're in line with what you're going to need and expect or that you could have the resources in-house to help them out if you really need to allow it. But each extension is a small separate product. Sometimes not that small if it's a Cytos or Timescale. But anyway, it's a separate product and it might introduce problems. They have usually smaller release cycles than Postgres if they are developed actively. But if you just install without thorough testing, you might have issues.
Starting point is 00:27:23 I had issues like some extension was updated and whole Postgres started to have segmentation faults. And we need to troubleshoot to find that it's not bad. Bad minor upgrade of some extension caused issues. So it's an area where we need to invest some efforts to maintain. But we shouldn't upgrade, but we should test it as well, right? Yeah. And for this, I would pay. I mean, my ideal thing, through open source Postgres,
Starting point is 00:27:58 installed anywhere, and supporting any extensions, some of them are maintained by this provider and I pay for it because they tested very well all upgrades and they automate those upgrades and so on. None of managed Postgres providers we currently have do this yet so this opportunity I guess. Sounds like it. and finally what's the most popular extension we have everyone knows
Starting point is 00:28:28 it right pgstat statements do you count that what does it mean it means it should be not an extension I've just
Starting point is 00:28:37 returned from from Ottawa where I had pgcon tutorial and you can find on twitter like a lot of
Starting point is 00:28:43 likes slide deck unfortunately it was not recorded. Probably we should repeat it. And I was telling basic things, TPS, QPS and latency for queries. Only TPS can be provided by Postgres itself.
Starting point is 00:28:58 For QPS and for latency, you need to use substatements. It should be in core. I like that. Maybe every five years, the most popular extension there's some effort to put it into core. That should be in core. I like that. Maybe every five years, the most popular extension, there's some effort to put it into core. That would be amazing. Yeah, if something is big, it should be merged, right? I like that.
Starting point is 00:29:13 Well, I think that's probably all we've got time for. So I'm sure we'll go into some of those topics in more detail in further episodes. But any parting things to leave people with? Well, I also wanted to mention there are some third-party registries of extensions. There is awesome Postgres on GitHub.
Starting point is 00:29:32 It's not only about extensions, but it's a good list of manually curated list of extensions and not only extensions. And it's good to check. It's kind of what people consider consider interesting and there are so many extensions out there actually there is also a saying that for each problem there is an extension it's already solved for example you you showed me today like uh someone said maybe it's a good idea to have a timestamp with time zone store and also answer was there is already extension for
Starting point is 00:30:09 that so yeah not just any old person that was robert haas and robert tree oh right yeah right great well thanks so much nicolai thanks everybody for listening and catch you next week thank you bye

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.