Postgres FM - What’s Missing in Postgres?

Episode Date: April 3, 2026

Nik and Michael are joined by Bruce Momjian to discuss his new talk "What’s Missing in Postgres?" Here are some links to things they mentioned: Bruce Momjian EDB https://www.enterprisedb....comWhat’s Missing in Postgres? (Bruce’s slides) https://momjian.us/main/writings/pgsql/missing.pdfThe Wonderful World of WAL (Bruce’s slides) https://momjian.us/main/writings/pgsql/wal.pdfGetting started with benchmarking (our episode with Melanie Plageman) https://postgres.fm/episodes/getting-started-with-benchmarkingMyths and Truths about Synchronous Replication in Postgres (talk by Alexander Kukushkin) https://www.youtube.com/watch?v=PFn9qRGzTMcThe Future of Postgres Sharding (Bruce’s slides) https://momjian.us/main/writings/pgsql/sharding.pdfMultiXact member space exhaustion https://postgres.fm/episodes/multixact-member-space-exhaustion~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to Postgres FM, a weekly show about all things Postgres QR. I am Michael, founder of PG-Must. I'm joined as always by Nick, founder of Postgresair. Hey, Nick. Hi, Michael. How are you? Good. And we're delighted to be joined by Bruce Monjian, who is VP of Postgres Evangelist at EDB and long-serving Post-Crest core team member, who recently gave a new talk about the missing features in Postgres.
Starting point is 00:00:23 So welcome, Bruce. It's an honor to have you here. Good to be with you. Wonderful. Perhaps we could get started with why this topic, why is it important or why is it important to you? So I'd love to say that I thought of this topic and I'm a genius, but no, actually, Melanie Plagman came to me in Riga, a PG Europe in the fall. And she said, we'd really love you to do a new talk at the PG-Dev, PG-Conf Dev conference in Vancouver. I said, okay.
Starting point is 00:00:55 and she said, how about things that aren't in Postgres? I think, wow, that's a great idea. So then I guess I was showing these slides to Robert Haas once I finished him. And he's like, wow, Melanie got a mention. I'm like, yeah, I said half the job is figuring out what topic to do for a talk. So he said, I'll have to think about that now. I can get a mention just by thinking of a topic. I don't even have to do the talk.
Starting point is 00:01:22 So, yeah, I thought it was a cool idea. I never would have thought of it. There's a couple of my talks where you'll see in the introduction of title concept from text. Because again, I have about 65 talks, I think, now. And it's hard to 67. It's hard to think of like topics. I've been really lucky because I did two new ones in the fall of last year. One is this talk.
Starting point is 00:01:52 Another is The Wonderful World of Wall, which is about the write a head log. And then I have a third one, which is more of an AI talk. I already have two AI talks. This is another one called building an MCP server using Postgres. So I got three new talks queued up for 2026. So it's unusual for me. I write my talks and basically give them for maybe a year around and then come up with a new one because I've been doing it for 30 years.
Starting point is 00:02:19 So on average, I'm making two talks a year. I have 67. But this year I get three. So I was like, wow, this is great. And Melanie, I have to give her credit for the topic on this one. Oh, that's awesome to hear. Friend of the show, Melanie, she's been a guest before. It's good to hear. Yeah, I have a question. If my memory doesn't play a game with me, 20 years ago, I remember you maintained the to-do list. That's right. Isn't this a to-do list? Is this a to-do list? I would say no. The to-do list is basically driven by, what people have asked for it could be a small thing
Starting point is 00:02:55 it could be a big thing this is more of the I would say the big missing features and a lot of them are ones that we have no intention to doing so it's sort of like let's step back let's look at what big things
Starting point is 00:03:10 were missing let's look at what big things we may be working on and then look some of the big things that we'll probably never do so strategic much more strategic you're right if I drill down to the do list would be here forever but again I think it is valuable to step back and look at what is missing and what isn't and I even learn something just by writing the talk because stepping back far enough and looking at the missing things that so I think
Starting point is 00:03:40 I have what is that 12 items and what's interesting is 11 of the 12 items relate to performance that surprised me. If you had to guess beforehand, what would you thought more of them would be? I thought it would be basically missing, basically operational features or missing security features. Well, that's TD is that one. That's one of the 12, right? Missing just integration, infrastructure, some of these big missing things. But when I actually wrote them down, I kind of.
Starting point is 00:04:19 kind of like, wow. So 11 of the 12 are basically related to scaling either on a single host, seven of them, or multiple hosts, which is another area we need to work on. So you got really two kind of 11 of the 12 performance, seven of them are for single host and four of them are for multi-host. And again, I just never suspected that it would be a performance need and not an operational need. If I had done this 10 years ago, I know there would have been a whole bunch of things that... I can bring some ideas. Maybe I should do it in the end, which are not performance related and definitely big missing pieces. And they're not included to your list. So maybe... I have... What's interesting, I've given this twice, both in January, one in Prague, one in Brussels.
Starting point is 00:05:09 And nobody really came up, in those groups, nobody really came up with anything that was wrong. The only I think feeling I got was some of the people were sort of like, well, Oracle Rack has some value in these small use cases. And I'm like, yeah, so there was a little bit of, eh, you don't have this Oracle thing or multimaster. Like, well, maybe there was some value. So I get it. Yeah, it's there. But I didn't get anything that sort of shocked me. So I'll be interested to see if you have anything.
Starting point is 00:05:40 I'm sort of like, oh. Yeah, I want to hear this. Why don't we do it straight away? Sure. Right now, let's discuss this. Okay, I'm looking at your list and I don't see, for example, synchronous replication improvements because right now, synchronous replication is not all right.
Starting point is 00:05:59 Maybe you watched Kukushkin's talk last summer talking about caveats. And this is a big topic, for example, when there are new people coming to Posgues ecosystem from my system. them from my SQL, for example, and one of the big group is Multigress, who are like creators of Vitas. Yeah. Right. So some words from them, some words from Kukushkin, and I see synchronous replication is
Starting point is 00:06:26 not all right because it's based on logs. And if primary already written data and then it suddenly starts, primary things data is written, standby, never accepted. This is one thing. another thing is that if you have logical replication involved, there is a nightmare to handle that. Basically, you lose during failover with issues, you lose logical replica. And looking at other systems, including MySQL, there is a big potential to improve synchronous replication. Basically, some people won't say names, but just watch Kukushkin's talk.
Starting point is 00:07:05 It's broken right now. So I'm working on the PG-19 release notes. I do see improvements in both of those areas in the release notes that I'm working on. They're not done yet. It won't be done for both two weeks. But it would be interesting if you could come back and take a look at see what was done. So what happens in that particular example, both examples, you're now out of the 30,000 foot and you're now getting down.
Starting point is 00:07:32 And there's totally stuff. Sorry, I'm not getting down. when people with enterprise and big cluster experience come, they say, where is synchronous application? We cannot build proper clusters. And what I'm telling them, most people still use a synchronous replication and they cannot believe. They cannot believe. Really? Oh. Yes. How can they say? Other systems have proper synchronous replication with proper consensus algorithm. This is another topic. There are many attempts to bring consensus algorithm or something inside PostGGIS, right? That's similar to Puler question.
Starting point is 00:08:06 It's always there are many ways. And so this topic is not about performance. It's about H.A. No, what I'm saying, I'm saying, Dan, what I'm saying is I'm trying to look at this at a very high level to look at big holes. What you're actually looking at is we have the feature, but it doesn't work right or it needs whatever. I get that. But again, if I go down that level, I'll never get done, right? I'm trying to, from the top, look for these big holes.
Starting point is 00:08:33 holes and actually to me not only look at the big holes but look at why are they holes right because a lot of people who aren't me or you who are sitting watching everything is happening all you 20 hours a day right they're like why don't what's the problem what's the holdup and then the other thing for me is by by having that 10,000 foot view I see a pattern clearer than I could see when I was down in the... I'm not disputing that we have... That these things are not feature complete, I would say. And it's not atypical for us,
Starting point is 00:09:11 particularly with partitioning and logical replication and failover to take years as you to get to a feature complete point. And I think we aren't at feature complete point a whole bunch of areas. But I tried to stay away from, okay, we're not feature complete here, we're not feature complete there.
Starting point is 00:09:27 I'm like, what things are we not even like moving in? And there are all. a bunch of them that I would say we're just not engaged at all. Yeah, I get that. That's a good description. And when you're at this highest level, you see a lot of performance-related stuff, right? That's right. At that level, I see a lot of stuff that we're not even in the ballpark in, that we're just scratching the surface. I'm not saying all the things we've done so far that have been implemented or done perfectly or feature completely, clearly or not. But at least I feel we're moving there. What I see up top is a whole class of things where we're not even engaging and we may not need to
Starting point is 00:10:08 engage. It's more of a thought experiment for people to come in and say, okay, what things are they not even in the ballpark on? And it's helpful to know because if you look at the list, it's not a terrible list. And if you think this is the biggest stuff we don't have, we must have a lot of stuff, right? Because I can't think of any other big things except these 12. There might be one or two that somebody will think of at some point. But I think for me, it's a great exercise in seeing where we are at that big level. Where do we think we need to go five, ten years out? That's usually the area. That's the distance I'm looking in, not necessarily what comes in 19 or 20. And I think some of the stuff you're looking for is coming in those releases. But what stuff do we really need to, at least
Starting point is 00:10:54 to evaluate and say, is this a direction we need to go in or not? Looking through the list, it struck me that there's a bunch of things that you can currently do, but you need extensions or a separate tool. You can use Postgres and get those features, but you need an extension or another service, like PG bounce or something. But there's also a bunch that you literally can't do with Postgres, so you would stick with maybe a proprietary database, or it's just a different solution completely. maybe a fork or something like that.
Starting point is 00:11:26 So it felt like there's that kind of two categories there where there are things you can do with progress, but you need something outside of court, and there are things that you can't do with, like you need to do? So is there one of those that's more interesting than the other in terms of bringing into core? That's funny.
Starting point is 00:11:43 You mentioned that. I really struggled with that question. And the audience struggled with that question. I remember people saying, hey, what about this and what about that? And I think the version, these slides are, regularly updated. Like every time I give a talk, I think it's something that I'm just looking at one of the one of the capitalizations I did wrong. Anyway, I'll fix that after this talk. But my point is that you're
Starting point is 00:12:05 right. You have there's like concentric circles. So if you're willing, if you're willing to go to a proprietary fork of Postgres, it's a big circle, right? And then if you're willing to use extension or we're external tools. Let's say external tools and then proprietary extensions and then extensions and then the core right. In fact, maybe I should have a slide for this to illustrate what that span is. And in a lot of cases, we will say we have it. This is just great. You ask that question. Nobody asked it quite the same way you did. So the question, some of the complaints we have are not that it's not available as either an extension or an external tool or a proprietary work of Postgres.
Starting point is 00:12:59 The complaint we have is it's not in court postgres. Yeah. Okay. Or that there's an extension, but the extension, because it's an extension, doesn't work exactly the same way it could possibly work. It was in court. Or if it's a third-party tool, the third-party tool is not as effective. or easy to use as it would be if it was integrated,
Starting point is 00:13:23 which is the connection pooling example. Or there's a choice of them. Like, why choice of them? Right, there's a choice of them. Although the choice of them I kind of like, particularly in the PG-PoG bouncer realm, there is actually a reason for that. But my point is that some of the missing stuff,
Starting point is 00:13:41 even something like column there, right? You could say we have extensions to do that. But yeah, I guess, but I'm not really happy with that. and I'm not necessarily wedded to Citis and it has limitations and because it's an extension, it might not be available. So you get all these sort of complaints. And it brings you to a bigger philosophical question of does everything need to be in core, right?
Starting point is 00:14:06 It's probably not. I think PGVectar is a great example of something external that can be developed on its own release cycle. And it really doesn't hamper its ability to be used effectively with the code. But when you're starting to talk column there, you're starting to talk connection pooling, where you have problems with authentication going through the connection pooler. We've tried to improve that, but it's still kind of yucky. Logical replication at DDL, you know, another one that's available as an external proprietary product,
Starting point is 00:14:35 but not in the code. TDE is a classic that's available in a bunch of external stuff. It's not available in the code. So it's sometimes when people complain or it's missing, they're like, what I tried to focus on was basically in this talk, is it available either in Postgres or as an extension that has no downsides? That's my litmus test. And anything that has limitations and is an extension, limitations as a third-party tool
Starting point is 00:15:06 or obviously as a proprietary fork, those are going to be effectively not available in the community. And we have to decide, is that a good place to be, a bad place to be? maybe that's fine. I really like the concentric circles thought, and I want just on the kind of the last two, is there also a difference in your head between contraib extensions and core?
Starting point is 00:15:31 Like, for example, PG stat statements, a lot of the PG stat views are on by default when people install Postgres, but PG stat statements isn't. So like, that's an interesting. I don't know if that's too semantic a discussion, but that feels interesting to me is that missing or not really,
Starting point is 00:15:47 missing. You're right. That's another concentric circle. Yeah. So yeah, see, like PGVector, maybe that would be nice to be in Contrib instead of being external. But then if we pulled into Contrib, then it's release cycle becomes tied to the major releases. Maybe that's not a good idea. I don't know. And when you start to talk about the cloud vendors will support some Contrib extensions and not others. Yeah. And you get people pull me aside, hey, why don't you move this into Core, get out of Contrib and I'm like, what's the matter with Contrib? And you're like, well, my cloud vendor doesn't support it. I'm like, okay, so you're complaining to me that your cloud vendor doesn't support
Starting point is 00:16:24 something worried shipping is that you think that's going to be like fruitful or, you know, and they're like, well, why isn't it, you know, but there's a variety of reasons that stuff is in Contrib, it may be an edge use case. It may be like PG stat statements, creates its own tables and has its own overhead that we're not sure everyone would want. So that kind of makes sense out there. Some of them, like an oddball one, like Cube, an edgy case, PG-Trigram.
Starting point is 00:16:59 I don't know. It's hard to say. Is it better than extension? Is it better to be better in core? You can make a case either way, I would say. Nice. From the list, there are any that are like particularly your favorites or ones that you would love to see progress on?
Starting point is 00:17:16 Or do you not like to pick favorites? Yeah, the two that I've really felt called to champion are cluster file encryption, which is called TDE in the industry and sharding. Those are the two that I have the biggest impact in terms of adoption and expanding Postgres workloads for me. The other ones are interesting, but they're more, I don't know,
Starting point is 00:17:42 don't see them as expanding the Postgres adoption universe. That's, I guess, that's the phrase I would use. Yeah, the other ones are good, but they're more operational. Oh, it'd be nice to have this. And, oh, I could, I would be more optimized if I had this other thing. But for me, the sharding and the TDE really take Postgres to a new level. And I would say in both of those, I've pretty much either failed or executed poorly in terms of, because I've been working, at least championing those for at least five years now. And I don't feel I've made a lot of, I guess I've been waiting for kind of a groundswell of support because both of them are very hard to implement as an individual. I got close with TDE, but got stuck on something. And sharding, I think, is,
Starting point is 00:18:30 it really requires a team. And I just, I don't think I've seen the kind of desire behind that. I think part of the reason for sharding is that machines keep getting so big. that effectively you're better off just buying a bigger machine or renting a bigger machine than going down the sharding route because the sharding route really is very workload-specific and there's an increasing dislike of workload-specific things in an enterprise. There's a much more pushed toward just generic compute, generic solutions, generic, and the sort of, oh, I got TISA with special chips on the drive, which filter stuff, and, oh, I got this special, whatever, special hardware does something.
Starting point is 00:19:23 It just, the enterprise focus is a lot less on hardware now, a lot less on infrastructure, a lot more in solutions. Maybe that might be good. I'm not saying good or bad, but I think that, I think sharding has, as people have needed sharding, systems have increased in size to the, really in pretty much. much in lockstep to what they needed. So the number of people actually need it is limited, and the number of people who are willing to put in the effort to architect it is also limited.
Starting point is 00:19:50 Yeah, I agree with you that hardware available right now is so huge. And recently I saw a cluster which had default vacuum settings self-managed. So usually like RDS or others, they have tuned. But it was the default of the vacuum settings, three workers only, nothing tuned, and it was 20 terabytes and somehow surviving. It was insane, absolutely. So how was it possible? But at the same time, HDS and CloudSQL and others, they have hard limit. I know they work on it.
Starting point is 00:20:29 And Aurora has more 64 terabytes. And these days, to collect that amount of data is not rare already. even one person surrounded by EIs can collect a lot of data. This is one thing. Another thing is that there are hard problems, for example, lightweight lock manager problem, which was solved in PostGIS 18. There are other problems as well. When you grow to some heights, you encounter with,
Starting point is 00:20:55 some people would not like to spend time firefighting, right? So Shardink is really needed. But I guess, what's your take? For me, it's like it's really hard to find consensus, how to achieve it. And also sharding topic usually triggers topics like internal connection pooler, for example, because they are like adjacent. If you think, if you like talk about routing and so on, you think about also like
Starting point is 00:21:18 auto-failover, you think about connection pooling. And these topics usually come together in my head. But there are other ways to do sharding, right? So there are many ways and what your take. Will it ever be implemented in core or no? I've always been the opinion. I do have a sharding talk on my website, but I've always been the opinion that
Starting point is 00:21:38 the sharding has to be really developed organically within Postgres, and it really has to be built on effectively partitioning foreign data wrappers and parallelism. I don't think we're going to have the appetite to create a whole new architecture for sharding.
Starting point is 00:21:56 And every year that goes by, it becomes clearer that, yeah, we just can't. So I think we can get closer than we are now. with little impact. I think if we, for read only sharding, I think we want to do read right sharding, then we have to have a global lock manager,
Starting point is 00:22:12 global snapshot manager, becomes much more complicated. Yeah, like I keep going there and people who think they need it effectively don't end up needing it or end up getting bigger hardware or re-architect what they're doing and then they don't need it anymore.
Starting point is 00:22:29 So I think that's one of the reasons it hasn't moved forward. but I do think, I think for right only sort of a data warehouse kind of sharding, I think within two years we could have a pretty good solution in the industry. But I just haven't seen a lot of, there's one guy in Fujitsu was working on it. But that was about it and just everyone. And frankly, it's been pretty dormant for the past two years. And I haven't had time to work on it much either. Yeah.
Starting point is 00:22:56 If you stay on a single primary cluster, there is a big limitation well known for those. who achieved some heights, whether you have like 200, 300, maybe bytes per second, and we talked about this a lot with folks who develop sharing systems. You, like, you hit the limits of single-threaded process of logical, physical, logical, wall-receiver,
Starting point is 00:23:20 basically, not wall-receiver, a replay process, right? In it, it shows up as a neat and top, which basically replays the changes from the primary and it's a single-threaded process. And I know also somebody from Japan working on it since 2013, I think, on and off. I saw some conference talks and so on. And this problem is not included to your list.
Starting point is 00:23:44 It's not a small problem if you achieve heights. What do you think about that problem? Yeah, so as I remember, that was the one where they're trying to create a dependency graph from the wall and therefore identify which parts of the wall are paralyzable. and send those off to workers. CPUs do that now with CPU instructions. They figure out which parts can be run on different cores or in parallel core. Yeah, you're right.
Starting point is 00:24:13 We don't have threading, so, yeah. I don't know threading is, threading's on the list that I have. In the list, yeah. I'm not sure threading is a requirement for that. Because we do have parallelism without threading now. It seems to work fine. because you load them into shared memory queue and the process just pull out of there.
Starting point is 00:24:35 I don't know. I think you're right. I think that would be getting a paralyzable replay of logical replication would be physical doesn't seem to matter too much because it's so fast that...
Starting point is 00:24:48 Hold on one second. Logical, I think already if there is work maybe in Postgust 18 or something, I remember something for logical to paralyze it. Paralyze it, okay. I think so. Technically, you can paralyze it with multiple publication subscription pairs, multiple slots, but it will be having problems
Starting point is 00:25:04 with foreign keys. It will be like eventually consistent in terms of referential integrity. I'm talking purely about physical replication, which is... Oh, physical replication, okay. Exactly. And this is like, this is limit some companies hit, and it's super painful because there is no escape from it. You need to do either sharding or vertical split.
Starting point is 00:25:27 Yeah. you need to achieve some like a lot of rights basically. I think the reason we don't hear about it a lot in the community level and this might be different from the level that you're working at is we're more of it. We're definitely a general purpose database. And not that we can't go to the heights and we keep pushing the ceiling up higher and higher, but there is a limit to how much. much complexity we're willing to add and potentially unreliability to the product to
Starting point is 00:26:02 Postgres to get up to the super heights and it's possible that's why you know I know you specialize in that area that's possible we have what we don't hear a lot about it that didn't even that didn't get on my list is something that I've heard a lot about I don't see many email threads really addressing that this is so but I know for example in stock art he did years ago during COVID actually and there are others who hit it. Sometimes people hit it and don't notice because, okay, some replicas are lagging a little bit and so on. Yeah, but I understand it's not a super common problem. We had a, EDB had a customer who hit it on specialized storage hardware, and we identified that it was the storage hardware that was causing it. So that was, yeah, that was kind of a,
Starting point is 00:26:50 get new hardware, get new storage hardware, your problem will go away. So that was, I think, the answer to that one. that's the only case I've heard of replay complaints and lagging sounds good but what's your take on threading topic because there was a big impulse a couple of years ago from heike originally i think yeah yeah i have a blog entry about it which kind of from 2018 which references i think it might even references heikis
Starting point is 00:27:23 Hakey's, let me see if it references Hakey's email. No, it doesn't. I think we're still on the fence on that one. We keep chewing away at some of the small things like getting rid of global variables, I think was one. And I think the other thing that's going to bail us out here is that we now have, it appears as though we never have compiler support for this. So instead of having to re-architect all our code, it
Starting point is 00:27:53 looks like there's some if we can get rid of the global variables which is pretty simple it looks like the compiler will sort of read local a lot of stuff for us to make it easier without us having to like really re-architect a lot of the code so i'm hoping in that area but my understanding is that the threading prototypes don't seem to gain a whole lot they don't at least from what we found because it reduces task switching time. But again, for example, if you're doing a GUI, you really need threading because you're updating a whole bunch of stuff on the screen at once and you have one of all these background jobs to be running in your dress space and so forth
Starting point is 00:28:38 to update a whole bunch of very light processes and light statuses. But when you start to talk about a database, there really isn't a whole lot of light out there. Yeah, I guess we could do index lookups with threads. That would be cool. Visibility bits, maybe? Visibility, yeah, but the problem is there's, a lot of the stuff is really heavy. For example, when you're trying to paralyze a query, those are not light, right? So you might as well just get a process and run it and have its own everything.
Starting point is 00:29:07 So I think that's what has slowed it down. Another problem, obviously, is resiliency. Wait now if obsession crashes or runs out of memory or whatever, we keep running just fine, unless it's in a critical section. whereas with threading, it would make us less resilient to sessions that misbehave, and we have to balance those two off. We know the value of threading and then the less resiliency of it. I don't really know the answer.
Starting point is 00:29:33 I've been surprised we haven't done more in this area, but it appears to not be a huge problem if I'm not hearing about it every couple months. Let's talk about huge problems related to four-byte transaction ideas. which is included to your list, 8 byte transaction ideas. And some forks have them for years already. And also, I saw some preparation work and so on. I'm very curious what are your forecasts and take on this. But also, which is not included to your list, is on dualogue and redesigning MVCC
Starting point is 00:30:10 and these topics, which were quite popular a few years ago. There are some efforts, but now, like, silence. Maybe I don't see it. No, I think you're right on the NBCC. I mean, we had the Zheap effort by Robert Haas. That was probably 10 years ago. And he really poured into it. But I think the job just got too large.
Starting point is 00:30:32 And I think you also, once you start to look at what undo requires, things get very complicated. So I'm not saying it's not doable, but you end up with a lot of complicated challenges. and Oracle suffered from those challenges for decades until they finally figured out how to deal with them. So what would we gain by having an undo? Certainly it would make updates easier. I'm not sure it would help with something like deletes or aborted inserts. Who needs to do it?
Starting point is 00:31:11 Yeah, just update be done with it exactly. It's all the bloat and vacuum issues, I think, that people are most excited about getting rid of. Yeah, I mean, but at the same time, they keep improving vacuum every release. I saw it in the thing. You have today's vacuum. You have the vacuum from two years ago. You have the vacuum five years ago. Which one are we complaining about?
Starting point is 00:31:30 Because, in fact, it has gotten better. I'd love to just get rid of the whole thing. I don't think we, even with an undo, I'm not sure I'm ever going to be able to get rid of vacuum. Because it has different jobs. as well. Right, because of the problem with aborted inserts and deletes and so forth. So what is that going to look like? I don't know. I don't know. I'm surprised of how long we've been able to stay with what we got, what, 30, 40 years ago. Much more simpler architectural setup. It's funny, I had a discussion with
Starting point is 00:32:06 somebody, and I was in Geneva for the CERN PG-Day, and somebody came to me and they were talking about page compression. Do we do page compression? And I have a blog entry about it, and I showed it to him. And he didn't really, I hoped he would read it from my phone, but he didn't want to. I said, why do you want page compression? It requires less disspace. I said, this is not the 1990s. You know, what actual reason? I said, I understand why it was needed like 30 years ago. But I'm not sure that, because once we compress a page, then obviously, Obviously, if we do an update and then the new data can't be compressed as well, then we've got a bigger page, we've got to put that somewhere, and where do we put it,
Starting point is 00:32:53 and then how we had deal with the index entries. I'm like, what are we really gaining, once we're all done, all that computation, all that moving the data around, what have we really gained with the compression? Now, we can use storage compression, right? You can, there is storage compression. I think Postgresor run on that. But again, that pushes the problem down to the storage layer. I don't know what performance would be like, because frankly, databases don't like compression
Starting point is 00:33:17 of that size. But I guess my point is that he was asking for a very specific thing. But when I asked him why he wanted it, he really couldn't articulate, except, okay, it uses less disk space, but I wasn't sure what that goal was there. For me, for me, compression related to, let's compress data, it's more related to column storage and how there you can compress. I told him that. I said, if you're telling column storage, that's a completely different setup.
Starting point is 00:33:47 And we do have solutions for that. But again, you're assuming a lot of duplicate data in the same column, duplicate data. And that's a different case than compressing an AK page on disk. And he's like, no, no, no, I don't want that. I want the one about the page. I don't see anybody working on that. It's like, okay. We don't serve this in this restaurant.
Starting point is 00:34:09 Yeah, we don't serve that. In terms of 64-bit transaction IDs, I think it's a little hard to understand what's going on when you look at the email threads because I think the impetus is that Postgres Pro has a version of Postgres that does 64-bit transaction IDs. And therefore, you don't have to freeze and the whole thing there. But as we talked about earlier, Nicola, I think that the Postgres Pro customers are really, at the very high-end throughput requirements and they're willing to pay for that they're willing to have a more complex system
Starting point is 00:34:48 that does that. But when we're now in the more generic workload case, the Russians have been very willing to give us the patches to do it. But there's a resistance in terms of exactly how to do it in a way
Starting point is 00:35:04 it's going to benefit the high-end users, no question. And one of the things I've learned in the past year is that one of the reasons that Oracle looks that way and a lot of the proprietary forks look the way they do is because they're really selling to that top 5% of market volume. Whereas when Postgres is working on its code, we're trying to hit that 50% mark, right? And the reason Oracle is so complicated, the reason a lot of these databases become complicated or add these complicated features is because they're really selling, they don't really
Starting point is 00:35:38 care. I don't say they don't care, but they're only really focused on their top 50 customers and everyone else is just along for the ride. And therefore, you get these very complicated systems with a lot of weird options, which were added only because one of the 50 wanted it. You have this very high-end group who's very demanding, who's calling the shots, and they're dragging the database into this high-end volume. But every time they're dragging it up there, they're potentially making the generic workload either slower or harder or less efficient. And they effectively don't care because the money is in that high end. Let me comment, please.
Starting point is 00:36:18 I cannot keep silence here because we help startups. Our primary customer is startup which grew some terabytes and they saw something and they hit problems. And last couple of years we experienced a lot of AI startups. And this is crazy, absolutely. they reach heights much more quicker with much less effort and resources and they start hitting problems
Starting point is 00:36:43 for example very quickly when freezing is a big problem and we need to do manual freezing to skip indexes for example this is very common right now for us and second thing is that yeah so second thing is that we already losing some of them
Starting point is 00:37:00 because when we start saying you need partitioning is like you need to locate some resources for this, like experiments and so on. They switch to different database system or quickly consider it, where like less headache. And this concerns me a lot. Isn't that kind of Bruce's point that if they were an Oracle customer and they grew to being a significant portion of revenue worth saving,
Starting point is 00:37:25 that's different. If it's the Post-Crow project doesn't have to listen to the huge customers because their payments. Like maybe Oracle are incentivized to listen to those customers more than the average customer or more than all the little guys. So Postgres is a different landscape where we can listen to people more like across the board, not just the guys hitting extreme scale. My point is that smaller teams have bigger databases with increasing speed and the heat
Starting point is 00:37:51 problems which were hit only by big users before more and more often. And these guys are more like ready to switch faster because they are surrounded by AI themselves because they are AI startups. What do they switch to? Sometimes proprietary databases, AWES or GCP, what they offer. Sometimes MongoDB also on a table sometimes, going to different problems.
Starting point is 00:38:19 But I'm just concerned that I see this different thing that I didn't see 20 previous years. So if we look at what's going on with the 64-bit IDs as a microcosm, so they're willing to give us all the patches to make this happen, right? But what's happened is that we've only incrementally implemented those. And we have some of them in PG-19. For example, the multi-exact sizes are now 64-bit, the members or the groups.
Starting point is 00:38:47 I can't remember. I think it's the groups. And sorry, we had a couple of good examples last year, and some of them came to our podcast where this was exhausted, this space. The multi-exact. Yeah, yeah. And also, yeah, AI-A-I startups, which have a lot of data very fast. So there's one, two ways we can go at it.
Starting point is 00:39:08 We could just go whole hog, 64 bit everything, increase the tuple size header size by 33% by making them 64 bit and be done with it. But we're worried about the impact on an average user. So what we've been doing is slowly 64 bit enabling the server, multi-exacts is one. And there's some other areas. I think C-Log, we're looking at doing. I don't know if we got to that.
Starting point is 00:39:35 So we're trying to do it. We're trying to 64 bit the areas that are costless to us, basically, where we can increase it without really any problems. And then I think once we get to that, now we have 64 bit pretty much everywhere that we can easily do it. Then we have to look at, okay, now at the places they're going to cost us, what do we do? I think one of the great ideas that I saw was to get basically an episode. an epic LSN on the page header so that you could basically say that for mass loads of data,
Starting point is 00:40:12 if you're doing a mass load of data, which a lot of these companies do, effectively all the tuples are in the same transaction. They're clearly all in the same epic, right? So you just put one epic and then all your heat tuples look just the same as they did before, right? I think that's where we're going is the concept of having an epic on the page, and then basically allow those not to necessarily have to be frozen at all. I think if we start to bring the epic down to individual rows and we increase the header size,
Starting point is 00:40:44 which people have already complained is too big, then we potentially get complained. So I think if I did get a guess, I think that's where we're going, but we're going at it very incrementally, again, hitting all the places that store LSNs outside of the day, the heap and index pages. We know those are going to be complicated and get the infrastructure done.
Starting point is 00:41:07 And I know that we have some patches in 19 for that. I've seen them coming through that got committed. And then we have to decide, okay, now that all this stuff is in place, let's put up a proof of concept. Let's see what a per page epic looks like. Let's do some loading of data. Let's see, you know, how far this gets us down the road. We don't have a whole lot of empty space on the pages.
Starting point is 00:41:28 So it'll be a little tricky to figure out where to put it. but I think we can find a place to put it. But I think that's where we basically are, is trying to figure out once we get all the bottom stuff done, what do we do for the main stuff, the heap of the index pages. Nice. I think that makes a ton of sense, and I really like the incremental approach that Postgres takes.
Starting point is 00:41:50 I've only been in and around the project for the past 10 years or so. It's a lot. It's a lot compared to some, but it pales in comparison to 30 years. And it's been really nice seeing the incremental improvements, but also that they stack up. It really has come a long way in 10 years. And things that looked incremental kind of nine years ago, eight years ago, seven years ago, they've really added up. And you mentioned vacuum, but it's not just vacuum, right?
Starting point is 00:42:17 All of the changes that have helped with index bloat. They're kind of attacking it from the other direction. So it's- That's very true. Yeah. Logal replication, yeah. Yeah. Partitioning. Yeah. Yeah.
Starting point is 00:42:29 I like that approach and I feel like it might sound like it's going to be slow, but time will fly in a few years. I can imagine us having made significant progress on some of these things. So yeah, that's really cool. Bruce, I'm conscious of time. Is there anything we didn't talk about that you wanted to make sure we did mention or any last like shoutouts or pointers you wanted to give people? No, I was just nice that we got a chance to at least talk about, for me, the categories of what's missing was a big takeaway for me to understand. understand like why are we here, why don't we have some of the stuff we don't, that we're missing, and some of the stuff we may never have, optimize our hands.
Starting point is 00:43:07 Although that seems like a portion of that is coming in Postgres 19. It's not called optimizer hands. It's called advise, but it could be used in a similar way. So, again, I think that we don't have a roadmap. That's the problem. Because we're too dynamic to have a roadmap. So it's almost a surprise by me to see what gets in every release. And I think everybody would be presently surprised by what's at 19.
Starting point is 00:43:31 And obviously what we're going to be, what we're going to be starting for Postgres 20 in July. Yeah, absolutely. And I think I saw the work from Robert. I think it was in contraband modules, right? At least the first version. PG plan advice, yeah. Yes.
Starting point is 00:43:45 Yeah. Looks very cool. All right. Well, thanks so much for joining us. I'm sorry Nick had to drop off. It is an absolute pleasure having you. Great. Thanks for, nice talking to you.
Starting point is 00:43:55 Likewise. Take care. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.