Postgres FM - Anniversary mailbag

Starting point is 00:00:00 Hello and welcome to PostgresFM. I'm Michael from PG Mustard. I'm back after a couple of weeks off. Thank you so much for holding the fort, Nikolai. This is Nikolai from PostgresAI. How are you doing? Much better than the last two weeks. Thank you for returning. I was thinking maybe that's it. You decided to stop this after one year of doing it. Yeah, unlike Americans, us Europeans do take holidays and two weeks off must be almost all of your annual leave, I guess, over there. But yeah, we like our summer holidays here. I don't know what you're talking about. I have three spots in my calendar today.

Starting point is 00:00:34 I've noticed so people still can schedule talks, even if it's holiday. Today is July 4th, but I still have some work to do, probably. So this is episode number 53. Today is July 4th. And our first episode was published last year, July 5th. Well, yeah, so this is our one year anniversary. And we decided we're going to do a mailbag episode. We've got a lot of requests and suggestions and things from people over the over the year, probably more than either of us were expecting. So thank you, everybody. But it does mean we have struggled to keep up with the suggestions.

Starting point is 00:01:12 And we've got a few that probably aren't full episodes or that we can give a little kind of quick couple of opinions or maybe even it's just a quick answer to. So I think the idea was for us to go through some of those this week so it's going to be a bit of a mixed bag hopefully a few of you that have asked those questions will get something out of it and we can probably follow up with a few of them in more detail at some point all right it's like a photo dump in instagram by the way the other side the software we are using just told me that subscription has expired, meaning that exactly one year ago you subscribed, right? Congratulations. You need to pay again. Okay, let's start. Wonderful. So the first question I had in the list was, what is the effect of wow log hints equals on after bulk deletes? Why does the next select one slow and

Starting point is 00:02:06 generate tons of wool? So this one came in via Twitter. Yeah, that's strange. I saw this question and it's strange that wow log hints leads to, I don't think there's connection here. Maybe I'm wrong, but wow log hints are needed to propagate hint bits to wall and to replicas. So one of the key purposes of it is if file-overs, for example, Patroni supports it, if file-over happens and there is some deviation, old primary can be slightly in future compared to new primary. And in this case, regular approach, we need to rebuild it. But if hint bits are wall logged, Patroni can apply pgRewind or we can apply pgRewind and make standby out of former primary much faster. But speaking of bulk deletes, I think the problem is not

Starting point is 00:02:59 in wall logged hints. Maybe I'm wrong again. But I think the problem itself is in block deletes themselves. If you delete a lot of tuples and then read from the same pages some live tuples, or revisit... If indexes don't have version information, so index can checking hip records, records in heap and table, might find that these records are already dead, just deleted recently. And there is also a mechanism like in place vacuum. When you select something Postgres might decide to vacuum right now a little bit in this page. And when vacuum happens, it leads to additional wall writes, of course. So I think the problem is, again, maybe I'm wrong. Wall log hint, if it's off, then on replicas, we might see selects leading to writes.

Starting point is 00:04:00 Because we don't have hint bits, there's an old article in a committer blog. We can attach it. If you don't have wall log hints, on replicas you just select something and postgres decides to update hint bits and this causes some writes on replicas but this is different so i think the author of this question deals with bulk deletes and the problem is bulk deletes and lack of control of auto vacuum behavioracuum behavior. So what we need, we need to split deletes in batches. We need to control it. We need to make sure that dead apples are cleaned up more aggressively, more actively by auto-vacuum workers.

Starting point is 00:04:34 Or manually we can vacuum ourselves, right? So this is what I think in this case, but maybe I'm wrong. Yeah, I'll respond to the person on Twitter, case, but maybe I'm wrong. Yeah. Yeah. Well, I'll respond to the person on Twitter. So hopefully they can let us know more detail if they're, if they've got it for anybody else wondering, this is off by default. So not something you need to worry about unless, unless you've turned it on. Yeah.

Starting point is 00:04:58 If I'm wrong, it would be great to, to have some reproduction and explore it. It would be interesting to explore with additional tooling. Could it be that the select is setting hint bits and therefore causing a load of full page writes, whereas it wouldn't have done? If you're just doing a select afterwards, you wouldn't have caused any wow without this setting on. So you're causing a bunch of full page writes

Starting point is 00:05:24 that you wouldn't have done in the previous setup? Maybe, actually. Yeah, interesting. Well, yeah, it's interesting. We can explore this with page inspect, for example, right? We can delete, inspect the pages before our select and after our select, and hint bits will be visible there. So page inspect could help to understand the behavior.

Starting point is 00:05:45 Yeah. Great suggestion. Should we move on? Yeah, let's do it. Right. So how to get started reading PostgreSQL source code, maybe PostgreSQL style C reference guide to consult with for non-C programmers. So yeah, any advice for people first getting started or I've got, I saw on the Wikiiki on the Postgres wiki, there's a really good, slightly, a major challenge if you want to be a developer. The major challenge is different. It's like dealing with discussions with other people and convince them that your vision is better.

Starting point is 00:06:38 Yeah. So defending your thoughts, opinions. But reading source code requires no... In the beginning, it doesn't require C coding experience. It requires just understanding English. Because there are good comments and there are good readme files.

Starting point is 00:06:56 It's enough to understand what's happening. Code is covered with good comments, definitely. And I think this is the main thing. I usually just read it and maybe some challenges to find how to find proper places in source code because it's huge you need to find proper places and i usually use git git grab sometimes i use git lab or github search but it's not super convenient also there is source graph or something which helps you to

Starting point is 00:07:25 navigate. You have a function, you can quickly see. I have a browser extension for that. I can quickly jump to definition of functions to see all calls. So it's basic tooling for navigation in code, regardless of language. It helps as well to navigate a little bit. But I also think there is opportunity here and actually some small secret to be revealed. There is opportunity to have a good tool to talk to source code, right? To like explain me, I mean,

Starting point is 00:07:58 chat GPT, right? Or something, some LLM or something. Just explain me how this was working and based on comments and not only comments, on code itself, this tool can explain, show you, mention particular files and even line numbers, depending on version, of course. Oh, also, a very good thing is to read two books, internals books. One from Suzuki, one from Rogoff, Igor Rogoff, and particularly Igor Rogoff's book about

Starting point is 00:08:25 Postgres internals. It has links to source code, like check this file. We described something here in terms of how Postgres works, and you can find it in this file. It has on side these links. It's very convenient. This is my overview of the problem. Nice. I agree with you, by the way and i've found github search to be surprisingly good for just for reading the comp like for getting to places where the comments are good they improved recently a year or two ago nice next one so this is quite a long one but it's basically about isolation levels their uses in different scenarios battle tested strategies and insights,

Starting point is 00:09:06 performance trade-offs, edge cases to consider at scale with replication, sharding, etc. There was some interesting Jepson analysis that they've linked to. And has this type of behavior or another one similar to this affected you or your clients in any significant way?

Starting point is 00:09:23 Well, I'm a very LTP guy with very focus on mobile and web apps. And we usually used to work at default level read committed, in which you might see inconsistency, for example, in single transaction. Oh, by the way, we also usually tend to have small transactions, sometimes single statement transactions, right? And actually being like DBA,

Starting point is 00:09:49 I hate when people use explicit begin commit block for a single statement. It doesn't make sense. It increases round-trip time and affects performance. Not like sometimes not significantly, but sometimes a lot. So first of all, isolation levels matter a lot when you have multiple statements in a single transaction.

Starting point is 00:10:10 And there you can have anomalies. But it's interesting to understand, like, usually we have asynchronous problem when reads are not yet propagated, but if someone, for example, added a comment, refreshes page, and we go to replica in second request and don't see comment which was just added by this user. So we implement something like stick to primary and so on. And similar effect can be seen with just single node, primary node, because if you have different transaction inside transaction, you read something, then you read something else, because you read committed, old committed transaction. So these anomalies, and, and or deleted something that some concurrent session deleted something in your transaction, you just read something, you read it again, and you don't see it, right? And this is okay. I mean, we got used to it. We just keep it in mind. And when we design

Starting point is 00:11:06 our transactions consisting of multiple statements, we just understand that this might happen. So I was going to ask you, actually, what proportion of your clients, your customers, sorry, that's a confusing word here, even change the default here? I haven't seen many or I haven't heard of many. Well, first of all, we all change when we use PgDump. For example, PgDump is working in repeatable read because we need snapshot, we want tables to be in a consistent position. We want to deal with single snapshot. We don't want like reading different table, breaking foreign key, for example. And if we run PitchDump in multiple sessions,

Starting point is 00:11:48 dash J, or number of jobs, is four, for example, to move faster, right? In this case, we have snapshot, which they are synchronized. All workers of PitchDump will work with single snapshot. We can do it in our application. It's not difficult, actually, to specify snapshot in repeatable retransaction. So repeat repeatable read transaction is needed, right? It's good.

Starting point is 00:12:10 And we all implicitly use it when we run PgDump. But as for explicit use, I saw only two big use cases. One use case when people understand very well what they do and they move very carefully to higher level understanding that when you move to repeatable read and especially to serializable you are going to start getting some deadlock deadlocks occasionally and some slowness but a second case is more interesting and probably happens more often. Some new developers who don't understand yet the problems of moving to upper level. I mean, sometimes we need it. For example, PgDump is one of the cases or some billing system probably sometimes we do need higher level to avoid these anomalies. But another approach is like people just decide, oh, we want to be in a very consistent state, let's start with a serializable

Starting point is 00:13:05 level right away. I saw it not once. And in this case, then the second thing they do is start complaining about Postgres performance. Yeah, so that's the big trade-off, right? Yeah. Even with few users. Yes, even with few users. Of course, for mobile and web apps, in Postgres, the lowest level read committed should be default. But with understanding all... This is the big thing I've seen. I think when people first learn about transactions, it leads them to assume at least a repeatable read behavior. So I think it does catch developers out when people

Starting point is 00:13:46 are first learning it which is understandable there's a good blog post by a former colleague of mine actually on this that i'll link up laurence jones who i know from go cardless so it's yeah he writes about that in detail and i think it's a good one to share with juniors on your team if they're learning about this right so it's my comparison to replicas synchronous and asynchronous is or semi-synchronous it's similar it's different but in terms of anomalies it's quite similar i mean users have anomalies and we've seen asynchronous replicas they also have anomalies and we need to deal with it but why like if you think performance doesn't matter let's make like we have five replicas let's make all of them synchronous right and no anomalies

Starting point is 00:14:32 anymore they all have data the same data and let's go to serializable synchronous replicas yeah oh and in multiple regions around the world as well. Of course not. Yeah. Of course, with big latencies. What could possibly go wrong? With big network complexity between them, right? Yeah, yeah, yeah. And serializable. Good luck. By the way, this is probably one of the topics we probably should dive deeper and explain many, many cases.

Starting point is 00:15:02 And I'm refreshing my memory from time to time. I'm sitting in read committed most of my life. So sometimes I go there and find new things to me. So I would like to explore this and discuss maybe deeper. Yeah, wonderful. Data encryption in Postgres. That's all we got on this one. Data encryption can be different,

Starting point is 00:15:25 like at rest and in transit, right? Two big areas. And I think, again, like security is what we need to deal with, but I'm not a big fan of exploring. Usually I prefer to just to check what's the best approach and so on. And encryption, of course, is a good thing, but compression is also a good thing, right? And sometimes they go together, but not always. And encryption should be enabled. For example, especially if you work in cloud,

Starting point is 00:16:02 but it should be enabled at protocol level. Yeah, so, okay, I see where you're coming from now. I assume they were actually talking about... The time I see this come up most in Postgres is because we don't have the whole, you know, encryption at rest at the database level. And compression also. Yeah, so the common argument is that you can encrypt the disks. Using which key? Provided by you or by cloud itself? If provided by cloud, then how good is it?

Starting point is 00:16:36 Well, I also had, well, I tend to trust cloud, at least the major cloud providers more than I trust myself. But you may not. The other thing that somebody brought up to me recently that makes a lot of sense is that also you've got to worry about your backups. So if you've just encrypted at the disk level and you've got a Postgres backup and you're storing that somewhere, that can be restored and it's not encrypted.

Starting point is 00:17:03 So there are other things to consider. And usually people store backups in object storage, which makes total sense because of its durability in terms, not accessibility, like high availability. High availability of S3, for example, is less compared to EBS volumes in edible years. But durability in terms of data loss, it's insane. And the data won't be lost.

Starting point is 00:17:30 But they also, usually they are not surrounded by some network solution. So, I mean, these buckets are available from anywhere. And you just need proper keys and you can access them from different regions, from different customer accounts and so on. It's interesting. But of course, they usually provide a lot of things for encryption. And sometimes it's challenging to use your own keys. For example, it might break some processes. Like if you want to repeatable retries for chunk upload in some clouds,

Starting point is 00:18:06 if you use their keys, it's fine. But if you use customer managed keys, it might not work. And you need to retry whole file, which might be one gigabyte. In Postgres, everything is stored, like all data and indexes are split to one gigabyte files. But if you compress it, it will be probably three, four hundred megabytes depending on data and so on. In this case, it means that

Starting point is 00:18:30 if you want to use your own keys, probably you need to perform retries of whole file and it's not super efficient. So there are many challenges there. And also if you encrypt and then you want also to... This is a security topic, so it's very big.

Starting point is 00:18:48 And, for example, one of the problems, it's not... Like, encryption can work in both ways. And sometimes encryption is used against the data owner. You know, like ransomware, which encrypts all your data, and then they say they will give you a key only if you pay. So how to protect against that? And sometimes people see bigger danger in this area, like how to avoid encryption.

Starting point is 00:19:17 I mean, not to avoid, but you probably need to store backups in two clouds, for example. Because one cloud can be stolen, your account is stolen, everything like you lost access. And losing data is probably so two big risks. Losing data is different risk. Of course, one risk is leaking data, very big risk, but also losing data is different risk and encryption can be used against you in the second case so yeah is it worth mentioning a couple of third-party tools yeah let's like well not tools necessarily but i think site both cyber tech and edb have non-core solutions here for that if you? Like transparent data encryption from cybertext?

Starting point is 00:20:05 No, it's different. It's at upper level. It's not encrypting disk. It's encrypting data inside Postgres. Which then results in even your backups having, like, being encrypted, right? To me, as a Postgres user, I think it's a very good feature we should all have. But I remember some discussions about it and some dead end in development. I mean, somehow it's not in core.

Starting point is 00:20:31 It cannot be brought to core yet and so on. I don't remember details, unfortunately. But in my opinion, it should be in Postgres as a feature. It would be great. I'll link to the ones that I'm aware of in the show notes, just in case anybody does want or need this and wasn't aware of those i think we've probably got time for one more i'm scared that the one i'm gonna pick is a bit big but do you have any on the list that you wanted to make sure we

Starting point is 00:20:55 talked about no but i can choose a couple let's well migration from other databases like oracle to posgus also on non-relational ones like Couchbase, Cassandra to Postgres. It's a big topic, but fortunately in this case, I'm not a big expert in migration. The main migration I did was from MySQL to Postgres many, many years ago in my project. But my colleagues did a lot of work migrating from Oracle and so on, and I understand the process. Of course, it depends usually on how much code you have on database side.

Starting point is 00:21:28 And by that, you mean like procedures and functions? Right, right, right. Migrating schema is quite easy, but if you have a lot of PLSQL or TSQL code, you need to rewrite it. And this probably will take a lot of time. And second biggest challenge, I mean, converting schema is also a challenge,

Starting point is 00:21:45 but it's solvable. There are automation tools. You will deal with some issues which probably is easy to fix. The key is to test a lot, as usual, experiments, right? And the second biggest challenge after code, this server code,

Starting point is 00:22:02 is probably if you want zero downtime migration. In this case, migration is used in proper meaning, I think, not in weird meaning when we change schema. Well, we also change schema, but we change engine for schema, right? Yeah. Yeah, in this case, you need some replication solution, logical replication solution. Like change data capture type thing. Yeah.

Starting point is 00:22:26 I actually, this is something I have got a little bit of, at least secondhand experience of, with, there's a cross company working group in France that are doing a lot of migrations from Oracle, from SQL Server, from Sybase.

Starting point is 00:22:43 It's become really popular postgres in France amongst huge organizations. And government organizations as well. Yeah, exactly. Lots of large organizations including government agencies. And they wrote a, they collaborated on a guide, but it was all in French. And I

Starting point is 00:23:00 spent a while with one of them. So I ended up translating a few of the chapters into English. So a few of them are thanks to me. A few of them are thanks to other people. So I'll link that up as well because there's basically a tried and tested formula for doing these.

Starting point is 00:23:17 A lot of consultancies make a lot of their money helping people do this. So you can get help from others. But yeah, it's not a small project because a lot of these databases do encourage using sort of procedures and functions so yeah if you have a lot of those don't expect it to be a small project what about no sql database systems to postgres i haven't seen much of it to be honest but hopefully we will in the in the coming years i i saw cases but i never i don't remember any big questions about it just do it that's it i guess it's quite simple right

Starting point is 00:23:53 i guess by by its definition that doesn't have much schema if it's mongo db there is a new project called ferret db which like speaks mongo but yeah this is one of one of the way just to use some extension or some project on top of postgres which will help but if it's cassandra or couchbase i don't know you can use json as usual right yeah there will be difficulties definitely there will be difficulties yeah well if anybody does have experience of that, let us know. But I imagine those are much simpler projects overall. With complex queries to JSON documents in some NoSQL JSON-like storage, it can be easy to convert them as is, but you probably will have not good performance and also consistency.

Starting point is 00:24:41 Why do you migrate with Postgres? You won't probably benefit from its strong sites and strong relational data model, ACID, and so on. Probably you should move some of parts of your data to relational model, not just JSON. Maybe even normalize it, yeah. Yes, and in this case, it's kind of like building something from scratch, almost.

Starting point is 00:25:04 Okay, good point. Actually, performance is a big part of some of these migrations. So once you've done the scheme, once you've done the code, sometimes they get stuck and you see it come up on the mailing list. Quite a lot of people saying, this used to be one second in Oracle and now it's 20 seconds in Postgres. What can I do about it? You know, you only get the squeaky wheels.

Starting point is 00:25:24 You only hear about the ones that are slower, of course, because they're the ones holding the project up. But it's quite common to have, you know, if you've got thousands of different queries going through your system, a few of them are probably going to be slower in Postgres than Oracle without some tuning. So it's quite common for that to be a big part. To be fully fair, SQL Server and Oracle optimization code

Starting point is 00:25:44 is much more complex, sophisticated, bigger than Postgres. But on the other side, Oracle has hints and tendency to use them quite a lot, manual control of plans. But also like SQL Server, if you compare performance with Postgres, sometimes I saw some works, SQL Server is winning quite obviously, quite often. But it does mean that in Postgres, we cannot make things work well. And it's improving also, right? So I don't know.

Starting point is 00:26:19 Like I understand that it's not that simple, that like it's easy to improve. Well, code base is huge some methods for query optimization used in commercial big database systems are very interesting and many of them postgres doesn't have yet so it's a good point actually like index sometimes it doesn't have them like natively like an index skip scan, for example. Who's index scan? You need to master it yourself manually. Other systems have it.

Starting point is 00:26:52 We've got transaction control in stored procedures, stored procedures, not functions. And Postgres is only in version 12 or when? A few years ago only. Yeah, exactly. Well, those guys have it for many years. So, I mean, we should not be like we are the best. We are the best in many areas, but not in all, right? I mean, Postgres.

Starting point is 00:27:14 All I'm saying is pros and cons is important not to be like all good, all good. No, sometimes it's not that good and improvements are needed and so on. Workarounds are needed, like Lucendix can. Yeah, all I meant is it can be a big part of those migrations. And they can get stuck at that point, right? And what I can say, like Postgres has so many things. Usually we have like, okay,

Starting point is 00:27:41 maybe not in a way you got used to, but in different way we can achieve very good performance, definitely. And you can put a very mission critical systems with under big tps a lot of data a lot of tps and so on and so on so it's possible to build big reliable system using postgres nice and i think this is another one of those ones where we could do a whole whole yeah migration is huge topic again i'm not an expert but i can explore and say something things definitely well maybe some small last thing and that's it or what do you think sounds good yeah go ahead one more okay latest failover best practices oh you picked a nice small one then. Failover best practices. Failover is

Starting point is 00:28:27 when things go wrong, right? What do you do when failover happens? Answer is, if well prepared, you shouldn't do anything. And this is the key, right? I remember times when no good failover systems existed. And also there was consensus that in Postgres,

Starting point is 00:28:49 in my opinion, it should have been Postgres, inside Postgres. But so far, it doesn't seem to be happening at all. But we have Patroni and others. The recommendation is to use Patroni or other system which follows consensus, well-developed consensus algorithms, right? Like Raft. But if you use RepManager, RepMGR, I have bad news for you. Split-brain is very often in large systems. Or in cases when you have many clusters. It's very likely.

Starting point is 00:29:23 So migrate from it. This is the key. And just use Patroni, for example. Patroni is an obvious winner right now. Are there any particularly good guides or books on it that you recommend? Well, just documentation. Well, there are some tricks there,

Starting point is 00:29:42 but it's worth a separate episode probably but one of the things for example you should understand if you use we mentioned asynchronous replicas if you have asynchronous replica during failover you might have data loss and patrony defines by default if i remember correctly it's 10 megabytes maybe bytes 10 maybe bytes of data might be lost in in case of failover so this should be be understood. Or you need to start using quorum commit and so commit goes to at least two nodes and Patroni will choose the best one and probably you won't lose any data during failover. Actually, you know, I suspect maybe the author of this question meant something different, not failover, but switchover.

Starting point is 00:30:26 Maybe high availability. I think it might be the same one as the one above, which is also a huge topic about high availability. Yeah, because there are also best practices on how to perform switchover and to avoid long downtime and not to lose data. It's not rocket science, but there are some tricks there. Let's promise to explore these both areas in the future.

Starting point is 00:30:50 Sounds good. Thanks, everybody, for these questions and suggestions. Thank you, Nikolai. Thank you for being back and continuing this because I was in fear you want to stop it. Yeah, you're crazy.

Starting point is 00:31:06 I said I'll be back. Okay. And by the way, don't watch all those draft recordings in this Riverside. Just delete them. They are all bad. That's so funny. All right, take care.

Starting point is 00:31:20 Thank you, everyone. Thank you, Michael. Bye. Cheers, bye.

Postgres FM - Anniversary mailbag

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.