Postgres FM - Constraints

Episode Date: December 8, 2023

Nikolay and Michael discuss PostgreSQL constraints — the six different types we have, how to add and modify them, including on large tables with high TPS.  Here are some links to things t...hey mentioned:Constraints (docs) https://www.postgresql.org/docs/current/ddl-constraints.htmlUnique index vs unique constraint issue (2017) https://www.postgresql.org/message-id/flat/CANNMO%2B%2B7%2BuVMj%2BZPGKB_zuRpKE0sNthy2Z-_oKSS%3DoE3_rN7QA%40mail.gmail.com Our episode on NULLs https://postgres.fm/episodes/nulls-the-good-the-bad-the-ugly-and-the-unknownAdvanced Int to Bigint Conversions (talk by Robert Treat) https://www.youtube.com/watch?v=_IieyXzdmcM Use BIGINT in Postgres (blog post by Ryan Lambert) https://blog.rustprooflabs.com/2021/06/postgres-bigint-by-default How to add a foreign key (Tweet by Nikolay) https://twitter.com/samokhvalov/status/1732056107483636188 Bloat, pg_repack, and deferred constraints (blog post by Nikolai Averin from Miro) https://medium.com/miro-engineering/postgresql-bloat-pg-repack-and-deferred-constraints-d0ecf33337ec ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, and welcome to Postgres FM, episode 75. This is a weekly show about all things Postgres. I'm Michael, founder of PGMustard. This is my co-host, Nikolai, founder of Postgres AI. Hello, Nikolai. What are we talking about today? Hi, Michael. Constraints.
Starting point is 00:00:16 Yeah. And specifically, the six DDL constraints that Postgres supports. Yeah. All of them. We're back to basics. I love this kind of thing. So nice choice. Thank you.
Starting point is 00:00:28 It was my choice. I'm writing this Postgres marathon series of how-tos and considering how to create various constraints without downtime. Already covered checks and foreign keys and something else. Saying this, those who follow me closely now understand when exactly we record this podcast, right? Okay. Because I write these how-tos every day.
Starting point is 00:00:54 So about constraints in the same order as documentation describes them, but also let's talk about practical complications when you have a lot of data and a lot of TPS as usual, right? Because documentation actually doesn't cover these topics. Yeah. Another thing I don't think the documentation covers is why do we have constraints? I guess it's so obvious, but I think it's worth mentioning that without these we'd be in real trouble database-wise, wouldn't we? Well, yes. So we have schema, right?
Starting point is 00:01:29 And without schema, it's not good to live without schema. Let's say no to NoSQL in general. Maybe in some cases it's fine, but if it's financial data and so on, you need structure to ensure that data has good quality. And constraints is the next step. So you have schema, so you define column names, data types, and constraints is addition to all this to ensure even better quality of data. For example, you say no one can create more than two rows with the same value in this column.
Starting point is 00:02:06 For example, email. And this is an interesting situation because usually people forget about case sensitivity. Yeah. Of text or varchar. And then you say if there is a row in this table referencing, for example, user ID. It means that such user should exist and so on and so on. So it's all about data quality, but also sometimes for foreign key constraints.
Starting point is 00:02:36 It also provides some automation. I would say limited. Automation for how to handle, for example, deletes, should depending rows in depending table, should they be deleted or not? Yeah. If the main row is deleted. But this should be used with care
Starting point is 00:02:57 if you have a million rows depending. Deleting one row might take a lot of time and this is also not good. Yeah. And deleting one row might take a lot of time, and this is also not good. Yeah, I feel like we've already dived into a few specifics around unique constraints and foreign key constraints there. But let's go, yeah, I think you're right. I think the Postgres documentation does cover them in a really good order, and it starts with check constraints,
Starting point is 00:03:20 which are super flexible, user-defined constraints almost, I'd say. Is that a fair description? Where we can choose, I think it describes them as a Boolean condition. So it's a check that returns true or false for each row that's inserted or updated. Yeah, check constraint is usually very, how to say, underappreciated. It's underused, in my opinion. And unfortunately, there you can define only some expression related to this table.
Starting point is 00:03:51 You cannot involve different tables and subqueries and so on. So it limits the expressional power of it. Yeah, so we can define it like we can have a check constraint on a column, or we can have it on multiple columns within the table. Multiple columns is okay. You can say, for example, I don't know,
Starting point is 00:04:11 like, some of these two columns should be positive or something, some crazy stuff. It's possible, definitely. And this is good. I mean, for example, if you want to say this is integer, but it should always be odd or even or something like that, right? You just define the constraint that will be.
Starting point is 00:04:31 If you try to insert something which violates this constraint, you will get an error. And this is how you can achieve better data quality. Yeah. When you say it's underused or underappreciated, what are the kind of typical cases you see for people? Typical cases, people rely on ORM and perform all checks on. So usual like three parts of architecture, front-end, back-end, and database,
Starting point is 00:04:59 usually people start with front-end and they realize. And this is fair because you should check a lot of stuff, including all constraints should be checked on front to minimize feedback loop. Users should quickly see that something is not wrong. Ideally, before they make an action, for example, filling some form, I would prefer seeing constraint violated
Starting point is 00:05:22 and explanation how to fix it before I press submit. It's not good to press submit, wait some time, and then have some retries. It's very annoying. So constraint checks on frontend makes sense a lot. But then people, if they use ORM, they
Starting point is 00:05:40 prefer checking it in code because it's more flexible. And those who write the logic for Python, Ruby, anything, Java, they prefer writing it right there because it's their favorite language. But the thing is that if a company grows, project grows, and then you start having different users or applications, for example, someone directly works with data in some good UI or you have another application written different code or in the same code, but skipping these checks. Implementation of constraints in application code is weak because it's not guaranteed.
Starting point is 00:06:18 Only database can guarantee it. That's why check constraints or other types of constraints is good to have in database because it's the safest way to safeguard, right? So you're on the safe side and nobody will violate it unless there is a bug. Sometimes I saw some bugs and unique constraint violation happened. It's good that, for example, I'm check. Soon we will have a unique constraint corruption check. Nice. Yeah, but it's a different story. So check constraint, you just say expression always should be followed, right?
Starting point is 00:06:53 It should always return true. If it turns false, such insert or such update should be discarded. Roll back. Right. And is it worth us discussing what to do or how to add one retro let's say you're listening to the podcast and uh you realize you should have some of these in place but you don't and they're on quite large tables how would you go about adding yeah let's actually let me finish about this consideration about front-end, back-end, and database and relationships between them.
Starting point is 00:07:26 I remember in 2004, I implemented what's called MVC, right? Model View Control. A very old architecture, maybe not cool anymore. I don't know. But what I implemented there, I implemented a simple thing. So we define constraints on Postgres, then we, at bootstrap of our application, we analyze all constraints and build logic to inject it to form. And also with digital signature because forms can be, it was long ago, but it's interesting. Forms,
Starting point is 00:07:59 front end followed constraints from database. It was cool. I think it's an interesting idea. Probably it should be also rediscovered. I don't know. So you just define constraints where they should be defined in database. But then Frontend follows exactly the same logic, and you don't need to implement it twice. Because if you implement it twice, you will have bugs. Different logic, right?
Starting point is 00:08:23 Yeah. And I've just realized there's another reason, which is handling concurrent sessions. So you might have a constraint. Let's say you have an amount of stock of an item and it needs to not go below zero or an account balance that needs to not go below zero. If you have concurrent transactions, you need them at the database level to make sure you don't end up letting a user take out more money than they're supposed to. Exactly. You cannot check if such row exists and
Starting point is 00:08:50 then make decision outside of database. You need to make decision inside database. Yes. But also what I'm talking about is having like main constraints should be in database, but you can mirror them in back and front and you can have automation and it's good if somebody who develops ORMs or GraphQL or something would follow this approach, considering database side constraints as the main one, right? So check constraint is not only very flexible, not super flexible, but quite flexible, but it also has this beautiful option to be created...
Starting point is 00:09:33 Not valid. In an online fashion, so to speak. Yeah, not valid. So you say not valid, quite confusing term. Yeah, very. Yes, so things to remember. When you create something not valid,
Starting point is 00:09:50 it means that it's already being validated for all new rights. So this is a super confusing part. So it's kind of not validated on existing data. So you cannot create not valid constraint and then insert something that violates it. This write will provide error.
Starting point is 00:10:12 But what it does is not valid flag when it creates check constraint. It just keeps long lasting operation of full table scan to check that all existing rows follow this logic of the constraint. Which is the default behavior when you add a new constraint. So if you don't think about it and just try to create a check constraint for existing large table, it will block DDL and DML, everything basically to this table. And it's not fun. So what you should do, if you want to do it without partial downtime, to do it in online fashion,
Starting point is 00:10:50 you create, so three steps, not two, three actually. My how-to yesterday was not full. So first you create with the not valid flag. Second, you understanding that all new writes already been verified automatically by Postgres. You yourself care about existing rows. You check existing rows are
Starting point is 00:11:14 okay with this constraint. Just with simple selects and if you find some rows that violated, you probably want to delete them or update them to adjust the values depending on your application logic or on your logic, right?
Starting point is 00:11:30 And then only then the third step, alter table, validate constraint, which will scan whole table, but this step won't acquire logs that would block your DML. It will block only DDL, but hopefully you don't alter during this. You don't issue any offers. That middle step's nice. And you can even do it in batches, I guess, if you want
Starting point is 00:11:53 or need to. But I guess it shouldn't be a big deal. Yeah, depending. But this maybe whole tables can also find because it's just select, right? Well, if you update, yes, in batches, if you found many millions of rows that violated. Oh, goodness. But this is, it depends. But this three-step approach is very, like, universal, zero downtime approach, and it's great. Well, what's the downside of jumping straight to step three?
Starting point is 00:12:20 Because you're kind of doing that if you think your data's fine. I guess, is it in the real world, you're most more than likely got some... Jump, if you want, you just need to accept this risk and then that's it, of course. If you are 100% sure, step two is optional, let's say.
Starting point is 00:12:37 But also, as usual, if you issue alter with not valid, you also need low log timeout and retries because you still need to change metadata of the table. And if, for example, Autovacuum running transaction idea up around prevention mode, processing your table, you won't be able to acquire log. And without low log timeout and retries logic, you will start waiting. And again, in this case, you will block everyone, even if it's not valid. It's not good.
Starting point is 00:13:11 So retries are needed and graceful alter needed, right? I wish there was such option like graceful and you say like how many retries and how low log timeout for a particular operation should be. That would be a nice word. Instead of concurrently, it could be gracefully. All to table, gracefully. Or concurrently, something like that.
Starting point is 00:13:34 Because I think 99% don't think about it until they have many thousand TPS, and then they realize. So basic operation, it was always working fine. Sometimes probably not not but people like like you know okay we had an issue lasting 30 seconds something was not right but it's okay and then we don't understand why right i mean you blocked everyone dealing with this table for 30 seconds for example but kind of fine and we live with it until it starts annoying you too much and then you realize that you need a low log timeout and retries to be i mean it requires an effort unfortunately to
Starting point is 00:14:14 have this right and if you're small you don't you don't care but like i wish it would be easier like something like similar to create index concurrently or refresh mutualized view concurrently, right? So also when you validate, I think if, for example, DDL create index concurrently is happening or to vacuum again processing, you won't be able to acquire this lock. So you need also to be careful. But in general, if there you start waiting, it's kind of fine. It just makes your operation longer, but at least no DML transactions performing DML operations are behind you in line, right? So I mean, this is also an issue with this final step, but it's not so harmful as it in the first step when you need to install, quickly inject this constraint with not valid flag. Yeah, so I think we covered it, right? So let's move on.
Starting point is 00:15:10 Let's move on. The next one in the documentation. I think this should be hidden behind some, I don't know, concurrently or gracefully. Would be awesome. The next one in the documentation is not null constraint, which the documentation points out is a, it's probably one of the ones people are most familiar with seeing in schema definitions. But it's just a special case of a check constraint, which I hadn't thought of before. Right. But unfortunately, you cannot.
Starting point is 00:15:38 Well, you can already. But if you have check constraint not null, saying this column is not null, like logically it's the same as the standard not null. Same. But primary key needs the latter, right? It cannot use, but it can. Since Postgres 12, like if you don't have not null constraint and you define primary key or redefine it,
Starting point is 00:16:04 it will try to implicitly create not null constraint. But since Postgres 12, when you create not null constraint and you already have checked, it's not null. It will just reuse it, skipping full table scan, which is very good optimization. So you just create constraint in this three-phase or two-phase approach, as we just discussed. And then you can rely on it when you're creating primary key, for example.
Starting point is 00:16:29 Or you can define not null constraint explicitly if you need it for primary key or any other, I don't know, maybe your application wants regular not null. Relying on the existing check is not null. And then you can drop check check and not null is still there and you skipped this unfortunate full table scan. While it's like I'm telling this because not null, creation of not null itself, Postgres doesn't support three-step or two-step approach.
Starting point is 00:16:58 If you want to create not null right away on existing table, existing column, it will need to scan whole table. So we don't have not now, not valid. Yeah. So yeah, this is like some nuances to keep in mind. Generally, my recommendation is to think more about check constraints.
Starting point is 00:17:18 This is why I say they are underappreciated. They are good and they, you see, here they support not nownow-constraint creation since Postgres 12, not before. But it means all currently supported Postgres versions. Yeah, true. 12 is already the oldest. Let's move on. Yeah, unique constraint. Unique constraint, this is interesting. I don't know
Starting point is 00:17:41 how much detail we should cover here. Before our recording, we discussed the case I discovered in 2017 and still saw in Postgres 16. So unique constraint, physically it relies on unique index, but it's a kind of implementation detail. You can say I want a unique constraint, and Postgres will create a unique index implicitly. And it's good that it can be done concurrently, of course, right? Because indexes can be created concurrently,
Starting point is 00:18:09 which is good. That's it, basically. You create constraint, but I think you can say using, right? Create unique constraint using some index if the index already exists. Or what? Maybe. Yeah, I don't remember in detail. But what I do know is that although unique constraint relies on index, unique index, it's not absolutely the same. So logically, again, it should be kind of the same, right? But you can have index without constraint, not vice versa. You cannot have unique constraint without unique index because Postgres
Starting point is 00:18:47 needs unique index to support validation checks, right? So, imagine we created unique index, but we haven't created unique constraint. One place where you can see constraint is needed is
Starting point is 00:19:03 insert on conflict, right? If I'm not mistaken, right? Yeah, I think so. Yeah. So on conflict requires constraint to be present. And if you say you have a conflict, like for example, do nothing, you cannot, if you have index without constraint, you cannot say insert blah, blah on conflict on this constraint, because constraint say insert blah blah on conflict on this constraint because constraint does not exist. It will tell you explicitly constraint does not exist. But at
Starting point is 00:19:32 the same time, Postgres has interesting, like I think it's logical, bug still not fixed. And I reported it in 2017 and today I checked in Postgres 16. It has it still. If you try to insert multiple errors, it will explicitly say that If you try to insert multiple errors, it will explicitly say that constraint,
Starting point is 00:19:50 and it will use index name, violated. So in one case, it says there is no such constraint. In another case, it says this constraint exactly with its name is violated. Okay, so inconsistency. I think it's just a bug that needs to be fixed, and that's it. That's funny. I'll link up the bug report as well. Yeah, but honestly, from user perspective, I think it would be good to stop thinking about unique constraints,
Starting point is 00:20:12 and unique indexes as something very different. I cannot imagine the case when they should go together, I think. I cannot imagine we have unique indexes, but why don't we have unique constraint in this case? I think they should go together all the time. That's it. It makes sense. In this case there would
Starting point is 00:20:33 not be any inconsistencies if constraint when I create unique constraint unique index is created implicitly. But why not vice versa? When I create unique index, why Postgres doesn't create unique constraint? I have no answer for this. You can't create a unique constraint
Starting point is 00:20:50 not valid as well, can you? So there's no difference there. No, I don't think so. The one thing they have, like a change in recent versions... As always, maybe I'm wrong. I think I'm not wrong here. I don't think so.
Starting point is 00:21:06 The one thing that has changed with these in the last couple of years is in Postgres 15, we got this nulls not distinct option, which I still don't... I'd love to hear from people that have good use cases for these, but it allows you to specify that you only allow a single null value rather than multiple null values. You know, null is the biggest problem in SQL model. We discussed it.
Starting point is 00:21:34 We have a whole episode on it. Right. But why I think people might want null as like, according to this, how it's called, trinary logic, so three-value logic, true, false, unknown. According to this logic, null means unknown, and comparing one unknown to another unknown, you cannot conclude they are the same. You always say they are not the same, so the comparison
Starting point is 00:21:59 should always yield to unknown, so basically to another null. Mixing nulls and unknown is another topic. And this means that unique index, unlike primary key, of course, unique constraint, unique key. Let's also introduce the term unique key, because the computation doesn't follow this term. Unique key, unlike primary key, allows nulls in the column, or in multiple columns if it's multi-column index or constraint.
Starting point is 00:22:25 But since we don't know if they are the same or not, we can allow multiple nulls. But historically, exactly due because of the problems with manipulation of large tables and so on, for example, before Postgres 11, we had, if we, for example, add a new column and we want default. Yeah, default, yeah. It's like full tabular rewrite, we cannot do it. It was fixed in Postgres 11, not fixed. Great feature that you can define like virtual default.
Starting point is 00:22:59 But we say default, we cannot say default, I don't want full tabulary so I say okay I will consider null as false just in my application alright so before postgres 11 null will be my false and true will be
Starting point is 00:23:18 my true in this case I'm breaking theory concepts here like null should not be considered as false it's not right but In this case, I'm breaking theory concepts here. Like null should not be considered as false. It's not right. But just I don't want this operation to be such a nightmare. Like I have billion rows.
Starting point is 00:23:36 Okay, null will be my false. And this leads me to the idea I want to be like one value in index. That's why, right? So to avoid the long, heavy operations, I give null a special meaning, not as it was supposed to have. This is practice. It's not theory, right? This is usual, like in many industries. Theory was very good. We developed great concepts. And then you go to construction, for example. Oh, this is how it's used. We didn't expect.
Starting point is 00:24:08 This is how nulls are used, right? So some people use nulls given special meaning, and they consider it as normal value. In this case, they might want index or constraint to say there should be only one row with null. This is my... but maybe there are many other understandings but this is what i have from my practice cool in fact you mentioned in passing there one other important thing about unique constraints is that they can be defined
Starting point is 00:24:36 across multiple columns in the table as well it can be single columns really common but you can do across multiple as well okay Okay. Should we move on? Primary key being a special case again. Primary key, okay. So not null plus unique constraint, basically. This pair gives you primary key, but there can be only one primary key.
Starting point is 00:24:58 That's the big... So I think we already covered. You need not null. For not null, you probably need implicitly or explicitly. You can do it yourself or just rely on it when primary key is redefined. For existing large table, right? For small tables, no problem.
Starting point is 00:25:14 Also, by the way, just this morning we had a discussion. For example, creating this concurrently, should we use it for new tables? In my opinion, no. Because creating this concurrently or these multi-step operations, in this case, you lose the good, beautiful
Starting point is 00:25:29 property Postgres has. Transactional DDL and ability to pack everything into single transaction. If you just define the table, follow normal approach. Don't care about this zero downtime stuff. And you will have single transaction, right? All or nothing. Same for tiny tables. Well, yeah, maybe like don't care about this zero downtime stuff and you will have single transaction right although nothing
Starting point is 00:25:45 this is same for tiny tables anything yeah well yeah maybe like less than 10 000 rows you don't you don't care about it takes like 100 milliseconds yeah let's go you have single step it's atomic great but if you have large tables yeah you need to redefine primary key of integer four to integer eight, for example. That's the big one. Yeah. Yeah. My team and I implemented all the types of this operation and we helped multiple companies, a few billion dollar, multi-billion dollar companies, public companies, we helped them to convert integer four to integer eight. I know a lot of interesting stuff around it. But in general, you just need like not now. And we discussed how there is also tricks if you're Postgres 11 trick,
Starting point is 00:26:32 like default minus one not now you can define right away virtually, right? You don't need even check constraint. But since Postgres 12, we rely on check constraint, but we remember primary key needs actual not null. And we also create unique index. And when we create primary key, we say using this index. This allows us to put primary key creation as final step into a transaction, which will probably swap something, right? Yeah. Rename columns as well. And there, of course, you also need to think about log acquisition,
Starting point is 00:27:10 retries, log timeout, like all this stuff. And it's, of course, if you have many, many, many, I don't know, like gigabytes, dozens, hundreds, maybe terabytes, and a lot of TPS, you need to engineer this carefully. I mean… This is probably the most involved of all the things we're talking about, and I think it's probably too big to cover today, but there is a really good talk by Robert Treat that I saw
Starting point is 00:27:32 that covers this in about half an hour in depth for people that actually have to do this kind of thing. I'm sure it's not possible to cover everything in half an hour because there are several methods. They have pros and cons, and there are many, many nuances. For example, foreign keys. If you redefine primary key, you need to deal with foreign key redefinition.
Starting point is 00:27:50 And it's also interesting. And auto-vacuum running in transactional wraparound mode can block you. And also, if you decide to mark a foreign key as not valid, and then you realize it blocks writes, or you just forgot to drop old foreign key and new rows after switch. So a lot of mistakes. Yeah.
Starting point is 00:28:13 Please use big ints or integer 8s in your new tables. Right away. Or UUID version 7, 8. Yeah. Or UUIDs. Yeah. Cool. Foreign keys?
Starting point is 00:28:26 Or actually, one last thing. Is it worth discussing multiple? You can have multiple column primary keys. I guess that's obvious from the multiple column unique ones as well. But yeah, foreign keys. I don't think it's something somewhat different. Yeah. So just you need to have not null on each column,
Starting point is 00:28:44 participating in primary key definition. That's it. Unlike unique keys, of course. So foreign keys involve two tables. And creation of foreign key requires several logs to be acquired on both sides. Fortunately, and full table scan of both tables to ensure that values in referencing table have values that are present in the referenced table. So in this case, if you just don't care and like brute force approach, like defining documentation,
Starting point is 00:29:19 just create it. That's it. Well, you have an issue because you are going to block probably not. I think log level there is not so bad than in previous cases we discussed check constraints. But you are going to block DDL for sure. Probably DML won't be blocked or will be blocked. Yeah, probably it will be blocked as well. I don't remember details here, although I wrote it a few hours ago. Check out my how-to.
Starting point is 00:29:46 I specified all the logs and what you're going to block. But in general, you should care about it as well. And generally, under load, you shouldn't want to create foreign key in one step. You need again three steps. First, creation with not valid, with retries and low timeout. Then you need to check again, again, like with checks. Postgres will start checking new writes that inserted and updated rows or deleted in this case as well. Very important, yeah. Yeah, yeah, yeah. And you need an index, but it's another story so it will start checking to validate this constraint for new rights but for existing rows we're still not not sure so optional step two is to validate and fix if you
Starting point is 00:30:38 see problems potential violation right and then step, you say alter validate constraint. Again, understanding that ongoing index creation or recreation or vacuum can block you, right? So yeah, also three steps. I explained in detail in my latest marathon post. So what else to say here? Foreign keys can be marked as, how's it called? Deferred, right?
Starting point is 00:31:13 Deferred constraints, right? So when you have a complex transaction, you might want foreign keys to be checked at commit time later, not at each statement time later. And in this case, I don't remember again details, several years ago we had an issue, we worked with Miro and had issue with using PGD-PAC to fight bloat and they use deferred constraints. So there is an article about it. How to use PG repack if you have deferred constraints.
Starting point is 00:31:50 And it explains a lot of details I will provide, Nick. So Miro Engineering wrote it a few years ago. It was interesting. Also issue to address. What else? I think that's it. I think on delete cascade is worth mentioning like
Starting point is 00:32:06 you can define at the point of constraint creation what you want to happen if like if you have a i think a really common example is a table of blogs and a table of blog posts if you delete the blog do you want the blog posts to be deleted like Like what do you want to happen in those cases? And that's the case where I think it's really important to mention that whilst when we define a primary key, we get an index, foreign keys defined on one table, their referencing column is not necessarily, we don't check that it's indexed as well. In fact, I've even read a blog post saying you shouldn't always index that column.
Starting point is 00:32:43 But I think the cases that you should far outweigh the cases saying you shouldn't always index that column but i i think the cases that you should far outweigh the cases where you shouldn't it is worth checking you do have indexes on that those so that those deletes on cascade are efficient right i i always try to avoid using this option but i see people in quite large databases under a large load use this deletion propagation or update propagation logic. So two things here. You're right about indexes and Postgres DBA toolkit I have and Postgres checkup tool. They have reports to find.
Starting point is 00:33:16 Nice. Like you have foreign key, there is index primary key on one side. You don't have index on the other side so when you will need to do to you you probably don't need delete every day but at some point if you have it it will be sequential scan very bad very slow but this is only one thing what if you're deleting a blog which has million posts i don't like it i don't like this like this propagation we like this propagation. If we know that only low volumes, low number of rows will be automatically deleted, it's okay. But if it's unpredictable number of rows,
Starting point is 00:33:56 I would prefer having my own logic with asynchronous propagation of change with some batches and so on. Makes sense. Yeah. But I see people use it at quite large scale. Makes a lot of sense. Cool.
Starting point is 00:34:11 I think we're down to the last one. We made it to exclusion constraints. Right. I always confuse the exclusion constraints with constraint exclusion, which was related to partitioning. But exclusion constraints is for time ranges, like intervals, and you want to define,
Starting point is 00:34:29 you want to say my intervals should not overlap. So it's kind of advanced uniqueness, right? Maybe not uniqueness, but it's like for special data, for GIST SP indexes and so on. So you define, I say, I'm building some schedule, I cannot allow overlapping, or I'm describing some, I don't know, like 3D world and I have various shapes and I don't want them to be overlapped. In this case, GIST and its variation R3 will support this constraint and ensure that no balls, for example,
Starting point is 00:35:06 or no cubes or something overlap in web space. So Postgres supports this kind of thing, which is good, but it's quite narrow for me. I don't see it often. Yeah, very cool. I think I've seen it being used in examples where people are designing room booking apps or things like that but no not used it myself in fact it's a tiny little documentation entry isn't it so it's time time
Starting point is 00:35:31 and space yes but yeah documentation is very brief here i honestly don't have anything to say it's like it's supported by some index i guess right so right? So an index you create with concurrently, usually, if you want zero downtime approach. I'm not sure if exclusion constraint can be created based on an existing index. It should be possible. I have no idea. It says, yeah, good point.
Starting point is 00:36:00 I don't know. Let us know in the comments. Yeah. But in general, I think we covered pretty well six types of constraints, at least five types of constraints we covered, and how to create all of them without downtime. Just for this final part, let's think how to drop them. Well, I think dropping is straightforward usually. If it's like unique
Starting point is 00:36:29 constraint, which is based on index, you probably just drop an index with concurrently, obviously, right? Others, you just drop them, but you need low log timeout and retries to be involved because Postgres needs to change table metadata, and during this, if it's blocked, it's not good. So again, low log timeout and retries are going to help here. What about altering? I saw a good blog post where people took some... One of the use cases for check constraints is, for example,
Starting point is 00:37:07 we often say use text in Postgres. It's in a lot of recommendations to use text and not be kind of constrained on how long. Not varchar, not char. Exactly, yeah. This, strictly speaking, is not a constraint, right? But in a broader sense, it's a constraint. But you could sense it's constraint but it's all but you could implement it with constraints oh exactly yeah and i like it because you control exactly how and when it will be verified and you know again this is exactly why again check constraints
Starting point is 00:37:38 are under appreciated and i recently i indeed prefer using just text without limits, not varchar n, but just text. And then if I need to limit, I use check constraint. I know how to set it up without downtime. And I know how to change it. If I need to increase or decrease this limit, I know how to do it, right?
Starting point is 00:38:00 And it's good. I think we only lose one benefit, or like one optimization. I'm not sure if there is this optimization for check constraints. Let's say we want to relax the constraint from 200 characters to 300 characters. I think if you're using Varchar, Postgres will handle that knowing it doesn't have to recheck existing data. Yes. But that's the only downside.
Starting point is 00:38:24 If you increase N in Varchar, it won't rewrite. And it won't scan even. Because obviously, if you just increase the limit, existing rows already are okay with this, right? So, yeah, I agree. And in the case of check constraint, I'm not sure if we have any... I don't think so.
Starting point is 00:38:43 I've not heard of one. But we can do out the three-step process you mentioned it's just right we can in fact we can probably add another constraint and then drop the existing like we exactly yeah great cool thanks so much nicolai thanks everyone for listening catch you next week yeah see you later bye

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.