Postgres FM - Get or Create
Episode Date: August 23, 2024Michael and Nikolay are joined by Haki Benita, a technical lead and database enthusiast who writes an excellent blog and gives popular talks and training sessions too, to discuss the surprisi...ngly complex topic of trying to implement “get or create” in PostgreSQL — handling issues around idempotency, concurrency, and bloat. Here are some links to things they mentioned:Haki Benita https://hakibenita.com How to Get or Create in PostgreSQL (blog post by Haki) https://hakibenita.com/postgresql-get-or-create "Find-or-insert" using a single query (how-to guide by Nikolay) https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0036_find-or-insert_using_a_single_query.md?ref_type=heads Is SELECT or INSERT in a function prone to race conditions? (Answer by Erwin Brandstetter) https://stackoverflow.com/questions/15939902/is-select-or-insert-in-a-function-prone-to-race-conditions/15950324#15950324 get_or_create() in Django https://docs.djangoproject.com/en/5.1/ref/models/querysets/#get-or-create Subtransactions Considered Harmful (blog post by Nikolay) https://postgres.ai/blog/20210831-postgresql-subtransactions-considered-harmful MERGE (Postgres documentation) https://www.postgresql.org/docs/current/sql-merge.htmlHidden dangers of duplicate key violations in PostgreSQL and how to avoid them (blog post by Divya Sharma and Shawn McCoy from the RDS team) https://aws.amazon.com/blogs/database/hidden-dangers-of-duplicate-key-violations-in-postgresql-and-how-to-avoid-them/ One, Two, Skip a Few... (blog post by Pete Hamilton from Incident) https://incident.io/blog/one-two-skip-a-few ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artwork
 Transcript
 Discussion  (0)
    
                                         Hello and welcome to PostgreSQL FM, a weekly show about all things PostgreSQL.
                                         
                                         I am Michael, founder of PgMustard.
                                         
                                         This is my co-host Nikolay, founder of PostgreSQL.
                                         
                                         Hey Nikolay.
                                         
                                         Hi Michael.
                                         
                                         How are you?
                                         
                                         Good.
                                         
                                         How are you?
                                         
    
                                         Very good.
                                         
                                         Well, yeah, me too.
                                         
                                         Thank you for asking.
                                         
                                         Thank you for asking.
                                         
                                         You like how are you doing?
                                         
                                         I'm here too.
                                         
                                         If you just want to keep chat among yourselves, I'll just wait here.
                                         
                                         Luke, we have a guest today.
                                         
    
                                         Yeah.
                                         
                                         Hi.
                                         
                                         Yeah.
                                         
                                         We are delighted to have a guest today.
                                         
                                         And it is the one and only Haki Benita, technical lead and database enthusiast, who also writes an excellent blog over at hakibanita.com that we've mentioned many times on the podcast before.
                                         
                                         He also gives popular talks and training sessions.
                                         
                                         Thank you so much for joining us.
                                         
                                         Thank you for having me.
                                         
    
                                         Big fan, very excited.
                                         
                                         The blog is great.
                                         
                                         If someone didn't read it yet, it's a must read for, I think.
                                         
                                         And long reads, right?
                                         
                                         It's long reads, but not very frequent.
                                         
                                         I guess it's not possible to make them frequent, right?
                                         
                                         Yeah, it's a lot of work.
                                         
                                         Funny story is we agreed that our bot will have your blog post in our knowledge base.
                                         
    
                                         Yeah, that's right.
                                         
                                         And thank you for that.
                                         
                                         And a few days ago, I told the team,
                                         
                                         this part of our knowledge base was not updated since January, since alpha release.
                                         
                                         And now we're approaching like beta release.
                                         
                                         And they said, no problem, we'll update and pay attention to this block, I said.
                                         
                                         And then they came and said, we have updated, only one article added since January.
                                         
                                         Okay, okay.
                                         
    
                                         But it makes sense because long reads require a lot of time.
                                         
                                         I know it very well.
                                         
                                         Yeah, it takes a very long time to produce these types of articles. Yeah, I know. I've been
                                         
                                         writing for a very, very long time and I've been very surprised about this article. I'll be honest
                                         
                                         with you, after you write for some time, you start to develop a sense of which ones are going to do
                                         
                                         well and be very accepted and which ones are going to do well and be very accepted,
                                         
                                         and which ones are going to be just one for the archives.
                                         
                                         So when I wrote stuff like comparing pandas to SQL,
                                         
    
                                         I knew that it's going to be a hit because people like pandas.
                                         
                                         And if you say pandas is slow, then people are going to hate you.
                                         
                                         And when I wrote this, me and Micah wrote something about
                                         
                                         hash indexes a few years back. And I think it's a top result for Postgres hash index on just about
                                         
                                         every search engine. So that one, you know, you release that one, you know that it's going to do
                                         
                                         well. But then I released this one. I said, man, it's so long, such a niche problem. No chance
                                         
                                         anybody's actually going to read all of that.
                                         
                                         And I was very surprised by the amount of responses
                                         
    
                                         that I received to this article.
                                         
                                         And I think the most surprising comment
                                         
                                         that I've received on this article
                                         
                                         is that a lot of Postgres people didn't know about Merge.
                                         
                                         Now, I came from Oracle, so I knew all about merge. And actually when I came to Postgres, I wrote a lot of ETL processes and I really wanted merge and I didn't have merge. So I had to learn about insert on conflict. So a lot of people told me this is the first time I heard about merge. Such a cool command. Because it's new. It's still new. It was added to 15, right?
                                         
                                         15. So it's super new.
                                         
                                         It also to be honest, insert on conflict is like what you
                                         
                                         want 99% of the cases. And while merge is nice, I think that on
                                         
    
                                         conflict is you seem pissed. pissed. Why did I say?
                                         
                                         Well, it was you describing your article, why it's bad, right?
                                         
                                         Let's talk about why it's bad.
                                         
                                         These gaps and sequences like bloat created.
                                         
                                         Yeah.
                                         
                                         This is not what people want, right?
                                         
                                         I know.
                                         
                                         Let's step back.
                                         
    
                                         Like our, Michael likes us to pronounce the topic, right?
                                         
                                         Topic is...
                                         
                                         I have to tell you something.
                                         
                                         I have to tell something.
                                         
                                         It's funny, okay?
                                         
                                         I have to tell something.
                                         
                                         So when we talked, I talked with Michael about this, doing this show, and I asked him, okay, let's do it.
                                         
                                         Can you give me like a rough outline of how it's going to go?
                                         
    
                                         And he told me, I can tell you how it's going to go. But to be
                                         
                                         honest, if Nikolai is going to be on the show, then I can
                                         
                                         promise anything.
                                         
                                         Nikolay Spilkaevich I can't promise anything.
                                         
                                         Well, I see you are the same type of person, actually.
                                         
                                         You
                                         
                                         I don't know, you delivered like on the first five minutes.
                                         
                                         Don't follow the agenda, right? So you said that you
                                         
    
                                         were surprised
                                         
                                         this topic will be popular,
                                         
                                         but how come it's super popular?
                                         
                                         It's like in computer science,
                                         
                                         particularly in databases,
                                         
                                         find or create,
                                         
                                         this is like a super common pattern, right?
                                         
                                         And the fact that in Postgres,
                                         
    
                                         these solutions have so many pros and cons,
                                         
                                         it's super interesting
                                         
                                         because everyone needs it, actually.
                                         
                                         Yeah.
                                         
                                         You build something,
                                         
                                         you want this, right? Yeah. I'm going to tell you a secret. After I published this article,
                                         
                                         I actually discovered that there are still situations where you get very unpredictable
                                         
                                         results. And I've done some experimenting in the past two weeks based on comments that I received.
                                         
    
                                         And I haven't cracked it yet.
                                         
                                         Yeah.
                                         
                                         With the lock, we're complicated.
                                         
                                         With what?
                                         
                                         And I'll even tell you another secret.
                                         
                                         There's a very, in my opinion, an unexpected difference between the way merge and insert on conflict behave under different circumstances.
                                         
                                         But, you know, we promised Michael that we described the topic
                                         
                                         before we actually dive into the details.
                                         
    
                                         Right.
                                         
                                         Let's name the topic.
                                         
                                         Yeah.
                                         
                                         Well, I was interested in your thoughts on this, actually,
                                         
                                         because I feel like you deliberately called your blog post get or create,
                                         
                                         whereas from the database side of it i've always thought of it
                                         
                                         as like insert or select and i think nicolai's called it that but in the past in like a how-to
                                         
                                         guide and things so we'll link up the blog post and we'll link up nicolai's how to and there's
                                         
    
                                         like a few other things i think one of the things you're referencing about how it's become even more
                                         
                                         complicated in the last few weeks you link me to a great answer by Erwin Brandstetter on Stack
                                         
                                         Overflow that discusses this as well.
                                         
                                         But it's a surprise.
                                         
                                         It sounds really simple, right?
                                         
                                         Select or insert, get or create.
                                         
                                         It seems such a simple piece of functionality and in small, low traffic systems.
                                         
                                         It can be like, you can, like any newbie developer could come up with a solution to
                                         
    
                                         this problem it's just gets complicated when you have to factor in concurrency and mvcc
                                         
                                         implementations and things so yeah i i loved your post i found it really interesting how complicated
                                         
                                         this could get but yeah maybe we should start simple and it'd be great to hear a little bit
                                         
                                         about it you both said it's quite a common thing that you come, like you have to implement it quite often.
                                         
                                         I haven't had to that often.
                                         
                                         Like I've got a couple of occasions, but not, maybe not as often as you do.
                                         
                                         So I'd be interested to hear like where you're coming across this.
                                         
                                         Okay.
                                         
    
                                         So my use case was very similar to what I actually implemented in the article because I had this organization table and we had members.
                                         
                                         And we wanted to have users be able to tag members
                                         
                                         with their own tags.
                                         
                                         And we wanted to encourage reuse of tags.
                                         
                                         Otherwise, they don't make much sense.
                                         
                                         So we've set up this very simple HTML form.
                                         
                                         Now, HTML forms are not SPAs.
                                         
                                         They're not very complicated.
                                         
    
                                         You can't send JSON.
                                         
                                         You just send a list of names,
                                         
                                         like from a radio selection
                                         
                                         or like an autocomplete, whatever.
                                         
                                         So at the end of the day,
                                         
                                         the server receives a list of strings
                                         
                                         and he wants to set these tags on the member.
                                         
                                         Now, if the tag exists,
                                         
    
                                         you want to reuse the tag.
                                         
                                         And if the tag does not exist, you want to create a new tag and get the ID.
                                         
                                         Because the next step would be get all the IDs and put them in the table associating tags with members, right?
                                         
                                         So that's the first part where you have a list of tag names and you want to get or create tags, this is where I came to this idea.
                                         
                                         Now, the thing that surprised me is that, now, this is not a very high traffic part of the
                                         
                                         application. I could have done the brute force approach, would have been just fine. But, you
                                         
                                         know, I wrote some unit testing and one of the tests was, let's see what happens when you just have an existing tag.
                                         
                                         You want to make sure that it's reused.
                                         
    
                                         So I used insert on conflict do nothing with returning star.
                                         
                                         So I expected that if I insert, for example, two tags and one of them already exists, I expected to get two
                                         
                                         tags in return.
                                         
                                         But in fact, what I got was just one tag.
                                         
                                         So this was very strange to me.
                                         
                                         So at this point, I started investigating and starting to explore why this is happening.
                                         
                                         And in fact, the first thing that I thought about was, let's do a meaningless update.
                                         
                                         Like instead of doing on conflict do nothing, I did on conflict set ID equals exclude ID.
                                         
    
                                         Like, let's fool the database into thinking that this tag was modified.
                                         
                                         So I get that in return.
                                         
                                         But it really bugged me because it's a very ugly solution. Why
                                         
                                         would I want to update something for no reason just to get it back? So this is where all the
                                         
                                         different scenarios started to uncover. And as I tested farther and farther, I came to the conclusion
                                         
                                         like, hell man, why is this so complicated? I mean, this is what database and applications do.
                                         
                                         This should be so simple.
                                         
                                         Why is this so complicated?
                                         
    
                                         And then I started digging.
                                         
                                         And one of the places that I eventually arrived was Django.
                                         
                                         Django is a web framework and there's an implementation of get or create.
                                         
                                         And what Django is doing,
                                         
                                         they try to fetch the record.
                                         
                                         If they find it, they return it.
                                         
                                         If they don't find it,
                                         
                                         they try to insert, right?
                                         
    
                                         But then, and that's like the funny part,
                                         
                                         the non-intuitive part,
                                         
                                         then they actually handle an integrity,
                                         
                                         unique constraint violation, and then they actually handle an integrity, unique constraint violation,
                                         
                                         and then they try to get it again.
                                         
                                         This is the select, insert, select.
                                         
                                         But then it gets even more complicated because if you have a unique constraint violation
                                         
                                         inside a transaction, it abhors the transaction, right?
                                         
    
                                         You need to wrap the second insert in another transaction.
                                         
                                         Or use sub transactions. I think Django by default uses sub transactions, which is a very bad idea, actually, but we will talk about it later.
                                         
                                         The reason that he uses sub transactions, and I know you're a very big fan of sub transactions, you use them all the time, you encourage everybody to use sub transactions. You wrote a very long blog post about why they are absolutely great, and you should use them all the time. You encourage everybody to use up transactions. You wrote a very long blog post about why they are absolutely great and you should use
                                         
                                         them all the time.
                                         
                                         But the reason that you do that in Django, the reason that they're doing that is because
                                         
                                         if you happen to call this function inside a transaction of your own, and if you reach
                                         
                                         a unique constraint violation, it gets your transaction aborted.
                                         
                                         So that's a problem. So the only way that Django can make sure that they don't get your transaction
                                         
    
                                         aborted is to execute this part in a sub-transaction of its own. Now, there's also another very
                                         
                                         interesting thing that happens here. And this is also something that I mentioned in the article.
                                         
                                         Python encouraged asking for forgiveness.
                                         
                                         So in Python, the item says that you should try something and then handle all the possible
                                         
                                         exceptions.
                                         
                                         So trying to insert a record and then handle the unique constraint violation is actually
                                         
                                         consistent with how you're expected to do things in Python.
                                         
                                         But it kind of conflicts with the way Postgres handles the situation because in Postgres,
                                         
    
                                         when you have an exception inside a transaction, it aborts the transaction, which is not a
                                         
                                         very good thing.
                                         
                                         So the approaches between how you would do things in Postgres and how you would do things in Python kind of clashed in this case.
                                         
                                         I thought it was very interesting.
                                         
                                         So, you know, I explored different approaches, like what would be the equivalent of asking for forgiveness in Postgres?
                                         
                                         What would be the equivalent of looking before you leave, check all the preconditions in advance?
                                         
                                         So, yeah, it turned out to be
                                         
                                         way more complicated than I anticipated. Yeah, that's interesting. And you explored
                                         
    
                                         several properties, right? Bloat, concurrency, constraint, and idempotency, right?
                                         
                                         Idempotency, yeah.
                                         
                                         So, I'm very curious. I
                                         
                                         saw some parts of benchmarks, but you decided not to include
                                         
                                         the performance to the resulting table. Yeah, when you compared
                                         
                                         methods, why so? Performance also interesting, right?
                                         
                                         Alex Ferrari- Performance is interesting. And I saw I saw
                                         
                                         some articles that do this from a performance point of view. In my situation, performance was not a concern.
                                         
    
                                         I was more focused on getting the functional attributes that I wanted,
                                         
                                         like idempotency, for example, was a lot more important to me than make it fast.
                                         
                                         Because at the end of the day, you don't update lots of tags.
                                         
                                         You probably set no more than five. So performance was not a concern. I did want to
                                         
                                         focus on the functional attributes. Also, we've got
                                         
                                         pretty long. So at some point, you need to let something go,
                                         
                                         you know?
                                         
                                         Sanyam Bhutaniyaraman Right. Yeah. Let's maybe talk about
                                         
    
                                         your favorite method right now. And for future for future, I
                                         
                                         guess it's merge, right? Becausegres 17 it will have returning clause actually i didn't know and when i read about
                                         
                                         it in your blog post i immediately thought oh this is a candidate to be one of my favorite
                                         
                                         additions to postgres 17 i somehow overlooked it so it's great before Before, like, Merge was added to Postgres 15.
                                         
                                         It was being developed more than 10 years, I think.
                                         
                                         It was a huge story.
                                         
                                         But it lacks returning in 15 and 16.
                                         
                                         Now, in 17,
                                         
    
                                         to be released soon,
                                         
                                         it's receiving returning.
                                         
                                         And it looks like
                                         
                                         all your checkboxes are green, right?
                                         
                                         I mean...
                                         
                                         Yeah. It's the favorite the favorite. You would choose this if you
                                         
                                         already were on Postgres 17.
                                         
                                         But before 17, what's your favorite method?
                                         
    
                                         Well, insert on conflict, do nothing with a union is currently
                                         
                                         my way to go. And I expected to remain
                                         
                                         my go-to approach,
                                         
                                         mostly because of habit,
                                         
                                         and also because I experimented with the new merge
                                         
                                         under different circumstances in Postgres 17.
                                         
                                         And it turns out that merge can actually fail
                                         
                                         with unique constraint violation.
                                         
    
                                         So I found it unexpected.
                                         
                                         I don't know because I don't know much about how these two are different
                                         
                                         in the underlying implementation, but I'm guessing that they are not implemented
                                         
                                         the same way. And we talk about it commuted level.
                                         
                                         Because I think it also matters.
                                         
                                         So after I published this article, a reader reached out to me and he said,
                                         
                                         I really liked your post and everything is very comprehensive, blah, blah, blah.
                                         
                                         But there is one race condition that can still cause inconsistent results.
                                         
    
                                         So basically, if you have two sessions inserting the exact same tag at the exact same time,
                                         
                                         then you can end up with a situation where using insert on conflict, you would not get
                                         
                                         item potency, meaning the first statement would return the tag, but the second would return nothing.
                                         
                                         And the reason for that is when you start the first transaction,
                                         
                                         you insert the tag A. So tag
                                         
                                         A does not exist. And insert
                                         
                                         the tag, right? And then
                                         
                                         you get that in return. but then you don't commit.
                                         
    
                                         Now another session starts, the transaction begins, and now you do the exact same thing.
                                         
                                         You do insert tag A on conflict, do nothing, returning star, and then you select from the
                                         
                                         table and select from the returning. What's happening now is interesting
                                         
                                         because right now the row is locked so the second session hangs. So now you commit the first session
                                         
                                         so the second session at this point is going to return nothing and the reason is, insert on conflict, lock, it encountered a unique constraint violation, so the row was not inserted.
                                         
                                         But then when you try to select from the table, it found nothing because the snapshot is older than the other transaction.
                                         
                                         So this is a scenario where you get,
                                         
                                         you don't get item potency.
                                         
    
                                         You expect to get the same output in both cases,
                                         
                                         but you don't.
                                         
                                         And the more interesting thing is if you do the exact same exercise with merge,
                                         
                                         you get different results.
                                         
                                         What do you get?
                                         
                                         Can you guess?
                                         
                                         I found it to be surprising.
                                         
                                         If you do the
                                         
    
                                         exact same experiment with merge, you're going to get unique
                                         
                                         constraint violation. Even though you can say in the merge
                                         
                                         clause, you can say, on conflict, do nothing, when match
                                         
                                         do nothing, you are still going to get unique constraint
                                         
                                         violation. So this raises a question of whether merge is really or I would
                                         
                                         I wouldn't want to say something like is it safe in concurrent highly concurrent deloads but
                                         
                                         it doesn't do what it promised. Better understand how merge handles these types of situations you know.
                                         
                                         Yeah it shouldn't produce unique constraint violation in read committed, but it does.
                                         
    
                                         Yeah, okay.
                                         
                                         But it should be easy to check based on what you say, like if you just do pgbench multiple
                                         
                                         sessions in parallel doing the same thing, right?
                                         
                                         It's very easily reproducible.
                                         
                                         You just need two terminal windows and you can immediately replicate it.
                                         
                                         The thing I found to be, I would say,
                                         
                                         disturbing, yeah, and I'm using air quotes for those of you just listening, is that when you do
                                         
                                         merge and you set on match do nothing, you don't expect to get unique constraint violation.
                                         
    
                                         Like if I wrote this statement in my code, I would not handle any constraint violation. Okay.
                                         
                                         Because if I'm inserting into a row and I know that it can raise, I would handle the
                                         
                                         situation and do whatever I need to do.
                                         
                                         But I would not expect to get integrity error when I explicitly handle the case where there
                                         
                                         is a match.
                                         
                                         So I found this to be surprising.
                                         
                                         So to answer your previous question,
                                         
                                         what would I use now?
                                         
    
                                         And that's a tough choice between you don't get the expected result
                                         
                                         to you get an exception.
                                         
                                         You know, when I'm thinking about right now,
                                         
                                         I think that it kind of makes sense
                                         
                                         to get the exception, right?
                                         
                                         Yeah, well, I guess we're moving
                                         
                                         from the end of article
                                         
                                         backwards, right?
                                         
    
                                         So I'm not sure the problem you described,
                                         
                                         is it discussed somewhere?
                                         
                                         Do you know, like, this behavior,
                                         
                                         was it discussed in mailing lists or somewhere, no?
                                         
                                         I don't know.
                                         
                                         I'm guessing that it might have, but I haven't seen any.
                                         
                                         And also, as you said, it's kind of new in Postgres,
                                         
                                         so it's possible that some of the people that worked on it
                                         
    
                                         didn't have enough time to actually document it.
                                         
                                         I wouldn't say document, but maybe write these types of articles on it,
                                         
                                         analyzing how it behaves under different circumstances.
                                         
                                         Yeah, I'm curious.
                                         
                                         I think this should be documented.
                                         
                                         Maybe it's actually documented.
                                         
                                         We should check.
                                         
                                         But it's super interesting.
                                         
    
                                         I never used merchant production, never, never ever.
                                         
                                         But, yeah, so,
                                         
                                         well, honestly,
                                         
                                         most production systems are still
                                         
                                         behind a lot
                                         
                                         in terms of major version.
                                         
                                         But moving backwards,
                                         
                                         insert on conflict
                                         
    
                                         do nothing or do update. This is
                                         
                                         interesting because, so, we
                                         
                                         said our goal is
                                         
                                         find or create or
                                         
                                         get or create.
                                         
                                         But when you say
                                         
                                         let's also update, it's like
                                         
                                         third layer of logic.
                                         
    
                                         Find or create
                                         
                                         but if you found, also
                                         
                                         update. It's more complicated.
                                         
                                         I like it, but it's already a different
                                         
                                         task.
                                         
                                         When I wrote lots of ETL, I don't know, 15 years ago in Oracle, I used merge all the time. It was like a third hand. Every time I needed to do anything, I would use merge because
                                         
                                         when you do ETL processes, you're basically taking data from one place and you want to sync it to
                                         
                                         some kind of target. So you use merge all the time.
                                         
    
                                         In Postgres, I kind of got used to not using it.
                                         
                                         And also, some may also claim that the whole ETL,
                                         
                                         the way that we do ETL now is kind of different
                                         
                                         than the way we used to do ETL 15 years ago.
                                         
                                         A lot of it is the same, but some of it is still kind of different.
                                         
                                         Yeah, I think it's interesting.
                                         
                                         And hopefully when more people adopt merge, I know that it was
                                         
                                         a very desirable feature, right?
                                         
    
                                         A lot of people wanted that.
                                         
                                         Right.
                                         
                                         People were very excited when it came out.
                                         
                                         It was all in the, what I anticipate, what I expect most N version,
                                         
                                         you know, the merge command.
                                         
                                         So hopefully we're going to see some people doing interesting stuff with merge.
                                         
                                         Yeah, and I was surprised to see, so there is difference between insert and conflict do nothing
                                         
                                         and do update in the table you provided in the end of the article. And update there is a like red cross in the column of bloat meaning that it
                                         
    
                                         has bloat issues obviously update okay but also you have a green checkbox for insert on config
                                         
                                         do nothing the thing is that when we created our bot we also use insert on config do nothing
                                         
                                         for knowledge base for example if your article gets inserted,
                                         
                                         it searches if this URL is known already, right?
                                         
                                         And if it's known already, do nothing.
                                         
                                         We decided not to apply update yet.
                                         
                                         So I was surprised to see you concluded
                                         
                                         that there is no bloat issues
                                         
    
                                         because we had a lot of activities, a lot of data to insert
                                         
                                         and it suffers from bloat.
                                         
                                         We have gaps.
                                         
                                         So if you have
                                         
                                         collision,
                                         
                                         right, so you inserted something,
                                         
                                         you try to insert and do nothing,
                                         
                                         it actually generates that
                                         
    
                                         topple, I think. No?
                                         
                                         No, I think that, first of all...
                                         
                                         Gaps for sure. I just checked.
                                         
                                         Yeah, gaps for sure.
                                         
                                         Yeah, gaps for sure.
                                         
                                         And, okay, I know what you're talking about.
                                         
                                         You're talking about the issue of bloat, right?
                                         
                                         Bloat, yeah. So this is also a very interesting topic,
                                         
    
                                         which I briefly mentioned,
                                         
                                         but I think that this is also something
                                         
                                         that some people commented about,
                                         
                                         that they weren't aware of the
                                         
                                         fact that it causes bloat. So apparently in Postgres, when you have unique constraint violation,
                                         
                                         it can cause bloat. Because the way it works is that Postgres tries to, it basically inserts
                                         
                                         the row into the table, and then it checks that there are no unique constraint violations.
                                         
                                         And if there is a unique constraint violation, this row is marked as dead, which causes bloat. So this is what's happening
                                         
    
                                         if you rely, heavily rely on catching unique constraint violation. However, however, however,
                                         
                                         and this is also in the original RDS article, which is how I found out about it. It was also
                                         
                                         very unexpected that unique constraint violation would cause bloat.
                                         
                                         But according to the article and according to the benchmarks that I've done,
                                         
                                         when you use insert on conflict, it uses a different mechanism
                                         
                                         that can check in advance for collisions, which prevents bloat.
                                         
                                         So I'm pretty confident that insert on conflict do nothing cause bloat,
                                         
                                         which is a big plus for do nothing. And also, you know, we talked before about the difference
                                         
    
                                         between how you do things in Python, like asking for forgiveness, versus look, look before you
                                         
                                         leap. So in Python, the pattern that is encouraged, basically try to
                                         
                                         insert something and then handle the exception, can end up causing lots of bloat, which is a problem.
                                         
                                         So if you have a process with lots of collision where you actually rely on unique constraint
                                         
                                         violation, this is something that you need to think about. Right. So yeah, I double checked.
                                         
                                         You're right. No bloat. But sequence gaps are happening.
                                         
                                         Yeah, sequence gaps.
                                         
                                         You know, there was a period where I like to ask in job interviews, how would you implement gapless sequences?
                                         
    
                                         Have you ever talked about this?
                                         
                                         Evil question.
                                         
                                         Evil question.
                                         
                                         Yeah, it has many depths to go into, right?
                                         
                                         Deeper, deeper, deeper, yeah.
                                         
                                         There's a company that a few of my friends
                                         
                                         and former colleagues work at called Incident.io.
                                         
                                         They do incident management within things like Slack and Teams.
                                         
    
                                         And they had this issue where they blogged about it,
                                         
                                         but they wanted gapless sequences for their incident numbers they didn't the customers
                                         
                                         got confused by the idea of
                                         
                                         the incident number jumping by like
                                         
                                         five when they'd only had you know
                                         
                                         they've had one incident then a second incident and a third incident
                                         
                                         suddenly they're on incident six what happened
                                         
                                         so there's a really nice blog
                                         
    
                                         post there
                                         
                                         explaining how they were
                                         
                                         yeah yeah but I'm curious
                                         
                                         why are you bothered by sequence I'm curious why like why
                                         
                                         are you
                                         
                                         bothered by
                                         
                                         sequence gaps
                                         
                                         Nikolai
                                         
    
                                         like why
                                         
                                         is that an
                                         
                                         issue
                                         
                                         well it's
                                         
                                         not a big
                                         
                                         issue
                                         
                                         it's just
                                         
                                         it might be
                                         
    
                                         an issue
                                         
                                         for some
                                         
                                         cases but
                                         
                                         actually I
                                         
                                         don't care
                                         
                                         it gave
                                         
                                         me false
                                         
                                         impression
                                         
    
                                         that there
                                         
                                         is bloat
                                         
                                         that's why
                                         
                                         but now I
                                         
                                         double checked
                                         
                                         I kept
                                         
                                         absolutely right
                                         
                                         no bloat
                                         
    
                                         and for
                                         
                                         so in certain conflict do nothing probably is the winner in Postgres 16 or older.
                                         
                                         I've got a question.
                                         
                                         I've got a question here because you mentioned the kind of trade-off between the issues in the highly concurrent cases
                                         
                                         or like the potential issue of insert on conflict, do nothing returning null.
                                         
                                         If, if you happen to have this, uh, case with the two concurrent sessions in
                                         
                                         setting the exact same tag at the same time, wouldn't insert on conflict,
                                         
                                         do update avoid that issue at the cost of some bloke?
                                         
    
                                         Well, that's a different issue.
                                         
                                         The issue I describe is that when you have two transactions trying to insert a new tag
                                         
                                         at the same time, the same new tag at the same time, then the second on conflict would
                                         
                                         not do anything and it would also not return anything.
                                         
                                         This is the unexpected part because one of my requirements is that the
                                         
                                         operation would be idempotent.
                                         
                                         So if I give it tag A, I expect to get an id return.
                                         
                                         So in this case, there is a scenario where I provide it with a list of N tags
                                         
    
                                         and I get less than N tags in return.
                                         
                                         Henry Suryawirawan, But if they both updated, imagine if both concurrent sessions were doing insert and conflict update, they both get to the update stage.
                                         
                                         You get two updates, but you still get back all the tags that you wanted to insert.
                                         
                                         First of all, in my scenario, I don't update.
                                         
                                         This is a very, a lot of people got this very confused.
                                         
                                         This is why I added a comment and some people made like a very, let lot of people got this very confused. This is why I added a comment.
                                         
                                         And some people made like a very, let's call them funny comments about it.
                                         
                                         Okay.
                                         
    
                                         But there's a difference between upcert where you want to update or insert,
                                         
                                         which is also a very, very popular pattern.
                                         
                                         In this case, by the way, it's simpler, because if you actually update the
                                         
                                         row, then it's going to get returned by returning star returning. Yeah, so that's like the easy
                                         
                                         case. In my case, I don't want to touch the row, but I still want to get it back. This is why get
                                         
                                         or create is like more difficult variation variation of absurd if you wait.
                                         
                                         Okay.
                                         
                                         So, uh, there, there's a tricky part.
                                         
    
                                         So apparently there's no good solution, by the way, the block, the post by
                                         
                                         Irwin on stock overflow lands on a brute force solution where you basically loop
                                         
                                         until you get what you expect.
                                         
                                         Uh, I don't know
                                         
                                         if I would have if I maybe this is the way to go.
                                         
                                         Alex Ferrari- loop where in different language or in
                                         
                                         PLPGSQL?
                                         
                                         Sanyam Bhutani- He ended up writing a function where you
                                         
    
                                         essentially loop until you stop getting unique constraint
                                         
                                         violations. Yeah.
                                         
                                         Alex Ferrari- But you need the sub transactions. So
                                         
                                         this is no go for me. Right? This doesn't scale. I mean, this
                                         
                                         Nikolai, you're a man with a mission.
                                         
                                         Well, it's no joke. I will have clients literally last week, I
                                         
                                         spent like half one hour and people went and and switched off
                                         
                                         it in Django. Because they because it hurts. It hurts constantly
                                         
    
                                         people. Like, sub-trans SLRU
                                         
                                         wait event. It happens
                                         
                                         all the time. So, the fact that Django
                                         
                                         by default uses them and people
                                         
                                         don't realize and then come to us
                                         
                                         with sub-trans SLRU, it's good for my
                                         
                                         income, right? But
                                         
                                         it's bad in general,
                                         
    
                                         right? By the way,
                                         
                                         I wanted to highlight one thing.
                                         
                                         If you do insert on conflict do nothing,
                                         
                                         returning star, it won't return anything
                                         
                                         if row already exists.
                                         
                                         I just wanted to highlight that you have a trick.
                                         
                                         You use CTE, right?
                                         
                                         And then select for such case, right?
                                         
    
                                         Yeah.
                                         
                                         Additionally, with union, right?
                                         
                                         Yeah, that's right.
                                         
                                         Union or what?
                                         
                                         Yeah, this is the
                                         
                                         only way that you can actually communicate between parts of a query. Otherwise, they all see the same
                                         
                                         snapshot and you don't get this is a trick also not straightforward at all. Yeah, like some people
                                         
                                         can move away from insert on conflict do nothing just because of that. Yeah. Right. Yeah. One of many cool tricks in that book.
                                         
    
                                         I went through this process where I try to figure out, wait, I just inserted
                                         
                                         this row, why when I select outside the CTE, I don't give it to me.
                                         
                                         Yeah.
                                         
                                         Yeah.
                                         
                                         I understand there's some cost to upset, but given the complexity we're running
                                         
                                         into here, is it, would you both see it as acceptable to pay that, that extra cost, the kind of heavier?
                                         
                                         I know you, I know you mentioned the kind of double hit of the table being annoying, but like those updates, even though they'd be setting the tag name to the exact same tag name, potentially over and over and over again.
                                         
                                         I just see the, the guarantees that provides is so attractive personally.
                                         
    
                                         I'll be honest with you.
                                         
                                         I understand what you're saying and you're probably correct.
                                         
                                         This is like the practical approach, but I would not do it.
                                         
                                         It just bugs me.
                                         
                                         I would not do it.
                                         
                                         I would not do a meaningless update just to say, I would just do two queries.
                                         
                                         I would insert and then I would select separately.
                                         
                                         It's like insert, rollback, insert, rollback. You end up having a lot of bloat depending on the concrete situation.
                                         
    
                                         So, sub-transactions and bloat, huge limitations of performance here, right? I know. I think this is a case where you kind of
                                         
                                         understand the limitation
                                         
                                         and restrictions of the database
                                         
                                         and you kind of end up
                                         
                                         solving it at the application level.
                                         
                                         Now,
                                         
                                         you mentioned at the beginning, I know that
                                         
                                         I, well, get
                                         
    
                                         or create is
                                         
                                         useful, but
                                         
                                         I haven't had a chance to implement it as much.
                                         
                                         So, yeah, for the rare occasion when I actually need to do get or create,
                                         
                                         I would just do it at the application level.
                                         
                                         Because as it turns out, it's very complicated at the database level.
                                         
                                         Right. In application, it becomes simpler.
                                         
                                         You probably don't deal with,
                                         
    
                                         probably don't deal
                                         
                                         with sub-transactions
                                         
                                         and bloat,
                                         
                                         but latency,
                                         
                                         like,
                                         
                                         route trip times
                                         
                                         between attempts
                                         
                                         to do one step,
                                         
    
                                         another step,
                                         
                                         it increases chances
                                         
                                         of failures,
                                         
                                         right?
                                         
                                         I mean,
                                         
                                         collisions and so on.
                                         
                                         Yeah,
                                         
                                         but you know,
                                         
    
                                         if my main business
                                         
                                         is to get or create,
                                         
                                         I would come up
                                         
                                         with a very innovative solution, put a lot of engineering work into it.
                                         
                                         But if it's just updating tags for members in an organization, I would go.
                                         
                                         Yeah, but I understand what you're saying.
                                         
                                         I spent a lot of time doing unnecessary optimizations on, you know, weird places in the code just for fun.
                                         
                                         I do it all the time.
                                         
    
                                         Yeah, like 90% of my blog posts is inspired by these strange endeavors
                                         
                                         where I try to optimize things.
                                         
                                         So yeah, this is interesting.
                                         
                                         And I think that this very simple problem
                                         
                                         surfaced a lot of issues.
                                         
                                         I also learned a lot from writing this.
                                         
                                         It got me interested in speculative insertion, the mechanism used by insert on conflict.
                                         
                                         It, I think, brightened my understanding of how the interaction between different queries and common table expressions work within the same query. of things that I'm now aware of them. So at least I'm better equipped to debug issues I might have, and don't even know about.
                                         
    
                                         Yeah.
                                         
                                         Yeah.
                                         
                                         I really liked it as a journey.
                                         
                                         I, I would encourage people to read this, even if they don't have this problem, just
                                         
                                         as a, almost as just to watch somebody go through this problem.
                                         
                                         Uh, and you know, I feel like you've included things as well.
                                         
                                         Like there's one section that I took a wrong turn as well.
                                         
                                         Like that's so helpful to share those things with people
                                         
    
                                         because otherwise you can read some of these blog posts
                                         
                                         and they just sound like the person knows,
                                         
                                         like just got it right straight away.
                                         
                                         And it's quite frustrating as somebody that quite often goes down long term
                                         
                                         for me reading those.
                                         
                                         So I appreciated that. I just, I wish you had also some warning about sub transactions anyway
                                         
                                         you know what let me just change the banner at the top of the website instead of the about
                                         
                                         page i would just place like this huge warning they don't have sub-transactions. Let me put it right here. If you see PLPG-SQL with begin,
                                         
    
                                         exception, when, or something.
                                         
                                         So anyway, nested, begin, and blocks.
                                         
                                         This is sub-transaction.
                                         
                                         Yeah.
                                         
                                         But in this case, it's warranted
                                         
                                         because otherwise it cancels any calling transaction.
                                         
                                         So that would be the responsible thing to do in this case.
                                         
                                         But you know what? Maybe
                                         
    
                                         it needs a comment saying this
                                         
                                         is a good sub-transaction.
                                         
                                         It's not a good sub-transaction because
                                         
                                         if you have high... This is
                                         
                                         the least worst type of sub-transaction.
                                         
                                         I cannot agree with you
                                         
                                         here because you talk about
                                         
                                         you explore
                                         
    
                                         collisions and concurrency issues.
                                         
                                         Yeah.
                                         
                                         It means you think that there will be a lot of sessions issuing this query.
                                         
                                         It means eventually the project will bump into this wall.
                                         
                                         Yeah.
                                         
                                         Right?
                                         
                                         Good sub-transactions is, for example, DDL, which happens not often in one session.
                                         
                                         This is good sub-transaction. And even there it can become bad. But here I cannot agree because we should consider multiple sessions fighting and this is
                                         
    
                                         where things can be wrong. What you're saying reinforces what I'm saying that in this
                                         
                                         situation, because it's so complicated in the database. I would just elevate that to the
                                         
                                         application level and try to find a solution there to avoid all the complexity. But it was a nice
                                         
                                         thought exercise. Just to try a regular transaction, maybe, right? It should not be
                                         
                                         such complex, right? I know. I was surprised that it was so complicated. Why is it so? What would you
                                         
                                         both like? Is it that you would like merge to work as expected? What is the nice solution on
                                         
                                         the database side for this? Well, the dream syntax would be for select do nothing returning star to
                                         
                                         just return everything. That would be like the dream syntax syntax i think part of the problem is that
                                         
    
                                         returning star is quite like not super popular thing which was added later to insert and update
                                         
                                         and delete it was not originally there so it's not supernatural thing right also once you've
                                         
                                         defined that behavior it's so difficult to like you can't change the behavior for existing applications. No, you can't.
                                         
                                         And I can even give you an example because you usually use insert on conflict with, you
                                         
                                         know, there is no merge command in Postgres.
                                         
                                         So if you want to sync data, you use insert on conflict and then when matched do nothing.
                                         
                                         Okay.
                                         
                                         And more often what you want to do is you want to
                                         
    
                                         avoid the unnecessary update so you do when matched do update where something actually changed
                                         
                                         okay and then some rows ends up not being affected so at this point you do returning star and then
                                         
                                         usually what i like to do in my ETL processes is I count the
                                         
                                         rows so I can log how many rows were affected. Right? So now the question is, if I expect to get
                                         
                                         the number of affected rows, we go for a circle here. I know. So if I'm just expecting the affected rows, then if I'm going to get rows
                                         
                                         that were not affected, also very strange, right?
                                         
                                         Right.
                                         
                                         Yeah.
                                         
    
                                         So what's the conclusion?
                                         
                                         Come on, guys.
                                         
                                         I mean, what's the conclusion?
                                         
                                         Just do it in the application?
                                         
                                         Don't go there.
                                         
                                         The main conclusion is don't go there.
                                         
                                         Don't think about it.
                                         
                                         Yeah. If error happens,
                                         
    
                                         it only
                                         
                                         confirms that application developers
                                         
                                         should implement proper
                                         
                                         retry logic.
                                         
                                         If transaction fails,
                                         
                                         code should be ready to retry it.
                                         
                                         That's it. Universal
                                         
                                         solution to everything.
                                         
    
                                         We should rename this to
                                         
                                         just do it in the application FM.
                                         
                                         Well.
                                         
                                         It's complicated FM.
                                         
                                         It's complicated.
                                         
                                         No, it is complicated.
                                         
                                         Jokes aside, I don't know why it's so complicated.
                                         
                                         But it is.
                                         
    
                                         Turns out concurrency is hard.
                                         
                                         Yeah.
                                         
                                         You know, I remember one of the things
                                         
                                         that I remember is that when I read this RDS article
                                         
                                         about unique constraint violation causing bloat, I was shocked.
                                         
                                         I was shocked because coming from Python, I'm like encouraged to trigger exceptions.
                                         
                                         This is how I do things.
                                         
                                         I fail and then I adjust.
                                         
    
                                         So that was very strange for me. I have this
                                         
                                         entire lecture about how to handle concurrency
                                         
                                         with a short URL system and everything. There's a problem of how
                                         
                                         you generate short random IDs. How do you do it?
                                         
                                         You need to, if you know a short URL system, you need to generate keys.
                                         
                                         Very small keys.
                                         
                                         So the space is limited.
                                         
                                         So you want to generate random unique keys.
                                         
    
                                         So how do you do it?
                                         
                                         So you end up in a solution where you try to insert and then you loop until you don't get unique constraint violations anymore.
                                         
                                         Right? constrained violations anymore. Right. And now all I'm thinking is I may have caused lots of
                                         
                                         bloat without intention.
                                         
                                         Maybe you have a better solution.
                                         
                                         By the way, it's very similar to the question about
                                         
                                         the gapless sequences.
                                         
                                         Yeah.
                                         
    
                                         I get that bloat is bad, right?
                                         
                                         Like I understand, I see lots of slow queries that have been
                                         
                                         of course like caused by bloat.
                                         
                                         But once you're aware of it, it's like something that can be managed.
                                         
                                         And you've written a great blog post about how to manage it hacky as well that I'll include.
                                         
                                         But it feels to me like it might be one of those pains worth paying sometimes when you're aware.
                                         
                                         Like every single Postgres table is going to have some bloat in it in a natural like working life.
                                         
                                         You're not going to get it down to zero even when you do some of these maintenance tasks so we're always managing some amount it's just like what's a
                                         
    
                                         what's a healthy amount and can can we keep it within those boundaries it's going to be more
                                         
                                         painful with certain workflows like maybe once we're getting near the end of that like once we
                                         
                                         get past maybe halfway full of that space we're starting like an average of two tries per unique URL might start to get
                                         
                                         annoying. So maybe that's the point where you start
                                         
                                         expanding the space by an extra digit
                                         
                                         or something. I imagine
                                         
                                         once you're aware of the problems, you can design a
                                         
                                         solution that isn't that bad and
                                         
    
                                         then manage the bloat. Well, it all boils
                                         
                                         down to how many collisions do you actually expect
                                         
                                         to get? Yeah. I think
                                         
                                         the fact that
                                         
                                         rollback
                                         
                                         inserts
                                         
                                         cause
                                         
                                         bloat
                                         
    
                                         makes me
                                         
                                         think that
                                         
                                         only undo
                                         
                                         people can
                                         
                                         save us
                                         
                                         someday.
                                         
                                         We had a
                                         
                                         quote last
                                         
    
                                         week.
                                         
                                         The new
                                         
                                         term undo
                                         
                                         people because
                                         
                                         indeed if you
                                         
                                         think about it.
                                         
                                         We should
                                         
                                         credit Melanie.
                                         
    
                                         So this
                                         
                                         different approach would place new rows
                                         
                                         with the same position in the page, right?
                                         
                                         And if it committed, it's there.
                                         
                                         If not, like...
                                         
                                         I mean, the different approach would not cause bloat
                                         
                                         if inserts is rolled back, right?
                                         
                                         So this is just the design
                                         
    
                                         of Postgres MVCC, we should
                                         
                                         blame. Right?
                                         
                                         Yeah.
                                         
                                         It's very unexpected, for sure.
                                         
                                         Yeah. Well, at least we can run
                                         
                                         very long queries without worrying
                                         
                                         about undo segments
                                         
                                         running out.
                                         
    
                                         I've been there.
                                         
                                         And it's great to have a former
                                         
                                         occupant understanding the
                                         
                                         downsides of under too, right?
                                         
                                         Haki, any
                                         
                                         last things that we should have mentioned
                                         
                                         that we didn't? No, I think that
                                         
                                         we zigzagged across this article
                                         
    
                                         very, very nicely.
                                         
                                         Nikolai definitely delivered on his promise.
                                         
                                         Yeah, well,
                                         
                                         are you planning part two with benchmarks?
                                         
                                         I don't know.
                                         
                                         I still don't care much about performance in this case,
                                         
                                         but I think that I might add an update
                                         
                                         on some of the findings about how merge and insert
                                         
    
                                         can end up eventually causing unique constraint violation
                                         
                                         and non-idempotent results.
                                         
                                         I hope that you implemented Upsert
                                         
                                         so you can identify where blog post is updated
                                         
                                         so your AI can answer this correctly.
                                         
                                         Oh, this is a good goal, actually.
                                         
                                         We don't have that.
                                         
                                         Yeah. Yeah.
                                         
    
                                         Brilliant.
                                         
                                         Misconceptions about blog posts.
                                         
                                         They never update.
                                         
                                         Challenge accepted.
                                         
                                         Challenge accepted.
                                         
                                         Yeah, I just solved your get or create problem.
                                         
                                         You can just update all the time.
                                         
                                         It's not that easy because we have many preprocessing stuff
                                         
    
                                         for really long articles as yours.
                                         
                                         Because, you know know the LLM
                                         
                                         usually has some limitation. OpenAI
                                         
                                         has like 8k
                                         
                                         tokens as input.
                                         
                                         I'm sure your article exceeds it
                                         
                                         a lot.
                                         
                                         I'll try to write shorter ones.
                                         
    
                                         Yeah.
                                         
                                         Probably you should have two versions
                                         
                                         for people who
                                         
                                         have issues with long reads.
                                         
                                         Yeah.
                                         
                                         I do actually.
                                         
                                         You like blog posts in parts?
                                         
                                         I like short versions till there, but with some details.
                                         
    
                                         I'm a big fan of long blog posts, Hakan.
                                         
                                         I know there's a lot of others out there too.
                                         
                                         So thank you for continuing to write for us.
                                         
                                         Yeah.
                                         
                                         Yeah.
                                         
                                         It's something that I'm going to cite for sure.
                                         
                                         Yeah.
                                         
                                         Links will be used to this article.
                                         
    
                                         Yeah.
                                         
                                         Thank you for this.
                                         
                                         I need an idea for the next one.
                                         
                                         So if you have one, ping me.
                                         
                                         I have a working title in my mind.
                                         
                                         It's called How to Take a Dump Like a Pro.
                                         
                                         I couldn't resist the title.
                                         
                                         That's a huge topic, actually.
                                         
    
                                         Yeah, I've actually had some experience with it lately
                                         
                                         and I think that it's interesting
                                         
                                         to see how you can optimize PG dump
                                         
                                         with the different compression algorithms,
                                         
                                         how they affect the CPU.
                                         
                                         And when dumping in parallel is actually not useful at all.
                                         
                                         On partition tables?
                                         
                                         No, if you have one big table.
                                         
    
                                         Right. It's possible, but you need to engineer yourself with
                                         
                                         snapshots and repeatable retransactions.
                                         
                                         So that's common. And we use ranges of IDs.
                                         
                                         This is what PRDB, for example,
                                         
                                         does for logical replication initialization.
                                         
                                         So it's possible, for sure.
                                         
                                         Yeah, but I do have another consideration
                                         
                                         that you didn't think about.
                                         
    
                                         Because one of my restrictions
                                         
                                         was that I wanted to stream the output directly to S3 and not
                                         
                                         to the local file system. Imagine that you run this in a Lambda. In this case, using parallel
                                         
                                         is problematic. But using a single file, you can just stream it directly and skip the file system. And also if you want to pass the dump through some
                                         
                                         kind of encryption, then also doing things in parallel makes it a bit difficult. So yeah,
                                         
                                         different restrictions, different solution.
                                         
                                         When you do this, like you find some like lacking feature, do you have sometimes idea
                                         
                                         to implement
                                         
    
                                         something?
                                         
                                         Implement? To become a hacker.
                                         
                                         To become a hacker.
                                         
                                         What, like contribute to
                                         
                                         Postgres?
                                         
                                         Because if you, well, dump lacks
                                         
                                         a lot of stuff all the time.
                                         
                                         But it's super complicated to add some things
                                         
    
                                         there, but definitely it lacks a lot
                                         
                                         of stuff.
                                         
                                         I'm curious if you had such feeling
                                         
                                         like idea like, oh, I'm going to try to implement this.
                                         
                                         No, I I looked at the Postgres code many, many times. I think that the documentation in the source itself is absolutely
                                         
                                         excellent. I remember when we worked on this hash indexes
                                         
                                         article, we looked at the actual source file. There's lots of very, very useful information.
                                         
                                         By the way, this information was a lot more useful than anything I can find online,
                                         
    
                                         including the documentation itself. So it was absolutely excellent. And I think that the code
                                         
                                         is very well written. I'll be honest with you.
                                         
                                         I don't think I'm capable enough to contribute to Postgres itself.
                                         
                                         But, you know, I do from time to time think about my contribution, my type of contribution
                                         
                                         to the Postgres community.
                                         
                                         And I know that the most straightforward way to contribute to Postgres is to contribute
                                         
                                         code. But I feel like I contribute in my way by advocating
                                         
                                         for Postgres and educating people how to use Postgres correctly. Because I believe that if
                                         
    
                                         people would have good systems, they'll be happy with Postgres. They'll continue using it. They'll
                                         
                                         recommend it to their friends. And this is how
                                         
                                         you build a community. So I think that I'm more capable in telling stories, educate people about
                                         
                                         Postgres, working on ETL processes and applications than I do working on, you know, the tooling,
                                         
                                         the internal stuff. There are people much smarter than me that can do this stuff. I'll write about them.
                                         
                                         Well, yeah, Postgres documentation lacks how-to recipes,
                                         
                                         and this is what you do.
                                         
                                         You describe some recipes and pros and cons.
                                         
    
                                         This is an excellent thing.
                                         
                                         Yeah, war stories.
                                         
                                         You absolutely do all those things, and we really appreciate it, Haki.
                                         
                                         Thank you so much for coming on as well.
                                         
                                         It's been a pleasure having you here.
                                         
                                         Okay, thank you, Nikolai, Michael.
                                         
                                         It was a pleasure.
                                         
                                         This is the first time for me in a podcast.
                                         
    
                                         Michael Weiszner But you did great.
                                         
                                         Haki Shabnamurthy Thank you so much.
                                         
                                         Bye.
                                         
                                         Nikolai Weiszner Thank you.
                                         
                                         Haki Shabnamurthy Take care.
                                         
                                         Haki Shabnamurthy Thanks.
                                         
