Postgres FM - Queues in Postgres

Episode Date: April 21, 2023

Nikolay and Michael discuss queues in Postgres — the pros and cons vs dedicated queuing tools, and some tips for scaling. A couple of apologies-in-advance: Near the end, we incorrectly sa...y "idempotent" when we meant "stateless", and also 50 TPS instead of 500 TPSWe also had a couple of audio issues, sorry!Here are links to a few things we mentioned: Recent discussion on Hacker NewsPgQWhat is SKIP LOCKED (blog post by Craig Ringer) autovacuumPostgres queues (blog post by Brandur)pg_repackOur episode on partitioningNikolay’s Twitter pollSubtransactions Considered Harmful (blog post by Nikolay)~~~What did you like or not like? What should we discuss next time? Let us know on social media, or by commenting on our Google doc.If you would like to share this episode, here's a good link (and thank you!)~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I'm Michael, founder of PgMaster. This is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, what are we talking about today? Hi Michael, last time we talked about read-only workloads and the case when we have data being not changed for significant periods. So how can we adjust everything for such kind of workload? Let's talk about different case. To extend this discussion, let's consider the case when we change things very quickly. Particularly, let's discuss the idea of using Postgres for queuing or message system. We have some tasks, we need to process them and consider being processed and so on so instead of using
Starting point is 00:00:46 things like rabbit mq kafka sidekick or something like that let's consider having similar thing right on poses yeah i hadn't considered about it being nearly the opposite of last week's episode but yeah there was a popular discussion on hack news very recently on this. It's not only popular, it's one of the most popular topics related to database in general. If you check how many times it was, like every backend engineer, I think, experienced backend engineer implemented this on their own. Queue in relational database. Well, it feels to me a little bit like the MySQL versus Postgres thing, right? It's like people comparing,
Starting point is 00:01:32 people kind of having strong opinions one way or the other. Or stored procedures first. Yeah, exactly. It's one of those things that there are good arguments on both sides. Should we even use a relational database as a queue? If so, why? Like when? are good arguments on both sides should we even use a relational database as a queue if if so why like when and if not which of the it doesn't tend to be which of the various queuing specific
Starting point is 00:01:53 services to use it seems to be much more of a holy war between should you use a dedicated queuing tool or can you get away with putting in a relational database. Yeah, there are two big camps here. But I would say you cannot stop people from using ORMs or from using stored procedures. And in my opinion, queuing Postgres is absolutely nice and a good idea if it's implemented correctly and if it's ready to scale, to grow, to handle performance aspects. Yeah. Do you want to talk about why we'd use it versus a dedicated queuing tool? Yeah, let's talk about why.
Starting point is 00:02:31 When we develop something, at some point we need it. We need some queuing mechanism or message delivery system, message bus. There are differences in terminology here, but it doesn't matter. We have some signal, we need to propagate the signal. And we need to process it asynchronously. Asynchronously is the key word here. Asynchronous processing. And latency should be small.
Starting point is 00:02:55 Throughput should be big, large. It should be good. And that's it, right? And in some cases… Maybe like exactly once. Well, yeah. There might be multiple consumers. There might be cases when we want retries,
Starting point is 00:03:11 when we want a reliable delivery of any message and so on. Obviously, many requirements that are set. But the question is, should we start using special tool separate to our relational database such as Postgres? Or we can use Postgres itself, for example.
Starting point is 00:03:31 And you cannot stop people from using Postgres, first of all. I observed, like, I observed it, first of all, I did it myself. And eventually, quite quickly, I switched to PGQ from Sky Tools. It was very good in terms of characteristics, but it required some maintenance.
Starting point is 00:03:50 And you cannot use it on managed Postgres unless it's directly supported because it requires additional worker next to Postgres. You cannot use it on RDS, for example, right? But I liked the idea to be inside database. Due to a number of reasons. We will return to them. Then in my career, like consulting career, a lot of consulting practice and also startup career. So I have two things to notice. One, quite often you come to some database trying to help them work with Postgres and see hotspots. And sometimes, quite often, one or a few hotspots is some table
Starting point is 00:04:27 which is having queue-like workloads. And it experiences a lot of bloat, sudden degradation on performance, and they ask you to help. This is a very, very common request for any consulting DBA with experience, which is a sign that people try to keep it in relational database and have problems. And this is
Starting point is 00:04:46 of course a cons item, right? Because if you do it in a straightforward way, not thinking in advance, you will end up with this hotspot in terms of performance issues. And then, on the other hand, every few weeks on Hacker News and other
Starting point is 00:05:02 places, we see a discussion, so many discussions, like new tool to implement Q in Postgres, or we replaced RabbitMQ just with Postgres. This tool, that tool, for Ruby, for Python, anything, like so many tools. I even remember how I helped like eight years ago or so, I helped, there is a small library called delayed jobs, Ruby guys should know it, it's quite old, I found my clients
Starting point is 00:05:31 using it and complaining like it doesn't handle our just check into the code, check to pgstat statements and understand like two simple actions and it can scale 100x, i even opened pull request in their github but they i think they still didn't merge it saying we need support to
Starting point is 00:05:54 support my sql that's why this like i don't know but i had a lot of thumbs up from people who found how to improve performance of delayed jobs library. And we will talk about how to make it right. But back to the reasons question. Why inside database? Obviously, you don't want to manage yet another system sometimes, right? And then one more thing, it's ACID principles, right? We want... Transactions, yeah.
Starting point is 00:06:21 Exactly. We want automacy. And this is probably one of the brightest reasons here. And we want, of course, backups, like HA characteristics and so on. Crash recovery. Like many, many things like isolation, like many things. And Postgres covers in a very good way many, many requirements we might have. For example, if you work with some system, if you have monolith,
Starting point is 00:06:45 or it's a microservice, and you need something internally, because we need to distinguish situations when we need queuing system inside something, or we need it as a mechanism to communicate between two services, for example, right? But if it's inside, you have transactions, you rely on transactions, of course, you don't want RabbitMQ, for example, or Kafka to have inconsistent data, right? For example, we commit a transaction, but then we started to write event to our external additional tool. Somehow we didn't accept writes, inconsistency, or even worse, we had a transaction. Inside the transaction, we wrote to Kafka, and then we had a rollback. Now we have an event which should not be there.
Starting point is 00:07:30 Again, inconsistency. Very not nice situation. But you know what? People sold this. There are microservice patterns. It's called transactional outbox. Let me explain. It's a very simple idea.
Starting point is 00:07:42 I'm quite sure I implemented myself several times, but people gave it a nice name. Those of our listeners who are backend engineers probably know it, this transactional outbox, you just write it to a special table, this like outbox table. You write a row, you commit, everything is good, right? And then later consumers pull it, this message, and deliver to your messaging system outside of Postgres. So it can be, again, Kafka, RabbitMQ, anything, Sidekick, many systems, right? And Mark has already delivered, right?
Starting point is 00:08:27 It can be in batches somehow, asynchronously. You can speed it up with various ways. But in this case, you have guaranteed delivery and eventually consistent state, which for microservices is quite good. It's a very simple idea. And it works. I do think you're right i do think the acid performance of postgres or you know the strengths of postgres there really is the main reason that
Starting point is 00:08:54 it should be chosen but i think the main reason it is chosen in general is actually the other one i think it's the simplicity angle or not having multiple, like people are familiar with it already, not having to learn a new system, not having to do backups of another system, not having to do ops on another system. It doesn't crash as often as a rabbit and Q does, right? Yeah.
Starting point is 00:09:16 Well, people, I think quite a lot of the time I'm reading these blog posts and at least in the comments, people are suggesting, you know, it isn't that that tool is necessarily bad it's that there's a lot that the people writing the post didn't know about one feature in it or didn't know how to overcome a certain
Starting point is 00:09:33 issue or they were holding the tool wrong so it's it's not necessarily that you can't do those things with those tools it's more that you have to learn them you have to get good at them and it's it's that takes experience and time and why do that if you've already got a perfectly capable tool, even if it's not the perfect tool, perfectly capable one in your back pocket that you already know? Right. Yeah. So the reasons are obvious. And I myself, I work with all systems in both camps. Sometimes there is a mix, actually. But I'm definitely, if I need to choose, I belong to the camp which says it's perfectly okay idea to have Q right inside Postgres. But you need to take care of a few things, just a few things, but I'm tempted to ask you it now. Do you think there's a scale at which you would stop saying that you should use Postgres for queuing?
Starting point is 00:10:30 Or is it as scalable as other solutions? Definitely. We consider a regular Postgres approach with a single primary node. So there is definitely the ceiling when we saturate our CPU or disk with writes. And this write workload cannot be offloaded to standby nodes, so we need multiple writer nodes or multiple primary nodes. In this case, you need to either scale using Postgres in one way or another, probably sharding or something else.
Starting point is 00:11:03 But microservices approach minimizes this need. It postpones it very well. Because if you split vertically to services, you need one of the services not to be able to be split further, and you need it to be so big that you already need sharding. So you need either to scale somehow to have multiple primary nodes, or you need to ready to give up and say, okay, enough. Inside Postgres, we need to offload it. But this moment will be so far from the beginning. If you have, for example, modern Intel,
Starting point is 00:11:38 or AMD, Epic third generation, Intel is like 128 vCPUcpus and any big cloud provider can provide it easily or these epic processors 244 vcpus you can scale your rights really really well you can have dozens of thousands of rights if they are quite well tuned and you're okay per second i mean yeah i was just gonna check for yeah makes it of course per day according to some of the hacker news it's so like you can you can handle billions of writes per day on one note wow yeah it's it's it's definitely possible but you will need to take care of several things i said two actually i have three already items usually i always said like okay yet another article is popping up on Hacker News top.
Starting point is 00:12:28 Like we had it last week once again, right? And my regular checklist for such articles or new tools doesn't use for updates keep locked. And second, that was a huge change. That was a huge improvement, wasn't it? I will explain why no. I'm about to exclude it from my list. Let's discuss it. Let's discuss it. But normally, it's in the list. But in my opinion, it's overrated by blood, you know,
Starting point is 00:12:56 like there are there is a spiral growth, right? When you have experience, you think, wow, this is super important. Then you have more experience, you think, well, it's overrated. It's not that important. There are more important things. So, okay, for updates, keep locked. Let's keep it in the list for a while. Maybe we'll remove it in the end of our discussion. Second is how people address debt topple and bloat issues. Yeah, huge. That's the one big downside, I'd say, of the Postgres. So yeah, obviously locking is important and we should talk about how to avoid...
Starting point is 00:13:35 Yeah, bloat seems to me like the one that not many people talk about and not many of the tools solve for very well. If people don't talk about bloat when talking about queue-like workloads, they do poor job. I mean, in Postgres. This is poor quality material
Starting point is 00:13:51 because this is actually number one problem when you grow. Because how you handle already processed rows, dead apples, like we have high churn system in this case, right? We have something which expires very quickly. We process it. We need to get rid of it.
Starting point is 00:14:08 And Postgres MVCC is very well known to have issues in this area. Datapoll issues or bloat issues as a consequence of datapoll issues. And how you handle it defines your limits, basically. And when you, defines how fast you will start having performance issues in your system. Okay. Number three item is your queries and how efficient they are, how many buffers they touch each time. Right? So if you check explain analyze buffers, we want as low buffer numbers as possible.
Starting point is 00:14:46 Great. Hint, index only scan, right? Yeah. Obviously. I'm going to suggest we talk about bloat first, because I feel like the locking one's going to get a little bit subtle. And I think it'd be nice to have had the bloat discussion already, because I think that's going to play into it and then maybe do the query performance thing as like a tuning thing last what do you think yeah well blow it is the number one question for
Starting point is 00:15:09 me these days at least like if you asked me five years ago i would probably start talking about you see i started with skip locked first yeah but it's already quite advertised so everyone knows about skip lock we will return to it and discuss why it's not number one anymore for me. But why bloat is number one? It's because this is where you will have this hotspot issue. This is exactly when performance will start to degrade very quickly. And, for example, if you... The question is, check your pgstat user tables and check tuple stats.
Starting point is 00:15:43 How many inserts, deletes, updates you have. Of course, a lot of inserts, it's like we have incoming events. But then what do we do with these tuples? Do we update them or just delete or both? And what's the ratio here? Looking at this ratio, we understand our pattern. Right? And each update physically consists of delete and insert. We know it. I'm not tired to keep talking and reminding this. If you say
Starting point is 00:16:13 update my table, set id equals id where id equals 1, you will have new tuple. Even if logically you didn't change anything. Right? So it produces new tuple. Marking logically you didn't change anything, right? So it produces new tuple. Marking all gone as that. As deleted first. Yeah. And worse, may update indexes as well. Well, yes. So it depends. Hot updates are smarter a little bit. Since we didn't actually change the value right it's probably this update
Starting point is 00:16:47 will become hot but it depends on also if we have space in this same page of even if that's an index column that's cool if it does that i didn't realize it did it's okay id column is is primary key probably right it's our surrogate primary key but but we didn't change value. I believe, I'm not 100% confident here, worth checking. Yeah, let's check. But I believe since we actually didn't change the value in this idle update, right? In this case,
Starting point is 00:17:16 hot update might happen. But my main point is that updates in general, we have the right amplification problem as well as just the new tuple in the... Yeah, I might be wrong and hot update might not happen even if we don't change. But even if I'm right, in some cases, we still won't have hot update. For example, if there is no more space in the same page, we need to go to another page. In this case, definitely, we will have right amplification. All indexes that the table has will need to receive these rights and all these rights will
Starting point is 00:17:49 go to wall it will go to backups it will go to replication it's like multi amplification of rights actually all right application that's why the name right and it's it's a lot of overhead so why i'm talking about this because in the queue approach what do we do with our record we probably market like status equals processed or changing from right like or deleted like soft delete like we don't actually we row exists but it's marked as deleted right or like from pending to retry to to process something we have some status or something right or we can just directly delete it right also yeah but updates will produce deleted dead tuples and delete will produce dead tuples and if we go
Starting point is 00:18:38 very fast and we want to go very fast we will end up accumulating a lot of dead tuples and then the question will auto vacuum be able to catch up, will AutoVacuum be able to catch up? With default settings, it won't be able to catch up, right? We will accumulate a lot of dead apples, and then AutoVacuum will delete them, convert them to bloat. AutoVacuum is a converter of dead apples to bloat, right? In most cases. I'm joking, right?
Starting point is 00:19:03 But in most cases, it's so but but in most cases it's so because we already created new pages and if auto vacuum deletes that tuples not in the end of table it cannot truncate this page it will keep maybe whole page will be empty sitting there just contributing to bloat that's it i think this is slightly counterintuitive though because i think some of the auto vacuum settings for example are scale based so for example because we're dealing with a queue table of only probably of only unprocessed rows or i'm assuming you're deleting them you might think that even if it's a bit bloated it's not a big table still but if we're talking about the case where we keep them around or a case where we have a bit of a runaway queue for a while or you know in the case of an issue this can quickly happen and crucially is not undone so each time we get an accumulation
Starting point is 00:19:59 we'll accumulate more in future unless we have a strategy for over time getting rid of that bloat right well yes also don't don't forget that some long transaction might happen or on standby reporting to via hosted by feedback being on and preventing autowacom from deleting the tuples it says i found the tuples but i cannot delete them because probably some transaction still needs it and it's a global problem. So it keeps them and then, again, converts them to bloat. And this is exactly a hot spot I mentioned. This looks like we have 1,000 live rows, but the table size is 10 gigabytes.
Starting point is 00:20:38 You say, what? 10 gigabytes for 1,000 rows. What's happening here? Yeah. You shared with me that there's a really good blog post by Brander about this specific issue. They argue at the end that that's a really good reason to not have this in your main database and have maybe a set foot. That's my interpretation of the argument. Good post with wrong conclusions.
Starting point is 00:20:59 Oh, interesting. Great. Okay. So what can we do instead? We can do like insane, but quite reliable way. You just put vacuum full to cron. Why not? No. It takes a second. It blocks everyone, but inside the second, you have fresh state. Well, I guess it is a queue and therefore what's the worst that happens? Things are
Starting point is 00:21:20 delayed a little bit, but you can't insert into it. Yeah. I must admit i never did it for for serious clients but i consider it as a valid idea in some cases if you need to like stop this fire stop this hot spot from happening just put it as a very fast and like quick and dirty mitigation approach it will help this like a latency spike of one second for for all those who work with this table it's not a big issue compared to constant pain they feel from degradation related to bloat and dead tuples. By the way, dead tuples themselves can be a problem. For example, when you delete in batches, you do everything right, but you have index-only scan. Then you start noticing that index-only scan, even if it deals with the same number of rows, somehow deals with more and more buffers.
Starting point is 00:22:10 Why? Because it scans through all those data tables all the time. And solution to that, by the way, also, like, probably vacuum is one thing, but probably in some cases solution is to – I like independent queries. I'm a big fan of them. Independent queries, they're like find me next thousand of rows to process, process and delete. Or just delete. Sometimes
Starting point is 00:22:31 we just need to delete. Clean up job or something. Like find next thousand, delete. I don't care. I have a reliable way to order RAM. I have indexes. All good. Index only scan. But somehow then more and degrades, degrades over time, and you realize autovacuum cannot catch up, and you just keep scanning
Starting point is 00:22:51 through all those data tuples, and the solution is to introduce context and move away from idempotency, just remember last ID, for example, or something, timestamp, and start from there, you skip all those that tuples i saw yeah the idempotency things have been interesting one i saw somebody discussing using keys and having and then maybe they called it some kind of jitter to make sure that if if for example everyone had skipped so maybe we'll get onto this with the skip locks if for example workers had skipped ahead and there was a job left behind that number or behind that date or like an old job. So you might have to sometimes forget, like one of your workers maybe forgets its latest key every now and again.
Starting point is 00:23:40 Exactly. To go back and fetch some old. Exactly. So I think there's a little bit of extra engineering needed if you go that route. Well, yeah, but at first run, it will scan through all the tuples and then it will keep remembering the position, for example.
Starting point is 00:23:56 It depends. I'm still a big fan of idempotent queries. I like them, I do. But sometimes we need to... We have trade-off here, right? But we didn't discuss the main solution to bloat. Before we do, quickly, Vacuumful, what happens with indexes?
Starting point is 00:24:14 Rebuilt. They're rebuilt, okay. Because it's a small amount of data. We can run pgRepack against table and all its indexes. It will be the same. It will be better, of course. And people do this, actually. But vacuum-full also works.
Starting point is 00:24:29 But there's no point because we have a better solution that you're about to say. Well, the packing is better than vacuum-full, of course, because it will lead to much very brief period of exclusive lock. Right? So very, very brief.
Starting point is 00:24:41 And it's very graceful and so on. Vacuum-full is a big hammer, of course. Like a boom, like, and we are good againful and so on vacuum full is a big hammer of course like a boom like and we are good again so pg repack is a valid solution for those who want to get rid of problem and continue working as nothing happened but if you want to design the system to scale for really good tps transactions per second and with low latency and handle a lot of events per second in a reliable way inside Postgres, you need partitioning, right? You need partitioning and a good lesson can be learned from PGQ from Skype, as I mentioned. It existed almost 20 years ago. Yeah, this is super cool. And I hadn't even really thought this through, but they can't have been using for update skip locked for it
Starting point is 00:25:27 because it didn't exist then. Skip locked was only introduced in 9.5. Well, advisory locks probably already existed. I'm not sure. I don't remember. But we didn't start talking about how to improve throughput. So let's return to skip locked slightly later. I'm talking about bloat and dead
Starting point is 00:25:45 tuple issues right and they had partitioning exactly declarative partitioning didn't exist that time so they used inheritance based partitioning three partitions and rotation right so we use one partition we maintaining another one and one third one is like on idle and maintaining means like we just truncate yeah that's it truncate is fast i was trying to work out why there are three is it that we have one that new rows are going into currently one where we're trying to finish the final jobs in it just you know while we're switching between partitions and then the third one that is always going to be i don't remember i remember i used it and i actually
Starting point is 00:26:25 applied similar approach in different systems because it's very smart approach yeah skype had a requirement to handle billion users so they designed very good system here and it's still alive by the way it lacks good documentation probably but I asked people about queuing Postgres and I was super surprised. I'm checking numbers right now. 105 votes on Twitter. I asked on LinkedIn as well. And 7.6%, so maybe it's like eight people around there, said I'm a PGQ user. So big respect.
Starting point is 00:27:02 PGQ is great still. You need to cook it, course right but it's enterprise level solution if you manage to work with pgq you can work with very large volumes of events per second low latencies you're in good hands but nowadays we have declarative partitioning and for example timescale also helps automate it. And I wonder if there are around people who implemented a good queue system based on timescale. If there are such people around, please let us know. I would like to learn because I have similar situations in my area as well.
Starting point is 00:27:41 So partitioning gives you ability, gives you benefits. We discussed benefits from partitioning, but here most benefits are you can divide and conquer, right? So you have partitions which are already fully processed, you can just drop them. Get rid of them. And that's important because
Starting point is 00:27:57 whatever bloat they've accumulated during the time that they during the time that they're exactly that's all now gone. Right. So instead of fighting with Bloat, you just forget about it, get rid of it, and that's it. But of course, you need to design it in a way
Starting point is 00:28:12 that it's not like we work with all partitions all the time. So you need to have some partitions which are already fully processed. So you need to consider them as done, and you just drop them. Or you can implement some rotation with truncate so you truncate actually recreates file and you have fresh state again like no zero rows indexes are empty you start from scratch zero bloat so this is ultimate solution to bloat just drop get rid of it fully inside table.
Starting point is 00:28:46 And partitions is a physical table, so we can do it. And also, of course, it gives you data locality for a lot of benefits. So you can keep old partitions for quite long, even if they accumulated some bloat and so on. New partitions don't see them and they don't suffer from issues all partitions have yeah but the main benefit is very very simple and easy bloat management right right cool so right now skip four updates keep locked yes very cool feature but it's needed when you when one worker is not enough right in some cases it's enough That's a really good point. Lots of cases. It's related to third question.
Starting point is 00:29:27 Let's talk about, I'm trying to downvote the importance of skip locked and I do it by purpose. Let's discuss query performance. We've already touched this, a lot of dead tuple accumulated case, right? But in general, the advice is try to have index-only scans. Here, auto-vacuum should be tuned again
Starting point is 00:29:49 because we want heap fetches to be as low as possible, ideally zero all the time. In practice, it's usually non-zero, but we don't want like index-only scan. It's good when heap fetches is close to zero because in this case, Postgres executor doesn't need to consult heap. good when it's like when hip fetches is close to zero because in this case postgres
Starting point is 00:30:07 executor doesn't need to consult hip if it does need to consult hip
Starting point is 00:30:11 it's similar to index scan already right so degraded index only scan
Starting point is 00:30:17 is index scan right so to have real index only scan we need to keep
Starting point is 00:30:23 autovacuum quite aggressive to keep visibility maps up to date, to know which pages are marked all visible. So if they are marked all visible, index-only scan can avoid consulting to heap and deal only with index. And to achieve that, sometimes we need to use covered indexes, so include additional columns. It depends. Yeah, or even in later versions of Postgres, I don't think there's that much benefit of the includes.
Starting point is 00:30:54 I think if anything, there might even be benefit of not using it because of the deduplication. Oh, yeah. Advice is to consider Postgres 14 or 15. Yeah. 14 is great. 13 was Yeah. 14 is great. 13 was good. 13 is great already. B3 deduplication is great.
Starting point is 00:31:11 It helps with bloat as well. I think it only deduplicates on ordered columns, though, not on includes, for example. There are several things there. Okay, never mind. It's a minor point, but index-only scans. So is this for the update query? Like which queries?
Starting point is 00:31:28 For selects. For queries like find next item to process. Or multiple items. So these kind of queries. And of course, for updates or deletes, if we deal with updates, we want to try to make them hot. Hip-only tuple updates. To do that, sometimes I see people consider reducing fill factor even for tables,
Starting point is 00:31:53 which is 100 by default, so they reduce it for like 50. So let's have room in each. We have artificial bloat initially, like 50%. I don't mind it, yeah. But our updates are faster. If we set status equals processed, most likely it goes to the same page. No index amplification is good.
Starting point is 00:32:15 And you probably then can't or don't want to index your status column, for example. Right, right. Well, yeah, it depends. But yeah, we need to be careful. Sometimes we try to be careful. Sometimes we try to avoid additional index because we know we will be updating for status, for example, right? Yeah.
Starting point is 00:32:31 And we know like, okay, how many rows with these parameters will have status, like multiple statuses? Maybe no, let's not have status in indexes, for example. And in this case, but because if you have status in indexes, in any of indexes, even if in where clause in partial index, and then you update status to something, of course, this update cannot be hot anymore. So that's it. So it's full-fledged update having all right amplification. Right. amplification. Right? So, speaking of query performance related to our workload, we need to, like, we probably
Starting point is 00:33:06 can divide it into two parts. First, the task of purely select how to find next row to process
Starting point is 00:33:14 or batch. Right? Index only scan should be there. And second, write itself. It can be update,
Starting point is 00:33:22 it can be delete. Well, insert also interesting, but i hope insert is quite fast usually maybe also worth checking but usually it's like number three item probably so update it's good to have hot update delete what can we do with delete i don't know with delete we just mark it as a process as a dead tuple that's it so sometimes also people like to have returning to yeah to
Starting point is 00:33:48 send the result to application code also handy sometimes people like to have multiple stages and write cte combining maybe updates delete something, in single query. And this reduces round-trip time. So in one query, you can do multiple things all together in single query from application code. But overall, you can reach, like, it's good if your selects are much less than one millisecond, even if you have a lot of rows. Our queue table usually doesn't have a lot of rows, right? Especially if it's
Starting point is 00:34:25 partitioned and so on. And for write operations, one or a few milliseconds is a good performance, usually, right? If it's more than 10 milliseconds, I would doubt it's good, right? Well, especially because we're talking about updating only a single row in general. Well, yeah, yeah, yeah. So it feels like we need to talk about the... It's time to talk about skip lock. okay now we see for example our right operations take two milliseconds it means that we cannot process when when we have single worker latency and throughput are very well connected right it's like you have a second, two milliseconds, 50 operations per second. This is our throughput.
Starting point is 00:35:08 If it's not enough, okay, let's have multiple workers. We might have multiple workers due to some different reasons sometimes, right? Because just one worker does this, another worker does that. I don't know. But if it's for for throughput we must understand that they might conflict they will conflict eventually right and if one like we cannot update same row at the same time twice right so if one is updating until the very end of transaction commit or rollback exclusive log will be on this row, right? And another worker will be just waiting on it. It can be seen in wait events analysis very well.
Starting point is 00:35:53 In this case, we can use for update no wait or for update skip log. No wait will just fail. Like you fail fast and retry, startup approach, right? So, or skip log. Give me next row or a few rows. Yeah. Batching also interesting topic, by the way. But let's, we don't have time for it, but yeah.
Starting point is 00:36:15 So if you handle that apples and bloat very well, if you have tuned queries, and you know that your queries will be always quite fast, you probably don't need multiple workers. But of course, if you need multiple workers, go ahead and use skip locked. But remember, if you, for example, use Django or something, sometimes sub-transactions might pop up. And for updates, skip locked. if you have subtransactions, you might have issues on replicas. So go and read my article about subtransactions.
Starting point is 00:36:51 There are dangers there. So regularly select for update is okay. But combined with subtransactions, it's not fun. Good point. But in general, it's a good thing. By the way, someone on twitter mentioned also the problem when uh the executor needs to scan over a lot of rows which are already locked it's a similar problem as we discussed with dead tuples right so we have a lot of logs rows already locked
Starting point is 00:37:16 and trying to find unlocked ones we have like degraded performance as well right but i think it's kind of edge case, maybe. I think so. You'd have to be doing things in batches and maybe have lots of workers before I could see that becoming a big issue. But yeah, definitely can see how it would happen at scale. Yeah.
Starting point is 00:37:38 So that's it. And if you have multiple workers, you definitely can achieve many, many thousands of transactions per second on good hardware, of course, and maybe dozens of thousands of transactions per second. That should be enough for most systems already these days, right? 99.99%, I think. Well, there are cases when we need more, but this is already good enough. So summary is bloat is our
Starting point is 00:38:05 main enemy here, data plan bloat and partitioning. Consider partitioning, yeah. With proper rotation or partitioning strategy, right? Yeah. Second is check, explain, analyze buffers and
Starting point is 00:38:21 have good plans for your queries. And finally for updates, keep locked. And if you see some tool which addresses all of these things, surprise, only PGQ does it. And I'm not even sure they use the latter. So many attempts to implement it, reimplement it. And when I say, guys, what about bloat? To do item. So if you see, if you find the tool which addresses all these three things, I don't care about language.
Starting point is 00:38:52 Please, tell me. I will advertise it, yes, because I see dozens of attempts to implement queue in Postgres, but these guys fail in these three areas. They sometimes implement keep locked, for example. It's very well advertised, as I mentioned, but they fail to do the bloat part,
Starting point is 00:39:11 or some indexes are not well organized. Wonderful. Well, thanks so much, Nicolai. Thanks, everyone, for listening. I do hope we helped some queue builders, queue in Postgres builders, because we need better tools. Or maybe some people can resurrect PGQ in terms of, I don't know, like provided as a service. Why not? Some managed Postgres providers. It would be good if some managed Postgres,
Starting point is 00:39:35 I know they'd have to install something else, but I think it would be popular. If it's just one click. The demon, right? Anyway, thanks everybody. Thanks Nikolai. See you next week. Bye-bye.
Starting point is 00:39:46 Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.