Postgres FM - Slow count

Episode Date: September 13, 2024

Nikolay and Michael discuss why counting can be slow in Postgres, and what the options are for counting things quickly at scale. Here are some links to things they mentioned:Aggregate functi...ons (docs) https://www.postgresql.org/docs/current/functions-aggregate.htmlPostgREST https://github.com/PostgREST/postgrest Get rid of count by default in PostgREST https://github.com/PostgREST/postgrest/issues/273 Faster PostgreSQL Counting (by Joe Nelson on the Citus blog) https://www.citusdata.com/blog/2016/10/12/count-performance Our episode on Index-Only Scans https://postgres.fm/episodes/index-only-scansPostgres HyperLogLog https://github.com/citusdata/postgresql-hllOur episode on Row estimates https://postgres.fm/episodes/row-estimates Our episode about dangers of NULLs https://postgres.fm/episodes/nulls-the-good-the-bad-the-ugly-and-the-unknown Aggregate expressions, including FILTER https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATESSpread writes for counter cache (tip from Tobias Petry) https://x.com/tobias_petry/status/1475870220422107137pg_ivm extension (Incremental View Maintenance) https://github.com/sraoss/pg_ivm pg_duckdb announcement https://motherduck.com/blog/pg_duckdb-postgresql-extension-for-duckdb-motherduckOur episode on Queues in Postgres https://postgres.fm/episodes/queues-in-postgresOur episode on Real-time analytics https://postgres.fm/episodes/real-time-analyticsClickHouse acquired PeerDB https://clickhouse.com/blog/clickhouse-acquires-peerdb-to-boost-real-time-analytics-with-postgres-cdc-integrationTimescale Continuous Aggregates https://www.timescale.com/blog/materialized-views-the-timescale-wayTimescale editions https://docs.timescale.com/about/latest/timescaledb-editionsLoose indexscan https://wiki.postgresql.org/wiki/Loose_indexscan~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, hello, this is Postgres FM, podcast about Postgres, as you can guess. My name is Nikolai from Postgres.ai, and as usual, my co-host, not guest, co-host is Michael from PgMaster. Hi, Michael. Hello, Nikolai. So you chose a topic, and it is... Slow counts, or like why is count slow and what can we do about it? How to make count slow? You just can we do about it? How do I count slow?
Starting point is 00:00:26 You just count. So yeah, let's discuss. I don't know. For me, it's a super old topic. And I'm not sure I will use some fresh knowledge here. I have a lot of fresh knowledge about Postgres. And I share constantly. But this is something which is super old. There are improvements we can discuss in like index-only scan.
Starting point is 00:00:49 We talked about them recently a lot, but I guess my role will be to use my knowledge from, you know, like 15 years ago, mostly. And just before we started this recording, I shared with you how I a little bit helped Postgres to get rid of count, right? Because sometimes you don't need it. So I'm going to stay in that area 10 years ago. And if we have something new, I will be just putting more to-do items to my to-do list, right? To explore and that's it. Maybe you know some things better than me.
Starting point is 00:01:26 Let's see. Yeah, well, the Postgres link is great because the resource I still like to point people to on this topic, the best thing I've ever found on it, is a blog post by the Postgres author, Joe Nelson, on the Citus blog. So I'll link that up for people that I think explains it in a lot of depth, but in 2016 so quite a long time ago in you know eight years at the point of recording and i was kind of surprised how many things have changed but also how little has changed in a way like there are optimizations a lot of things that were kind of little gotchas back then aren't true anymore.
Starting point is 00:02:07 There are a bunch of ways that it's faster now by default. But the essence of it is still true. Like because of the way Postgres does MVCC, when we do an application. Because it's row store. Yeah. Because it's row store. Because it's row store, yeah. Basically, because all the things that make it great as a transactional database count against it, and I pun not intended, when it comes to aggregating across a lot of data. And count is, I think, at least in my experience, that's the most common aggregation people want to do in typical use cases. Yeah, that's a good point.
Starting point is 00:02:50 Since Postgres is a row store, it's better. Imagine you have a table of users, for example, as many SaaS systems or e-commerce systems do, like customers, users. Since Postgres is mostly like OTP, a row store, OTP-focused database system, it stores a tuple, a row. If it's not too wide, it stores it in one place, in one page, basically, right? As we discussed many times, a page is eight kilobytes by default in most cases. And
Starting point is 00:03:21 very roughly, if a row doesn't exceed two kilobytes, it will be in the same page, and that's it. It was not the mechanism called Toast won't be involved. If it exceeds it, it will be shrinked to pieces and stored in an additional table called Toast table. So imagine we have a user's table, and then users can log in. So we need to retrieve information about this user. And it's good because all information about a single user is stored in the same page, right?
Starting point is 00:03:54 If it's narrow, right, as I said. It means that it's quick to retrieve. It's just one buffer hit or read. Yes, extremely efficient for like, if we wanted to show the profile of that user, if we wanted their name, their age, you know, loads of information about them, single page read to return that.
Starting point is 00:04:14 Incredible. Maybe like a couple, once you look them up in the index, then the hit page. So it's very, it's very, are you optimized basically? Efficient. IO optimized, basically. Efficient? Efficient, exactly.
Starting point is 00:04:31 But if it comes to counting or calculating some, like sum or average, minimum, maximum, everything, aggregates, it's not good because usually aggregates deal with only a single column or a few columns, and we don't need all the other columns this table has. If we have a row store, as Postgres is, in this case, it means that we need to do much more I.O., many more buffer operations, hits and reads, to get information to calculate aggregates for a single column. For example, count them.
Starting point is 00:05:10 And it means if you want, for example, very, very, very common ask in startups, in social media, in SaaS, in e-commerce, let's just show how many new users we have every day. We need to count daily. And this means, imagine if we have only a few records in an eight kilobyte page, few users records, few users tuples, because we have many columns, right? It means one buffer operation will allow us to count only like four users, six users, and that's it. Compared to column store where each column is stored in separate files, and if this file is about ID, it's only ID, right? If it's about creation time, it's only creation time, date time, right? In this case, counting is much more efficient because one page can bring us many more users and we can count them much better, right? This is quite straightforward and very simple, trivial comparison of ColumnStore and RowStore and why in ColumnStore aggregates and counts specifically are much faster than in RowStore just because of IO, right?
Starting point is 00:06:37 Yes. I think because column stores are used for these aggregation and analytics use cases so much, I think they also tend to have more optimizations around these aggregations as well than at least the old school transactional processing databases have. I think lines started getting blurred a little bit with some of the extensions, like some of the newer extensions in the postgres world that also have some of these optimizations but yeah i think that's that's a great starting point as some of postgres's strengths working against it i think another one that is not just row store versus column store is because of how postgres does visibility i think there's if you look at um some other transactional process databases because they don't do like the the same um marking pages as not because they do undo logs basically they can also cheat a little bit for certain counts like if you want to count all of the users, which if you have some specific use cases,
Starting point is 00:07:49 like let's count all of the records in this table, they cheat and do some of the solutions that we're going to discuss. They put new row version in the same place as old row version. It means that compactness, if we have already compactness of storage, it's not lost. Compared to Postgres, where new version comes to new place, maybe different page, in many cases, it's a different page. Maybe it's a place which we're taken by another tuple, which became dead and then vacuumed, deleted by a vacuum, right, or auto-vacuum.
Starting point is 00:08:28 It means that we might end up having very scattered storage for our tuples, especially if we need to count only specific tuples. We have the where clause in our select count. It means that we need to, or we might have index scan for this, but it's kind of random access, right? We need to count many tuples which are stored in random places, and we don't use partitioning. Our table is one terabyte in size.
Starting point is 00:08:57 It's super bad case. And H update randomizes the storage, right? New version is stored in random locations, and it becomes worse and worse and worse. It's very scattered and we lose compactness. This is how MVCC in Postgres works, unfortunately. We oftentimes say it's more like rollback focused than commit focused because in case of rollback, new version becomes basically dead eventually, right?
Starting point is 00:09:26 And cleaned up. And old version remains in the same place and we don't lose compactness. If we have commit, we move rows all the time. We move tuples, physical versions of rows all the time and we lose compactness eventually, right? And this is bad for count bad for other aggregates as well yeah so i think we've i think that's like a good overall summary of why count is slow like or at least why it scales with yeah or what i think while it's it's not bad on tiny data sets it's fine right it's you probably you probably won't notice it too much
Starting point is 00:10:05 but as data scales even in the blog post i was reading they only did a data set of a million rows and already even like the fastest optimizations at the time could only get an accurate count in about 100 200 milliseconds something like that so we're already talking about like 10x what you would consider a fast query so even at a million records which is not many in many use cases we're talking about a potentially a meaningful length of time so yeah i think should we move on to what we can do let me share my this history of this blog post i i might somehow provoke that blog post john nelson wrote a little bit and i'm not taking full responsibility for that blog post definitely but i remember he
Starting point is 00:10:52 published it in 2016 right and i started using it all the time like i shared like if you want to understand count just read that blog post of course right now I would say it's missing buffers in the explain plans for sure. Because as we just discussed, I.O. is reason number one of why things are slow, right? We should not talk about timing only. We should talk about buffers as well all the time. But back one year earlier in 2015, I started looking at Postgres, which right now is one of the main building blocks Superbase has. And I was impressed, like, great, we can ditch Django and Rails applications completely in simple, at least in simple cases, and do everything on server side
Starting point is 00:11:42 and front end. This is actually the same time React became popular. I think it started in 2013, 14, and so on. So it skyrocketed at the time in 2015. And a lot of business logic went to front end, of course. Also React Native. Great. So we need to get rid of middleware, keep data-oriented business logic and database, keep UI-oriented business logic on client side.
Starting point is 00:12:09 And this is great. So I found Postgres and I quickly realized I came from already quite heavily loaded systems. I remember, for example, first time I saw Integer 4 primary key exhausted. It was in my own project in 2008. So I already realized some data volumes and how they work in Postgres. And I quickly found that in Postgres they had already some
Starting point is 00:12:33 pagination, very good. But on each get request, you say, I want only 25 items on my page, but they also show you how many overall items you have for this request with all your filters and so on and i thought that's that's not good we i know postgres has slow count right if i every time they counted so it's written in haskell so i started digging inside
Starting point is 00:13:00 source code and i found indeed it runs count all the time so I helped them to get rid of it by default so I remember I created issue let's get rid of count by default and they quickly agreed and and removed it so I'm happy I helped a little bit to realize that count should not be used by default and also at the same time john nelson the the original creator of postgres started to dig into this topic and found several alternative ways how to make count fast including approximate method right i i like this method it doesn't work all the time it works very well if you have auto vacuum well tuned so it runs auto analyze frequently right because you need fresh statistics and it also works if you don't play too much with filtering so for example if
Starting point is 00:13:52 you have users table and you know to display overall count of rows in this table you can use just approximate approach which like in few, you just run explain without analyze, you just run explain, select star from users, right? And the planner will tell you how many rows it expects. And this is already approximate number you can use. This is all trick I first learned maybe in 2004 or five when I started with posgus we used it i think yeah i think this trick comes into its own for things that are harder to find out from the statistics or from other methods i think like for example if you have dynamic filters that users can set or if you basically if you have if you need to be able to set arbitrary where clauses or you want to be able to join relations and get estimates then, this is an extremely powerful estimation tool because you're basically using Postgres' planner to do it.
Starting point is 00:14:57 But you need to make sure it's up to date most of the time. That's not even enough though because we've had plenty of episodes where we've talked about times where the planner's estimations can be way off. Yeah, just have two columns, two filters, and don't have multi-column statistics. That's it. Exactly.
Starting point is 00:15:16 It's super easy way. Yeah, or in two different tables. Yeah, drawing. So, yeah. We've talked about the the limitations of that before but in the in the real world i've seen very few i mean it's probably a limitation of my experience rather than because i've seen how popular these estimation tools are i've seen very few cases where people would rather have an estimated count than no count at all in lots
Starting point is 00:15:42 of user-facing applications i've seen people want an exact count or they don't want to show it and that there's i cannot agree i cannot agree for example where do you see it well for example i had this we have a social network for example and we just want to display the total number of registered users and the total number of i don't know like groups created or anything created posts comments on the front page and uh it's just overall number uh definitely this approach will work there very well and we round it we don't show well next idea let's round it, but also show some, if we know current number and one week ago number, so we can understand the velocity, the speed of growth, and we can also pretend it's growing right now.
Starting point is 00:16:34 It's like fake, but this is what social media do all the time. But this is kind of my point that you wouldn't use the actual, so the number, when it's an estimate, could go down. Like even if it's like a number of, like in that case, you wouldn't want to show a number this week that was lower than the number last week. Right. You're right. So there are, you wouldn't, you still wouldn't use the exact, like,
Starting point is 00:16:59 so to be more specific, there's some really cool algorithms. There's a really popular extension called Hyper really popular extension called Postgres Hyperlog Log. Count Distinct. Yes, so it's for estimating count distinct. You're right. So more specific than just count. But it's available on pretty much every cloud provider, which makes me think it must be pretty popular.
Starting point is 00:17:25 It must have been requested on a bunch of things. Not necessarily. Things work in life differently. It might be just, you know, we don't know how popular it is overall. It might be just cool to have it at some point. You know, sometimes people have high high expectations and also in case of managed postgres services is in any areas of business any segments of market if some big player added it
Starting point is 00:17:55 others also added this is or even if like let's say only one big customer needs it or wants it if they're big enough maybe you add it yeah just also might happen yeah so yeah good point but i do think this whole category of estimated counts is super interesting and that feels like one big solution that you can say look your count is slow one option is estimating and i think it's really interesting intellectually, and I'd love to hear from people that do use it in production for real things. But very often I see people say, okay, that's cool, but what are the other options?
Starting point is 00:18:35 What else can I do? Let's finish with when to apply this proximate approach. I think if you need count of rows in one table, it's easy, like no filtering, it's definitely an option. But you need to make sure auto vacuum is configured to trigger, like auto vacuum analyze scale factor is configured properly, so statistics is up to date most of the time, I would say. But also, if you have filters, these filters should be like single column, no joins. In this case, it's good, right? In some cases, it can be not good as well. For example, if you need daily, it might be off completely. Or if your project already is old, you have many years of data, daily might be also challenging because we know by default Postgres keeps only statistics for 100 buckets. Oh, interesting.
Starting point is 00:19:31 Statistics. Yeah, I see where you're coming from. But we can raise it. We can raise it for specific tables, even specific column. And this can help as well in this case. If we know we do daily or monthly counts, we can change this setting and have more precise statistics and still use approximate approach. Make sense?
Starting point is 00:19:59 Yeah. I think it might even be simpler. It might be that most people don't end up going this route because they actually find they can live with the trade-offs of the other approaches. It feels like a last resort estimation. If you've got a slow count, maybe you're thinking, well, there are ways of making it faster,
Starting point is 00:20:21 but they come with certain trade-offs. So maybe we should discuss those first and then come back at the end to say if you can't live with those trade-offs yeah then you have to them and you still want to display something or show something for me it's vice versa actually account is super approximate count is super easy to implement i would consider it first it does it's like would it work well enough in our case or we exclude it completely? That's it. Because it's super easy to implement. But let's move on.
Starting point is 00:20:49 We already started talking about HyperLogLog, right? This is for distinct count and this extension is indeed available almost everywhere. So if you have count distinct, why not consider it, right? Because it might be very good, fast, and so on. And Joe Nelson in this old article explains it, right? So what I would do, I would consider it in this case. I would do some benchmarks with current data volumes, but also 10x, for example, predicting future growth with focus on IO.
Starting point is 00:21:28 I definitely would look at buffers numbers in the plans and understand how like what about scalability here if we scale 2x 5x 10x will still io numbers be fine and leading to acceptable latencies or no this is the question right but this is we're still in estimation right yeah but it's more like it's smarter already right yeah true smarter than just relying on postgres statistics right yeah true and next like i know the article also just like it's interesting that we discuss the article which is nine years old. Or how many? Nine, right? Well, I actually, I did, as part of my prep for this, I did run, conveniently they provided code snippets for each step.
Starting point is 00:22:16 And I did a few changes, but it was very, very easy for me to run this on Postgres 17, actually, just because I was doing some testing. What changed? It was very, very easy for me to run this on Postgres 17, actually, just because I was doing some testing. What changed? Well, so the first thing I noticed was his plans didn't even have parallel. Like, bear in mind, we're talking about doing a count over a million rows. These days, you just expect to see the first query plan have a gather and… Three workers by default, basically. Yeah, exactly.
Starting point is 00:22:43 Exactly. And I didn't. And i was like wait so his blog post was pre-postgres having parallel workers and that was it means roughly it should be by default three times faster because aggregation like append how it's called it's it's easy in this case it's just sum of numbers that's it right yeah count should parallelize really well yeah i because i don't know his hardware i could like and i didn't go back and like try and get a really old version of postgres i didn't i didn't actually actually i probably could have just tested by turning off parallelization but yeah probably about three times faster good this is a good point yeah yeah that was the first thing i noticed and then there were a couple like there were a couple of other differences in the nitty-gritty but not nothing like fundamental nothing like too big so most of
Starting point is 00:23:32 the points still stood really well even though as i said it was like a lot's changed but also not that much has changed so yeah another one was in the past you had to do a workaround for... You needed to use a sub-query trick for count distinct to get an index-only scan. So instead of being able to do count distinct on column and get an index-only scan, you had to do a select within that. So you had to do select distinct to get the index-only scan. Don't need to anymore.
Starting point is 00:24:02 That was quite cool to see. Yeah, that's a good point as well. But yeah, like a bunch of the, so actually index only scans, we should probably talk about that while we're here. Yeah, we just had the episode about it a month ago. So it's very related to, ideal case for Postgres in its row store situation,
Starting point is 00:24:23 it's index only scan with low heap fetches, ideally zero, which means, again, that leads to the need to configure to vacuum to make it work more frequently and move on faster. In this case, if it's on the index-only scan, this is the best we can do with Postgres, parallel index-only scan maybe, as you you said and this is good I mean this is the best thing you can see in plans index only scan with low heap fetches the only better situation would mean you need to start denormalization caching and so on and it's like I would consider it yes heavy solution to the problem, right? Because this would give you a few buffer operations
Starting point is 00:25:12 to retrieve your count. If you have fully denormalized, maybe for example, materialized view, I would not recommend using materialized view in complex projects at all, default materialized view. But consider we have it already and we have proper index on it finding proper count would be just finding maybe one row that's it one value it's like primary key lookup right or maybe just index lookup for to find one record that's it it's super easy but to achieve that first of all it requires effort
Starting point is 00:25:47 you need probably triggers or a synchronous trigger somehow and second of all if it's not materialized to you you need to have something probably with desire to support partial refresh because default materialized to you supports only full refresh which is not efficient in most cases and can lead to bloat itself bloat inside materialized view so i've seen people implement versions of this and use off the shelf things that are kind of well in postgres extensions for this as well so this is actually So that's the approach I've seen most commonly used for this because it doesn't scale with volume. I think it's really attractive because even an index-only scan,
Starting point is 00:26:33 if you double the data size, you're going to roughly double the time it takes to do the count. So you're going to keep bumping into this problem down the line. Well, yeah, you're going to keep bumping into this problem down the line whereas well yeah you're right if you double yeah if even with the index only scan we still need to if we for example if we forget about structure completely talk about only about leaf nodes if we have two times more values to store we need two times more pages right two times more leaf nodes and uh it translates to two times more iu operations and of course and even if it's not two times it's still like uh it still degrades
Starting point is 00:27:16 over time and not better than these counter caches not better than two times it can be even worse but uh not better because if you need uh to to keep references to two times more tuples, you do need more space in leaf nodes, right? Two times more leaf nodes, at least, maybe more actually, if we have a balancing situation and, again, not compact storage of tuples and so on. Yeah. So on the index-only scan only scan actually i think it's worth
Starting point is 00:27:46 before we move on from that we're thinking about some because i think that goes back to the beginning of the conversation you talked about row store versus column store if we have an index only scan on only a single relatively small like let's say a primary key for example that we can use to count it's ordered as well so it can help us with count and count distinct. It's almost column store-like in a way, because if we only have to search the index, we're searching like a single column effectively. So we're almost getting that column store benefit
Starting point is 00:28:20 without some of the aggregation tricks that some of these columns do. Leaf nodes are double linked in both directions. So we avoid traversing of whole tree. We just locate the beginning of the first value, and then we just go scan leaves only. And that's great. I mean, you're right. If it's index only scan, we don't care about other columns.
Starting point is 00:28:44 But interesting point that we need to make sure we select count like select count star will probably choose some index for most likely primary key right but if you count some column which allows nulls like we had a very long ago we had episode about dangers of nulls and And this is one of the dangers because count, if it's null, it doesn't count. Count doesn't count nulls, right? This thing might be surprising in some cases, even for experienced folks. I surprise myself with this once per couple of years, at least. I see this so often in kind of philosophical, like people talking from a computer science perspective, people asking like, what's faster, count star or count one or count ID or thing. But I've never in practice seen that's it. In many cases, I count with additional filtering.
Starting point is 00:29:46 You know this. It's already not new. It's like 20 years old, but you know filter extension. Like you say, count star. This is our overall count. And then instead of doing these very heavy structures in SQL, like case when, blah, blah, when else, I don't like them. So instead of that, you just say, count, star or column filter, and then in parentheses where
Starting point is 00:30:13 and this gives you opportunity to count many things in one go, right? You just have one scan. Yeah, yeah, yeah. So it's filtering on the fly if it suits your filter conditions then it will increment your additional counter so you have multiple results in one select multiple numbers returned right that's that's great thing and in there sometimes i use colon understanding that mouse won't be counted in this approach. In this approach. Makes sense. For example, I want to understand how many people fill this value, again, users table.
Starting point is 00:30:53 And we have, for example, Facebook ID. It's null if Facebook was not connected. In this case, you say count star means overall number users registered daily for example and then you say count facebook id counting only those people who connected their facebook accounts make sense right yeah in this case you don't need even don't need filter so it's kind of just maybe syntax sugar right yeah i have seen an argument that star is misleading because it it doesn't mean what it means in other contexts and maybe we'd have been better off if it was like count and then parentheses but nothing in the parentheses would have i actually agree with this
Starting point is 00:31:34 because if we if you propagate this logic about nulls we and we should star should mean none of columns are null and this is always so all of the columns are null. And this is always so. All of the columns are not null. Yeah. So let's talk about this dynamoized approach. I think this is most powerful, but requires effort. Well, it depends. I think there's some really simple approaches that have some big trade-offs.
Starting point is 00:32:01 And then there's some more complicated approaches that have fewer performance trade-offs, then there's some more complicated approaches that have like fewer performance trade-offs but they were more complex to set up like for example i think adding a trigger that like let's say you you know what you want to count and you're able to maybe you're doing a bunch of reads but not many rights to this table and it's already quite big so you're okay to pay a little bit of overhead on each right in order to maintain just literally a counter many times but you know what will happen right under load we okay yeah if you have especially if you have foreign key as i remember there are issues because multi-exact id and contention. And imagine you have count.
Starting point is 00:32:45 You know this popular anti-pattern many people do in the beginning of their career when they try to develop some accounting part of their systems. Debit and credit. Yeah, yeah. And they have common account or something, like counting all the money flow. And this requires updates to single row in some table. Any transaction, financial transaction, triggers this update synchronously.
Starting point is 00:33:16 And this becomes quickly a hotspot, obviously, because if you have two SQL transactions, already SQL transactions, doing some financial transactions, by triggering, they try to update a single row, they compete. Like, I mean, this one will be blocked until another finishes with commit or rollback, right? And this doesn't scale at all.
Starting point is 00:33:39 Well, I've seen solutions to that. Like, I've seen somebody, I think it might have even been Tobias Petri, it might have been one of his Twitter tips. All you do is you add more. If you split... A band-only table, right? No, no. So you have, say, 10, keep it small, counts.
Starting point is 00:33:59 And each write updates one of those, but you don't know which one. At random, it updates one of them. And then we want to count everything. You have to count all 10 values. We get batches, basically, right? So, yeah, it removes the hot, like it spreads the heat. Yeah, this is one approach. Second approach, if you, for example, have some, well, it's not easy, I would say.
Starting point is 00:34:22 And batching is already a good thing, but it's also like, okay, you basically reduce contention, but you don't eliminate it. Full elimination would require a synchronous trigger. All I meant is it's quite simple at low scale. It only gets complex at higher scales. Yeah, I also did it. Let's increment by 10 all the time or by 100 in jumps, but it becomes already not real-time.
Starting point is 00:34:49 It's okay. But ideally, in heavily loaded systems, it should be asynchronous. So normal writes don't trigger immediately. I mean, they don't lead to increments at all, and they just register event, something happened, right? Or they just are there, and we will retrieve them by SELECTs, and that's it. Yeah, synchronously, we do processing with batches as well. The favorite thing I've seen for this, other than maybe some proprietary solutions, is simple roll-up tables.
Starting point is 00:35:23 So maintaining a count of older stuff in bigger batches. So you might do yearly count, and then a monthly count, then a daily count, and then anything since that day, you go through all of the data. But it means that over time, you can roll things up into bigger aggregates. And I think that, to to me scales really well yes but like so the main approach like naive simple approach let's do synchronous updates and we will immediately
Starting point is 00:35:55 have hotspot it's bad then okay asynchronous approach there are two two like sub approaches. One is pull, like every minute we pull, look, oh, do we have some new data? If we have it in one batch, we issue update every minute. It's okay approach actually. And reduction of it, it's actually regular materialized view and you just refresh it fully from time to time. Again, it's super simple relatively. And it lacks partial updates. This is probably very annoying. So materialized view is a very rough approach, regular materialized view. If it's PG-IVM or how it's called, it should be better. I haven't tested it myself.
Starting point is 00:36:42 And I needed to confess. I had big plans to test hydra right uh this new thing uh with dark db right pg dark debuts yeah this pgm i have so many things on my plate to test for maybe already not months but years but unfortunately i must confess confess customers with managed postgres with their own needs occupying our time lately most of the time because we like try to to do things which don't require extensions rds cloud sql and others don't have i i wanted to mention PGQ. PGQ is present in Cloud SQL, right? It's interesting. So Cloud SQL
Starting point is 00:37:29 users can use it and benefit from it because it's very old and a very well battle tested solution from Skype. So imagine if we have some things we can register in PGQ, it will take care of scalability because it has partitioning inside
Starting point is 00:37:45 and so on and then we just uh it's like push events push events and then consumers of those events they increment count but also maybe in batches and so on so like yeah so what if you know just one update to increment by some number right and this improves things a lot and makes the system more resilient because legs will be lower. It can be not PgQ, but something like Kafka or I don't know, Sidekick or what you have,
Starting point is 00:38:16 like Celery, if you're a Python guy, right? But with this asynchronous nature, it's really good and nobody likes actually delays like one hour because you start with one minute you say okay it's acceptable but system grows and then you say okay we need to reduce frequency to like once per five minutes now but it then grows again once per 10 minutes now but if you have asynchronous push-like through asynchronous
Starting point is 00:38:46 message queues or something like event processing system, in this case, you can build good asynchronous kind of trigger, right? I wish Postgres had something internally and every managed service provider supports it. So all our customers already had something. But right now, unfortunately, it requires effort and you need to build with things you use sometimes outside postgres like sidekick or kafka or something yeah i think we've mentioned pretty much everything from like vanilla postgres without extensions at the moment but there are a bunch of other interesting projects. I agree. I think the DuckDB stuff, especially that cross-company initiative, seems super interesting.
Starting point is 00:39:30 But there's at least a couple of projects at the moment pushing some of these analytical queries to a column-oriented file. But we need to keep in mind that this also brings additional challenges. Again, this means you need denormalization because you have original table as source of truth and you need to bring data to extra table or something and maintain this relationship all the time so it's consistent and not lagging too much, right?
Starting point is 00:40:01 Yeah, but I think the optimizations they add might then be worth it in a lot of cases. Like the column store and then the vectorization. There are so many benefits if you're willing to pay that upfront cost. You can also do more like, I would say it's already traditional architecture when you have analytical system. In addition to Postgres, it can be, I don't know, it can be even ClickHouse. It can be Vertica if it's old or these days Snowflake. And we know
Starting point is 00:40:32 PeerDB was recently acquired by ClickHouse. ClickHouse. And it means, imagine we have our OTP storage with users table and then through logical replication it goes to click house for example and there you can do any counts with good speed right any analytical you can move analytical
Starting point is 00:40:54 workloads which are maybe even user facing to there right it's it's also worth considering but it moves us outside of postgres, and I don't like it personally. To stay kind of within Postgres, I think the work the Citus team did was super interesting because I think they were largely used for these large analytical workloads where they used the fact they had sharded to multiple nodes to let you parallelize some of these queries. So it gained from some of that extra parallelization.
Starting point is 00:41:30 And then the other company doing some interesting stuff in this space, I think it's Timescale, who've added some optimizations, added some functions. But I think... Continuous aggregates is a great thing, yeah. Yes, and continuous aggregates means that they've done a lot of that heavy lifting. If you have access to Timescale, if you're self-hosted or on their cloud, then you don't have to do all the complicated stuff.
Starting point is 00:41:56 And if you have not only access to Timescale, but to proper Timescale, because when some company says we support Timescale, but it's only, how is it called, like community edition? Apache. Yeah, I get confused. I think they actually, community edition is the one that you can't use. It's the Apache 2 one that you can use on other clouds. Yeah, some clouds support timescale, but only this reduced option.
Starting point is 00:42:22 Yeah. But full-fledged timescale is a great thing to have. Definitely compression plus continuous aggregates. We have it in a couple of places. It's so great. And this is it. It's flavor of Postgres, which I do like, but it's not available if you're already deep inside RDS or so.
Starting point is 00:42:42 You need to consider migration and so on. Timescale cloud or self-hosted. Exactly. And then the last thing to bring us back to the topic we talked about last week is they have an implementation of loose index scan for Count Distinct. You can use your own implementation of loose index scan.
Starting point is 00:42:59 Like in Wiki, right? With the cursive. Yeah. So that was pretty much everything I had. Yeah. So I think we covered basics. If you want to deal with billions of rows,
Starting point is 00:43:13 with hundreds of thousands of transactions per second, it's not an easy topic, actually. Thanks so much, Nikolai. Catch you next week. Yeah. Bye. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.