Postgres FM - Partitioning by ULID

Episode Date: December 22, 2023

Nikolay and Michael discuss partitioning by ULID — revisiting some of the old UUID vs bigint key debate in light of some new UUID specs, and how these can be used for partitioning (by time)....  Here are some links to things they mentioned:Nikolay’s episode on UUID (for primary keys) https://postgres.fm/episodes/uuid Postgres TV hacking session with Andrey and Kirk: https://www.youtube.com/watch?v=YPq_hiOE-N8UUIDv7 patch https://commitfest.postgresql.org/43/4388/ Use ULID Idempotency Keys (tip 6 in this blog post from Shopify) https://shopify.engineering/building-resilient-payment-systems Nikolay’s howto post on UUID v7 and partitioning with TimescaleDB https://gitlab.com/postgres-ai/postgresql-consulting/postgres-howtos/-/blob/main/0065_uuid_v7_and_partitioning_timescaledb.md Hussein Nasser’s YouTube video about the above https://www.youtube.com/watch?v=f53-Iw_5ucA UUID proposal that includes UUIDv7 and UUIDv8 (work in progress): https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/history/  James Blackwood-Sewell advice on more elegant solution using TimescaleDB https://twitter.com/jamessewell/status/1730125437903450129 ULIDs and Stripe IDs (section of Identity Crisis blog post by Brandur) https://brandur.org/nanoglyphs/026-ids#ulids Crunchy Bridge changed their default random_page_cost to 1.1 🙌 https://twitter.com/brandur/status/1720477470116422028 UUIDs vs serials for keys (newer post from Christophe Pettus) https://thebuild.com/blog/2023/02/16/uuids-vs-serials-for-keys/ ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello and welcome to PostgresFM, a weekly show about all things PostgresQL. I am Michael, founder of PgMaster. This is my co-host Nikolai, founder of PostgresAI. Hey Nikolai, what are we talking about today? Let's continue discussion about UUIDs. I chose this topic because I recently implemented something myself and also because some of my customers raised this again. So I think many people still have questions. myself and also because some of my customers raised this again.
Starting point is 00:00:29 So I think many people still have questions. I had it was at the solo recording about you. OK, it was so so light as well. But obviously I have fresh things to discuss and you also have something to discuss. So let's let's return to this topic and be more practical this time and think about really, really large data sets and how to work with them more efficiently. It means partitioning. UID plus partitioning.
Starting point is 00:00:58 Yeah. So that's what you looked into this. Yeah. Well, let's talk about only partitioning. We are not sharding experts, so let's do about only partitioning we are not sharing experts so let's do only about partitioning but I would say actually I think it's important to mention
Starting point is 00:01:13 that one of the big benefits of UUIDs in general is that they do they suit a setup where you're generating IDs in multiple places so it is a natural fit in terms of topic. So basically you ditch sequences. I think this lets you.
Starting point is 00:01:33 I think some of the advancements we're seeing in UUIDs, some of the new implementations, some of the new standards, mean that we don't have the drawbacks that typically come up in discussions of big serial versus uuid or ulid as as they're often called if they're sortable so yeah you were looking into this never ever integer for primary keys never yeah hopefully people know that now you were looking into it for partitioning recently then so what what was the use case so use case was our chatbot a new ai chatbot we develop and we needed to have a storage
Starting point is 00:02:14 for all communication which happens in public channel because we have paid version we have free version and free version obviously is recorded and shareable so basically we created chats table and i originally used big int primary keys and then i was ashamed by my team members saying that come on we need to use timescale there first of all and also like numbers and in URL is not looking great. We need some better ideas to have there. Is this because they would be guessable, or is it a security thing or something else? Good question.
Starting point is 00:02:56 I have a long history about this. So I remember my first social networks, we tried to hide the actual number of posts and users we have. But then competitor was released and they don't care about it. They didn't care about it. And then I also started not to care about it. It's okay to have some numbers in URL and share actual numbers with everyone.
Starting point is 00:03:20 But here I just think maybe it will be maybe shared data set in the future so i don't want to have sequences and also maybe i don't want to share actual numbers i don't know like it's one of things where you're not fully sure so i i'm kind of okay to share numbers right so for example we generate 50 to 100 messages per day with this bot. People are using. We have 72 people join the program. Many more waiting in wait list.
Starting point is 00:03:57 It's not a lot at all. But obviously, it will explode soon. I think so. And maybe I don't want competitors to see actual numbers. maybe i like i i don't know 50 50 here because i had experience in both cases i had experience sharing numbers i hadn't experience hiding numbers and we used two prime numbers to have not actual random but kind of random if you take sequence and multiply it by a very large integer and then take a modulo with another integer and these two integers should be mutually prime.
Starting point is 00:04:34 In this case, you have rotational. Without collisions, you have kind of random numbers but they are not actually random. And nobody knows your actual ID number. So this is how you can hide the growth rates and actual number of your business metrics and so on. Like how many posts or comments or anything you have in your social network,
Starting point is 00:05:00 for example. Again, it's not my main reason. I just think, OK, we need to use UID probably. And then the team said, we also need to partition it. We already use TimescaleDB. Let's just use it. And then I think, okay, obviously, I want my indexes behave well. If I use regular UID version 4, for example, supported by Postgres currently, I know we will have performance issues because numbers are going to very different random places of the B3 index. Not purely random, but any insert disturbs some arbitrary part of index. And this is not good for performance, especially also if you want to show last 25
Starting point is 00:05:46 items for example order by time, creation time you will deal with a lot of more buffers than you should deal with and a lot more for right-ahead logging right because your full page images
Starting point is 00:06:02 for different parts of the index each time instead of... Well, B-trees are very optimized for adding things to the end, to the right side continuously, like timestamps. And UUIDs don't fit that pattern at all. But that takes us on to, I guess,
Starting point is 00:06:18 some of the newer... Right. And since we also had a session on Postgres TV YouTube channel with Kirk and Andrej, I think it's Kirk who brought this topic, let's implement ULID in Postgres. And using Andrej's hands, we usually use Andrej's hands because he's a true hacker in those sessions. And he implemented quite quickly some patch, sent it to hackers,
Starting point is 00:06:46 and quite soon we realized there is already an ongoing proposal of RFC change to support version 7, version 8 of UUID. And Andrey renamed the patch from UUID to UUID version 7. And then it was blocked by the idea, let's wait until the standard is finalized and fully approved. Right? You read this thread.
Starting point is 00:07:18 There's also idea, let's just avoid naming version 7 completely. Well, I thought that was a really neat solution. So ultimately what we want in, well, what you want, what I would like, I think what lots of people would like is a function within Postgres to generate things that can be put in a column of type UUID, but that when we generate new ones based on a timestamp, they are sorted and increased over time. So that comes with loads of benefits for updating B-trees and some of the other benefits that you mentioned already. Now, there are some new definitions of
Starting point is 00:08:02 new specifications for UUIDs coming down the pipe that would suit those perfectly but somebody made a really good point i think it was andres foreign from microsoft and one of the first replies that andre got was we could just name the function along the lines of what it does rather than uuid v7 so it could i can't remember the exact name but they came up with something very sensible, I'm sure. And yeah, that seemed really smart to me because then even if the spec changes for v7, it doesn't matter.
Starting point is 00:08:33 We can implement those later or have different functions for those, but this one could still be useful to users earlier before the spec finalizes, before everything else gets agreed. So I like that as a suggestion, but I think it either got lost or people are very happy waiting another year.
Starting point is 00:08:49 Well, it's just because it looks like RFC makes good progress and it should be finalized. It's still not finalized. I've checked it a couple of days ago. It's not finalized. But let's explain once again what we want. We want to take good things from both worlds. Growing integers, right?
Starting point is 00:09:11 Which if you order them actually completely... If you have, for example, a regular approach, you have ID and created at. Created at is 16 bytes and ID is 8 bytes. If you use 4 bytes, you will still use 8 bytes because of alignment painting, we know it. So, and if you just insert and created at is just now and you insert just one row per transaction or you have clock timestamp instead of now. In this case, you can insert many rows. So the order of IDs and created at values will be the same. You can order by ID desk limit 25,
Starting point is 00:09:56 or you can order by created at desk limit 25. Same. And this is good. This is a good benefit, and it's also good for performance because of locality of data. Rows created inside the same second go probably in one or two pages only. It's packed, not distributed sparsely.
Starting point is 00:10:21 And UID is good because it doesn't require any sequences. You can generate it using thousands of nodes. This is the name, UID, universally unique ID. But they go into different places of B3, and you cannot order by them. You cannot order by them. This is the main problem. In version 4 of the specification. Yeah, I'm talking about current situation.
Starting point is 00:10:47 And also with RFC, it's still not finalized, so it's also current situation if you don't use drafts. And this is a problem. And we have good blog posts from Shopify. It's MySQL, but it's very good to explain the economical benefits from switching to a different type of UID, newer type of UID. And also, Hussein Nasser, sorry, I pronounced his name wrong, on YouTube explaining very well for backend engineers this Shopify case and why a better version of UID is really better. He explains it very well. Very, very well. His
Starting point is 00:11:25 explanation is... Anyone can understand this now, right? Maybe better than I just explained, right? Let's link it up. I actually pulled out a quote from that Shopify blog post. It said, in one high-throughput system, they saw a 50%
Starting point is 00:11:42 decrease in insert statement duration by switching from UUID v4 to ULID for some of these pieces. And you can order by them. If you want 25 latest posts, you need to order by, right? Otherwise...
Starting point is 00:11:56 Even if you don't need that, there could be significant benefits because of the VB trees. Let's slowly move to partitioning. If you don't have an orderable ID and you rely on created at column, or you can have index on created at, you can order by using index scan on that. Okay, it's okay. Not perfect situation because you will have many more buffer numbers in the plan, right? Because you fetch from random places. But if you want to partition such table,
Starting point is 00:12:29 in URL you have UUID, right? How to find which partition to deal with? You could do like hash partitioning. Well, yeah, yeah, yeah. But I want to partition by time, I forgot to say. I want to partition by time because older data is used less often. And maybe you want to phase it out over time. Maybe you don't even...
Starting point is 00:12:51 You might want to eventually drop... Exactly. Or move to different table space. It's not a super popular solution, of course. Also, there are some ideas to be able to move it to object storage, which is only supported in timescale cloud. I think one day it should be possible with regular vanilla Postgres. I think I read about another provider doing it recently too.
Starting point is 00:13:16 Interesting. Send me a link. I'm very interested in learning about it. I think moving older partitions, time-based partitions, like our old data, like two years old, moving to object storage, eventually moving towards bottomless Postgres, right? It would be great. But back to our question. In URL, you have blah, blah, blah.
Starting point is 00:13:35 In our case, postgres.ai slash chats slash some UID. And we need to understand which partition to deal with. In our case, it's TimescaleDB. How to tell TimescaleDB which partition it is? If it's regular UID, I think it's a nightmare. Well, I saw the update and
Starting point is 00:13:54 the reply you got from James Sewell at Timescale. A couple of guys replied from Timescale company. I really appreciate it, but it was about already this sortable UID version 7. This is already solved. This recipe I have in this Postgres Marathon.
Starting point is 00:14:14 It's recorded. And I implemented it my own way. It was not efficient. And they just provided very good advice how to do it much more elegantly. And I have it covered in my small article so your your lid it's similar to your idea version seven or eight your lid it's it's uh actually it should be you you lid because it's universally uh unique like lexographically or something. Sortable. Yeah, probably like UULSID. But I like, I admire
Starting point is 00:14:49 the people that came up with ULID. It's nice and simple and unique. I guess we are about to abandon this abbreviation at all and just use UID version 7 or 8 because of the standard. Maybe or maybe in the future we'll just say UUID
Starting point is 00:15:05 because there might not be that many benefits to using the random ones. Yeah, sure, you get extra uniqueness, like you get more bits assigned to the uniqueness part, but we already have so many. I was looking it up for version 7 and I think the spec currently says 48 bits for the timestamp. So if you're thinking about in terms of bits and bytes.
Starting point is 00:15:27 16 bytes, yeah. Yeah, so 16 bytes or 128 bits. 48 of those are reserved for the timestamp component. And there's a little bit about to specify the version, then 74 bits for the randomness. Now, naturally, in a UUID v4, you get all 128 bits for randomness. So there is less randomness possible in these ones by definition. There may be collisions, right?
Starting point is 00:15:53 Or just a slightly higher percentage chance of it, right? Like, it's not, we're still talking about, especially if you're talking about in your case the chance of collisions is just basically zero which is why these are useful yeah even if we will have like 100 messages per second we will be fine yeah exactly and we can generate them on any server basically
Starting point is 00:16:16 well clock how clocks are set will be a question of course but yeah this needs to be UTC, right? So this version 7 or ULD or version 8, version 7 and 8, they distinguish in precision, right? Only, right?
Starting point is 00:16:36 That was my understanding. The idea is let's take timestamp, let's generate regular UID and then, or something like that, and then produce a new UID, the prefix of which will be not prefix first bytes of which will be corresponding to timestamp so they will be sortable right if you you can order by and with very very high confidence you can say this create a timestamp. And actually, the interesting thing. So so the
Starting point is 00:17:07 Postgres is on pause, right waiting for RFC to be finalized. I mean, I actually start started to doubt that Postgres 17 will get it. This idea to change name and just provide something not depending on the RFC is good but I guess consensus was maybe not consensus but what I saw from these Postgres scale hackers mailing list, people decided to wait
Starting point is 00:17:34 until the RFC and it's not fully clear I checked the status, it's still like waiting for some reviews but it's not fully clear when it will be finalized so I guess progress 17 will not have it my experience with these things is it depends if a couple of people get excited about this if a if somebody ends up needing it somebody commits it as i get a good sorry if
Starting point is 00:17:57 somebody puts a patch together that can generally be agreed on and somebody else is willing to review it it could get in but it it depends on a couple of people at least having the um energy and uh well i just remember peter eisenstraut's comment that we should probably wait for standard and peter eisenstraut is very close to some standards you know like including sql standards so it's a very important comment. My impression is that we're just waiting for RFC. But the good news is that you can generate it on any application code yourself. There are many, many, many ULIDs or UID version 7. If RFC is not yet finalized, they're called already version 7. You can do it. And you can find on GitHub a lot of stuff.
Starting point is 00:18:45 But you can also generate it using p lpg squl yeah and additionally like and uh danielle verity i hope i pronounced like sorry guys if i pronounce names wrong showed some proposal how to generate it using p lpg scale and then i, I'm just looking at the code. I'm saying, we don't need actually PLPG scale here. We can use it with regular SQL function. And then he answers, yes, just scroll down. The same GitHub discussion. It was GitHub gist.
Starting point is 00:19:20 I see SQL function. It's quite simple. And I just started to use it. So I used SQL function. It's quite simple. I just started to use it. So I used that function. And I just generate UID of the seven using small SQL function. That's it. But then the very big question, do we really need to create a timestamp?
Starting point is 00:19:42 If we have timestamp already present in this UID value, which is 18 bytes, and we don't need a very good precision. Maybe we can extract timestamp back. And the answer is yes. I saw Andrey's comment in PGL Hackers saying that the authors of the RFC proposal don't encourage it, extraction of UID version 7 values, extraction of timestamp.
Starting point is 00:20:04 But for partitioning, extraction of timestamp. But for partitioning, we actually need it. And also we need it if we want to just drop our created at column, because why? We have created at right here. I'm not sure we do need it for partitioning. Like, maybe in timescale, the way you define a hypertable, but for example, if I set up a range partition using a ULID, and I could tell it, or maybe in the context of my particular recipe with TimescaleDB. And TimescaleDB, of course, would like to have timestamp. And I created some recipe which looked quite weird, but it worked.
Starting point is 00:20:54 But then a couple of Timescale guys provided good advice how to use it much better. Just saying time partitioning func option when you create a hypertable, timescale DB hypertable. And that's it. And this func function is our function which converts UID version 7 to timestamps. And I strongly believe this is a very helpful function to have. So we need to have it. Even if RFC authors think we don't need it,
Starting point is 00:21:24 I think we need it. I hear you. We cannot have it. Even if RFC authors think we don't need it, I think we need it. I hear you. We can have ranges. Yes. But having this function is super helpful because I can reconstruct timestamp even with some not perfect precision. Timestamps are 16 bytes.
Starting point is 00:21:41 UID version 7 is 16 bytes. We know not everything on those 16 bytes is related to timestamp. You said how many bytes, I don't remember. More than half is random, yeah. Yes, so we lose precision. But I don't need milliseconds sometimes. That's a good point. That's a good point.
Starting point is 00:22:02 I'm looking at the v7 spec i suspect v8 with more precision yeah if you need better precision use v8 and have better precision paying some extra storage costs and memory of course i don't think it's i don't think it i think it's still uuid format right so I don't think you do pay those extra precision. But I think you get less randomness. Ah, less randomness. Well, interesting. Fewer bits for the random job.
Starting point is 00:22:34 Well, I need to look at it then. Yes, I only played with V7 and I decided to use it, but maybe I should look at V8 and understand trade-offs. Yeah, well, good to understand at least. Yeah, yeah, interesting. So anyway, you can take UID V7 right now, generate its own application on using this
Starting point is 00:22:53 SQL function, quite simple, and then join this with timescale and have partitioning provided by timescale DB, which is fully automatic, very good, and that's great, right? And it's sortable and it's quite efficient yes it's 16 bytes versus eight but for timestamps we also had a 16 bytes and i decided to not to create create it at all yeah you said do we still need created that and then
Starting point is 00:23:20 you answered yes but you did but i thought you meant the other way around. So you mean now we don't need a creator that at all? Yeah, we can extract it and that's it. So I don't need super precision in my case. It's just some messages and I'm fine with even second precision. So yeah, it's a good question which precision I get for extracting from this UAD version. Well, there's an argument. If you don't care about the spec,
Starting point is 00:23:49 you could invent your own version that only goes down to a second precision and get even more random. You could play with that a little bit. And adjust the function and feed it to timescale and so on. This recipe is already showing all internals, right? It's pure SQL. So we don't need to...
Starting point is 00:24:08 We can use it in any place, in any Postgres flavor, RDS, Aurora, anywhere right now. So that's great. Of course, on RDS, you don't have timescale DB, right? But you have a new sharding Aurora
Starting point is 00:24:22 just released like a month ago or when. So yeah, I don't know. Like, I'm excited to see us shifting from regular numbers to these IDs, actually. Yeah. So it's good to see some numbers. Let's provide some links to blog posts for those folks who want to explore overhead. Well, I think there's an interesting post by Brander.
Starting point is 00:24:48 Exactly, this is one of them, yeah. Nice. But they also mentioned a couple of downsides of ULIDs, which I hadn't considered that I think is worth mentioning. And, well, let's talk about them quickly. One is that, well, there's naturally less randomness because we're taking up some of it with timestamps. We've talked about that one already.
Starting point is 00:25:08 But the second one is you could end up producing a lopsided index. So if you have deleted data, we won't reuse that space, like sequential IDs. But in other UUID implementations, because they're more random, you would reuse that space like sequential ids but in other uuid implementations because they're more random you would reuse that space so we've got kind of a different bloat problem except if we partition right if we're partitioning and if we've got index maintenance if we're dropping our petitions over time we'll naturally get rid of that but it's a good i thought it was a really good point that there are some downsides to this always increasing idea as well but they they mentioned at the end of the blog post that they expected to go into the investigation very much on the side of using big serial or big ints everywhere and actually they ended up thinking these ulids or they called them performance-aware or intelligent UUIDs, which I thought was quite
Starting point is 00:26:06 a nice phrase. Was their favorite approach now? I'm very sorry. You said in the end of his blog post, and I see pictures of the walk. I don't know how it's related at all. No, no.
Starting point is 00:26:22 A bit above that. I've got distracted first of all very good title of the blog post identity crisis so ID crisis but yeah
Starting point is 00:26:35 these pictures of San Francisco Bay Area and walk down to Pacifica yeah I know
Starting point is 00:26:40 Brando already heard one of our episodes and highlighted my phrase on Twitter when I was saying, very good blog post, but completely wrong. Something like this. Not completely, right. Okay. And I was right, actually.
Starting point is 00:26:58 Speaking from experience. And now I must say, you should not put very good pictures of 101 highway because it's super distracting yeah i actually remember that quote i think it was i think the words you used for something along the lines of great post wrong conclusion ah yes exactly right right and then yeah that must be i think there must be a second one as well, because I think a crunch data credit to them. They listened to the one where we were talking about random, random page cost and how it was still for, but they did some benchmarking and reduced it,
Starting point is 00:27:34 I think to 1.1, which is a much more like SSD friendly number. So maybe they listened to it. And also, yeah. And also I also learned from this section, which was full of, I also learned that 1.0 probably is not better than 1.1, which is interesting because they had numbers. Yeah, they had some numbers proving that 1.1 better than 1.0.
Starting point is 00:27:58 So this is super interesting. Next time I touch this area, I will pay attention to their numbers. Nice. Yeah, that's good. Were there any other posts or things that you've seen that you wanted to draw people's attention to? I don't know. There are many materials about it. I don't remember particular ones.
Starting point is 00:28:16 Let's just put something to show notes. It's an interesting topic. I think very important for maybe everyone. Yeah. There's one, there's an outdated one by Christoph Pettus that I thought was quite good, but then a more recent one by them as well, suggesting that we think about it in two steps.
Starting point is 00:28:36 The age old UUID versus serial question. They suggested thinking, firstly, should our keys be random or should they be sequential that's a like very good point and it's do you want them to be guessable do you want to be able to infer like we actually we didn't talk about this but if you're using ulids and someone can tell their ulids they also get a little bit of extra information they can tell when this idea was created is that a problem for you or not like they these are the questions you need to ask yourself so that's number one should they be random or should they be sequential?
Starting point is 00:29:07 Do you want to hide it, right? Yeah. And then the second one is, should they be 64 bits or should they be larger than that? And that's a second separate question. But I think, Christoph, I want them to be shorter, not larger. Okay. Sorry, bits. Did I say bytes? I don't remember.
Starting point is 00:29:34 Anyway, the point is how much randomness, like how much entropy do you need? Collision risks, basically. Yeah, but these two questions, they're separate questions. And Christoph made a really good point that often they get conflated in these arguments. People that argue for UUIDs are arguing along one of these questions. And people that argue for Big Ints are just completely ignoring that question and going down to completely different questions. So I think asking both of those questions is important before picking your ID. We could handle collisions maybe, I don't know,
Starting point is 00:30:07 like maybe in certain conflict and just adjusting last bytes slightly on conflict, right? Yeah, well, I'm not sure. I'm just thinking like how to flood. But yeah, in terms of likelihood of collisions, I actually haven't done the math to how many, like how unlikely they are. But I think they're pretty minuscule likelihood,
Starting point is 00:30:30 at least in most of the definitions. It depends on the volumes you're trying to ingest in your database. Yeah, of course. But anyway, I think next time we build a new system, we need to think, should we use these integer 8 surrogate keys at all, or maybe it's time for UID version 7, version 8, with partitioning. Actually, my response to my team was,
Starting point is 00:30:56 partitioning? Come on, we know how to handle billion rows easily in one physical table. Let's just grow it until billion. And they said, no, no, no. Let's do it. It's so easy. Let's just do it in a physical table. Let's just grow it until billion. And they said, no, no, no, let's do it. It's so easy. Let's just do it in a normal way.
Starting point is 00:31:10 Okay, okay, we have partitioning. We have UID version 7. Modern approach, you know. But not chunk. Last question from me. What chunks interval did you go for? Or what partition size? Oh, it's a good question.
Starting point is 00:31:25 I actually need to check because I remember for development, I think I used, for experimentation, I used one hour chunks, very small, like tiny. I saw even smaller in your test setup. Well, maybe a minute just for testing. But eventually I think it will be, I don't know, days or weeks. But it will be smaller chunks. TimescaleDB is good with small chunks.
Starting point is 00:31:46 A lot of small chunks. Oh, really? Not months. Well, maybe months, actually. It depends, actually. It depends. It's a good question. I need to reconsider it. This I need to finalize, but we can adjust it over time as well, I think. Yeah, I wouldn't be surprised if you start off
Starting point is 00:32:02 much longer and then go smaller and smaller. Makes sense. It depends. If you want to test it in advance how it works with a big number of partitions, chunks of hypertable, you probably want to go smaller and find problems earlier. It's a problem when you want to go deeper, right? But if you're building a normal system, of course, it makes sense.
Starting point is 00:32:26 I think the rule of thumb is, it's not about timescale. The rule of thumb is, except timescale, partitioning for all tables which exceed 100 gigabytes, maybe actually 10 gigabytes. And this rule of thumb, so to speak, was raised by several folks. I listen to them very well. Their words matter to me in a lot of sense. So why 100 gigabytes, right?
Starting point is 00:32:56 Why 100 gigabytes? It's just like, okay, it's some empirical rule based on, for example, Alexander Kukushkin said, based on Zalando experience, just 100 gigabytes. But then I realized I can build some theoretical basis and find better threshold. So theoretical basis is how many transaction ID
Starting point is 00:33:16 consuming transactions per second you have. For example, 100, right? So how fast you move consuming seeds, right? And then how long does it take for Rotovacuum to process a single table? And also how long does it take to create an index on the largest tables? Because when you create index, you hold a spin horizon. Yeah. you hold a spin horizon. If it takes many hours, you have, for example,
Starting point is 00:33:48 five hours to build an index on a five-terabyte table. For example, it's very arbitrary numbers. During which, AutoVacuum cannot delete data tuples from any table in your cluster, in your database. It means that if you have a lot of seed growth during
Starting point is 00:34:04 this, you insert a lot of seed growth during this, you insert a lot of... So you can start comparing these numbers and understand, okay, I need not to go more than like 100,000 seeds to be spent during index creation. And if you split your physical table to smaller physical tables, partitions or chunks in sense of timescale, index creation becomes faster.
Starting point is 00:34:30 And auto-vacuum is blocked for a smaller period of time. But a couple of other things, a couple of things I've heard being used for this are size of cache, like how many chunks do you want to be in memory like if you
Starting point is 00:34:47 if your if your memory is smaller than your most recent chunk size that might not be optimal for performance it might you might be better off with smaller chunks some of like some of the more recent ones more likely to be fully in memory. I understand that. And then another one. But it's very rough reasoning, I would say. Yes, but it's the other direction, right? It's like one of them is encouraging you to do it earlier and earlier, and this one is – actually, no, that one is in the same direction, isn't it? It's like earlier is better.
Starting point is 00:35:21 But you can go too far. Like you could have too many – like let's say you want to regularly query the last day, but that's going to involve querying 24 partitions. We've already talked in the past about problems when you query too many relations at the same time. Yes, planning time and execution time and log manager contention. A lot of stuff happens if you have a lot of partitions, and especially if each partition has a lot of indexes. So, yeah, many, many things to...
Starting point is 00:35:49 There's a trade-off, obviously, here, but timescale is very good with large number of partitions. Okay. Yeah, so we can go with daily partitions. It's okay. And we have cases, my colleagues have observed some cases where we have dozens of terabytes of data with daily partitions, very small, maybe not even daily, I don't remember details, maybe some like eight hours partitions, ingesting like a lot of per second.
Starting point is 00:36:18 Did they change the... Remember when we talked about the lock manager issues when you don't have partition pruning and lots of indexes on each partition? Presumably that hard limit is still there in timescale as well. 16. Yeah. So you say it's really good with lots of partitions,
Starting point is 00:36:37 but if you're querying too many of them at the same time, like if we're doing monthly reports against 31 31 or however many days we're going to bump into those limits right definitely but a monthly report is just a single query the problem is when you have thousands of such queries per second then they start competing and the lock manager lightweight lock contention is happening if it's some not frequent... So second question to check is how many QPS you have. Yeah, yeah. So if it's like dashboards or something. If it's monthly reports, you'll probably find, but...
Starting point is 00:37:14 Nobody says we cannot have more than 15 indexes on a table achieving 16, this fast path logging threshold. It's okay to exceed it. The only problem when you need to exceed it a thousand times per second or more. Cool. Right. Any last words on this one? No, I think it was good.
Starting point is 00:37:40 A lot of things to understand and play with. And I encourage folks to look at these new versions of UID, if not yet. Does it matter where we generate it, on application or on database? I think the only risk is time zones. Clocks can be off. Yeah, true. If they're off by seconds, then it will be going to different database pages. Yeah, it won't be elsewhere. And also, order will be broken.
Starting point is 00:38:11 Yeah, so I guess it matters a bit. Yeah, I prefer Genetit on Postgres database side, but interesting question, pros and cons of database versus application side. Anyway, I think Postgres eventually will have
Starting point is 00:38:26 UID version 7. Not sure for 17, but it should be for 18 at least, only a couple of years of waiting, but we don't need to wait. And we know now how to use both this new type of UID and partitioning, timescaleDB or any other. Good? Nice one. Well, thank you, Nikolai, and catch you next week. Thank you, bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.