Postgres FM - UUID

Episode Date: June 23, 2023

Lonely Nikolay discusses the performance aspects of using UUID for primary keys. Here are links to a few things I mentioned: "postgresql" posts on HN, most popular last week (Algolia search...): https://hn.algolia.com/?dateRange=pastWeek&page=0&prefix=true&query=postgresql&sort=byPopularity&type=storyUnexpected downsides of UUID keys in PostgreSQL (a post by Ants Aasma, Cybertec): https://www.cybertec-postgresql.com/en/unexpected-downsides-of-uuid-keys-in-postgresql/HN discussion of that post: https://news.ycombinator.com/item?id=36429986Additional math by me (converting to bytes): https://twitter.com/samokhvalov/status/1671962111092850689Updated RFC4122 (proposal): https://github.com/ietf-wg-uuidrev/rfc4122bisStatus of that proposal: https://datatracker.ietf.org/doc/draft-ietf-uuidrev-rfc4122bis/history/Patch UUID v7 (commitfest record): https://commitfest.postgresql.org/43/4388/Postgres hacking with Andrey and Kirk: https://www.youtube.com/watch?v=YPq_hiOE-N8 (where that patch was developed)~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!If you would like to share this episode, here's a good link (and thank you!)~~~Postgres FM is brought to you by:Nikolay Samokhvalov, founder of Postgres.aiMichael Christofides, founder of pgMustardWith special thanks to:Jessie Draws for the amazing artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, welcome to PostgresFM episode number 51. And today I'm alone and first time I do it online. I mean, I did many things online on PostgresTV. If you follow just PostgresFM podcast, I encourage you checking out YouTube channel PostgresTV as well. Since we publish their podcast episodes regularly, like weekly, and this is actually week number 51. It's insane.
Starting point is 00:00:29 Next week we will have anniversary. But also we have other sessions. Sometimes we develop something or we invite some guests and so on. But, well, 51 is insane. It feels insane to have each week one episode, and this is great pace. But unfortunately, my co-host Michael is right now on vacations, and it's very hard for me to choose topic,
Starting point is 00:00:56 to coordinate everything. For me, it's easier to do it online because I don't need to redo it and so on. But I apologize in advance for some um and um sounds because usually Michael edits it very well. I don't know how it will sound on podcast, but I will do my best. So choosing topic today was hard, as I said. But I went to Hiker News.
Starting point is 00:01:25 I'm using usually Algolia search. I will provide the link below. So I'm using Algolia search and I checked just Postgres or PostgresQL. Unfortunately, this search considers these words not synonyms. And this week was actually hot in terms of Postgres topics on Hacker News. Many topics made it on the front page, but the most popular one
Starting point is 00:01:57 was about threads versus processes. Maybe you know Heike after PGCon started this discussion in PGSQL Hackers mailing list, and this topic became very popular. More than 100 emails in the mailing list itself, and on Hacker News, more than 300, I think 400 maybe, comments. Of course, levels of discussion, deepness or how to say that, depth of discussion are very different. But anyway, interesting to hear a lot of opinions.
Starting point is 00:02:34 But I'm not going to discuss it today, maybe another time. And next, number two topic was about upgrades, major upgrades for Postgres using logical replication, involving logical replication to achieve zero or near to zero downtime. This topic is very interesting and I have a lot of things to say and to share, but maybe later
Starting point is 00:03:02 because there is some ongoing work which is not done yet. Maybe later this year we will share something which will be very interesting. So I chose the third topic. The third topic was related to a blog post on CyberTech blog by Ans Asma and it was about UUID. It's quite a well-known problem with traditional UUID, and it's related to performance. But this post was, as usual in cybertech blog, it was quite concrete in terms of practical examples and numbers and so on. So it's worth discussing.
Starting point is 00:03:47 But let me start from some history and from some distance. When I co-founded my first social network and chose Postgres for the first time, I started it on using MySQL, but it took me just one week to realize that I'm not going to stay on MySQL because it was a terrible time. When MySQL was version three and so on and so on, not following what we learned at the university. So I quickly chose Postgres and converted very quickly. And we had interesting, it was 2005 or so, very long ago. And we decided that we need to hide real numbers from our competitors. And we wanted to, you know, like when you decide to use surrogate key,
Starting point is 00:04:41 you probably see, you probably let your users see current number. For example, you assign it sequentially using, back that time, we didn't have generators. We had only serial or big serial data types, sequences, right? And any user, when a user is registered, is assigned, we have a number assigned and the user can guess how many other users already registered. And if this is your competitor, they probably will start checking you every week and they will see the growth, like rates of your growth, and not only about users, but also some blog posts, comments, everything. And we decided that we want to hide it.
Starting point is 00:05:32 And an interesting trick was chosen. It was not my idea, but I implemented it quickly using simple defaults and sequences. So what we did, we wanted some random numbers, but not real random numbers. So if you take two mutually prime numbers and just multiply sequence value by one of them, and then use model operation with another number, you have like looking like like kind of random numbers but it will not
Starting point is 00:06:10 be really random numbers and it will be extremely hard even if probably not possible to understand not knowing these two mutually prime numbers to understand what is what is happening so it will look like some random numbers. Of course one of these two mutually prime numbers should be very large, very big number. So you have like a ring or a circle and you're walking on this circle. So what we did, it was great. It was working great. We used this approach for all our surrogate primary keys. And that's it. So it was a single primary. Then we dealt with various scalability issues using Slony, then Lundest and so on. But in the second social network I co-founded a few years later, we used the same approach and then we ended up having a real nasty
Starting point is 00:07:07 problem for the first time. It was 2008, I remember it. And we saw the problem when you reach 2.1 billion records if you use integer 4 primary key. In some point when your table is growing very fast, 2.1 billion rows, I mean not rows, but value 2.1 billion, you cannot insert. Postgres cannot insert anymore and you have a very big problem to solve. To solve it synchronously, you need to have some downtime, many hours of downtime, unfortunately. So since then, I learned you never should use integer 4 primary keys. But what's interesting here, we used basically some kind of random distribution for numbers. And at that time, we didn't realize that it's not good in terms of performance. Why it's not good in terms of performance? Because if you, usually in social
Starting point is 00:08:11 networks, social media, most of the time you select users, make you, ask you to select from your database most recent data. For example, provide the last 25 comments or 100 comments for this post, or show most fresh, interesting posts and so on. And if you use random numbers or looking like random but not real random numbers, you end up using different pages for each insert in terms of heap. But not only in terms of heap, but also for indexes, because if you use index, probably you use by default, it's B3, you end up inserting in different subtrees in B3. So sometimes it's good, but not always. If you have just single node and your usage pattern is like fresh data, fresh data is needed more often than old data.
Starting point is 00:09:18 In this case, it's not optimal. And we didn't realize it at that time. And then we also saw our competitors don't care anymore about hiding some numbers and so on. And at some point, I decided also not. And my third social network, since then, I don't care about this masking real numbers. And I just used integer eight primary keys and all good. But now we live in the world where we use many, many, many nodes. And for example, just yesterday I needed to compare behavior of two major Postgres versions and I needed to dump some samples and plans and so on. And then I needed to analyze plans behavior, explain analyze buffers on two nodes and then to compare.
Starting point is 00:10:16 And obviously if I used just regular integer eight and sequential assigned, I would have collisions when merging data from two nodes because, for example, ID 1 is used on both servers so we cannot merge this data. It's solvable, but it's not interesting. So I just used UUID. You install
Starting point is 00:10:40 extension and you start using UUID and you know collisions are extremely rare. So, it's all good. But the regular approach, the so-called traditional UUID, so-called version 8, sorry, version 4, it behaves like randomly distributed. In this case, you know collisions are very unlikely. It's good
Starting point is 00:11:08 to use it as a global value. But since it's randomly distributed, it's going to have not good the state of B3 because when you need again, like you need to check your latest results, you will use a lot of buffers. And this particular post in CyberTech blog discusses this problem. But it goes into an interesting direction which probably was not explored by others before. It discusses the performance of aggregates of count
Starting point is 00:11:48 and index-only scans. So with index-only scans, we utilize visibility maps. And of course, in this post, the author performed a vacuum analyze. And so visibility map was fresh, no heap fetches involved, only work with index. So select count, blah, blah, blah, and you check some count of fresh records. And the author compared three cases. Integer 8, UUID version 4, traditional UUID most people use. In Postgres, this is what you will use if you decide to use UUID, provided by Postgres itself, by extension basically. And the third option was a new kind of UUID, so-called version
Starting point is 00:12:38 7, which is not standard yet, but there is a work in progress standard reviewed right now. And this UAD version 7 has interesting properties. Collisions are also extremely unlikely, but values are generated in a kind of sequential-like order. So if you order values generated, so basically timestamp is involved under the hood. And if you generate values, you can order by them and it will have correlation with time. So it keeps this good property to be, like you can use it globally.
Starting point is 00:13:24 You can have many, many primary use it globally, you can have many primary servers generating this value. Collisions in your life will never happen, but you can order by this value and it will be the same if you would order by created at column or any column. So similar to regular surrogate integer 8 primary keys. And interesting result was that if you create indexes, of course, bit indexes for all three options, and then you, first of all, if you check, like, first thing interesting was, like, I didn't pay attention to it, but it's interesting.
Starting point is 00:14:08 Insert time itself already where version 4 traditional UID is losing. It's interesting. It's not losing too much, but it's already losing in terms of insert time. And if you check then if you check select
Starting point is 00:14:24 count, this is the most interesting part. So timing for version four was much worse, but the state was warmed, caches were warmed, so everything was buffer heat. And then the author checks, explain plans to understand why it's slower, why version four is slower.
Starting point is 00:14:45 As for integer 8 and UUID version 7, quite close. Why so? And there I expected to see it, and I saw it. I saw buffer's option used in explain analyze plan analysis. We discussed it many, many times, and I will repeat it many more times. Buffers should be always used when we analyze plans. It should be default, but unfortunately,
Starting point is 00:15:15 there were two attempts to make it default during the last few years. But unfortunately, both attempts failed and both hackers decided not to continue. I think everyone agrees it should be default, but somehow it's related to tests, automated tests Postgres has and so on. It's hard to implement. But I think if someone wants this goal to achieve, it's a great goal to make buffers option default in expand analyze. So when we check buffers, surprise, we see that for integer eight and UID version seven, it's almost the same in terms of buffers. This explains similar timing. But for UUID version 4,
Starting point is 00:16:10 it's almost two orders of magnitude more buffer hits. And then Offler explains why it's related to visibility maps. The need for Postgres to check many more pages, many more buffers, because we don't have data locality anymore inside visibility maps as well, and we keep checking many, many, many buffers. So this is super interesting, and it means not only for fetching fresh data, sequential generation of surrogate keys, either integers or complex values like UID is better. But also for aggregates and counting, Postgres has slow count. It's always a problem. Many people complain.
Starting point is 00:17:00 It was improved with improvement of index-only scans. If you have auto vacuum tuned, index-only scans perform quite well and counting is faster. But still, this is an interesting example. So the bad recipe is to forget about auto vacuum tuning and use UUID version 4. And you will end up having very slow count. But good advice is don't use version 4 UUID, use version 7, which is not official yet, but you can write simple function to that. It's not a deal. You can find many examples on the internet,
Starting point is 00:17:43 or you can even generate it on the application side and then also keep your visibility maps up to date. For that, you need to tune AutoVacuum because AutoVacuum updates visibility maps. Visibility map remembers which page has all visible tuples. So in page, we have many tuples. So in page we have many tuples and all visible page means that we don't have transactions which would need some tuple which probably is dead. So all tuples are visible to all transactions. So it means like if you make some recent delete or update, you produce some dead tuple, then Autovacuum needs to clean it. But also in visibility map the all visible flag will be set to false, meaning that
Starting point is 00:18:35 in this page we probably have some tuples which are visible to some transactions, not visible to some other transactions. But if you keep the auto-vacuum settings quite aggressive, auto-vacuum is processing tables often. In this case, you will see a few heap fetches in the plan for index-only scan and count will be faster. But also you need the sequential ID allocation. So UUID version 7 is looking very good. Funny thing that a few months ago on Postgres TV, we had a session with Andrei Borodin and Kirk Volokh, and it was Kirk's idea to let's work on implementation of,
Starting point is 00:19:21 first it was called ULID, but then we learned that there is a standard proposal which describes UUID version 7. So, Andrey implemented it and sent to hackers list. You can find it on Postgres TV,
Starting point is 00:19:39 YouTube channel. And we had good feedback that it's indeed a feature which should be in Postgres. But Peter Eisenstraut noted that the standard is not finalized, it's not approved yet.
Starting point is 00:19:56 And before approval, it's not good to implement it. We should wait until approval. And then one of the authors of the new version of standard popped up in the hackers mailing list, responding that it's great that you do it and update it with some details. So it was good work. Anyway, I've checked the state of the standard and on June 10 it was this proposal changed its state
Starting point is 00:20:25 and now it's waiting for new review and it looks like probably later this year this standard update standard change RFC I don't remember the number can be approved and
Starting point is 00:20:41 if it's approved I think it will be not a big deal to finalize the patch and approved I think it will be not a big deal to finalize the patch and in this case it will probably make it away to Postgres 17. If not okay, Postgres 18.
Starting point is 00:20:56 But it's not a huge patch so once the standard is approved I hope Postgres will have it. That's probably it. Again, it was hard for me to be alone. I hope next week I won't be alone. If I will, it will be anniversary.
Starting point is 00:21:15 It's interesting. We should do something special. If you have some questions, probably we should have some session of Q&A, like ask me anything session. So send me anything to Twitter, to YouTube, anywhere. And we will probably work on those questions. And in two weeks, Michael will return and it will be normal podcast. Sorry if I was not like I was doing a lot of it's because no editing this time. Okay, thank you so much.
Starting point is 00:21:48 Please share this episode in your social networks and working groups and so on. And let me know what you want next week. Bye-bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.