Postgres FM - Compression

Episode Date: July 26, 2024

Nikolay and Michael discuss compression in Postgres — what's available natively, newer algorithms in recent versions, and several extensions with compression features. Here are some links ...to things they mentioned:wal_compression https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-COMPRESSIONOur episode on WAL and checkpoint tuning https://postgres.fm/episodes/wal-and-checkpoint-tuningSynthetic wal_compression and streaming replication wal size test https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/issues/11default_toast_compression https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-DEFAULT-TOAST-COMPRESSION ZFS https://en.wikipedia.org/wiki/ZFSOur episode on TOAST https://postgres.fm/episodes/toastOn compression of everything in Postgres (talk by Andrey Borodin) https://learn.microsoft.com/en-us/shows/cituscon-an-event-for-postgres-2023/on-compression-of-everything-in-postgres-citus-con-2023cstore_fdw https://citusdata.github.io/cstore_fdw/Using Hydra columnar https://columnar.docs.hydra.so/concepts/using-hydra-columnarAbout compression in Timescale https://docs.timescale.com/use-timescale/latest/compression/about-compression/pg_tier https://github.com/tembo-io/pg_tierpgBackRest https://pgbackrest.org/WAL-G https://github.com/wal-g/wal-gpg_dump https://www.postgresql.org/docs/current/app-pgdump.htmlpg_dump compression specifications in PostgreSQL 16 (article by Pablo Glob from Cybertec) https://www.cybertec-postgresql.com/en/pg_dump-compression-specifications-postgresql-16/Our episode on pgvector (with Jonathan Katz) https://postgres.fm/episodes/pgvector ~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artwork 

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, hello, this is PostgresFM, episode number 107. My name is Nikolai, founder of PostgresCI, and as usual, my co-host is Michael Pidgemustard. Hi, Michael. Hello, Nikolai. Hello. So, you chose the second most boring topic, in my opinion, after security. Tell us what it is. Yeah, I can blame our listeners for this one we had a
Starting point is 00:00:27 in my opinion blame someone else right yeah exactly always that's the first this is my favorite methodology in troubleshooting in off incidents no blameless blameless culture right so we had a great listener suggestion to talk about compression and i guess it's kind of surprising we haven't covered it yet we've covered various topics around this but they they are specifically in the context of a couple of extensions that offer compression options i thought it was a good topic to cover generally we can talk about all the different things where compression is available in postgres kind of broad but shallow topic maybe this time yeah we will talk about physical like not lower level of compression not like modern way to compress data when you tell some llm i have a lot of data in this table in this column it's a huge chunk of text let's just summarize them and drop the
Starting point is 00:01:27 detailed texts and so on right it's it's it's actually my favorite way to compress well that's a good point actually i guess we i i assumed automatically we were talking about what would be called lossless compression yeah so, so like with images. This is definitely lossy. Yeah, this will lose some details for sure. And it will use some details that don't exist. Not only is it lossy, it's also glossy, I guess. Does it stand for lossy lying machines? Is that what they stand for? Yeah, yeah. for lossy lying machines is that what they stand for yeah yeah so we talk about lossless
Starting point is 00:02:07 lower level transparent compression when we just enable something and data suddenly takes less disk space or space when we send a network like like sending something in transit, so to speak. And we are able to uncompress it without any losses. But it's all fully automatic and users just use it, enabling some features probably. This is our focus today, right? Yeah, exactly.
Starting point is 00:02:41 And I think that's a good transition into kind of the two main benefits. And I think they're obviously related, but I think they are somewhat different and probably in some ways trade off against each other a little bit. if we have very repetitive data or data that compresses really well we could maybe spend a lot less money and less resources by compressing it but so storage is obviously a big one but there's also the performance side of it if we can if it can take up less space it might be depending on where our bottlenecks are it might be that overall the the cost or speed uh degradation of compressing and uncompressing or decompressing the other side is still faster if we've had to you know only transport a lot less data or do calculations on a lot less data that kind of thing let's maybe start with things we have for many years in Postgres, right?
Starting point is 00:03:48 And then discuss some new stuff and then discuss, like, compare what's beneficial, what's less beneficial. We have, first of all, compression at wall level for full page rates, full page inserts, right? FPIs. And full page inserts need to fix the problem of difference between the size of
Starting point is 00:04:13 buffers, the pages in memory, 8 kilobytes in most cases, and the size of block and file system, very often 4 kilobytes, x4 for example, right? And to avoid partial writes in the case of feathers, Postgres, after each checkpoint, if the buffer was changed for the first time
Starting point is 00:04:36 until the next checkpoint, Postgres doesn't write only the change itself. It writes the whole buffer. And if by default wall compression is not enabled, it means the whole buffer 8KB is written as is to wall, consumes 8KB, right? But if we enable wall compression, this page is compressed. And in my opinion, in most cases, if we talk about significant load, we should consider enabling wall compression. It can be beneficial, especially if you have short distance between checkpoints, because in this case, you have more frequent full page writes happening. Of course,
Starting point is 00:05:21 if you increase distance, then maybe the same page is becoming dirty it means writes happen inside it. Many times, multiple times and only for the first change it will be full page write. Subsequent changes will be written only like deltas, right? Only tuple which was changed. But if you have quite frequent checkpoints enabling wall compression, you can significantly reduce the size of wall written and this has very good positive consequences such as less data to write to backups while archiving. Archive command will archive less bytes, fewer maybe like gigabytes per hour, for example. And second, replication as well. Wall is smaller,
Starting point is 00:06:18 so replication has less to transmit over network. Of course, compression needs CPU cycles and decompression needs CPU cycles. And I saw on Twitter, some people mentioned they had problems when they enabled wall compression. But in my cases, I observed, we always decided to switch it on and CPU overhead was worth it. so there is trade-off here CPU
Starting point is 00:06:48 versus IO and we always chose in favor of less IO and then with synthetic tests I just showed you before we started this recording I showed you we had simple PgBench experiment with max wall size 4 gigabytes checkpoint amount 15 minutes. We saw wall reduction I think on regular PgBench workloads, which include writes, of course, inserts, updates,
Starting point is 00:07:18 deletes. We saw three times less wall created, generated, or written, it's the same, right, when we enable wall compression, which is a huge benefit, three times less wall. But if you had the checkpoint tuning, as we discussed in some of our previous episodes, if you had it, and distance between checkpoints is quite large, especially if you have Patroni and modern Postgres, which in case of failover or switchover doesn't require restart of all nodes, you can afford bigger distance between checkpoints
Starting point is 00:07:56 and have maximal size set to, I don't know, like 100 gigabytes. In this case, it's unlikely the benefit will be so huge, not 3x. Of course, it also depends a lot on the data nature. If it's very repetitive, it's easier to compress. That's pretty true across the board for compression, isn't it? On the wall front, one thing i think we probably didn't talk about when we talked about a wall and checkpoint tuning is on the compression side as of postgres 15 which i guess is pretty new so you might not have have this kind of in the wild experience yet but we do have more options now so in the past we could only turn wall compression on
Starting point is 00:08:42 or off and now we have instead on, we have three different options. We have the previous option, which is the PGLZ algorithm. But we also have LZ4 and ZStandard options now. And for the people that complained about the CPU side of things, one option might be LZ4 i believe would be less cpu intensive it might not be faster and more lightweight in terms of cpu consumption but the compression ratio is worse well it's not that different to pglz actually it's yeah but compared to some other modern compression algorithms it it tends to lose
Starting point is 00:09:26 but compared to pglz you don't lose much i think it is slightly worse on average obviously depends a lot on the data but i think it's a pretty good option if you have that issue and if you're if you're testing turning this on on a modern version of postgres worth trying at least whether you know depending on your constraints, trying those different new options. Yeah, that's a great point. And two ideas here I have in mind. First of all, I remember these options,
Starting point is 00:09:56 ZStandard and LZ4, they require compile flags to be turned on. I guess, as you rightfully said, I don't have a lot of... It's fresh things, so I don't have rich production experience. But these flags, I guess they are turned on in the official apt packages, right? If you just install it on fresh Ubuntu. I hope it's so, but worth checking. This option should be turned on. So Postgres should be compiled with support of these two algorithms. And second thing, I think it's worth new research.
Starting point is 00:10:35 Maybe with our bot, it should be quite easy. And we should just research maybe with different workloads and maybe with different Maxwell size. As I said, it's very important in this case, the distance between checkpoints. And just with regular PgBench, we could check all available algorithms on fresh Postgres versions. And maybe draw some charts, right? Plot some charts. So, yeah, we just like hands, but bot is definitely ready for that.
Starting point is 00:11:04 So I think it's a great direction. I wish I had more hands. By the way, I wanted to say we are looking for maybe part-time database engineers, people who want some fun with this kind of work, like research. We usually do it public, so it's kind of interesting for community as well. So if some people listening to this podcast want to participate We usually do it public, so it's kind of interesting for community as well. So if some people listening to this podcast want to participate and work part-time with us, with Postgres AI, definitely would be interested in discussing. So maybe we will have more hands and do this benchmark, for example.
Starting point is 00:11:42 What's the best way for them to get in touch with you? Email, email or Twitter or LinkedIn, but email nick at Postgres AI, always a good way to contact me. On the packaging front, I don't know, I haven database, and recently upgraded to Postgres 16, finally, and was pleasantly surprised that I was able to turn on while compression with LZ4 and Toast. I was able to switch our default Toast compression algorithm to LZ4 as well, which is really cool. 16, you said? I'm on 16, but I think those are available
Starting point is 00:12:22 as of version 15 for Wall and version 14 for toast. That's great. So what benefits? Do you remember some numbers? So I didn't do extensive benchmarking, but the thing we use it for most is we have saved plans. So explain plans compress really well. It's a lot of repeat. Large values.
Starting point is 00:12:51 Yeah, manually. Yeah, yeah, yeah. It makes sense. It's in JSON, right? Well, both. Like text plans compress well too, but JSON plans compress extremely well. But obviously JSON plans are bigger in the first
Starting point is 00:13:06 place so there's not yeah so yeah compresses really well but the main thinking was we don't mind spending a bit more on storage for those if if the speed of retrieving them will be quicker so people save a plan they might share it with somebody on their team and that person needs to load it obviously it's like we're talking small amounts of time but if if the storage wasn't that different and the speed was faster i was happy to make the switch and yeah it turned out the storage storage was slightly worse on being on average for the plans i tested with the lz4 but the retrieval speed was faster. So I bit the bullet and did it. The cool thing is, the thing I didn't realize, I thought it would be a complex migration,
Starting point is 00:13:50 but like, what do you do? Like, I thought you might have to change existing plans or existing data, but you don't. If you change the setting, it applies to new data. Right. Yeah, actually, this is a good point we didn't mention. So we're shifting to discussion of storage, compression, and post, right? But it's a good point we forgot to mention about wall compression.
Starting point is 00:14:15 It doesn't require restart. So you can switch it on, switch it off, and all new writes will happen according to the new setting. Of course, you need to send the SIG hub signal or select PG reload conf. So configuration change is applied without any restart, which is great. And it also means it's like easier to try.
Starting point is 00:14:38 If you have monitoring, if you're prepared to roll back, it's easier to try. And if things go wrong, can return and yeah so toast compression is interesting and so you again like sorry i'm i'm not paying attention to details today somehow you you chose this standard right i actually chose lz4 oh okay okay i was more interested in the speed of retrieval and then speed compression and speed of retrieval than I was for total size on disk. Yeah. Meanwhile, I asked our bot to double check how apt packages, official PGDG packages are created.
Starting point is 00:15:23 Of course, these two options are there. So if you install Postgres on Ubuntu, you can try various compression algorithms. Well, just to double check. Okay, so toast. What else? We don't have a good ability to compress table data, right? By default in Postgres, we don't have it.
Starting point is 00:15:46 And it's actually not a simple topic in RowStore. In RowStore, we have, yeah, so tuple by tuple is stored and it can go in different pages. So like if we compress it, we can transparently compress it switching to ZFS, for example. And we saw like benefits in terms of disk space, like 10% to 30%, depending on data nature. But ZFS brings new challenges for administration, for sure. It's still good to shave off 30%.
Starting point is 00:16:21 And this is what you get by default if you install DbLab because it's on ZFS so if you have for example terabyte size database in DbLab it will look like 700-800 gigabytes only it's much better but yeah so but in Postgres itself there are no good compression options for
Starting point is 00:16:40 heap except for single large like if you have one column that's large, there's good, like I think Toast is very good, but not for like smaller values. Oh, actually, yeah, big values are compressed in heap, right? Now I remember before going to Toast,
Starting point is 00:16:59 like Postgres tries to squeeze, like to fit them in size like two kilobytes and have for roughly four tuples per page roughly like it's
Starting point is 00:17:11 like kind of things I remember if once it's compressed it's under one kilobyte or something like that they put it in line on the page in line
Starting point is 00:17:19 yeah it just means that if you have values that are multiple kilobytes and compress well even the default pglz will give you quite a lot of compression out of the box transparently in Postgres. It's just if you have lots of repeating small data, like time series data that could compress as a whole very well, the very store model. But this compression is applied to a single tuple, single row, basically, version, right?
Starting point is 00:17:50 So single row version. It means that we only have one timestamp, for example, one, like, I don't know, like temperature and so on. Several different columns, maybe a couple of timestamps, but different nature of timestamps. For example, created at, and I don't know, like registered at something, different timestamp. and postgres tries to compress the tuple and toast like we didn't cover and there's no goal to cover it deeply we had another episode on it
Starting point is 00:18:15 right so postgres tries to compress if it doesn't fit it shrinks it already and chunks are stored in separate uh-called TOS table, which is actually also a regular table, which is kind of invisible to user, but you can inspect it if you want as well. And then the decompression and reconstruction of tuple is occurring when it's needed. But I'm trying to say there is no like for heap itself, there are no like rich capabilities to control compression. And even if we had them, expected benefits would be not high compared to analytical column
Starting point is 00:18:57 store databases where, for example, we have all temperatures or all timestamps stored in separate file. Like only this column for all rows is stored here. And they all are temperatures, and maybe these temperatures are coming from, for example, if they're coming from a single source, they don't jump. I don't remember the word in English, sorry, but the changes are not acute, right?
Starting point is 00:19:33 I mean, if they change, for example, if temperature is increasing, probably it will be increasing for some time. It means that compression could be done using various algorithms, for example, applying deltas and storing only deltas, right? And we will probably discuss timescale DB soon because this is pointing in the direction of timescale DB and what it provides. So for column store, you can compress like 10x, which is not possible for row store.
Starting point is 00:19:58 Yeah, or more even. But I think you're missing... The multiple is interesting, right? But you're missing the the multiples interesting right but you but you've missed the fact that like let's say we had a row that is let's say our tuples are like a megabyte but 99.9 of that is a single json column and we have 10 other tiny little integer columns we get way more than 10x compression ratio just with Toast, you know, as if that block is as well. Yeah.
Starting point is 00:20:29 So as you know, probably I, like when I was very young, I participated a little bit in XML function and data type development in Postgres. And I remember that time XML compression was the thing, big thing. And I even remember hardware accelerators companies were selling to compress XML on the fly transparently. It's crazy. It's because a lot of structural pieces of the values can be compressed well in XML for sure, like less than JSON, but still also like all those
Starting point is 00:21:07 parentheses, braces, like quotes and so on. Like, yeah. Or maybe it's also some, if it's JSON, not JSONB, a lot of white spaces. Right? Sure, yeah.
Starting point is 00:21:23 Yeah. So if you have json and uh yeah so so in some cases compression and toast like tuning it it's interesting and maybe also we need benchmarks so maybe for json as well right it's good yeah it would be interesting but what what i mean is more like the the compression ratio is a lot is is varied obviously extremely dependent on the data but i think yeah i'm very curious what you have for example if you know that some old values are not compressed and some new values are compressed you check how much storage is occupied by, I don't know, by like 1000 rows or so. It's worth checking.
Starting point is 00:22:09 Well, you can't, it's like impossible to store large values in Postgres without, as you know, you can turn compression off, but Toast, you can turn compression off with Toast, but the default is on. So I have never tried it without so i don't have that to compare but just give you an idea we didn't used to compress plans in so we only store plans in local storage by default i can browse the local storage and that's an extremely limited resource like in some i think in firefox it's like five or ten megabytes depending on the version so that's not very much when you're talking large query plans. Obviously, it's a lot for most people's query plans,
Starting point is 00:22:49 but some people will come up with query plans that are like dozens of megabytes. Varalina data types accept up to one gigabyte, right? But as I remember, it's better not to go beyond a couple of hundred megabytes because the performance will become terrible. Well, just to give you an idea, though, quickly, these plans were compressing from megabytes down to kilobytes. It was more than 95%, I think, in some cases. That's cool, but still, it's a compression of only one tuple.
Starting point is 00:23:24 Actually, one value value not tuple value here yeah yeah but uh it feels like it would be it would make sense to have some compression that may be at page level as well and so on this is what the zfs provides transparently if you put postgres on top of it right so compressing I think, even if it's still a row store, maybe having these abilities. I remember Peter Zeitz talked a lot about new features in MySQL many years ago.
Starting point is 00:23:54 I think MySQL has more settings in this area for compression of storage. Have you heard or remember something from it? We have mostly questions today right i didn't know i didn't know that okay they are they seem to be a bit further ahead on the storage engine side of things don't they like they've had multiple storage engines for quite a lot longer than well like we don't have but yeah so it wouldn't surprise me if they were ahead on that front. But yeah, you mentioned timescale.
Starting point is 00:24:27 It feels like we've had a history of a few extensions that have offered the option for columnar storage. Before going there, let's just touch a little bit on the options that Postgres could have, and there are some discussions. What else could be compressed inside for example temporary files temporary files yes well we should also probably talk about things that we actually haven't mentioned that like backup files pg dump files yeah i wanted to do it but look look, before going to backups or dumps, backups and dumps, it's
Starting point is 00:25:06 already going outside of Postgres. But imagine we have Postgres, we run some queries, we discussed wall, which is the first thing we should discuss because data is written there first before it's written in data files or pages and so on. So we discussed wall. We discussed the storage itself. Toast. This is what we have, and that's it, right? Okay. Not at page
Starting point is 00:25:30 level, but it can be achieved by file system, but that's it. Indexes, I have no idea. Maybe deduplication, which was done in Postgres 13, 14, by Peter Gagan, Anastasiy Lubennikov, maybe this can be considered as compression deduplication which was done in postgres 13 14 but by peter gagan anastasia lubenikova maybe
Starting point is 00:25:46 this can be considered as compression actually right some that's a good check yeah native native compression you know like pieces of compression because like it requires less less space right it's kind of why not so but it's optimization but. And last time we discussed, what was the topic last week? Out of disk. Out of disk, right? And remember we mentioned, I named it exotic, and some listener commented on YouTube. By the way, thank you all those who write something on YouTube
Starting point is 00:26:21 or on Twitter or LinkedIn. It's very good to receive feedback. So they mentioned that temporary files is not that exotic. And if you run huge, heavy queries, so they had it, you can be out of disk space because temporary files were huge and consumed a lot of disk. And compression of them makes total sense to me right what do you think well we don't have it right i was watching a really good video that you sent me a link to by andre borodin right
Starting point is 00:26:54 right and that was going to yes and he i think uh he meant i watched it last night and he mentioned that in order to be i think there was a problem with durability. Forget about problems. This is like, I'm just, that doesn't make sense. He started doing the work and it's more complex than it might sound. But yeah, it makes sense, but it's complicated. Full disclaimer, I just chatted with andre this morning and this is what he told me and i'm just shamelessly using his ideas so yeah it's worth
Starting point is 00:27:34 compressing pages as i mentioned and it's worth compression compressing temporary files andre says it's just lack of time and hands but but it's worth implementing and proposing patch. So, yeah, it's a great idea. I hope he will find time to work on this, and other people probably will find time to move Postgres forward in this area. And in this case, imagine if temporary files are compressed. In this case, I would be more right saying it's exotic to be out of disk space
Starting point is 00:28:03 when you have temporary files occupying be out of disk space when you have temporary files occupying a lot of disk space, right? I think it makes extra sense because the type of things that are generating temporary files, like sorts and hashes, it's probably a lot of similar values. Well, I'm guessing there would be a fair amount of repetition which would suit it would naturally suit compression as well so i'm wondering if there's like a not only would it just take up less space and obviously there's benefits there but i'm wondering if there might be performance benefits too yeah that's a good point actually when we talk about saving disk space sometimes it's also about performance if disk is slow it's better to consume cycles of cpu for compressing decompressing
Starting point is 00:28:59 and things become much faster i agree this is a good point and but it's not always so it depends exactly it depends in general i think i saw somewhere there somewhere, the general point is that databases are I.O. hungry. And as you very well know, we talk buffers, buffers all the time. It's just confirming it's all about I.O. usually. But I also saw CPU 100% and we lack CPU especially. For example, we are on Graviton or ARM, and we only have up to 64 vCPUs, and that's it. These days, it's not a lot, 64, right? It's like a moderate number.
Starting point is 00:29:35 It's a lot for some people, yeah. Right, right. But it's not 360. It's not 800 like you showed me on AWS for Intel scalable Xeon. So if it's 64, a lot of clients, in this case, CPU becomes a very valuable resource, right? Of course. So in this case, probably we might be preferring to spend more disk IO cycles. Okay.
Starting point is 00:30:08 What else can be compressed in Postgres? We talked about, I mean, ideas which are not currently implemented. Temporary files, page-level compression, maybe protocol level, which has pros and cons. Maybe let's not go there, right? What else? Anything else in memory that could be? Oh, interesting. Maybe. I don't know.
Starting point is 00:30:30 But I think if we, for example, shifting maybe to some additional projects, I think we will talk about Hydra and TimescaleDB. If, for example, we have StorageEngine and we have some tables defined as column store,
Starting point is 00:30:47 this is what Hydra provides. In this case, it makes total sense to compress those tables. Or, for example, if we mentioned last time talking about disk space, or we have row store, but we have partitioning and some old partitions we consider as having archived data. And this data we want probably compressed, and maybe we want to store it not on the regular disk, but on object storage. And we discussed PGT, Tembo developed. So kind of bottomless Postgres, but we want all data to be compressed and rarely used, right?
Starting point is 00:31:25 Yeah, I saw three different approaches from different extensions. Like the Hydra one you mentioned, I think, came from Citus data originally. C-Store FDW, I think, was the original. It's AGPL from Citus inherited by Hydra, right? Yeah, so I think that is very much like from the start of a table's life, It's AGPL from Cytos inherited by Hydra. Yeah. So I think that is very much like from the start of a table's life, you choose whether it should be row store oriented or column store oriented. That's one approach that makes a lot of sense.
Starting point is 00:31:58 The timescale approach seems to be… And column store, of course, we won't probably compress quite a lot because the ratio usually is good. storage benefits, but the main aim seemed to be so that we can make analytical queries faster. So it's that performance angle that seemed to be the driving force of why should we want it to be column-store in the first place for performance of analytical queries that tend to be aggregates over a single column. And if we've got column-oriented data that's compressed... Yeah, it's massive read. A lot of I.O. Of course, we want to reduce this I.O.
Starting point is 00:32:47 and then already deal with it in CPU. But also nature of data is like a lot of similar-looking values and compression. And I think once it's organized by columns, you can also start to store much easier like the metadata of like min max values and do some like shortcuts i think on that data as well so i think there's some like cool tricks as well now but but there's then i think there's i think there's two other approaches that i've seen one is the time scale approach which seems to be on older partitions, like on older data,
Starting point is 00:33:26 everything's row store at the beginning. And then after a certain point, you set a policy that it gets converted to column store later once it's unlikely to change. I remember differently. I remember it's always row store, but compression works in two dimensions. Like, for example, I'm not sure if they converted to column store. Maybe I'm wrong, but what I remember is still row store, but with understanding of vertical dimension, so to speak.
Starting point is 00:33:57 And, for example, storing deltas instead of row values and applying compression. Well, I mean, it's still in postgres it's still in postgres so it's still row store under the hood but it's column or like it's organized by column like if you look the postgres docs are really good on this um i'll share a link sorry not the postgres the timescale docs i like understanding here apparently but there's a third approach as well i think that so the timescale approach I think is more optimized for space. I think the compression is more
Starting point is 00:34:28 on the, let's make older partitions take up less space because you have so much data. Like 20, yeah, some of the numbers they mention, like 95% compression. We have impressive numbers observed, yeah.
Starting point is 00:34:45 It's really good. But yeah, but that's the idea is with time, time series data, you could end up with hundreds of terabytes like, and they've, they have themselves. So I think it's,
Starting point is 00:34:54 it's the kind of time where you could actually save a lot. Now, obviously they also benefit from the performance on analytical queries and they have some cool features there, but it feels like the way they've implemented it was primarily for those storage benefits and then the third one i think has popped up relatively recently in the grand scheme of things is this idea of as you mentioned the tiering or like the moving to a file like format like parquet so exporting data out of Postgres into a compressed format on object storage
Starting point is 00:35:28 that's normally column-oriented so that you can get these fast analytical queries and they take up a lot less storage space. There's some big limitations for data types, I suspect, right? Only a limited set of data types supported for those kinds of things. And how do updates and deletes work? I don't actually know all of the details. limited set of data types supported for those kinds of things well and and like how how the updates and deletes work i don't actually know all of the details only yeah so there's definitely
Starting point is 00:35:52 like limitations and differences between these and a lot of like we're not gonna be able to describe them all here obviously time-wise um but i found it really fascinating that there's these three different very quite different approaches. What do you think? Right. Yeah, well, it's super interesting to observe progress here. I, like, probably again, like, for me, this episode raising more questions than answers. I think after this episode, I will be planning more experiments to study benefits.
Starting point is 00:36:27 And I guess we always need to take into account several metrics, compression ratio, speed of compression, speed of decompression, right? Maybe CPU overhead itself, if it matters for us, how much CPU we consumed. So, yeah, timescale DB and Hydra are interesting in this area. And I still, I remember my big impression from reading the TimescaleDB details in their blog post, how they implement compression. I think we forgot, because of me, we forgot to talk about compression of dumps. PgDump has compression options.
Starting point is 00:37:12 And also compression of wall files as a whole, which I think Postgres doesn't provide it, but both PgBackrest and Wall-G, I think the most popular backup tools, they both do it, but both pgbackrest and walg, I think the most popular backup tools, they both do it, right? Because if you have 16 megabytes file, if you compress and you have like three times less, for example, as I remember, like five megabytes maybe or so, maybe even less in some cases.
Starting point is 00:37:42 Again, it depends on data. In this case, it's much better in terms of storage costs and transfer speed and so on. But this will consume some CPU cycles, and usually we archive commanders working on the primary. This is the key here, usually, to avoid the risks of having longer delays, this lag of archiving.
Starting point is 00:38:05 Because we need walls, it's part of our DR strategy, the disaster recovery strategy. So we want to archive wall as fast as possible. And this means we do it on the primary. And if we archive whole wall, we need some CPU. And if a lot of wall generated, probably we will need multiple workers. And then you see already more than 100% of single CPU. So meaning multiple cores are busy.
Starting point is 00:38:34 Like 200% out of our 360 cores, vCPUs. We allow two cores to be used to compress and archive walls. It's just like some random numbers in my head, but definitely if we talk about compression of whole walls by Wall-G or PGBgres, we need to keep in mind this, the most valuable CPU resource, which is on primary.
Starting point is 00:39:07 We need to think about capacity here. And decompressing usually is not a big deal, especially if we can fetch walls using multiple workers from object storage like S3 and decompress them. Usually it's not a big deal, but still worth remembering. So usually we have compression there. Unfortunately, Postgres right now doesn't do it officially. So here I talk about, again, third-party tools, which are very common, popular, WorldJPG, PgBackrest.
Starting point is 00:39:36 Others as well, I think they also compress. And pgDump. pgDump is official, very official. pgDump, pgRestore., very official. pgdump, pgrestore. They support compression for custom or directory or both formats, right? I always mix them up. Yeah, and this is one of the oldest. Like, this has been in forever, right?
Starting point is 00:39:56 Right, but compression was forever, but what about options in terms of algorithms? Oh, yeah. ZStandard and LZ4. So I think it's relatively new. And it's worth, again, experimenting, benchmarking, and studying. I think there are articles from CyberTech, right? Let's check.
Starting point is 00:40:15 Yeah, there are articles comparing different types of compression and decompression. So ratio, I think you can even control a ratio there if you want to spend more time and CPU capacity to achieve a little bit more better compression ratio, it's possible. Yeah, I remember some surprising results from those articles. I don't remember details, so let's attach them. But definitely we want to study this as well. Again, mostly questions today, not answers,
Starting point is 00:40:49 not exact recipes. But it's good that the only thing I don't like in PgDump, PgRestore is that you cannot use parallelization and compression on the fly. What if I don't want this is a one-time operation.
Starting point is 00:41:06 I won't just logically migrate from one database to another database. And I don't want to send a lot of bytes over network. I want to compress and decompress and use multiple workers because, for example, these servers are not used right now. So I just need to migrate out of RDS
Starting point is 00:41:24 to self-managed Postgres because I feel enough level of confidence or I work with great guys who know how to do it and RDS is not needed for me anymore and so on. So in this case, you need to, unfortunately,
Starting point is 00:41:39 you need to first save it to disk and then restore from there. Yeah. So this is, I think, a big missing feature of Postgres Lex. And Dmitry Fontaine created pgCopyDB, I think. But I quickly checked before our recording. I didn't see anything related to compression. It talks only about parallelization. Let's have like 16 workers or I don't know, four workers to speed up the process, which is in my head quite
Starting point is 00:42:13 relatively simple idea. We just create repeatable read transaction, keep it, export snapshot, create other transactions, repeatable read, and use the same snapshot so all of them are synchronized and you can even read one huge partition table in chunks. It will be fully consistent read using multiple workers. But no compression options, I couldn't find it. Maybe it's not an easy idea to implement. Maybe it's a good idea. So what I would like to have is just pgdump, pgrestore, supporting both parallelization and compression without the need to store intermediate file or directory.
Starting point is 00:43:02 Would be great. Makes sense. Yeah, I needed it yesterday. Copying huge, not huge, not huge at all, like 10 or something gigabytes, tiny table with big vectors. I mean, a lot of vectors. By the way, let's like make a full cycle and in the end mention that although we joked
Starting point is 00:43:24 like about lossy compression using LLM, underlying tech, when you have a lot of dimensions and huge vector values, it means toast is involved definitely. It's interesting how it's compressed there. In this case, it's a simple idea. Sometimes
Starting point is 00:43:39 reducing the number of dimensions means kind of lossy, but it's also kind of compression, right? And OpenAI speaks about it. It's also an interesting area how to, like, not losing a lot of quality.
Starting point is 00:43:56 On vectors, though, I think Jonathan Katz made a really good point on the episode we did for Postgres.fm that because it's mostly because the vector data is like at least the embeddings that come back from models like open air it's mostly random integers it doesn't compress well so there's actually i think i'm not sure he's done it yet or he's planning to do it there's work to look into whether you'd actually be benefit it'd be beneficial to
Starting point is 00:44:26 turn compression off for toast of vector data because you're not getting much compression there's no point paying the overhead of the compressing and decompressing each point so i thought that was super interesting as a use case for turning it off yeah interesting i would like to explore it off. Yeah, interesting. I would like to explore it myself. It's a very interesting area. So I think we discussed maybe five different directions for benchmarking. I would like to conduct these benchmarks.
Starting point is 00:44:57 Cool, let us know how you get on with all five by next week, right? Yeah, well, yeah. So thank you. It was interesting. Many questions. right? Yeah. Well, yeah. So thank you. It was interesting. Many questions. There you go. Not boring after all.
Starting point is 00:45:12 Thanks so much, Nikolai. Catch you next week. Thank you. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.