Postgres FM - Append-only tables
Episode Date: November 15, 2024Nikolay and Michael discuss append-only tables in Postgres — what they are, some unique challenges they bring, and some options for compressing / removing / offloading the data eventually.�...�Here are some links to things they mentioned:Append-only https://en.wikipedia.org/wiki/Append-onlyOur episode on BRIN indexes https://postgres.fm/episodes/brin-indexesTips to Improve Insert Performance https://www.timescale.com/blog/13-tips-to-improve-postgresql-insert-performance/ Our episode on WAL and checkpoint tuning https://postgres.fm/episodes/wal-and-checkpoint-tuningautovacuum_vacuum_insert_scale_factor https://www.postgresql.org/docs/current/runtime-config-autovacuum.html#GUC-AUTOVACUUM-VACUUM-INSERT-SCALE-FACTOR Our episode about compression https://postgres.fm/episodes/compressionNew project announced recently called BemiDB https://x.com/samokhvalov/status/1854726389603484145pg_tier https://github.com/tembo-io/pg_tier~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artworkÂ
Transcript
Discussion (0)
Hello, hello, this is PostgresFM. I'm Nikolai, PostgresAI, and my co-host is Michael, PJ Mustard. Hi, Michael.
Hello, Nikolai.
And guess what we are going to discuss today?
I'll guess. Is it append-only tables?
Exactly.
Ah, I got it right.
I was surprised to hear that we haven't discussed it in the past.
I'm sure we touched it many times, right?
Yeah, for sure it's come up in episodes, but me too.
When I saw it in our listener-suggested topics,
I did have a few searches on our site
just to make sure we hadn't talked about it already as a whole episode,
and yeah, agreed.
But it's not something i see all the
time like it's it is it's relatively common like to have an events table but even then i mostly see
append mostly tables rather than like oh yeah we need to distinguish yeah i was thinking we are going to discuss both. Yeah.
Do you feel in the episode for append mostly tables?
No?
Well, I actually, I don't think this is super complex. I think it's quite nice because almost by definition,
if we do accept that it's append only,
we've got quite a narrow scope
and there's only a few things to consider.
Maybe it gets a bit more complicated with append mostly but yeah we can definitely cover that today i think it's still not that complicated yeah where do we start
like definition go for it yeah why not well if we just insert that's append only that's it
insert and select i would say yeah no updates no deletes select is allowed right yeah insert
and select this is the only two things we can allow ourselves from dml data manipulation language
and that's it yeah we We just select and insert.
This is append only.
If we have occasional deletes and updates,
it's append mostly or insert mostly.
I don't know how to say.
Yeah, I like that.
And why do we care about this particular use case?
Because it has characteristics, right?
It usually has...
If it's append-only, for example,
we don't care about that tables anymore.
No bloat, right?
It's good.
And usually we talk about huge volumes of data.
And at some point we think,
oh, we need to compress it,
we need to offload it maybe to cheaper storage, or just
clean up, because this old data is not needed anymore. In raw form, sometimes it's aggregated,
and in raw form, we can just remove it from database. Yeah, or we just, I don't know, need bottomless. And usually we want inserts to happen very fast
because volume is huge,
so we need to make sure performance of inserts is good.
Did I miss anything?
No, I think those things aren't necessarily always true
for append-only tables, but they correlate.
A lot of the use cases
for very, very fast-growing data,
and by definition, append-only means
it's never going to decrease in size.
It's only going to keep getting larger and larger.
Unless you clean it up.
Well, there are specific cases.
For example, imagine you...
We discussed many times
the topic of slow count.
And if you can allow like a synchronous calculation of count,
maybe it's like materialized or something, I don't know.
So idea is instead of updating the count somewhere on each insert in the original table,
you can aggregate operations in intermediate table and then it's append only.
So you register events in some table and then you process chunk and reflect this in count in final storage.
And then you can delete it or better drop this partition or truncate or something.
In this case, it's append only, but it grows, grows, and then the size drops.
It happens also, right?
Yeah, I think dropping partitions
definitely pushes the definition of append-only,
but it's the thing that makes sense to do
in a lot of cases at huge scale.
But yeah, is it still append append only if we're dropping partitions
yes this is how we should do it no i know but do you see what i mean you mentioned deleting data
but well we again again again it's append only we drop partition we never delete we never update it's append only but we if we don't need like
last year data we already processed it somehow made all calculations we need we can get rid of
raw data we just dropping partition it's it's the best we can do instead of cleaning up somehow
using deletes i think we need to discuss it because I
did it many times and participated in huge projects in very large companies. The idea let's offload all archive data. It was e-commerce. Old orders let's offload it to cheaper storage for
a longer term storage and then we need to delete it in the original place, original database, and it was not partitioned.
Deletes, it was a project for a couple of months.
Because downtime is not acceptable.
It costs a lot of dollars.
E-commerce guys know very well.
They can calculate it each second of downtime, how much it costs
to company.
So if I had partition table there, it would be magic.
And it's append-only.
That particular table was not append-only, right?
But it can happen with append-only.
For example, we have audit log.
All some actions are stored in some append-only table, but
we have a policy to store only two years of data, then I would prefer to draw partition
with all data.
That's it.
So cleaning up is a very important topic for append-only tables.
This is what I was trying to say.
Yeah, I completely agree.
I think there are other benefits
to partitioning with append only or append mostly as well due to like if we do have the occasional
update or delete by having partitions that's bad yeah well partitioning helps with that as well
right so let's let's zoom back out maybe we We've got inserts and selects.
So we do have to, we might have to,
if we're talking about a very, very high volume,
we might have to worry about insert performance
and select performance.
We can also have, you know, sorry for interrupting,
we can also have copy.
Yeah, sure.
In both ways, right?
And I guess that's about to come up if we're talking about...
Also, common table, you know, like table, which reads everything.
Yeah, but it's kind of select.
So, yeah, inserts or selects, no updates, deletes, and so on, yeah.
So, use cases, you wanted to discuss use cases, right?
Or even, I was actually thinking of diving straight to performance.
I think there's a few things that we don't have to worry about,
a few things that we can then optimize for.
If we're having to insert extremely high volumes,
which sometimes these use cases do lend themselves towards.
I think IoT, for example, Internet of Things, high volumes which some sometimes these use cases do lend themselves towards you know liquid if we're
i think iot for example internet of things if sensors are sending information and we're we're
logging for each second amongst thousands or tens of thousands of sensors that could be that can end
up being a lot of data so inserting can be a bottleneck and you might make design decisions
for those tables that you wouldn't make if you had a different type of table or different type of data.
So there's that side of things.
But then there's also the read side of things.
I think, you know, and I think those things maybe sometimes play off against each other.
So the fact we've got append only, we have some benefits to like index only scans for example become even better
i know i know you often talk about always trying to get index only scans but in a table where the
data is often changing that can be a losing battle it can be a it can be a battle it's not always
worth fighting or it's maybe not always worth including as many columns to the index for
example like there's a there's different trade-offs for append only versus uh let's unwrap
everything here you mentioned so many things in in like in in just a minute right so first
let's talk about performance of inserts i would say ideal situation is we don't have indexes and
we don't have triggers including foreign key
triggers because foreign key in postgres internally implemented via system trigger and this trigger is
going to slow down inserts especially if you have like if you need to insert a lot of rows
if you just have a few foreign keys it can add add, like, it can multiply the duration of this massive insert.
So, ideally, we should get rid of foreign keys and keep as few indexes as possible for this particular case.
I remember in some cases I decided to go without primary keys, you know, breaking relational
model and so on.
There's no relational model in Postgres in any relational database which implements SQL
model, data model, which has null and it breaks relational model completely anyway.
But this is a different topic, side note.
Anyway, so it's not good to be without primary keys, but sometimes you think,
oh, I just need to dump this to some table reliably. So we have ACID, so Postgres guarantees,
it's stored, it's saved, it's replicated, it's backed up. But even one index, sometimes you think, oh, it slows me down.
And I remember I decided to leave without primary key.
And it was a weird case, but it was just some archive,
maybe just for audit purposes.
And I decided to use brin at that time.
Brin is actually a good idea to consider if we have append only because layout physically, rows
don't move.
If we have a row, it's a tuple, it's
saved in some block,
it's there, right?
So this is exactly when brin
indexes work well. And we had an episode,
one of our first episodes,
I remember. It was brin
indexes. Brin is
block range index, right?
Yeah.
Yeah.
So it's very lightweight.
It speeds up select performance,
not as good as other indexes,
especially B3,
but it's still good, right?
Or we might consider hash indexes also, right?
Because they might be more lightweight than b3 sometimes well i think for
example right yeah but but when it comes to append i think you you make a really good point each
index we have slows down the inserts so the fewer the better possibly none if we aren't let's say it's a table we're never reading from,
or it's an audit log that we only ever have to read from extremely rarely.
We might consider one or even zero indexes on that.
Maybe not an audit log, because maybe that's not one
you would actually be writing an insane volume to.
But I read a timescale.
So timescales, they have to worry about this kind of thing
that they're whilst they've designed for these kind of time series workloads they've written a
good blog post on optimizing inserts and they list all the same things as you and go further so they
as well as foreign key constraints basically other constraints can add overhead as well as foreign key constraints, basically other constraints can add overhead as well. So for example,
of course,
it checks unique constraint.
Yeah.
Check constraint,
but unique constraints.
Yeah.
Index additional check.
Whoa.
Yeah.
So not having it basically deciding for each constraint, if you really need it or what value it's adding,
having it makes sense.
That being said,
I must say like in most cases,
I prefer having primary key.
Because it's like the center of consistency,
of data consistency, right?
So it's good to have.
But it depends, right?
It's good that you mentioned timescale,
but I think, yeah, we will return to timescale. My question to you, tricky question, but I think yeah we will return to time scale my question to you tricky
question but i think you already know and i i must admit when two years ago we started podcast i
didn't realize it fully now i realize it much better so we have an index what operations does
it slow down you said it slows down inserts this is for sure does it slow down? You said it slows down inserts. This is for sure. Does it slow down updates? Well,
yes. Right? And there's a mechanism, hot update, which deals with it in a limited number of cases.
Does it slow down delete? Well, maybe no, because during delete, index is not updated.
Postgres only updates xmax, as we discussed a couple of times.
Does it slow down selects?
What do you think?
So we've talked about how having a lot of them can.
Yeah, yeah.
It slows down selects, especially if we have a lot of them
and high frequency of selects, and this is about planning time.
And lock manager locks during planning, all indexes are locked.
It's some overhead in very heavily loaded systems to keep in mind.
Yeah, but in general, yeah, so I would minimize number of indexes and try not not you not to use foreign keys like foreign keys we in many cases we can
imagine they exist have maybe routine checks that referential integrity is fine right but drop them
intentionally because in this case we want for, good insert performance. And as usual, I would like to remind that when I say all this,
in many cases when I deal with new system,
I have some like these principles, but I never trust myself.
I always check again.
Checking should be like considered,
like sometimes you spend time there, right?
But it's worth doing experiments yeah i
well and i would say we're talking about extremely high volumes here if if you can i would much
rather normally have primary key have some foreign keys if they make sense and have a unique key if i
need it and then test if like can I get better disks if I need to?
Like, are there other ways I can improve, like, I can cope with higher write performance instead of...
Perform checkpoint tuning if you expect huge volumes to let into the store.
Yeah, so maybe pay for it in other ways. It's only at the absolute...
Have more memory, bigger buffer pool.
Exactly.
Make sure backends don't write all the time.
It depends, right?
So CheckPointer is not crazy.
It's not too frequent.
And so, yeah, there's a lot of stuff here.
And if we think about SELECT now, what's the number one problem usually?
I think, like, it makes me so, like, I'm still wondering how come we lived so many years
until I think Postgres 12 or when auto-vacuum, vacuum, and insert scale factor was added.
I think Darafei initiated it.
Darafei. Version 13. I looked it up. Yeah,
version 13. Okay, it's very recently compared to my experience with Postgres. So strange.
What it adds, originally Postgres vacuum, which also maintains Postgres statistics,
which is important for good query performance,
including selects, right? Originally, it was triggered only after, say, like 10% by default,
10 or 20% of rows are changed. There is some complex formula, not very complex, but some
formula, but roughly after 10 or 20% of rows changed change means deleted or updated it triggers but not on not after inserts and only in postgres 13
specific parameter was added i think by default it's also 20 percent or 10 which tells AutoVacuum to run and process a table after 10 or 20% of rows were added.
Yeah, I looked this up,
and it's like there's three jobs, right, of AutoVacuum.
There's roughly removing dead tuples.
Four jobs, actually.
Freezing and analyze statistics. Removing dead tuples, freezing, and analyze statistics.
Removing the tuples, maintaining visibility maps.
Maintaining visibility maps, of course, yeah.
Four goals.
Maybe actually more, but these four come to mind quickly.
Yeah, and if you're only doing inserts, you don't need the removing dead tuples job.
But that isn't the only thing vacuum's doing so this
this then enables though the visibility map and the freezing to happen well regardless of inserts
it will happen well we can we can insert a different table yeah okay yeah good point and
auto vacuum will see that xmin or xmax or both, Xmin, right?
Yeah.
In the past, we have risk of wraparound, so it's time to freeze this table.
Yeah.
We can have zero operations in terms, like, table can be left unchanged for many,
but at some point, AutoVacuum decide, okay, it's time to freeze.
But you're right, visibility map would never be, like, that would not get.
Visibility map is huge.
You mentioned index-only scans, the performance of aggregates counts, right?
So we do want to keep it up to date.
I think default is not enough, as usual, with AutoVacuum.
We must tune it.
And even cloud providers, their defaults are not enough.
We must tune it and go down to one percent
or smaller and make sure autowacom maintains statistics and visibility maps more often
so performance of selects including index only scans are good right performance yes are good
another reason to partition as well so you can keep those. Yeah, that makes sense.
I was going to say, it's 20%.
So it is quite high still, as you say.
Would you ever switch to...
I cannot imagine any OLTP system, any website, any mobile app,
which would be okay with Postgres AutoVacuum defaults.
This, like,
ah!
That's it. I don't know why they are so. They are so for
what? We have
so many beautiful websites working,
huge systems
working with Postgres.
It's so cool to see that
Postgres handles
so big workloads. but these defaults
well and the strange thing is this one for example if we did reduce it to one one percent
it would add overhead on small systems sure if you've only got 100 rows it runs vacuum every row
for a while you know but who's running a small system that can't handle a vacuum of a hundred row table every
row like that's also we've append only specifically when some page is already
processed that's marked all visible all frozen or whatever a vacuum just skips it
yeah so it wouldn't even be much there were many optimizations in this area, so to not do work which can be skipped.
So it's doing a good job skipping, and it's many years already.
So I think, like, I never saw any system, and I saw maybe already hundreds of them,
different sizes, websites, like all TP, right?
I didn't see any time we decide, oh, you know what, we need to increase scale factor.
I don't remember at least at all. We can throttle it if we like, we can balance work many workers and so on but deciding let's let's make work of auto vacuum less frequent
zero cases i had maybe my experience is not enough maybe one day i will see such a system
i i've not seen one either enough rage about defaults my usual fun i have with postgres
let's talk about partitioning maybe right, right? Why do we want it?
I see several ideas
here, and
TimescaleDB is definitely, for a
pen-toned database, is a good thing to have
in many senses.
But unfortunately, it's not available in managed
offering, except
their own, Timescale Cloud,
right? And some others, but
some others usually choose Apache 2.0 version,
which doesn't have compression, right?
It doesn't have a lot of their good features, yeah.
Yeah, so partitioning is good.
Again, there's some rule, empirical rule, we say, like many people say,
not only I, let's consider partitioning if table exceeds
100 gigabytes or has chances to exceed 100 gigabytes partitioning adds complexity it's not
as well automated as in say in oracle but it's very important tool to consider and many factors here. First, for example, you might say, okay, I have partition
where I insert and then many partitions where it's like my archive. And as we decided, we want
very low number of indexes in the main partition, which is receiving inserts and constraints like foreign keys and so on.
But there is no such problem in all archive partitions, right?
We might have more indexes there and constraints and so on.
This is one thing.
Second thing, autowacuum.
If occasional deletes or updates are happening, the block which contains the row,
that tuple basically right it's out
of visibility like it's it's marked not all visible anymore and not all frozen anymore so
vacuum needs to process it and it's good to have data locality so archive data is in in some
partitions and fresh data is in particular partitions. So AutoVacuum is focusing on fresh data in fresh partitions.
It reduces the number of blocks it needs to deal with, right?
Because all data is rarely touched, so AutoVacuum visits us very rarely, right?
This is another reason.
Cleanup is another reason as well, right?
I think cleanup's the biggest reason.
Maybe, yeah.
I think maintenance,
partitioning helps so much with maintenance.
It does have other benefits for sure,
but it helps so much with maintenance
that I can't help but feel like that's the biggest one.
And I actually have started to say, I think I must have stolen this from somebody else because it's too clever for me.
But partitioning based on how you want to eventually delete data makes sense.
So if you want to eventually delete old dataB SaaS and you eventually want to delete data
based on a customer quitting the service, you probably want to partition based on...
Or both, two level of partitioning also.
Yeah, exactly. But that being the, like a guiding principle for how you partition,
because it makes that deletion or dropping so easy.
What will you do with data, right?
And as I said, I participated in projects
where delete was a big issue.
And of course, with partitioning,
it's very different and it's good.
Deletes can be a problem.
Postgres deletes, like if you have a terabyte,
10 terabyte table and you need to delete 20% of it,
it's a big headache because
you need to make sure vacuum will be okay it will auto vacuum will catch catch up all the time
you need to again to pay attention to check pointer and you need to find a way how to delete
so delete doesn't degrade. This was my problem.
So I created beautiful queries,
but they degraded over time because of dead tuple accumulation
and bloat accumulation as well.
So I needed to adjust them and so on.
So there are many problems with delete
and it takes time to delete millions of rows.
If you rush with it, can put the system down or have
degradation of performance
well yeah
and it can really affect even your
select performances so
you mentioned
brin is probably the one where it gets
it used to at least get affected the most
with the default way of creating
a brin index if you have
a row inserted way back in an
old if you don't have partitioning if it goes miles away and you get some real scattered data
brin performance can end up effectively looking like sequential scans so all indexes degrade
b3 degrades very quickly if you perform deletes and updates,
and you need to rebuild it.
And rebuilding is better with partitioning,
because the smaller partitions are, the faster rebuilding is,
and Xmin Horizon is not frozen, right?
So auto-vacuum is not affected in the whole database, right?
Yeah.
So it's better.
Building and rebuilding indexes vacuum itself maintenance
tasks are good if you have smaller physical tables or partition is great right yes on the brin thing
i'll link up the old episode we did but the min max multi i think makes a big difference
especially if you don't have to like well it handles loads of outliers so i do think that's easier and if you if you are able to keep on top of auto vacuum i guess the b3 stuff doesn't degrade
that quickly so i feel like these things aren't as bigger problems anymore but yeah often in these
cases if you're dealing with high volume like many many thousands of queries per sec like just
extreme volume anything you can do to help fight
on the performance front will be helpful. Yeah. And as usual, when we touch partitioning,
the state of caches and buffer pool, for example, if you have archived data which touched rarely,
those blocks are evicted from the buffer pool and cash efficiency might grow. Hit rate ratio might be better.
But yeah,
I agree with you.
So partitioning is good in many
senses. It comes with
price of overhead and maintenance as well.
But it's worth to have it.
But imagine, like, all
this said, we moved slightly
from append-only to append-mostly,
right? But let's move back to append-only.
Imagine we have many partitions where data is not changed, archive.
Index is created, all frozen, all visible.
It's a beautiful state of data, right?
So all index-only scans are working well.
And that's it.
Maintenance is not needed.
Auto-vacuum is not needed there, and so on.
However, what if we have 100 terabytes of data,
and this is a heavily loaded cluster,
we have many replicas.
The data is not changed as good.
It's evicted from buffer pool,
but we still need to keep it on the main storage.
And at some point we think think oh like we pay a big
price because this data is replicated it increases the volume of backups full backups if we consider
right so this is like this legacy it's a lot and if we we have, for example, five replicas, one primary,
we need six times to store the same data,
and nobody is reading it.
Like, people read it occasionally.
At some point, you think it's not efficient.
And you think, I would rather to store it somewhere else,
not on the main disks, not on the SSD, fast SSDs, or, I don't know, NVMe's I have,
or cloud storage, which is also expensive,
right? So this leads to idea, to two ideas. First idea is it would be good to compress it,
right? Again, TimescaleDB, full version of TimescaleDB is doing great job, and their
blog posts about compression are great. I like especially one,
I remember, first big one, which explained algorithms in row and basically kind of column
compression, although we still have row storage, it's great. And also, I think second topic here,
which opens up naturally, is what I know Aurora now offers it, right? And Neon and Timescale
as well, in cloud only, bottomless approach, when old partitions are offloaded to S3 or object
storage, GCS on Google Cloud, or blob storage on Azure, how it's called, I don't remember.
And now even Hetzner has it. They just recently released, which is big news.
I like it because I like their prices.
And I worked with them since, I think, 2006 or so in a few companies.
When you bootstrap and you have a small startup,
Hetzner is like number one in terms of budgets and the hardware they can offer.
So they just recently released S3-compatible object storage, right?
So we can have normal backups and so on.
But what to do with old partitions?
It's a natural way of thinking.
We don't want to keep them on these expensive disks we have,
having multiple copies of that.
So offloading it somehow implicitly,
transparently in background to S3 or S3-compatible mini or something,
if you have self-managed Postgres,
it would be great, right?
And so we have it in Timescale Cloud.
I think Neon also does it, right?
I don't know.
Bottomless, bottomless.
Like Neon, they store data on S3 originally anyway.
So idea is we want to have petabyte size cluster,
but don't pay for lots of disks and headache it comes with.
And for append only, it's very natural to have, to decide, okay, we want
to store data forever, not to clean up, but we want cheap storage here. So S3 is a good
idea to consider. And it has tiers also, right? It can be slow to retrieve, but it's okay
because it's rare right well and it depends what you mean
by slow like i think there's there can be performance advantages i think when some of
this data is fresh we might want to retrieve it row by row like if you're if you're looking at
some audit logs you might want to look at some recent ones that you might want all the information
about them but if you're looking at data from two
years ago there's probably a higher chance that you're looking at it in aggregate you know on
average how many audits of this type were we having in in 2022 versus 2023 and i think actually
the types of queries that happen on older data tend to be these aggregate ones that often perform better once it's column store compressed
you know these file formats often suit that kind of query so i could i i don't even think
i know what you mean flex compression good compression it lacks it yeah and timescale
db compression i have seen how good it is. It can be 20 times smaller.
And indeed, if they even support data changes for compressed data,
which is great, right?
I have seen a project or two come about lately,
I think open sourcing some of this stuff,
or at least putting it under the Postgres license.
Is it PG Parquet that allows you to?
Yeah, but it's different.
It's for analytics.
And actually for analytics,
we also might want to consider append-only tables, obviously.
But there is a new wave of this,
and I know many people, companies look at it.
PG DAGDB, not PG DAGDB.
Yeah, DAGDB.
Right.
And the idea, let's marry it with Postgres.
And there are a few projects. I looked at a few ones recently, and one of them was just released maybe last week and they i remember they use logical copy from original data tables regular tables
to these tables which are stored on on i think in parquet format in on object storage and then
duckdb is used as processing for analytics but i remember i think alvaro commented on twitter that
i'm not going to consider it until it works with like basically CDC
logical replication or something, because right now it's only full refresh of like,
it's not serious, but they will do it. I think. Also, I think Neon guys looked at DuckDB and
I saw some activities and Hydra, right? They also looked at DuckDB.
Yeah, but I understand that most of the marketing at the moment is around analytics use cases,
but I don't see why it couldn't work for append-only...
Data types.
Might be true.
I looked at a couple of extensions because I have a couple of customers with such need to offload all data.
And Parquet, this format supports only mapping of data types. It might be tricky if you have some complex data types, as I remember.
And when I looked at some extensions, it didn't work well.
And I think right now, like I have plans to look at Temples extension, which is called pg-tier for tiered storage.
Right.
And the idea is we can, with this extension we can can have all
partitions on on object storage it's great idea so if it works it's great i just haven't looked
at it yet if uh if somebody from tambo is watching listen listening to us please let us know if like how this project works like already.
Is it like already production or beta stage? I'm very curious. And like is it worth trying? And
what are the limitations, for example? Maybe we should actually have a separate episode because
I think this extension might bring bottomless idea to the Postgres ecosystem for everyone, right?
It's great.
It's open source, unlike what other companies do.
So kudos to Tembo for making this open source extension.
And is it transparent?
Like, do I have to...
Because I think some of the DuckDB stuff,
I would have to be... I'm not sure if I have to write Because I think some of the DuckDB stuff, I would have to be...
I'm not sure if I have to write...
I don't know.
It might be semi-transparent.
For example, if you need to make some transitions
with old partitions, I'm okay with that.
And I can even create new partitions in background
and move all the data from one old partition
to this kind of new old partition,
which already has different storage, this is doable. So it's not a big deal as I see it.
I think what, for example, Timescale Cloud has, it's transparent and they have some kind of
interesting, as I remember, very in rough form. They have something also with Planner,
and some algorithms to decide when to bring this data to caches and so on. So it's interesting.
But the idea is we want partitioning to evict blocks with all data from memory,
one thing. But then we think we want to evict them from from our disks because
disks are also expensive right let's evict it and this is alternative idea to clean up and to
compression or compression goes well with offloading as well i don't know so parquet format is definitely
good with compression compressing data in terms of column storage, right? So if it's time-serious, it's good.
So there are interesting new directions of development of Postgres
ecosystem here. And I think we mentioned a few projects, both commercial
and open source, which is great. So if someone wants to
store petabyte of data, is preparing for it, doesn't want to delete
everything, I think there are
ideas to consider.
Well, we didn't mention with partitioning you can also have some kind of foreign data
wrapper and store it on very cheap also Postgres cluster.
It can be Postgres, it can be not Postgres as well, right?
But for example, we can consider cheap Postgres cluster with very slow disks, HDD, for a good price, and, because of, I didn't, I expected foreign
data wrappers, Postgres FDW code in terms of how it works with, like, basically, you
need two PC, right?
You need two-phase commit to have a reliable commit because it's distributed system already.
But without two PC, there are risks to have inconsistency, for example.
But not if it's append-only, you're never going to change that data by the time you're pushing it to a different server.
Right. I remember I inspected code and found some kind of edge cases, maybe even corner cases,
but for inserts it's also possible. You wrote to one place,
you're trying to write to another place,
this one is already committed, here it's not committed.
But I remember
code was quite smart
to
reduce the probability
of some cases, but it's not 100%.
As I remember, it was like five years ago.
So I haven't
revisited it since.
Tell me if I'm wrong, but you're saying new partitions would be on the local Postgres
and it would be old ones that we would move to the second?
Yeah.
If we don't have inserting transactions which deal with multiple partitions,
there is no problem at all.
And old, we can move to foreign data.
With Postgres FDW to different cluster, so you're right.
No inserts happening, no problem.
Cool.
Select only.
I never considered using FDW for partitioning.
Or like, you know, the old partitioners.
It's natural.
It's natural.
This is how many folks were going to have clustered Postgres, right?
I mean, or sharding.
Kind of sharded, yeah.
Sharding, yeah.
But this path has issues.
Yeah.
It's a different story.
Here be dragons, is it?
Or what's that old phrase?
I don't know.
Okay, we discussed many things.
I feel we might be
missing something, right?
But for good overview,
I think it's enough for
general overview, what's happening
with append-only tables.
And yeah, so I think
it's an interesting topic, actually.
Many people need it, many people.
Many companies store logs and so on in Postgres.
And I'm looking forward to the future
where Postgres at some point will have better compression
and bottomless feature.
And as I like to say,
I imagine TimescaleDB was Postgres license.
Okay.
This is I think should happen at some point.
I know there are people who don't like this idea
but they developed so good stuff
that Postgres would benefit from it.
I know some people would not be happy with these
words i know but it's just yeah it feels natural to have these features in postgres itself
some of them for sure like but but i do think some of it's tricky like the bottomless stuff for
example how where would it where would it go like what there is extension for where is extension already
right yeah okay s3 compatibility is standard even google cloud gcs it's also s3 compatible
everyone does it so i think no no i don't see um why not here i I see it, but it's manageable. Fair. The biggest question is business-wise, I guess.
So Timescale DB.
I understand.
License, yeah.
I understand business-wise it's not going to happen in the nearest future,
but it could be so great to have good compression and bottomless
and Postgres itself, even if it's extension.
I'm okay with extensions
i don't like extensions in general because i like to like you know to do all people have
some features but in this case it's okay to have extensions cool extensions are great sometimes yeah
good okay have a good week i know we have more than 100 suggestions in our doc.
Yeah.
We read them. Keep posting. This was one of suggestions, right?
Yeah.
So you chose it.
Thank you.
Yeah, good. Thank you for our audience for patience reaching this point. Okay,
see you next time.
Absolutely. Take care. See you next week. Bye.
Bye-bye.