Postgres FM - When not to use Postgres
Episode Date: September 5, 2025Nik and Michael discuss when not to use Postgres — specifically use cases where it still makes sense to store data in another system. Here are some links to things they mentioned:Just use ...Postgres (blog post by Ethan McCue) https://mccue.dev/pages/8-16-24-just-use-postgresJust Use Postgres for Everything (blog post by Stephan Schmidt) https://www.amazingcto.com/postgres-for-everythingReal-time analytics episode https://postgres.fm/episodes/real-time-analyticsCrunchy Data Joins Snowflake https://www.crunchydata.com/blog/crunchy-data-joins-snowflakeTwo sizes fit most: PostgreSQL and Clickhouse (blog post by Sid Sijbrandij) https://about.gitlab.com/blog/two-sizes-fit-most-postgresql-and-clickhousepg_duckdb episode https://postgres.fm/episodes/pg_duckdbCloudberry https://github.com/apache/cloudberryTime-series considerations episode https://postgres.fm/episodes/time-series-considerationsQueues in Postgres episode https://postgres.fm/episodes/queues-in-postgresLarge Objects https://www.postgresql.org/docs/current/largeobjects.html PGlite https://pglite.devParadeDB https://www.paradedb.comZomboDB https://github.com/zombodb/zombodbturbopuffer https://turbopuffer.comHNSW vs. DiskANN (blog post by Haziqa Sajid) https://www.tigerdata.com/learn/hnsw-vs-diskannSPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search (paper) https://www.microsoft.com/en-us/research/wp-content/uploads/2021/11/SPANN_finalversion1.pdfAmazon S3 Vectors https://aws.amazon.com/s3/features/vectorsIterative Index Scans added to pgvector in 0.8.0 https://github.com/pgvector/pgvector/issues/678S3 FDW from Supabase https://github.com/supabase/wrappers/tree/main/wrappers/src/fdw/s3_fdw~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith credit to:Jessie Draws for the elephant artwork
Transcript
Discussion (0)
Hello, hello, this is Postgres FM. As usual, I'm Nick, PostGISI, and as usual, my co-host is Michael,
Pigea Mustard. Hi, Michael. How are you?
Hello, Nick. I'm good. How are you?
Very good. So the topic you chose is to talk about beyond Postgres when we should avoid using Postgres, right?
Yeah, you put a shout out on a few social networks asking people.
what kind of questions they'd like us to answer and we had lots of good suggestions as we've
had for many years now and one of them was particularly good and I thought was worth a whole
episode which was yeah when not to use Postgres I think there's a growing trend of or a few
popular blog posts of people saying you should you should consider Postgres for most workloads these
days and I think it is still an interesting topic to discuss are there cases where it doesn't
make sense? If so, what are those and what, like, when does it make sense not to use
Postgres? And I thought, I was interested in your take on some of these as well.
Yeah. Well, classic example is analytics, of course. Yeah. Do you want to list a few and then
discuss them in a bit more detail or just want to... Yeah, let's create a frame of this
result, right? So analytics, embedded databases. Yeah. So like, and I think analytics,
Yeah, so analytics embedded, I think storing large objects, there are some cases where it makes sense.
Exactly, especially larger ones, like videos, like very large objects, 100%.
And just let's agree, in every area we can discuss pros and cons of using Postgres because some people will definitely have opinion that it's not an excuse to avoid postgres.
Let me add then.
With this in mind, let me add.
add than topic like ML data sets and pipelines.
Yeah.
Right?
Yeah.
Machine learning and big data to, yeah.
I think anything where there's specialized databases,
like search, vector databases.
Vectors, exactly.
Let's talk about vectors separately.
It's worth it.
And then one more, which I think,
well, actually, I had two more on my list.
One was potentially controversial.
I wondered if there's a case
if you're at extreme
OTP
Right heavy
Very very right heavy
And you
Like let's say you've got
Institutional
Experience with Vitesse
I think sticking with that for the moment
Makes a lot of sense
So like when not to use Postgres
I wondered if like a new project came along
Starting with Vitesse while we get these
Sharded Postgres
like OTP sharding, Postgres solutions up and running.
I think at the moment, maybe it still makes sense to not use Postgres there.
Let's add then time serious.
Yeah.
Let's discuss this area.
Time serious.
Yeah.
And also data that can be compressed really well.
It's, this topic is close to analytics.
True.
Yeah.
And then I had one more.
One more, but I guess it's kind of what I said just now.
I think if you or your organization have tons of experience with another database,
the argument for using Postgres for your next project is weaker.
I'm not sure it's like when not to use Postgres.
I think I could think of lots of counter-examples.
Well, this is orthogonal discussion.
You can say if we already have a huge contract with Oracle,
we already signed for next five years.
It's not wise to start using Postgres and those money will be spent for us.
thing, right? There are many such
reasons, right?
Should we stick to technical ones then?
Yeah, yeah. Like area,
like types of usage, area.
Q-like workloads, I would have.
Yeah, interesting.
Yeah, yeah, yeah.
The last one is like kind of Kafka territory.
Or there are others, of course.
Yeah.
All right, should we start with analytics then?
I feel like that.
I know we did a whole episode.
Kind of a whole episode.
Yeah, so
Roastore is not good for analytics
and select count
will always be slow
you need denormalization or
estimates, estimates will be
slow, not slow, not
too rough
and too wrong sometimes
yeah, it sucks.
Yeah, but I think
there's a scale, like I think we
talked about this before, but there's a scale
up to which you'll be fine on
Postgres. You could achieve
better performance elsewhere, but if you have hybrid, a lot of systems now are hybrid, right?
Like, they have to be transactional, but they have to provide some analytics dashboard for
users or something, like, but they still want real-time data.
They still want transactional processing, maybe 90 to 95, maybe even 99% of the workload
transactional, and only, like, there's a few analytical queries from time to time.
I still think those make a ton of sense on Postgres.
Counts everyone needs, right?
you know pure LTP applications they need to show counts or understand pagination and like
on silk have counts if you think about social media you need to show how many likes comments
and so on reposts so yeah and it's and like to implement it in purely in posgous
a good way for generalization would be needed yeah and also it like it can it
There are so many rakes lying around this, you can easily step on them and have a hotspot.
Like, you know, this classic hotspot when accounting system, like tracking balances,
and let's say we have like whole balance and all transactions update a single row.
So this is where it can be an issue.
But at the same time, there are many attempts to do better and some attempts led to companies being acquired, right?
I mean, crunchy.
There are many new waves of aiming to solve this problem,
better analytics for PostGus.
I see two big trends right now.
One trend is how Sid,
founder of GitLab recently had a post saying that
PostGas and Clickhouse is a great couple of database systems.
I don't remember exact title of that post, but the idea is that it's great.
They go together very well.
Yeah.
We also had Psi, who founder of PeerDB, which was acquired by Click House, and I met with him last week.
And we talked about, like, again, the same basically idea that Click House and Postgreaves are great together.
And this is one direction.
Another direction is saying Click House is very different.
and not even maintaining
maintaining is absolutely different
but also using it
it requires different mindset and skills
so it's better to choose
for example DuckDB
keep
yeah so do everything inside
postgres but in a smarter way
this is what
multiple companies
worked on recently and one of them
Crunchy was acquired by Snowflake
and we had it
we did an episode on PG DuckDB as well
so a slightly different approach on that.
But yeah, the crunchy one's interesting
because all the queries go through Postgres,
but a lot of the data is stored in, like, iceberg
or like some other file format.
Yeah, exactly.
Getting the column last side.
Yeah.
So, yeah, this definitely feels like one of those ones.
There's also a third, by the way,
there's a third option,
which is these massively parallel databases.
Well, I was at a, you spoke to Syla last week.
I was at a UK event this week, and there was presentation from the successors of the Green Plum project,
the kind of the open source successors, which is called Cloudbury.
It looks like really interesting work.
But that's another way of doing some analytics from within Postgres, kind of.
Yeah, and from previous experience, from the past, I remember cases when Postgres and Green Plum was combined in one project,
and it was great, and it was some bank, even quite big.
bank and yeah but somehow i stopped looking at green plant for quite long already i don't know
there are also of course commercial databases i remember vertica there is snowflakes super popular
it's like major player in this area by the way i would distinguish two areas of analytics
one is internal needs for company we need to understand how business is doing a lot of stuff
We need a lot of reports.
And another need is we need to show our users some counts, like I said, on social media.
So two big areas, I think, also.
Yeah, good point.
In the first case, users are internal, and the second case, users are external.
I'm pretty sure there are a lot of mixed cases, additional cases.
But I personally like these two directions.
Of course, there are others.
There is also redshift.
on AIWs.
Yeah.
Also originally based on Postgres, yeah.
Yeah, yeah.
So there are many options here.
So yeah, is the short version at sufficient scale?
It probably doesn't make sense to be using Postgres at this point for analytics.
But like that, that level is quite high.
Yeah.
But also I see cases when companies go to Snowflake, then try to escape it.
Come back.
Yeah.
Okay.
So going to Snowflake, it's like going to Oracle, in my opinion.
You mean like in terms of financial?
In terms of vendor lock-in?
Yeah.
Because it's just purely commercial offering.
There are, of course, many tempting things there, features, performance.
Yeah, yeah.
Integrations.
Nice product to use as well, yeah, developer-friendly.
Yeah, well, users love it.
I agree.
But if we try to remain in more open source and vendor lock in less,
then it's like it should be excluded.
Even Click House, like Click House is open source itself.
Yep.
Right.
You mentioned the time series being quite close to this.
I feel like we should jump to that next.
What do you reckon?
Well, timescale DB is great, but it's also kind of enderlocking because it's not open.
yeah yeah so because of their license other cloud providers can't provide timescale as a service
easily or at least not the not the version with lots of nice features yeah in timescale cloud
I had a recent case where we saw limitations again very badly like create database doesn't work
and moreover lack of observability tooling like again like I keep promoting on this podcast if
guys who build platforms
listen to us, you must
add PG weight sampling
unless you are
RDS, okay, but even
in case of RDS we talked about this
it's great to have it
in SQL context and be able to
combine weight event analysis
with regular
PGStatement's analysis and PG
Stat K Cash additional very good
observability point. Because
I had the case when
guys just compared everything
so worse performance worked closely with time scale but in case of RDS you see performance insights
understand where time like where we wait right case of time scale only rare collection of
samples from Pugist activity is possible it's sometimes good enough but it's quite rough
tool to analyze performance so yes such things are lacking
and unfortunately more and more I come into conclusion that when I recommend
timescale to customers it contradicts with the idea they want to stay on managed
service yeah yeah because they're down to a single choice yeah yeah that being
said the timescale cloud even offered me like some bounty if I convince
someone to go to them and this is great like I love loyalty
but I need to be fair.
Some pieces, big pieces are missing, unfortunately.
Yep.
And again, again, Postgres, even without time scale,
can be used for good time series, like workloads up to a certain point.
We're just talking about at very high scale, right,
where all the features like compression, like continuous aggregates,
like automatic partition.
Straight to the point.
Straight to the point.
Yeah.
Oh, by the way, for time series,
Yes, Clickhouse also is still a good option, and there is also Victoria Metrics, right?
Well, and I learned just yesterday about even Cloudbury have incrementally updated in materialized views.
I need to look into it, but that's quite cool.
And if you're, like, maybe that would be a good thing about this.
Wouldn't it be great to have in Postgres something like update materialize, you were,
and you just define the scope and also concurrently?
we should do a whole episode.
I think there are several projects
that have started to look into
incrementally update immaterialized views
and I think they're more complicated
than I've...
It's like one of those topics
the more you learn about it
the harder you realize it is.
Right now in position
where most,
not everyone,
but most of our customers
are on managed postgres
so it's really hard for me
to look at extensions
which are not available
on RDS CloudSQL and others.
I understand.
I'm just thinking like
I think it's worth learning from the extensions as to what would be needed in core.
Like, how did, what did they try?
What was difficult about that?
And it's not just extensions, right?
There were whole companies that have been built on the premise of, is it called?
Is it material, like, or like, what was the thing?
Yeah, yeah, yeah.
I haven't heard for a few years from them, what's, like, I'm curious what's happening there.
Lack of autonomous transactions will be an issue, right?
Yeah.
Or, or Q like, Q like.
tool inside Posgos, so asynchronously update would be propagated through Q-like.
If everyone had PGQ, like CloudSQL has, from Skype, developed 20 years ago.
In this case, implementing incrementally asynchronously updated materialized users would be easy.
Well, yeah, A-Sync and Sync is, anyway, this is an interesting topic.
Yeah, yeah, yeah, yeah.
And we already, basically, we just touched Workloads,
you like workloads, it's still hard.
Blow it is an issue, right?
We discussed it, I think.
I think, well, we've discussed there are solutions, right?
I actually think Q's is one of the ones that I was going to fight you hardest on.
Like, I think there were ways to do it badly within Postgres.
And again, at extreme scale, I think it wouldn't be smart to put it in, especially
not in any way.
Skype had extreme scale.
Yeah, well, yeah, okay.
20 years ago, one billion users was a target, one billion users.
good point so maybe actually that of all the ones that we added to potentially be on the list
that would be one where I think if you manage it well like PGQ at scrap did with partitions
actually I think you that's not an excuse to not use Postgres if that's the title
yeah benefits are serious problem is like I when I recommend the Q like workloads inside Postgres
I just say like I
understanding whole complexity and hidden issues that you just need a high level of understanding
of what's happening to have it yeah but if you have it it would be great it will be great it's just
not a small project unfortunately usually yeah good point and this recent case with notify as well
because it's also sometimes used in such workloads yeah yeah as a reminder notify exclusive
log on database, serializing all netifies makes it basically not scalable.
Yeah. Yeah. All right.
Anyway, like, what's the answer? Like, if you was, if you needed to create a project
and you would need to think about analytics, like, think about like, okay, we will have
terabytes of data very soon, fast growing. What do we choose for analytics? What do we choose for,
Q-like workloads
for time series. What are the
choices you would make?
I think it does depend
a lot. Like you already said with analytics
are we talking about
a bank that is doing
nightly loads of data
and only cares about internal
reporting or are we talking about
a user-facing web app that has
to do, or like a social media
app that has to do
counts and various
like aggregations
that are user-facing.
Well, you need both.
You need to think about both
and what architecture choice
would you make in the beginning?
Yeah.
For target of terabytes of data
in one year, for example.
10 terabytes in one year,
what would you do?
Would you choose stay in Postgres?
I'm a big fan of simplicity.
I think I would stick with Postgres
for as long, like, until it was painful.
Okay.
A re-engineer then, yeah.
Yeah.
And I know that would be painful, but that would be my preferred option.
I think then I am tempted.
I think it's quite new at the moment, but I am tempted by the work that Crunchy started
on moving data out to iSpoke format and still querying it from Postgres.
Like, I like the, I like that I can keep queries going from the same place.
But not possible on CloudSQL or RDS, right?
Not yet, right?
But I think it's quite early.
Like, if I was starting a project today, I would hope that those,
they caught up by then and if not then a click house like a whatever peer d is called now within
click house like having that go out to an analytic system like click house makes a load of sense to me
what do you think yeah i would choose uh self-managed postgous 100% and i would use times scale db full
fledged got it okay yeah and then i would i would consider dark db path as well
additionally at some point and q
workloads I would
engineer perfectly and
squeeze as much as POSGUS can.
And what else we touched?
So again, all of
these are actually sticking with Postgres and then
just at some point in the future
you're going to have to think about sharding
if you get that in the future.
Only one reason
would make me do it.
So I wouldn't go
in these three areas we just discussed
I wouldn't go away from POSGUS.
Although I understand very
customers who we have and why they say we need click house or something in my company i would
at click house only if there is a strong team which needs and this is their choice and i delegate
and then i i don't i'm not involved in this decision but while this choice is still mine i would
stick to progress and just make it work better yeah and can scale until ipo yeah i i saw it
several times with several companies so yeah cool makes sense so i wonder if this next one's going
to be the first one where we both would would and don't use postgres which is the storing large
objects like large files well definitely i i yeah last time i i tried to store a picture inside
Postgres was probably
2006 or seven
when it was just exploring
you know like oh this is working
okay
but no
I even don't know how
what will happen
you know like this
like this piece of
postgres I touch super rarely
you know yeah
I think the one exception is text
based stuff
stuff that like you might want to query it
but even then like
you probably want to be doing
PDFs, but you know, it's like some
representation of the PDF, not the actual PDF,
but like the text extracted from, you know, like, it's going to be
Or it can be marked down and then we have Pandock or something
which converts to both to HTML and PDF, this is what we do with our checkups.
Originally, it's in Markdown.
Yeah. And by the way, and another possible exception,
I think it's almost not worth discussing is
if you only need to store five pictures like maybe you know but i just don't i don't see many cases
like that yeah yeah still cool all right still yeah we recently implemented attachments in our own
system for pictures and various like archives of logs or something which customers upload
or we upload pdf as well sometimes and of course we store them in gcs and in secure manner
not in post goes 100% no yeah yeah i even don't don't think there yeah it's just exercise for
those who probably don't have other tasks yeah all right another one that i think is maybe in
between these two like advanced search i still think there are cases like there are search use
cases where people that i respect and trust would still choose elastic search over postgres even
By the way, sorry for interrupting. Sorry for interrupting. I just realized you talked about blobs.
Yeah, but we also can. So, and this can be an issue at upgrades I've heard, right?
Major upgrades. I don't see them at all. So I mean, I mean, you can try to store it in postgres,
but then you have some operational issues additionally. Because not only I don't see them
often, people who design some procedures and tools, they also don't.
see them often and it's kind of exotic to keep to have blobs yeah so when you when you say major
upgrades are we talking about like the speed of the initial sink or we're talking about like
pg dump if we go in the dump restore route then actually it's all having to get dumped out
yeah yeah i just remember some notes about it i saw maybe on rdios
specifically about large objects maybe i'm wrong actually
I just remember that I'm actually using it, not for super large objects, but several kilobytes
like we store query plans in the in JSON format or text format.
Text format ones don't tend to get that massive, but JSON format ones can be hundreds of,
well, we've actually seen a couple that were tens of megabytes, I think one or two that
are in the hundreds of big.
We need to amend this part of episode.
You are talking about Varlena types.
Yeah.
There is a special thing, large object facility, a special chapter.
Yes, sorry, that's a different, yeah.
Yeah.
Warlena, everyone uses jasons, large texts, everyone.
Even bite arrays.
Okay, you mean there's a specific issue with the thing called large object?
I cannot say I don't touch large jasons.
Of course, I touch them a lot.
We have them a lot.
And, yeah, we talk about how they toast in Postgres.
Yeah, yeah.
Large jasons, large XML sometimes, right?
Texts, of course, large text, everything.
For example, our Rack system for AI system has a really large text,
chunks of source code,
or mailing these discussions, kilobytes.
And you put all of those in Postgres.
Of course, because we need to parse them
and also full text search and vectors.
It's everything in podcast right now.
Well, you don't, you could vector, you could store the vectors in Postgres without storing the text in Postgres, but the full text search makes a lot of sense.
Yeah, I understand you, but we do everything in Postgres even, even vectorization.
It's maybe, it doesn't scale well if you need to deal with billions of vectors, but millions, it's fine.
Yeah, makes sense.
So what I was talking about is like, it's like a low create function, these things.
Yeah, I've not used that.
you're saying you don't see it yeah yeah this is what i don't see yeah hopefully everyone
got the memo and no one's using it yeah hello from bite so yeah i don't use those and uh again
last time i touched that it was so long ago actual blobs all oh underscore hello put hello get
these functions cool have no idea and i suspect something will be broken
you start using them some operations like upgrade maybe well not not broken but you will need to
take care of them like in seem like some side effects like table spaces can have you know
it's not used also often if in cloud in cloud context we don't use often the table spaces
but table spaces might might be headache when you do some migrations move your database from
place to place or upgrade and so on yeah
yeah okay yeah good one what else have we talked we haven't talked about embedded databases yet
on the kind of tiny scale of things i'm yeah i'm not an expert in an embedded databases i've heard
school light is good yeah yeah in this category we do now have pg light um looks like a very
interesting project yeah but i think at the moment unless i was doing some
syncing between Postgres.
Like,
this had a really good reason.
I'd probably still default to SQ Lite if,
or SQLite, however they,
however their community pronounces it.
But actually,
I was going to include in this topic,
like even browser local storage, for example.
If you're,
if you're wanting to do stuff,
client side in like a browser app or web app,
still makes sense to use the local storage there.
index DB or whatever.
So there are like a few embedded cases where I don't think it makes sense to use
Postgres or if you're going to try maybe maybe PG light.
And can you remind me PG light?
What does it like that what does it do?
And is it related somehow to WebAssembly?
I think yes, right?
I think it must be right.
But I don't know.
I don't know enough about it.
Yeah, it's complete.
I'm checking it's a complete awesome build of Postgres
that it's under 3 megabytes exept.
That's impressive.
Yes, it's a cool project.
But I guess there's an argument to say it's not actually Postgres.
Like it talks and behaves like Postgres,
but it's kind of its own thing.
Well, we can talk about many post-Gus variants like this,
including Aurora and so on.
Some of them more, more Postgres, some of them less.
Yeah.
But if the topic is like when,
when not to use Postgres
yeah I guess Aurora
I don't know if I count Aurora as that or not
anyway
I'm not sure I understand what you mean
if the only solution was use
Aurora
let's say that was the there was like a
one of these cases
it turns out Aurora was the absolute best
for and way better than native
Postgres I think I would count
that as when not to use Postgres
because it's
It's kind of Postgres compatible, like, or Cockroach, like, or any of these kind of compatible, or Yulobai.
Yeah, I'd say, Al-a-B-B, yeah, you're right, it's kind of a scale.
It's hard to, like, draw a line where it, where it is and isn't.
It's a spectrum.
It's a spectrum.
Yeah, yeah.
Cool.
Okay.
But, yeah, it feels like that's an easy one, right?
If you're, if you've got little devices or little, you know, little sensors.
Yeah, default choice is.
cool light already and i like the idea of pg light and i know superbase used it used it for
database dot build project which i like a lot with like merging this with i and write in browser
you can create pet projects and maybe like explore and it's it's very it's very creative tool
to think about to like to bootstrap some new projects in how it could look like
it has the AR diagram
and you can iterate with
AI it's great
and there PG light works really well
and I'm sure
I'm sure they already created
this ability to deploy
to sync what you build
to real postgres
in super by source somewhere right
well I think that was the main
aim of the of the company
behind PG light
called like electric SQL or something
replication
yeah exactly um so it's the whole premise was local if you heard of local first development so
yeah the idea like apps like linear the task management tool they they're like lightning fast
because they do everything locally and then sync very thick client very thick yes basically like
git like git clone when you write when you type git clone and executed it's basically a whole
repository. It can live
on your machine.
Distributed fashion, right?
And it has to handle mergers.
Yeah. You can miss there.
It's great.
I'm curious if we could explore
better branching in that area because we already
very close to implementing
synchronization between two DB lab engines.
Yeah. But it's a different story.
Yeah.
I like the idea. So it might
be a foundation for
PG light I mean might be foundation for more apps which will leave in browser but then be synchronized with real posgous right yeah or even desktop apps like it doesn't have to be browser based well I guess the desktop apps built on top of using electron right oh no yeah good point good point and and and then if you don't have internet connection you still can work yeah like offline mode that's great
I like the idea actually
I like the idea
so you have
Postgres mirror
it reminds
it reminds me
a multi
multi master
replication
this is
complexity
all the same problems
like with merging
and conflicts
yeah
but at the same time
recent Postgres
has this ability
to create subscription
and avoid
loops
of replication
Yeah, true.
So origin is something you can say.
I want to replicate on the data which doesn't have origin,
which means it was born here.
Local origin, basically, but it means no origin there.
Somehow terminology is strange a little bit as usual in Postgres, right?
But it's a great ability to break the loops, infinite loops.
Yeah, it fixes one of the problems, but it doesn't, yeah.
It picks all the time, conflicts.
Yeah, exactly.
And like last right win type things, yeah.
But if you use, if you need to have a very good like service site application and so on,
you choose progress, but then you have this very thick clients and you need to choose database for them
and you choose SQLite.
Then you need to synchronize between them somehow.
It may be even worse.
Even harder.
I think that's the different data types and models.
Yeah.
Yeah.
Cool.
What about the specialized workloads like, well, vectors,
and I was going to bring up search as well.
I think search is slightly easier.
Let's make sure.
I don't actually have, I haven't written an app or been involved in an application
that is heavily reliant on, like, very advanced search features.
But the people I speak to that have swear by how good elastic search is
and this is also
this is also what I
see I touch
Elastic only usually working with
some logs application logs
Postgres logs through Kibana
so ELK
or how is it called
this stack but I also
see many customers use Elastic and like
it and the shift from
Fullsearch search and Postgres
there
okay their choice
and I know limitations of
Postgres 90 full search
yeah
I'm also I don't understand paradeb and I haven't seen the benchmarks the benchmark
I saw was only the in the beginning when they didn't create index on TS vector
they made a really interesting hire recently I saw do you remember Zombo yes yes
do you remember so yeah because there is money yeah but where are benchmarks
so I don't understand what's there because I don't see benchmarks I tried
recently, because their CEO founder, Philip, approached me.
Nice.
And maybe asking, like, yeah, not maybe asking for, I guess, like, to spread the world.
But I cannot spread the world if I don't see numbers.
Yeah, yeah.
If it's about performance company, show numbers, I might be missing something because, like,
well, the other, the other product in this space for Postgres was an extension called,
ZomboDB, which synchronized, which kept a Elasticsearch index maintained, but the data coming
from Postgres originally.
So I thought that was a really fascinating way of having both, a bit like when we talked
about analytics, like having the interface be from Postgres, but the actual query being
run on something that isn't Postgres.
So that was fascinating.
And it was the founder of that ZomboDB that really.
recently joined parade. So that seems interesting as like this this this whole story seems
interesting. I don't understand it because I cannot find numbers at the same time I see everyone
mentions them, a lot of blog posts, a lot of GitHub stars, a lot of like a lot of noise, but there
are the numbers and benchmarks. So they removed it after initial ones. Yeah, well it would be so
that if you were to try and stay within Postgres, they seem like.
like the obvious thing to try, but I still see people choosing Elasticsearch,
and I'm not sure why.
Yeah, yeah.
Yeah, yeah.
If, please, if someone listening to us can share benchmarks showing how Paradeeb is behaving
under load, some like, I don't know, some number of rows, some number of queries,
like latencies, buffers used ideally, right?
I would appreciate it because I'm still, I'm just, I'm stuck in.
question. What is this?
Cool. What about vectors?
Vectors, I have a picture for you I saw yesterday near my house.
These guys definitely, yeah, those who can see YouTube, please check this out.
So these guys are definitely experts in vector storage.
This is Nikolai's joking. It's like a removal company that are called vector moving
and storage.
I saw storage as well. I thought it's funny, yeah.
So we have TurboPassfer, right?
Well, again, not Postgres, right?
Not Postgres at all, and not open source at all.
And not free at all, you know, free will.
And data is in us three, and this new type of index is being used there.
I already forgot the name.
But, yeah, so not HNSW.
Oh, interesting, yeah.
Yeah, yeah.
I don't know.
So HNSW doesn't scale beyond a few million rows.
Disk on and we had the timescale tiger data guys who developed an advanced version of that.
In my perception, I don't see what scale is 2 billion rows at all.
And Tobu Pfeffer says they scale 2 billion rows, but as I understand, it's a multi-tenant approach.
So every, like, it's not a one.
set of billion vectors. I also don't understand that, but I see some development. The plan
scale for my SQL, they implemented the same index. And this development started, I think,
at Microsoft and maybe in China, actually. Microsoft in China, which this is what I saw. And
interesting, they choose Posgis for prototyping. So this area is worth additional research.
I started it and didn't have time, unfortunately. But it's a very interesting direction.
what's happening with vector search
because I think
Postgres is losing right now
well you say losing
I think
it's losing 100%
well bear with me
I think there are a lot
of use cases that don't need the scale that you're
talking about and a lot of those
are fine on
on Postgres with PGVector
but you're probably talking about
the ones that then succeed and do really
well and scale they hear
hit a limit relatively quickly, or like within the first couple of years?
It's really hard to maintain huge HNSW index and latency-wise.
It's not good.
TurboPyfer, I'm not fully sold on that idea that let's store everything on S3.
Speaking on S3, a few weeks ago, they released S3 vectors.
AWS released S3 vectors, and this might become mainstream.
so S3 itself right now supports vector indexes have you heard about this no I think this might
become mainstream but if if big change doesn't happen in postgres ecosystem it will be
worse and the case with full tech search and elastic it will be worse and how it's
called like alarmist I am today right
Well, this is the point of the episode, right?
It's like, it's almost by design that we're talking about the weaknesses.
I was feeling so good saying, like, I would choose Posgis for this, this, and this.
I can rely.
But here, since we have 1.1 or 1.2 million,
million vectors in our reg system for PostGos knowledge, knowledge.
Is that mostly because of the mailing list?
Yeah.
Mailing list, I think, 70%.
But we also have a lot of pieces of stuff.
source code of various versions and not only postgis,
PG Bouncer and so on,
and documentation, it's a lot of stuff.
Also, block posts.
And I feel not well thinking about how to add more.
And we are going to add more.
We're going to do 10x at some point.
Of course, we will check what Tiger Data has,
but at the same time,
I'm feeling not well in terms of.
What's the main issue latency?
Is it query latency?
the main okay yeah latency index size and x build time all these things
interesting ability to have additional filter which is on the shens w still legs right
yeah maybe i do remember seeing an update on the pg vector repo but i can't remember
i feel like they had a something to address this but i can't remember what
i haven't touched this topic for several months i might be already lagging in terms of
updates it's a very hard topic of course very young as well right
Yeah, and not something I'm experiencing again.
It's more something I'm observing.
So you're definitely way ahead of me on this.
Yeah.
So I know just companies, you have several customers who are on Postgreas,
but they choose Tobopifer additionally.
And linear, you mentioned, for example.
Cursor and linear, they also chose turbopifer.
Notion chose TurboPyfer to store vectors.
I'm just checking the website.
They have some cool customers on this list, yeah.
Yeah.
And several more companies, which are not mentioned here, we also mentioned, and they are
our customers in terms of post-guess consulting.
And I was super surprised to see that something is like massive migration of vectors is happening.
Some moving company called to Biphyfer helped them move to their vectors to S3.
But yeah, it's interesting and they use some different index, which is like younger idea.
It's based on clustering and centroid, so vectors, so it's like A&N is implemented differently, not a graph-like as in HNSW, but basically quickly understand which centroids of clusters are closer to our vector and then work with those clusters.
Quite simple idea, actually, but I guess there are many complexities in implementation.
there yeah well and it's cheap right being on S3 it oh yes and slow but they have
a turbo piper I guess they have additional layer to cache on regular disks closer to
database so there is caching layer of course yeah but but it's much cheaper
much yeah actually this is another area if you have hundreds of terabytes of data
tiering of storage
and Postgres is still not
fully solved problem
right
unless you shard right
and this is
super explicit to shard
and keep everything on disks
especially if there is some
archive data which you touch very
rarely I would prefer to have it on
S3 and time scale cloud
tagger data they solved it in their
solution we also had
attempt to solve it from
Tembo which is not PostGus company
anymore. Yeah, right. PGT, right, it was called. But this is, I think, this should be like
more and more needed over time. Well, and it's a side effect of the, like the crunch data
approach of putting things in iceberg. That also solves the problem, right? You can archive the
data from Postgres at that point. So the, it's a similar solution, isn't it? So I guess these days,
I would explore a three vectors at this point. If I needed to, maybe I will actually. Well, you are going
a need to it sounds like well yeah yeah postgis i infrastructure mostly is on cloud sequel
or not cloud google cloud google cloud no no no it was wrong google cloud so one level up yeah yeah yeah
yeah but a three is a double yes so it's but it's it's it's really interesting it should be cheap
should be interesting to explore yeah and it's a big challenge big challenge to post
ecosystem yeah or maybe opportunity if somebody creates a foreign data wrappers also yeah it's
actually why not it should be it's a good project by the way right so interface there is
foreign data wrapper to s3 already right i think so i think super i don't know i'll check
should be just extended to have vector functions in my opinion okay enough it's like
Kind of brainstorm mode already.
Thank you so much.
See you next week.
Thanks and catch you next week.
Yeah, bye-bye.