Latent Space: The AI Engineer Podcast - ⚡️The Rise and Fall of the Vector DB Category
Episode Date: May 1, 2025Note from your hosts: we were off this week for ICLR and RSA! This week we’re bringing you one of the top episodes from our lightning podcast series, the shorter format, Youtube-only side podcast we... do for breaking news and faster turnaround. Please support our work on YouTube! https://www.youtube.com/playlist?list=PLWEAb1SXhjlc5qgVK4NgehdCzMYCwZtiBThe explosion of embedding-based applications created a new challenge: efficiently storing, indexing, and searching these high-dimensional vectors at scale. This gap gave rise to the vector database category, with companies like Pinecone leading the charge in 2022-2023 by defining specialized infrastructure for vector operations.The category saw explosive growth following ChatGPT’s launch in late 2022, as developers rushed to build AI applications using Retrieval-Augmented Generation (RAG). This surge was partly driven by a widespread misconception that embedding-based similarity search was the only viable method for retrieving context for LLMs!!!The resulting “vector database gold rush” saw massive investment and attention directed toward vector search infrastructure, even though traditional information retrieval techniques remained equally valuable for many RAG applications.Full Video EpisodeTimestamps00:00 Introduction to Trondheim and Background03:03 The Rise and Fall of Vector Databases06:08 Convergence of Search Technologies09:04 Embeddings and Their Importance12:03 Building Effective Search Systems15:00 RAG Applications and Recommendations17:55 The Role of Knowledge Graphs20:49 Future of Embedding Models and Innovations This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Transcript
Discussion (0)
Okay, hi. So this is another lightning pod with Joe Christian Bergam.
Did I get it right? You're over in Norway.
I'm over in Norway, Toronto, I'm Norway in the center of Norway, yes.
What should people know about Trondheim?
It's a small city. It's easy to get around. There's a great technical university here.
The climate sucks a little bit, but it's easy to get things done in the winter.
Yeah, I've never been over. I've been to ORA Dev, I think, which is over near you guys.
But yeah, what we hear to talk about just generally your hot takes on rags, search, vector databases, all that stuff.
I think you've taken to publishing a lot more recently on X, and that's gone really well.
So I'll just kind of go into that main thing that is that everybody knows you for, which is your piece on the vector databases, the rise and fall vector databases.
So maybe give us the background of why you felt compelled to write this.
Yeah, first of all, I think I had to go a little bit back, right?
So I have a long background in search and working on infrastructure for search.
Like I've been in search, working on search systems for 20 years at Yahoo.
Company also fast search and transfer here in Toronto, I'm and Norway.
And also working on embeddings, neural search, all of those things, right?
leading up until chat GPT, the chat GPT moment, like November 2022.
And then there was some kind of cookbook, I think, from Open AI where they said,
okay, this is how you can do connect chat GPT with your data and here's embeddings.
And I think then a lot of developers, right, got into this is how we can build, search.
This is how we can do rag.
And I think there was like this unnatural connection, meaning that between retrieval
in Rack, that it had to be vector embeddings.
By the way, I have a small role in that.
I actually was the one who wrote the Chroma example in the opening
Aco book.
You did?
Okay.
I was an angel investor in Chroma before they became a vector database.
And then I was just helping out.
I'm actually a huge fan of Jeff and Anton from Chroma.
I mean, I think Anton left, but I think they've done a great job at promoting retrieval
for AI and infrastructure and they did a lot of great things. So I really enjoy talking to them on X.
Anyway, and then we had the whole vector database. I think Pinecoin was one of the pioneers
framing it as a new infrastructure category. If you need to work on embeddings, you have to
use a vector database. And naturally then, if you want to do anything in AI, then you need to have
a vector database. And that was my primary motivation for writing that piece and looking at a little
bit back, you know, what happened and where we are now and how I see it. And yeah, so that was
the pure, pure motivation. Okay. And the general thesis, I guess, if you want to just sort of
recap that like, you know, I think it's a very fast rise and fault. Like Pine Cone was a dominant
player for a long, long time. And, you know, I don't know my exact sources because there's a lot
rumors going back and forth, but apparently they went up to like a hundred million
they are very, very quickly to raise a big rounds. And then suddenly a lot of people started
leaving, like suddenly went from cool to uncool very quickly, and I don't understand why.
I don't understand that either. And I think also they repositioned a little bit going back
to their core messaging. If you go to their website now, it looks more developer focus.
It's not the memory for AI. It's not like enterprise-ish. It's more towards developers now.
I think that they were trying to go back to their original routes.
I think that's a good thing.
But also, of course, there's been a lot of competition in this space, a lot of new companies.
One of the upcoming stars is turbopuffer, kind of same SaaS model, a little bit different pricing,
and they really talk to developers.
And I'm not saying that the companies are dying, right?
I'm just saying that the separate infrastructure category is dying, right?
Because you have vector search capabilities in almost any DB technology nowadays, right?
And you have it also in more traditional search engines like Elasticsearch, Solar, VESPA.
So I think there's like convergence on features on both parts.
And then you have things like PG vector in Postgres.
A lot of people get confused.
Okay, I have already a DB.
It has vector search.
Why do I need another DB?
Like a vector DB?
So the whole database concept.
So I think those companies, I mean, there are lots of great technology here.
Don't get me wrong on that.
But I don't say that the companies are dying,
but I'm saying that the category is dying.
So there's this distinction.
And I think a lot of people like oversaw that and like came out me and say that,
you know, because they had some kind of hate around some of these companies.
And they said, yeah, you know, go fuck buying going and whatnot, right?
But I actually say that the category is dying.
And I actually want to call these new companies that they are like search engines.
And I want to go back to the natural.
I think that's a more natural abstraction for connecting AI with knowledge and all the arguments for doing rag.
I think the natural concept there is search.
And I think one of the insight I have from, I use windsurf a lot.
I love Windsor.
They're like Cascade mode.
And if you ask it, like, what are the tools you have available?
And like lists like 17, 18 tools, like edit files.
But there's also like things like search codebase, search the web, grep.
And these are like search abstractions, right?
And I love that idea where you like just connect the reasoning model with these tools that are essentially search tools.
And that can help the agent or the LLM to actually formulate the query, you know, should I do a grep.
Or should I do more of a semantic search,
or should I do more a keyword search,
or should I just search the web?
So I think that's more like the natural abstraction,
instead of jumping into vectors and how you represent,
that is more of like a detail of how you implement search.
Yeah.
It's interesting that we fixated a lot on vector,
like dent and bidding and all that.
And I think now we're sort of broadening out.
I would also mention that Croma,
I think from the start has always said there kind of going after information retrieval
and not so much, not so much, you know, the narrow sense of rag.
Yeah, I think like broadly this is the consensus that, you know, like the category was,
was never really going to be lasting for that long.
It was just, there was just a brief period of time from my one of my favorite early tweets
in AI, you know, this post-JadGBT phase was, I, I, like, I.
summed up all of the fundraising that happened in vector databases for
and it was something like $230 million and all put into all the vector databases
and that was more than the entire lifespan fundraising of MongoDB.
Right.
So like basically they cannot all win because they've already taken more money than
supports a, you know, one of the de facto winner companies in NoSQL.
Yeah.
Interesting.
Yeah, I think also on MongoDB, right, they brought a new category in NoSQL, right?
And but nowadays, all the other database players have also caught up, right?
So now even MongoDB has relational SQL, right?
So there's always like this convergence, but MongoDB kind of it sticks.
But I don't think that for Pinecoin that was originally leading that movement, it won't
like stick in the same way.
It's too narrow.
It's too narrow.
Yeah.
But I would like to say one more thing about embedding.
So people like, okay, Joe, but, you know, embeddings is really important.
And I also think that embeddings is really important, right?
Because you can represent more data than ever before, right?
Multimodal, whatnot.
Run into a neural network, get an embedding representation,
and then you can move this embedding representation around the vector space
and adjust to your domain or whatever you're doing.
So it's really important.
But what happened was that it,
went mainstream, right? It went from these big tech companies like Google, Yahoo, Facebook,
all of them, you know, been working on embeddings for a long time for a lot of different tasks,
right? But with post-chatGPT and we got the embedding APIs from Open API, it suddenly became
mainstream. Like every developer would start using embeddings, right, and what to do with them
and similarity search and so forth. So I'm not against embeddings, right? Embeddings are here to
day, it's just that it's only, it's not only about similarity searches in this kind of embedding
space. But, and then I think more people actually realize that, you know, you actually need something
more to it than just an embedding and a cosine similarity to do search well, like things like
freshness or authority and all of other signals, like that really plays into role in web search.
And I remember one of the Open AI guys wrote like,
you can embed the whole web and then you can build the next generation web search.
And I thought like, okay, just looking at semantic similarity,
that's not going to play out too well.
So, yeah, I mean, they're trying to sell you their model, right?
So what are they going to do is say those very hypey things.
Yeah, I mean, the way that I put it is always you're always going to want to do a hybrid query.
You always want to add metadata and like, you know, do all that stuff.
I think my question to you is maybe a very ageless question, which is, should they all be the same system?
Right?
Your search system, like, Elasticsearch is typically like you duplicate your, whatever your main storage or record is.
And then you have that search index that is basically almost a complete duplicate.
Like you just copy over the documents.
Do you believe in that?
Do you think there's a convergence here?
This is a fantastic question.
I think for a lot of use cases, right?
And if you're already using some database like Postgres, right, it has this great extension, PGVector, right?
And I know that I tweeted things about PGVector that was true in the start around the limitations of PGVector.
But there was a rally around PGVector, like adding new algorithm, introducing actually two algorithms, both EBF and HNSW, adding half-weck, adding binary vectors.
Actually, what you can see, PGVector is doing more in the capabilities of vector search
than some of the real vector database players, right?
So if you're only looking at like vector search capabilities and you already have your
data in Postgres and you're operating at a reasonable scale, I think it's fair to use Postgres.
Sorry?
I think it's fair to use Postgres or use one database if you're like operating at, you're not operating
at a really large scale and you do some vector search related workloads and you also use
a database for other types of workload, then it might make sense to just keep the data there.
But if you're actually building something that really depends on search quality and your
business depends on it, yeah, then definitely I think that you should consider, you know,
actually using a real kind of retrieval or search engine to represent the data there, right?
Yeah.
And is the search system,
how closely entwined is Rexis in search in your mind?
Yeah, and that's the thing with embeddings, right?
Embeddings, I think with embeddings on embedding-based retrieval,
because embedding-based retrieval has been used for a long time in recommender systems,
like large-scale recommender systems like, you know, TikTok or Yahoo News or things like that.
Apparently TikTok published their Rexis recently.
which is kind of interesting.
Yeah, it's like a cascade of,
there's always in this system that operates at a really large scale,
there's always a cascade of different stages
where you first have to retrieve over the candidate pool,
and it's typically using embedding-based retrieval.
And then you have layers of re-ranking layers
that finally you end up with 100 candidates or something like that
that you actually present to the user.
So I think definitely there's convergence
and that embedding-based retrieval
is also known more common for
search systems.
So there's convergence there on how it actually sold
on the technology specter.
Yeah.
Any other thoughts on like,
I guess the confusion for a lot of folks
who are newer to this, right?
They understand now that you cannot just have
embeddings only and coside similarity only.
It's just the sequencing of like, what should I do first?
What should I do second?
What should I do third?
Everyone says like, you know, re-ranking is like super important,
but that, you know, it adds like maybe like three to four percent to your results.
And maybe that's like the lowest hanging fruit.
So I'm always trying to figure out like what should I recommend to people, right?
Like that they should start with.
Like be, you know, like a PostQuest or MongoDB as that they're transactional and in the end.
store. Then they could split it out to maybe use Elasticsearch or Vespa. I don't know if
that would be the recommendation there. Redis, I think, is also trying to push themselves there very,
very hard. And then you add the Rexis. Is that a good sequence? I think it's really hard to come
up with general recommendations, like without knowing what you're doing. But if you're like
looking to build a Rag application, like, I think most people are interested in some,
something related to the rag, right?
When you have some data and you have to transform your data and think, I think first,
I think is it Hamel that always talks about look at your data or, you know, everyone's looking
about look at the data.
So first of all, you know, how to get your data in a cleaned up way.
It is like PDFs or whatnot to do that.
There's things there.
I think actually that a very strong baseline is the classical BM25, like algorithm that's
been around for 30 years, right?
it's keyword matching, but it offers a very useful baseline for a lot of different search
use cases, because it gives you that baseline, right?
Then you can start looking at using an off-the-shelf embedding model to also embedding model
and all of the engines, more or less, most of the engines have some kind of hybrid search
capabilities, start to play with that.
And then if you can afford it, both from a latency perspective and the cost perspective,
you can look at adding like a re-ranking layer on top of that.
How you stitch that together depends on, you know, your framework of choice.
But I think most of you can you can stitch this together, like with multiple different
APIs depending on your budget, I guess.
Yeah.
You know, I always tend to recommend people to do this offline as much as possible, like
batch online and whatever.
Most people don't need fully online systems.
Yeah.
And that's a friction point because I've been used to working on kind of constrained online systems,
like at a pretty significant scale.
And when there's always like everything is online, needs to be a low latency.
And then I have problems adjusting to, you know, when you want to do things at a much lower scale.
So I'll give you an example, like calling out to some kind of embedding API to get JSON floats.
You know, it's not something.
you know, you want to do if you're running at thousands of QPS.
You don't want to add that dependency.
You want to have something local, something that is faster.
So I've always been like, okay, I'm going to call out to this endpoint.
It's going to take 300 milliseconds to get this large float.
It's something that I like, oh, shrug.
But now I'm shifting towards more, you know, it's easy.
It's an API-based service.
You don't have to think about it.
It's just there.
So it's much easier to build from, right, to have something.
that it's API-based.
So I'm trying to embrace that.
I see, I see.
No, so when I say offline, I mean more like not in the critical path,
like a batch systems because, yeah, and it's interesting.
I don't know if you've looked at Postgres ML for running the models alongside
of the database.
Are you bullish on that kind of stuff?
No, I'm not.
I'm sorry.
I'm not.
I think this is, yeah, we also seen other players that tries to,
to move a lot of the logic into the database,
agentic embedding inference and whatnot, LLAMs.
I think the right direction is to keep infrastructure
a little bit separate from that,
because they're different scaling properties.
I think people can stitch those two things together
instead of trying to do everything with one single platform.
So no, I'm not bullish on that.
Because I don't believe in the developer experience
of writing like these huge SQL statements
for transforming,
data from this and then embedding it and then writing it back and expressing this in the database.
Like, what does this do to my database? Is it like calling out? What's going on? I tend to want
to have more control over cost and performance and what's going on than just writing some
really large SQL to execute. Yeah, it's interesting. I think there's this constant tension between
what you live in the database versus what is an external system. I don't think it's a clear cut.
like, you know, classic
like the Kron service,
which we have in Superbase.
Okay, so cool.
Like, any other, like, hot takes or, you know,
what are the biggest criticisms that you got after you publish this?
You know, like, you know, what do you agree with?
What do you, what do you disagree with, you know, just?
Yeah, I think one of the things that people pointed out,
if something goes semi-viral,
after a few days, you discover that there's a lot of replies
that you didn't see and you're like, okay.
But I think one of the things that stood out,
was that people said that Joe is saying that RAG is dead because of vector database infrastructure
is dead, right? And I think that was a misunderstanding as well. And I think that comes from
people making the connection between RAG and vector databases, like, it's so strong. So when I'm saying
that the vector database infrastructure category is dead, is like, okay, rag is dead. And I think that
rag is definitely not dead, right? Augmenting AI with retrieval or search,
is still going to be relevant,
and I think it's going to be relevant for a very long time, right?
So that was one of the things.
I mean, so that, you know, now we have 10 million model, longer context,
and you have the same cycle repeat.
Every time, every time.
I'm just like, you know, for me, you know,
I put out this, like, cryptic tweet.
I was like, you know, this is,
Lama 4 is going to reignite the long context versus fact debate,
but it will actually resolve the debate
but not in the way that you want
this is too cryptic to me
no it's just like
there's there's like five other guys
like saying all like you know long context hills
rag like RAP rag and I'm just like guys
you are your idiots or like
your engagement farming basically like a lot
most likely you believe
like they know what they're doing
and they're just saying nonsense
just to just to have fun
and like people who don't take them serious
Yeah. But also it's nuanced to this, right? So I've seen people do rag when there's no need to do rag,
meaning that, you know, if you have one PDF like with visual information and things and you want
to chat with that, definitely that case is probably like that if you don't have like high QPS and things
like that. So I think there's nuances around this that there's definitely I had a call with someone that
had like 300 articles, and I said, you know, this will just fit into the context window of one of these Gemini models.
You don't have to have a vector database for this case.
And they were so surprised when I said this, can you really do that?
Yeah.
But it's also, look at it.
It's like just, we had like 4K context window, right?
And now 10 million, right?
So, and that's fast, right?
And then people are still running their initial demos from early January,
2023, right?
So where you were dealing with 4K or 8K, right?
So some parts of it is still not relevant now because we have longer context windows.
But I think retrieval, of course, it's going to be there for a long time.
One example I love to bring up is like one of these small toy data sets from
TREC COVID is like 170,000 documents.
and it's already 36 million tokens.
And you're not going to load all of that, you know, for a single career.
Yeah, awesome.
Do you have a take on knowledge graphs and graphrag?
I think that the graph rag, well, I have a lot of takes around it.
I think that one issue, I mean, graph databases is a database that kind of solves one particular problem.
And it does it well to traverse the edges in the graph and, you know, random access and jump across.
But the core issue is actually to build the knowledge graph, right?
The entity is the relationship.
So if you say graph databases or graph rag is going to kill vector rag and all that discussion,
I think the first issue is to actually build the knowledge graph the first place, right?
And if you use a search engine or a dedicated graph DB to actually speed up and accelerate the searches,
okay, fine, but I think people like, okay, if I'm going to do graph rag, then I need a graph database.
And I hate that connection between doing something and then connecting it to some specific
technology. And I think a lot of people do that, right? You jump from some concept into some
technology. You can also do graph exploration with a search engine, right? You can, so you don't
need a specific technology to do it. And can graph rag be better than vector rag?
Yeah, for sure, in some cases, it might make sense or hybrid or whatnot.
But I think people get caught up in some specific technology all the time.
Yeah, but I think that's okay.
But I'm still trying to validate the presence of knowledge graphs in LLM applications
because obviously with LLMs, it is much easier, better to create these entity triplets and all that.
So theoretically, it should be better.
Yeah, yeah.
In the past, knowledge graph has been a dirty word.
But now maybe it's not.
Maybe, maybe, maybe, I think with LLMs, you can do a lot more things around data
generation, you know, in general.
So generating those triplets is a bottleneck, right?
It's been a bottleneck.
But now we have LAMs.
So I agree, you know, now it could be easier to actually build what matters, right, which is
those triplets.
Yeah.
Okay.
Awesome.
Any other opportunities that you find?
I know that you mentioned Gina.
I think they're prominent European startup in, you know.
rag. And then I think over here, Voyage just got acquired by
Nvidia. You know, anything on the embedding side, like, do we need
a lot better embedding models? Do we, this is what we have from the big labs,
good enough? Oh, I hope to see, I mean, Voyage was really leading the pack on
doing domain specific embedding models, like legal PDFs. And what I want to see
is more embedding models in that direction,
where you essentially represent this PDF as an embedding
or multiple embeddings for legal domain or finance or health.
I hope to see that grow
so that you can have a better starting point
than just those text models.
And I've been a huge believer in using visual language models
as a backbone for embedding models
where you essentially take a screenshot of a page.
You don't have to go through OCR, so you then get a much richer representation.
You don't have to go to these complex processing pipelines.
So I hope to see more innovation.
I'm not sure if it's going to happen because I think it's a difficult business model to be in
because you have to have an API-based service and you have to do batching
and you have to make up for the compute.
And then, you know, are people willing to pay for it?
And I think maybe that's why Voyage got acquired.
I think also Gina is doing a lot of.
of great things in this space now, especially in European languages.
But I think every company is trying to move up in the value ladder, right?
They want to move into enterprise search or move into a different direction.
So, yeah, but I do hope that we will see more and better, like general embedding models.
Yeah, yeah.
I mean, I'm sure, I think the voyage guys are very happy because it seems like they got acquired
for a lot.
Yeah.
Okay.
So, okay, cool.
Anything else before we wrap, any calls to action?
any, you know, partying rants on the topics of the day?
No, I would love.
I mean, if you want to connect with me, you know, for the audience, you can find me on
X under the handle Joe Bergen there.
So I love a show that's shout out on the on X.
So I hang out there quite sometimes.
Yeah.
Yeah, I mean, it's where the AI community is, you know.
Although I've been, I'm always trying to like grow on LinkedIn or YouTube.
I mean, you know, there's a lot more people there, you know.
There's Twitter show like this, this like echo chamber.
Yeah, but it's not the same.
I mean, X, us.
I mean, we wouldn't have this meeting me and you without X there, right?
So it's a great place for really high signal to noise.
And I think the AI community there is really great.
Yeah, awesome.
Well, thank you.
Thank you so much for having this.
It's been awesome.
