Latent Space: The AI Engineer Podcast - ⚡️The Rise and Fall of the Vector DB Category

Episode Date: May 1, 2025

Note from your hosts: we were off this week for ICLR and RSA! This week we’re bringing you one of the top episodes from our lightning podcast series, the shorter format, Youtube-only side podcast we... do for breaking news and faster turnaround. Please support our work on YouTube! https://www.youtube.com/playlist?list=PLWEAb1SXhjlc5qgVK4NgehdCzMYCwZtiBThe explosion of embedding-based applications created a new challenge: efficiently storing, indexing, and searching these high-dimensional vectors at scale. This gap gave rise to the vector database category, with companies like Pinecone leading the charge in 2022-2023 by defining specialized infrastructure for vector operations.The category saw explosive growth following ChatGPT’s launch in late 2022, as developers rushed to build AI applications using Retrieval-Augmented Generation (RAG). This surge was partly driven by a widespread misconception that embedding-based similarity search was the only viable method for retrieving context for LLMs!!!The resulting “vector database gold rush” saw massive investment and attention directed toward vector search infrastructure, even though traditional information retrieval techniques remained equally valuable for many RAG applications.Full Video EpisodeTimestamps00:00 Introduction to Trondheim and Background03:03 The Rise and Fall of Vector Databases06:08 Convergence of Search Technologies09:04 Embeddings and Their Importance12:03 Building Effective Search Systems15:00 RAG Applications and Recommendations17:55 The Role of Knowledge Graphs20:49 Future of Embedding Models and Innovations This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe

Transcript
Discussion (0)
Starting point is 00:00:02 Okay, hi. So this is another lightning pod with Joe Christian Bergam. Did I get it right? You're over in Norway. I'm over in Norway, Toronto, I'm Norway in the center of Norway, yes. What should people know about Trondheim? It's a small city. It's easy to get around. There's a great technical university here. The climate sucks a little bit, but it's easy to get things done in the winter. Yeah, I've never been over. I've been to ORA Dev, I think, which is over near you guys. But yeah, what we hear to talk about just generally your hot takes on rags, search, vector databases, all that stuff.
Starting point is 00:00:39 I think you've taken to publishing a lot more recently on X, and that's gone really well. So I'll just kind of go into that main thing that is that everybody knows you for, which is your piece on the vector databases, the rise and fall vector databases. So maybe give us the background of why you felt compelled to write this. Yeah, first of all, I think I had to go a little bit back, right? So I have a long background in search and working on infrastructure for search. Like I've been in search, working on search systems for 20 years at Yahoo. Company also fast search and transfer here in Toronto, I'm and Norway. And also working on embeddings, neural search, all of those things, right?
Starting point is 00:01:24 leading up until chat GPT, the chat GPT moment, like November 2022. And then there was some kind of cookbook, I think, from Open AI where they said, okay, this is how you can do connect chat GPT with your data and here's embeddings. And I think then a lot of developers, right, got into this is how we can build, search. This is how we can do rag. And I think there was like this unnatural connection, meaning that between retrieval in Rack, that it had to be vector embeddings. By the way, I have a small role in that.
Starting point is 00:01:59 I actually was the one who wrote the Chroma example in the opening Aco book. You did? Okay. I was an angel investor in Chroma before they became a vector database. And then I was just helping out. I'm actually a huge fan of Jeff and Anton from Chroma. I mean, I think Anton left, but I think they've done a great job at promoting retrieval
Starting point is 00:02:22 for AI and infrastructure and they did a lot of great things. So I really enjoy talking to them on X. Anyway, and then we had the whole vector database. I think Pinecoin was one of the pioneers framing it as a new infrastructure category. If you need to work on embeddings, you have to use a vector database. And naturally then, if you want to do anything in AI, then you need to have a vector database. And that was my primary motivation for writing that piece and looking at a little bit back, you know, what happened and where we are now and how I see it. And yeah, so that was the pure, pure motivation. Okay. And the general thesis, I guess, if you want to just sort of recap that like, you know, I think it's a very fast rise and fault. Like Pine Cone was a dominant
Starting point is 00:03:13 player for a long, long time. And, you know, I don't know my exact sources because there's a lot rumors going back and forth, but apparently they went up to like a hundred million they are very, very quickly to raise a big rounds. And then suddenly a lot of people started leaving, like suddenly went from cool to uncool very quickly, and I don't understand why. I don't understand that either. And I think also they repositioned a little bit going back to their core messaging. If you go to their website now, it looks more developer focus. It's not the memory for AI. It's not like enterprise-ish. It's more towards developers now. I think that they were trying to go back to their original routes.
Starting point is 00:03:53 I think that's a good thing. But also, of course, there's been a lot of competition in this space, a lot of new companies. One of the upcoming stars is turbopuffer, kind of same SaaS model, a little bit different pricing, and they really talk to developers. And I'm not saying that the companies are dying, right? I'm just saying that the separate infrastructure category is dying, right? Because you have vector search capabilities in almost any DB technology nowadays, right? And you have it also in more traditional search engines like Elasticsearch, Solar, VESPA.
Starting point is 00:04:33 So I think there's like convergence on features on both parts. And then you have things like PG vector in Postgres. A lot of people get confused. Okay, I have already a DB. It has vector search. Why do I need another DB? Like a vector DB? So the whole database concept.
Starting point is 00:04:52 So I think those companies, I mean, there are lots of great technology here. Don't get me wrong on that. But I don't say that the companies are dying, but I'm saying that the category is dying. So there's this distinction. And I think a lot of people like oversaw that and like came out me and say that, you know, because they had some kind of hate around some of these companies. And they said, yeah, you know, go fuck buying going and whatnot, right?
Starting point is 00:05:17 But I actually say that the category is dying. And I actually want to call these new companies that they are like search engines. And I want to go back to the natural. I think that's a more natural abstraction for connecting AI with knowledge and all the arguments for doing rag. I think the natural concept there is search. And I think one of the insight I have from, I use windsurf a lot. I love Windsor. They're like Cascade mode.
Starting point is 00:05:42 And if you ask it, like, what are the tools you have available? And like lists like 17, 18 tools, like edit files. But there's also like things like search codebase, search the web, grep. And these are like search abstractions, right? And I love that idea where you like just connect the reasoning model with these tools that are essentially search tools. And that can help the agent or the LLM to actually formulate the query, you know, should I do a grep. Or should I do more of a semantic search, or should I do more a keyword search,
Starting point is 00:06:15 or should I just search the web? So I think that's more like the natural abstraction, instead of jumping into vectors and how you represent, that is more of like a detail of how you implement search. Yeah. It's interesting that we fixated a lot on vector, like dent and bidding and all that. And I think now we're sort of broadening out.
Starting point is 00:06:39 I would also mention that Croma, I think from the start has always said there kind of going after information retrieval and not so much, not so much, you know, the narrow sense of rag. Yeah, I think like broadly this is the consensus that, you know, like the category was, was never really going to be lasting for that long. It was just, there was just a brief period of time from my one of my favorite early tweets in AI, you know, this post-JadGBT phase was, I, I, like, I. summed up all of the fundraising that happened in vector databases for
Starting point is 00:07:14 and it was something like $230 million and all put into all the vector databases and that was more than the entire lifespan fundraising of MongoDB. Right. So like basically they cannot all win because they've already taken more money than supports a, you know, one of the de facto winner companies in NoSQL. Yeah. Interesting. Yeah, I think also on MongoDB, right, they brought a new category in NoSQL, right?
Starting point is 00:07:48 And but nowadays, all the other database players have also caught up, right? So now even MongoDB has relational SQL, right? So there's always like this convergence, but MongoDB kind of it sticks. But I don't think that for Pinecoin that was originally leading that movement, it won't like stick in the same way. It's too narrow. It's too narrow. Yeah.
Starting point is 00:08:11 But I would like to say one more thing about embedding. So people like, okay, Joe, but, you know, embeddings is really important. And I also think that embeddings is really important, right? Because you can represent more data than ever before, right? Multimodal, whatnot. Run into a neural network, get an embedding representation, and then you can move this embedding representation around the vector space and adjust to your domain or whatever you're doing.
Starting point is 00:08:35 So it's really important. But what happened was that it, went mainstream, right? It went from these big tech companies like Google, Yahoo, Facebook, all of them, you know, been working on embeddings for a long time for a lot of different tasks, right? But with post-chatGPT and we got the embedding APIs from Open API, it suddenly became mainstream. Like every developer would start using embeddings, right, and what to do with them and similarity search and so forth. So I'm not against embeddings, right? Embeddings are here to day, it's just that it's only, it's not only about similarity searches in this kind of embedding
Starting point is 00:09:14 space. But, and then I think more people actually realize that, you know, you actually need something more to it than just an embedding and a cosine similarity to do search well, like things like freshness or authority and all of other signals, like that really plays into role in web search. And I remember one of the Open AI guys wrote like, you can embed the whole web and then you can build the next generation web search. And I thought like, okay, just looking at semantic similarity, that's not going to play out too well. So, yeah, I mean, they're trying to sell you their model, right?
Starting point is 00:09:51 So what are they going to do is say those very hypey things. Yeah, I mean, the way that I put it is always you're always going to want to do a hybrid query. You always want to add metadata and like, you know, do all that stuff. I think my question to you is maybe a very ageless question, which is, should they all be the same system? Right? Your search system, like, Elasticsearch is typically like you duplicate your, whatever your main storage or record is. And then you have that search index that is basically almost a complete duplicate. Like you just copy over the documents.
Starting point is 00:10:24 Do you believe in that? Do you think there's a convergence here? This is a fantastic question. I think for a lot of use cases, right? And if you're already using some database like Postgres, right, it has this great extension, PGVector, right? And I know that I tweeted things about PGVector that was true in the start around the limitations of PGVector. But there was a rally around PGVector, like adding new algorithm, introducing actually two algorithms, both EBF and HNSW, adding half-weck, adding binary vectors. Actually, what you can see, PGVector is doing more in the capabilities of vector search
Starting point is 00:11:08 than some of the real vector database players, right? So if you're only looking at like vector search capabilities and you already have your data in Postgres and you're operating at a reasonable scale, I think it's fair to use Postgres. Sorry? I think it's fair to use Postgres or use one database if you're like operating at, you're not operating at a really large scale and you do some vector search related workloads and you also use a database for other types of workload, then it might make sense to just keep the data there. But if you're actually building something that really depends on search quality and your
Starting point is 00:11:50 business depends on it, yeah, then definitely I think that you should consider, you know, actually using a real kind of retrieval or search engine to represent the data there, right? Yeah. And is the search system, how closely entwined is Rexis in search in your mind? Yeah, and that's the thing with embeddings, right? Embeddings, I think with embeddings on embedding-based retrieval, because embedding-based retrieval has been used for a long time in recommender systems,
Starting point is 00:12:19 like large-scale recommender systems like, you know, TikTok or Yahoo News or things like that. Apparently TikTok published their Rexis recently. which is kind of interesting. Yeah, it's like a cascade of, there's always in this system that operates at a really large scale, there's always a cascade of different stages where you first have to retrieve over the candidate pool, and it's typically using embedding-based retrieval.
Starting point is 00:12:50 And then you have layers of re-ranking layers that finally you end up with 100 candidates or something like that that you actually present to the user. So I think definitely there's convergence and that embedding-based retrieval is also known more common for search systems. So there's convergence there on how it actually sold
Starting point is 00:13:13 on the technology specter. Yeah. Any other thoughts on like, I guess the confusion for a lot of folks who are newer to this, right? They understand now that you cannot just have embeddings only and coside similarity only. It's just the sequencing of like, what should I do first?
Starting point is 00:13:38 What should I do second? What should I do third? Everyone says like, you know, re-ranking is like super important, but that, you know, it adds like maybe like three to four percent to your results. And maybe that's like the lowest hanging fruit. So I'm always trying to figure out like what should I recommend to people, right? Like that they should start with. Like be, you know, like a PostQuest or MongoDB as that they're transactional and in the end.
Starting point is 00:14:05 store. Then they could split it out to maybe use Elasticsearch or Vespa. I don't know if that would be the recommendation there. Redis, I think, is also trying to push themselves there very, very hard. And then you add the Rexis. Is that a good sequence? I think it's really hard to come up with general recommendations, like without knowing what you're doing. But if you're like looking to build a Rag application, like, I think most people are interested in some, something related to the rag, right? When you have some data and you have to transform your data and think, I think first, I think is it Hamel that always talks about look at your data or, you know, everyone's looking
Starting point is 00:14:44 about look at the data. So first of all, you know, how to get your data in a cleaned up way. It is like PDFs or whatnot to do that. There's things there. I think actually that a very strong baseline is the classical BM25, like algorithm that's been around for 30 years, right? it's keyword matching, but it offers a very useful baseline for a lot of different search use cases, because it gives you that baseline, right?
Starting point is 00:15:14 Then you can start looking at using an off-the-shelf embedding model to also embedding model and all of the engines, more or less, most of the engines have some kind of hybrid search capabilities, start to play with that. And then if you can afford it, both from a latency perspective and the cost perspective, you can look at adding like a re-ranking layer on top of that. How you stitch that together depends on, you know, your framework of choice. But I think most of you can you can stitch this together, like with multiple different APIs depending on your budget, I guess.
Starting point is 00:15:51 Yeah. You know, I always tend to recommend people to do this offline as much as possible, like batch online and whatever. Most people don't need fully online systems. Yeah. And that's a friction point because I've been used to working on kind of constrained online systems, like at a pretty significant scale. And when there's always like everything is online, needs to be a low latency.
Starting point is 00:16:14 And then I have problems adjusting to, you know, when you want to do things at a much lower scale. So I'll give you an example, like calling out to some kind of embedding API to get JSON floats. You know, it's not something. you know, you want to do if you're running at thousands of QPS. You don't want to add that dependency. You want to have something local, something that is faster. So I've always been like, okay, I'm going to call out to this endpoint. It's going to take 300 milliseconds to get this large float.
Starting point is 00:16:46 It's something that I like, oh, shrug. But now I'm shifting towards more, you know, it's easy. It's an API-based service. You don't have to think about it. It's just there. So it's much easier to build from, right, to have something. that it's API-based. So I'm trying to embrace that.
Starting point is 00:17:05 I see, I see. No, so when I say offline, I mean more like not in the critical path, like a batch systems because, yeah, and it's interesting. I don't know if you've looked at Postgres ML for running the models alongside of the database. Are you bullish on that kind of stuff? No, I'm not. I'm sorry.
Starting point is 00:17:25 I'm not. I think this is, yeah, we also seen other players that tries to, to move a lot of the logic into the database, agentic embedding inference and whatnot, LLAMs. I think the right direction is to keep infrastructure a little bit separate from that, because they're different scaling properties. I think people can stitch those two things together
Starting point is 00:17:46 instead of trying to do everything with one single platform. So no, I'm not bullish on that. Because I don't believe in the developer experience of writing like these huge SQL statements for transforming, data from this and then embedding it and then writing it back and expressing this in the database. Like, what does this do to my database? Is it like calling out? What's going on? I tend to want to have more control over cost and performance and what's going on than just writing some
Starting point is 00:18:18 really large SQL to execute. Yeah, it's interesting. I think there's this constant tension between what you live in the database versus what is an external system. I don't think it's a clear cut. like, you know, classic like the Kron service, which we have in Superbase. Okay, so cool. Like, any other, like, hot takes or, you know, what are the biggest criticisms that you got after you publish this?
Starting point is 00:18:41 You know, like, you know, what do you agree with? What do you, what do you disagree with, you know, just? Yeah, I think one of the things that people pointed out, if something goes semi-viral, after a few days, you discover that there's a lot of replies that you didn't see and you're like, okay. But I think one of the things that stood out, was that people said that Joe is saying that RAG is dead because of vector database infrastructure
Starting point is 00:19:05 is dead, right? And I think that was a misunderstanding as well. And I think that comes from people making the connection between RAG and vector databases, like, it's so strong. So when I'm saying that the vector database infrastructure category is dead, is like, okay, rag is dead. And I think that rag is definitely not dead, right? Augmenting AI with retrieval or search, is still going to be relevant, and I think it's going to be relevant for a very long time, right? So that was one of the things. I mean, so that, you know, now we have 10 million model, longer context,
Starting point is 00:19:45 and you have the same cycle repeat. Every time, every time. I'm just like, you know, for me, you know, I put out this, like, cryptic tweet. I was like, you know, this is, Lama 4 is going to reignite the long context versus fact debate, but it will actually resolve the debate but not in the way that you want
Starting point is 00:20:02 this is too cryptic to me no it's just like there's there's like five other guys like saying all like you know long context hills rag like RAP rag and I'm just like guys you are your idiots or like your engagement farming basically like a lot most likely you believe
Starting point is 00:20:20 like they know what they're doing and they're just saying nonsense just to just to have fun and like people who don't take them serious Yeah. But also it's nuanced to this, right? So I've seen people do rag when there's no need to do rag, meaning that, you know, if you have one PDF like with visual information and things and you want to chat with that, definitely that case is probably like that if you don't have like high QPS and things like that. So I think there's nuances around this that there's definitely I had a call with someone that
Starting point is 00:20:56 had like 300 articles, and I said, you know, this will just fit into the context window of one of these Gemini models. You don't have to have a vector database for this case. And they were so surprised when I said this, can you really do that? Yeah. But it's also, look at it. It's like just, we had like 4K context window, right? And now 10 million, right? So, and that's fast, right?
Starting point is 00:21:21 And then people are still running their initial demos from early January, 2023, right? So where you were dealing with 4K or 8K, right? So some parts of it is still not relevant now because we have longer context windows. But I think retrieval, of course, it's going to be there for a long time. One example I love to bring up is like one of these small toy data sets from TREC COVID is like 170,000 documents. and it's already 36 million tokens.
Starting point is 00:21:53 And you're not going to load all of that, you know, for a single career. Yeah, awesome. Do you have a take on knowledge graphs and graphrag? I think that the graph rag, well, I have a lot of takes around it. I think that one issue, I mean, graph databases is a database that kind of solves one particular problem. And it does it well to traverse the edges in the graph and, you know, random access and jump across. But the core issue is actually to build the knowledge graph, right? The entity is the relationship.
Starting point is 00:22:26 So if you say graph databases or graph rag is going to kill vector rag and all that discussion, I think the first issue is to actually build the knowledge graph the first place, right? And if you use a search engine or a dedicated graph DB to actually speed up and accelerate the searches, okay, fine, but I think people like, okay, if I'm going to do graph rag, then I need a graph database. And I hate that connection between doing something and then connecting it to some specific technology. And I think a lot of people do that, right? You jump from some concept into some technology. You can also do graph exploration with a search engine, right? You can, so you don't need a specific technology to do it. And can graph rag be better than vector rag?
Starting point is 00:23:15 Yeah, for sure, in some cases, it might make sense or hybrid or whatnot. But I think people get caught up in some specific technology all the time. Yeah, but I think that's okay. But I'm still trying to validate the presence of knowledge graphs in LLM applications because obviously with LLMs, it is much easier, better to create these entity triplets and all that. So theoretically, it should be better. Yeah, yeah. In the past, knowledge graph has been a dirty word.
Starting point is 00:23:45 But now maybe it's not. Maybe, maybe, maybe, I think with LLMs, you can do a lot more things around data generation, you know, in general. So generating those triplets is a bottleneck, right? It's been a bottleneck. But now we have LAMs. So I agree, you know, now it could be easier to actually build what matters, right, which is those triplets.
Starting point is 00:24:07 Yeah. Okay. Awesome. Any other opportunities that you find? I know that you mentioned Gina. I think they're prominent European startup in, you know. rag. And then I think over here, Voyage just got acquired by Nvidia. You know, anything on the embedding side, like, do we need
Starting point is 00:24:25 a lot better embedding models? Do we, this is what we have from the big labs, good enough? Oh, I hope to see, I mean, Voyage was really leading the pack on doing domain specific embedding models, like legal PDFs. And what I want to see is more embedding models in that direction, where you essentially represent this PDF as an embedding or multiple embeddings for legal domain or finance or health. I hope to see that grow so that you can have a better starting point
Starting point is 00:25:01 than just those text models. And I've been a huge believer in using visual language models as a backbone for embedding models where you essentially take a screenshot of a page. You don't have to go through OCR, so you then get a much richer representation. You don't have to go to these complex processing pipelines. So I hope to see more innovation. I'm not sure if it's going to happen because I think it's a difficult business model to be in
Starting point is 00:25:28 because you have to have an API-based service and you have to do batching and you have to make up for the compute. And then, you know, are people willing to pay for it? And I think maybe that's why Voyage got acquired. I think also Gina is doing a lot of. of great things in this space now, especially in European languages. But I think every company is trying to move up in the value ladder, right? They want to move into enterprise search or move into a different direction.
Starting point is 00:25:55 So, yeah, but I do hope that we will see more and better, like general embedding models. Yeah, yeah. I mean, I'm sure, I think the voyage guys are very happy because it seems like they got acquired for a lot. Yeah. Okay. So, okay, cool. Anything else before we wrap, any calls to action?
Starting point is 00:26:13 any, you know, partying rants on the topics of the day? No, I would love. I mean, if you want to connect with me, you know, for the audience, you can find me on X under the handle Joe Bergen there. So I love a show that's shout out on the on X. So I hang out there quite sometimes. Yeah. Yeah, I mean, it's where the AI community is, you know.
Starting point is 00:26:33 Although I've been, I'm always trying to like grow on LinkedIn or YouTube. I mean, you know, there's a lot more people there, you know. There's Twitter show like this, this like echo chamber. Yeah, but it's not the same. I mean, X, us. I mean, we wouldn't have this meeting me and you without X there, right? So it's a great place for really high signal to noise. And I think the AI community there is really great.
Starting point is 00:26:57 Yeah, awesome. Well, thank you. Thank you so much for having this. It's been awesome.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.