Software Huddle - Vector Databases with Bob van Luijt

Starting point is 00:00:00 What is it that makes Weviate special in comparison to some of these other things that are on the market? The big difference, especially for Weviate, is what we like to call a focus on AI native. We do a lot of education, right? So we share with the outside world, like this is how you build X, Y, and Z. And we spend a lot of effort on that to educate people how to build AI native applications. That is directly inspired through how fashion brands position new products in the market. So there's a direct correlation between fashion and building infrastructure businesses. And then what types of searches are vector databases great at?

Starting point is 00:00:41 And then what are the limitations? Where are they not the right solution, essentially? A pure vector search, so with nothing else, a pure vector search is kind of like casting a net in the sea. Hey everyone, Sean here. And today we have Bob Van Luij, the CEO and founder of Weeviate on the show. I thought this was a really interesting conversation.

Starting point is 00:01:04 You know, Bob talks about building AI native applications and what that means, the role of vector database will play in the future of AI applications and how Weeviate actually works under the hood. We also get into why a specialized vector database is needed versus using vectors as a feature within a conventional database.

Starting point is 00:01:21 I love talking to Bob and hopefully we'll have him back down the road. We barely scratched the surface. And if you enjoy this conversation and the we'll have him back down the road. We barely scratched the surface. And if you enjoy this conversation and the show, please leave us a positive rating and review. And if you ever have questions or suggestions, please hit me or Alex up on social media. All right, let's get you over to the interview with Bob. Bob, welcome to Software Huddle. Thanks for having me, Sean. Great to be here. Yeah, thanks so much for doing this. So I was digging into your background a little bit in preparation for this.

Starting point is 00:01:47 And, you know, I had heard that you kind of you started coding as a kid, but then it looks like you later went on to actually study music in college. So I was kind of curious about like, what was the original sort of goal or dream with pursuing something in music? Oh, the question assumes that there was a goal and and there wasn't so the um the yeah and this is sometimes when i when i talk you know to especially to younger people it's like when i was like young and as in like uh i mean like 15 right that kind of age like 14 15 you know until my 20s that was just a lot of stuff that I liked

Starting point is 00:02:29 doing right so and that was just a you know I'm born with a couple of just a handful of gifts I guess and one of them was music because it turned out that I understood how that worked so when I auditioned to study music, I was

Starting point is 00:02:45 accepted at the conservatory. And simultaneously, I was writing software, and I kind of figured out how to do that on my own, and also around that age. And I also like building businesses around that. To be clear, we're not talking about huge businesses. It was

Starting point is 00:03:04 all very small. It was just a, you know, a young kid, you know, trying to figure stuff out online. And all these things kind of came together, but there was never a, hey, I am very interested in, you know, there's still, I'm still in art, you know, certain types of literature, certain types of, you know, contemporary art, music, software, those kind of things. I like that kind of stuff. I mean, I was the other day, I was like, I was browsing through Hacker News. And then I see like, hey, there was like a whole article about the MIDI protocol right so between synthesizers and software and I go hey that's cool I love that stuff so it's like I just like these kind of things and if you distill it I like to make stuff right so I like to make and now when I'm a bit older 38 now that kind of morphed into you know software business building those kind of things but the the original

Starting point is 00:04:05 um yeah i guess part of my personality that this original thing of like just wanting to build stuff and explore stuff that is unchanged that is still the same so there's no there was no goal there was just like hey this is super exciting i'm i want to be part of this i want to i want to fix and what makes me very proud is like I as part of that I studied in because I'm Dutch so I studied in the Netherlands

Starting point is 00:04:29 but then I studied for a period of time in Boston and there's like now musicians that I literally hear on the radio or that kind of stuff

Starting point is 00:04:37 that were in school with me there so it's like I'm super proud of that so it's just it's just you know riding the wave of life.

Starting point is 00:04:45 And then that just takes you in certain directions. No, absolutely. That's really cool. And then what was your instrument? So I started with bass guitar. So because it's something you could study. And I was very early on interested in jazz. So I like very improvised music, you'd say so that's i started

Starting point is 00:05:06 that and then later when you do that like a lot when you study a lot then always you um you always end up everybody ends up who studies bass guitar with the same thing and that those are the the the cello suites of bach so and then at some point you do all these things on your bass guitar that's you know that was like tremendous fun right so like a lot of jazz and classical music when i was older some some composition contemporary music which was wonderful i had a at a at a blast was if i could go back in a time machine i would do it again yeah i would buy more bitcoin though yeah yeah if you get a time machine there's probably a bunch of stuff that we both do. A little Bitcoin, maybe invest in NVIDIA 10 years ago or something like that.

Starting point is 00:05:51 That kind of stuff. I would change that, but for the rest, I wouldn't change anything. Yeah. How did you go from that to founding WeD you know, interested in the vector database space. Um, so when I was done studying, um, I, I wrote software. Actually,

Starting point is 00:06:10 I was actually doing art projects that I funded myself by writing software. And, um, in a, in a journey on like, you know, just software consult, just freelance,

Starting point is 00:06:19 you know, I'm small, small company. And I ended up at a publisher and, um, that, um that I was working on something completely different related to e-commerce, but back in 2015,

Starting point is 00:06:29 they were looking at building new products with machine learning. So that was when I was introduced to Glove. Glove, it's still from Stanford, it's still on GitHub, where you could download the CSV file with individual boards and then the embeddings, the vector embeddings related to these words.

Starting point is 00:06:47 So that was my introduction to that. And I started to play around with that myself. And I had this little idea that I was like, hey, wait a second. If it's in space, then I can take words that might be in a paragraph, calculate a centroid, and store that information. Now, bear in mind, it's very important to know this is all pre-transformers that didn't exist yet. words that might be in a paragraph, calculate a centroid, and store that information. Now, bear in mind, it's very important to know this is all pre-transformers

Starting point is 00:07:08 that didn't exist yet. But I was like, hey, this is super awesome. So I started to play around with that. I started to present it to people. Actually, my first use case was knowledge graphs in vector space. That was for a long time. People didn't get it.

Starting point is 00:07:25 I was like, I remember the first, I even organized my own meetups that again, I funded myself to travel by writing software. And I was like, my first meetup in New York, there was like exactly zero people showed up. Right. And, and now, and luckily now it's different, but back then that was the case. And, and then Transformers came on the scene. I met my co-founder.

Starting point is 00:07:45 We figured out, hey, there's room in the, we believe that there's room in the market to build a database specifically for machine learning. And then the whole AI wave took off. And we're just riding that wave. And my co-founder, HN, turned into CTO. So really focusing with the team on the core database. And I became interested in the business side of that.

Starting point is 00:08:10 So I really doubled down on the business side. Like, how do you build a new infrastructure company in a new emerging market? So that's my focus right now. Yeah. And then was there some, you know, going back to that, you know, sort of early introduction to embeddings from Stanford, was there also some inspiration or, you know, fascination with semantic web? Like that was also kind of that era of, you know, people were talking about RDF, triples and all this sort of stuff. And that being sort of the future of how we understand the relationship between objects and concepts. Yeah, so i've

Starting point is 00:08:45 absolutely went through my um semantic web period right and the um and uh so the first use case that i would and and bear in mind i'm i'm using the word weviate now but it it's by no means was what it is today but the uh my first use case for weefy was an iot related use case because um i was working on a um it was like a smart building project where you had like different types of elevators right from different vendors from different oems and the problem was that the data that they were sending in or through apis or old-fashioned csv files the definitions of things in these um in these in these files were between these apis they meant the same thing but they were not expressed in a similar way so we created yes we created an ontology to describe what was in there

Starting point is 00:09:40 but the problem was we couldn't make these relations and then i was like hey wait a second what if we use the embeddings to determine the relation in the data set and so there's absolutely an linked data origin story and if you go on youtube you even find a video on the google cloud a youtube channel from maybe four years ago or something, where I referred to WeaveYate as a knowledge graph. And because the term vector database didn't exist. And what we were doing was storing data objects, JSON data objects, with a vector embedding representing that data object. That is unchanged. That is still what it is today. But it started to focus more and more

Starting point is 00:10:28 on the actual vector embedding rather than the data object. Long story short, yes, there's absolutely an original story in the semantic web. And what's interesting is that we're going full circle because now everybody talks about vector embeddings from knowledge graphs and and i can proudly say well you know i have some videos and material from

Starting point is 00:10:50 that from like four years ago so it's nice to see they go full circle yeah i mean i feel like that those waves and technology is always like what what is old is new again it's like fashion essentially like things coming in in and out out of fashion when it comes to like programming languages, frameworks, all these types of things. Even if you look at neural networks, neural networks, there was the sort of AI winter phase where no one was doing any research on neural networks outside of like Canada and a few other places. And now, of course, it's like the only thing that people are focused on. So these things like admin flow you know if i can double click even on that so it's like a in the way that we built business right now around the database i'm getting a lot of inspiration from the fashion industry actually because um um uh one thing that that people might be listening to this who might be familiar with

Starting point is 00:11:42 vp8 or not maybe they look it up after listening to this, but they'll see that we do a lot of education, right? So we share with the outside world, like this is how you build X, Y, and Z. And we spend a lot of effort on that to educate people how to build AI-native applications. That is directly inspired through how fashion brands position new products in the market. So there's a direct correlation between fashion

Starting point is 00:12:08 and building infrastructure businesses. It's probably a necessary step too when you're building essentially like a new category of infrastructure. Like a lot of people, now I think more and more people are aware of like what a vector database, or at least like have heard the term

Starting point is 00:12:24 over the last year and a half. before that i'm sure you know most people couldn't didn't know what a vector was you know i learned like a vector database and it's growing but there's still a need essentially to educate the market and i think the good thing is with the explosion of things like chat gpt and everything that's happening with transformers and LLMs, it is creating this sort of market force where people are really curious and they want to know what is going on and, and you know, learn more and more about it. I agree. Yeah. A hundred percent. In terms of Weave, like you were, you know, pretty early to the space,

Starting point is 00:13:00 but now it's, you know, quickly's quickly becoming like a crowded space. I'm sure that was maybe unexpected from where you started, but there's like over, I think like 60 plus databases now that offer some level of vector support from the like specialized vector databases to things like Postgres and MongoDB that have extended to support vectors. Like what is it that makes Weviate special

Starting point is 00:13:22 in comparison to some of these other things that are on the market? Sure. So the first thing is I think that's a good thing that this is happening because a vector embedding is a data type, the vector embedding itself. So it's like an integer, a string, a floating point, whatever. It's an array

Starting point is 00:13:40 of floating points. And so what we start to see is that the majority of the of that number right let's let's say you mentioned 60 i i lost count so but so let's let's work with let's take that as a working assumption right 60. so the majority of these 60 um they use the effect from betting as a um as a feature great i mean in weviate you can make a graph connection if you want right that doesn't make with the graph database it's just a feature. Great. I mean, in Weavey8, you can make a graph connection if you want, right? That doesn't make Weavey8 a graph database. It's just a feature that we have, but for certain use cases, it helps people to do whatever we want to do. The big difference, especially for Weavey8,

Starting point is 00:14:15 is what we like to call a focus on AI native. And how I always explain that is very simple. You have two types of applications. So first of all, I believe AI is going to be in everything, right? And now you have two types of applications. Application number one is the application where you just, as I like to call it, you sprinkle some AI on your application. So a little bit of vector search, maybe generative stuff. Great. But if I would take the AI out of that application,

Starting point is 00:14:43 it's still there. It just misses a feature, right? The other hand, you have applications that if would take the AI out of that application, it's still there. It just misses a feature, right? The other hand, you have applications that if you take the AI out of the application, it's just gone. It doesn't exist anymore. That's what we call an AI-native application. And that's what we focus on when we review it.

Starting point is 00:14:57 And how do we do that? At the heart, we have the database. And the database, of course, has its core defector index. That's where it's important. But everything we built around that hybrid search functionality, this multi-tenancy functionality, these different types of vector embeddings and recommendation engines, the education,

Starting point is 00:15:17 all stuff that we do, and so on and so forth, is giving people the tools to work with, to build AI native applications. And a metaphor I often use, like think about it like GitLab. So when GitLab started, it was just Git, right? It was just version control. Today, GitLab is there for your whole, for your DevOps needs, right? If you're a bank and you there for your DevOps needs, right?

Starting point is 00:15:45 If you're a bank and you have big DevOps needs, GitLab is your go-to solution. That's what we're doing for AI. So the metaphors where they started with Git, we started with factor search, but we're building this whole ecosystem around it to build AI-native apps. So that's the big difference between just having it as a feature versus having it as

Starting point is 00:16:10 a core element of the ecosystem that you're building. And what's an example of some AI native apps? I feel like a lot of the stuff that we're doing with AI at the moment feels a little like bolted on. We're kind of like shoehorning co-pilots into everything. And some of those, it's like early days of internet

Starting point is 00:16:33 or early days of mobile where like people are just kind of like experimenting and trying to figure out like what's going to work. And it took a little while before you really had truly sort of native internet experiences

Starting point is 00:16:42 or native cloud is another example. Like I remember when I spoke to Bob Moogly, the ex-CEO of Snowflake, like one of the things I asked him was, was there pressure in the early days to do what they were doing on-prem? And was that a hard decision to essentially stay cloud native? And he said it wasn't a hard decision because they didn't know how to do this outside of the cloud. So that was a very good example of something that couldn't even exist pre-cloud. So what are some things that you're seeing that couldn't exist without AI being the core part of the application? Sure.

Starting point is 00:17:17 I think the easiest way to explain that is by example. Let's say that you have a um a customer engagement system right so you know support tickets something like that an example of just adding um ai as a feature might be that you get a support ticket in you vectorize it and um and you try to predict what labels should be attached to the, you know, so what department the question should go to or label or severity, those kind of things. That's a beautiful vector surgery case. And that's a feature.

Starting point is 00:17:56 It's a great feature. If you would remove it, then, you know, you lose a feature. But yeah, that's, you know, that's what it is. Now, if we now take the same system, but we turn it into an AI-native solution, still the request comes in, still it gets vectorized. But now actually in harmony with the generative model, you start to create what we call generative feedback loops, where actually the model knows how to query the database,

Starting point is 00:18:25 requests more information from the database, and writes a response to the person asking the question. Now, if you would take that out of the system, your system just doesn't work anymore. So it's the actual models, that's AI-native, it's the actual models looking in the database to find relative information, formulating an answer,

Starting point is 00:18:43 and responding almost near real-time. I mean, it's like we can't do real-time because the models aren't there yet, but let's say near real-time, respond to the customer with an answer. That's the difference, right? So in the first one, it's just a feature. And the second one is truly AI native. If I may add one thing, for the first one, you're probably not going to build a business around it, right? So it's a feature. You're like, hey, we have like a customer engagement tool and we're going to add this feature. The second one, that's

Starting point is 00:19:11 a business, right? They can say, hey, we can now use the vector database like we did in combination with the models and just create a completely new application around it that automatically responds based on customer requests or automatically generates a knowledge base and so on and so forth. So that's the

Starting point is 00:19:28 big difference. And with this idea of generative feedback loops, like, do you see that as sort of the like, natural evolution beyond, like, where we are with RAG today? Because, like, RAG is, you know, it works, people are creating applications with it, but in a lot of ways, it feels kind of, like,

Starting point is 00:19:44 hacky. You know, you're kind you're kind of like stuffing things into the context to give additional information to the prompt. And then it takes a lot of sort of like testing and iterating to get to a place where you're, you're, you're giving the right amount of context and not too much context and all this sort of stuff. But if there was a little bit, uh, I don't know, like hacky, I think is like the best word I can describe it with. Do you see this as sort of like the next days of that? No, no. So I agree.

Starting point is 00:20:11 First of all, I agree with that. And the second thing is there are actually two separate things. So if we separate them up, the generative feedback loop is just a concept of directly integrating or giving the generative model access to the database. That's the concept of the generative feedback loop. That is still something you can do in a hacky way, basically. But that's the concept. When it comes to RAC itself, and if we look at the original paper of RAC, right, the thing is like, can we do retrieval directly from a database based on vector embeddings while we're generating something? Because it's retrieval augmented generation, right? So it's in the abbreviation.

Starting point is 00:21:00 And we're doing a lot of work there too. So what you refer to as the hacky way, which again, I agree with, is that we get some data from our database, we shove it in the context window, and then we keep our fingers crossed and we hope for the best, right? That's very primitive, right? So what's actually happening here,

Starting point is 00:21:21 and this is something that I'm a big believer in, is that the, and this might sound like a sidetrack, but I'll promise I'll bring it back to your question, is that when the context windows started to increase, right, so now we have a million, and so, you know, soon we're going to probably have 10, 100, billion, whatever, right? So it's like,

Starting point is 00:21:40 we're going to get very close to an infinite context window. And one of the things that people start to say, oh, did the context window kill the vector database? And I think the answer is no, it doesn't. It's actually, I believe, the vector database becomes the context window. Because if you look what a context window does, is that it takes a word. Let's stick to LLMs, right? So let's put multimodality aside for a second.

Starting point is 00:22:12 So it takes a word, turns it into a token, gets a vector embedding for the token, creates a matrix, does a prediction, applies the transformer mechanism, does a prediction, turns it back into a token, turns it back into a word. That's very primitive. So what kind of technology is really good at doing that? Well, vector database. So this is what we abbreviate weaving together. What we mean with weaving the model and the database together is that we mean that the database becomes the context window. And the moment that the model becomes the context window,

Starting point is 00:22:48 you solve that primitive problem. So that is a, and in the slipstream of that, you solve latency issues and all those kinds of things. So it's operational issues because just storing a flat, how are you going to do that?

Starting point is 00:23:05 Sort of flat file or something of like a million tokens? I mean, that's kind of weird, right? So it just makes so much sense to do that more from the perspective of the database. And this is another example of something where it's not a feature. It's just core elements of the database, which is way more AI native, right? So the models and the database, or in this case,

Starting point is 00:23:26 they just start to slowly merge together. And then outside of RAG, what are the sort of main use cases for vector database today? Yeah, so that depends a little bit how you look at it, right? Because if you use the term use case, right? So it's like you can zoom in, you can zoom out, right? So the highest level, the use cases are in the order of that people have been exploring them is vector search, hybrid search, rank, chance feedback loops. And what do I mean with that, right?

Starting point is 00:24:03 So I mean that the first one I mentioned, that's the most that people have in production right now, right? So if you look at our customers, that's the most. And the next thing is hybrid search, right? Because they started to explore it a little bit later. And now we see the first bigger rec use cases. And then you have to the gens feedback loops. And what comes after that our use cases around our graph use cases.

Starting point is 00:24:27 And I don't mean graph as in traditional graph, but in graph neural networks, basically, where rather than creating the actual relations, you predict the relations, right? But that is stuff that's really new. So nobody has that in production yet. It's really new. So that's the difference. And then if you zoom in, if? So, but that is stuff that's really new. So nobody has that in production yet. It's really new. So that's the difference.

Starting point is 00:24:46 And then if you zoom in, if you say like, so what type of use cases do we see? So we see everything where it's like, it started all with the LLMs. So the LLMs are,

Starting point is 00:24:56 90% is text-based right now, right? And that is around stuff that is text-heavy. So we see a lot of legal use cases, a lot a lot of legal use cases, a lot of e-commerce use cases, a lot of ticket systems, tier systems, confluences,

Starting point is 00:25:14 and so on and so forth. That we see a lot. And people now start to bring the first agents into production. But that's more RAG related. So those kind of use cases we see a lot. And from an industry perspective, it's really all over, but especially where it's top-heavy on having a lot of unstructured data.

Starting point is 00:25:37 And then what types of searches are vector databases great at? And then what are the limitations like where are they not at the right solution essentially i am a pure vector search so with nothing else a pure vector search is kind of like casting a net in the sea right so you go like okay i want to i want to i want to catch a fish so you cast a net into the sea and surely you're gonna catch fish but um if you're looking for some type of fish right so the way that the stuff is organized in the net um uh is um that's not there right so it's like it doesn't rank anything it doesn't filter anything it none of these things and then there's's the problem that if you search for something

Starting point is 00:26:26 where the language you're using, and again, for the sake of argument, I'm going to stick with language models. The language you're looking for was not in the trained model, not part of the trained model. Then it's like a fish just slipping through the net. What's an example of language that was not in the trained model? That can be product IDs, right?

Starting point is 00:26:48 That can be names of people, those kind of things, right? So now the good news is that you first want to, you know, there are methods to close the holes in the net, and that's something you can do through hybrid search. So hybrid search, that mixes the best of both worlds together. In briefs, you can do that out of the box, right? And just like literally three lines of code. And so what it does, if I say like,

Starting point is 00:27:15 okay, what was the outcome of a ticket that was raised with ID 123 ABC? Then hybrid search will do very, sorry, the traditional search will do very well on ABC 123. And then the vector search will do very well on the whole sentence. So now we've solved that problem.

Starting point is 00:27:35 So the answer probably sits in the net. And now we need to start to do the re-ranking and the filtering. And you now see model providers that, so for example, Cohere has like a re-ranker model. At Weavey8, we do a lot of work to make that part of the database itself so that it can train based on the data that you have.

Starting point is 00:27:57 But what makes vector search very different from traditional forms of search is that it's more like casting casting a net and then somehow you need to organize stuff as efficiently as you can in the net that's something we do with like re-rankers and filters and hybrid search and those kind of things not sure if there's an answer to your question but no no that's great so i mean basically it's great at uh you know figuring out that two objects are related in some fashion and essentially the way it's doing that is it's it's great at figuring out that two objects are related in some fashion. And essentially, the way it's doing that is it's going to have these objects represented in high-dimensional space

Starting point is 00:28:30 and going to use some sort of similarity metric to determine they're close in space in some fashion. But it may be not as good at an ID lookup that a traditional database is fantastic at, and it's really what they're designed for so the key is like can you bring these two worlds together and serve the needs of something that is an index look up uh exact match while also being able to take advantage of these kind of fuzzy sort of similarity type of search searches that you also want to be able to do exactly that is that is correct and um and and so actually in fact the database is of the flavor search engine, right? Because storing a vector embedding is easy. You can do that.

Starting point is 00:29:12 That's just an array of floating points. The tricky part sits in the index and the index that's used to search and retrieve, right? So, and doing that as efficient and as optimized as possible, that's the trick, that's the core mode of the vector database. Yeah, so I want to get into the specifics around how WeGate is doing their vector indexing. So there's all kinds of different ways, essentially, you know, indexing, embeddings,

Starting point is 00:29:42 there's, you know, cluster-based techniques, there's tree-based index and stuff like that. So what is actually going on in order to generate the index behind the scenes for vectors represented in Weeby? Yeah, sure. So I think to answer the question, I think for the audience, it might be, who do not know, it might be helpful to quickly say something about what the problem is, what problem are we solving, right?

Starting point is 00:30:04 So if you think of vector embedding right and you can i mean you can think of a tiny vector embedding it's like three dimensions so you can imagine that it's in a in a three-dimensional space is that we want to know what's similar to our query so let's say that i have 10 objects in my three-dimensional space, and I have a query, right? So that's one three-dimensional representation. The way to figure out what's closest is that you take the query, three-dimensional object, and you compare it with the first one.

Starting point is 00:30:38 I say, what's the distance? And it gives you a distance. You look at the second one, you say, what's the distance? And the third one, the fourth one, fifth one, and so on and so forth. And then when you hit 10, you can just re-rank them and reorganize them based on distance. Like, what's the closest?

Starting point is 00:30:53 What's the furthest away? The problem with that is that that grows in a linear fashion. So if you now have 10 data objects, but like 10 million or 100 million, and you don million, or billion, and you don't have like three dimensions, but you might have like a thousand dimensions, then all of a sudden that takes a long time to actually, it is not real time anymore.

Starting point is 00:31:19 So these algorithms were invented, and the abbreviation is ANN and stands for the approximate nearest neighbor. But basically, we can come up with algorithms where you can quickly find nearest neighbors and it comes at a cost. And the cost that it comes at is approximation. So that's why, for example, when you look at these anand benchmarks they always talk about time versus approximation so the the more likely you want the algorithm to be to share the nearest neighbors the slower it becomes right and so this index like the way that you build it at the core of the database, the WeaveJet is built on an algorithm called HNSW. And the reason it's built on HNSW is because with HNSW, you can build full CRUD support, create, read, update, delete. That's something you want in a database because there are also vector algorithms that, for example, do not

Starting point is 00:32:25 support update or delete. So then you create the index. But if you want to change something, you have to recreate it again. That takes a lot of time. So that's why it's in HNSW. And then you need to figure out how do we sharp them? How do we scale them? And that is the trick

Starting point is 00:32:41 of the vector database. So that if you run, if you have a large use case with like a billion data objects with a billion vector embeddings attached to that, how do you safely and securely and reliably scale that for your use case? And that's what sits at the heart of the issue

Starting point is 00:33:01 and that's what's happening under the hood. And we use open source so people can see that. And nowadays we even have like the heart of the initiative. And that's what's happening under the hood. And we use open source so people can see that. And nowadays we even have like multiple types of these indices so that you can have multi-tenants and you can offload stuff and put it in memory or keep it on disk. And that call comes off trade-offs. But now, of course, price and those kinds of things also start to play a role. So we're pretty sophisticated in the things you can do

Starting point is 00:33:24 and how you can scale your factor indices. At what point, you know, you started off by sort of describing the brute force approach of, you know, we have 10 vectors and three-dimensional space, like we can just compare everything and figure out what's closest or maybe the five closest or whatever we need to do. At what point do you need essentially to introduce a vector index? Like how many vectors does it get where this like a brute force is essentially unscalable? So, I mean, so there are two answers to that question, right?

Starting point is 00:33:55 So you have a use case and whatever the audience can think of a use case, right? So whatever use case is, you might have a time limit in mind, right? So let's say that you have an e-commerce use case, then you might say, well, I need to be at least within, I don't know, 15 milliseconds, I need to be able to present something to my customer. Then it's easy math. So you can just say, well, I i got this number of data objects similarity scores brute force take me xyz so and you would be surprised how quickly you're at 50 milliseconds

Starting point is 00:34:31 so then you quickly need a vector index but the second part of the answer is more like when do you need a vector database and you need the database to keep it reliable because even if you go like you know i just have a couple of data objects, you know, I'm just going to write a little Python script, you know, that just, you know, does similarity search, nothing fancy. What if that server goes down?

Starting point is 00:34:56 You know, so a database also comes with all the operational stuff of making sure that it stays reliably available. So the question is not only when do you need the index, but also when do you need to make the database.

Starting point is 00:35:12 And then you just come to the conclusion very quickly that you're like, you know what, I'll just throw it in the database. Because if I am going to write this myself, and I want to run this in production, I need to also take care of the, you know, if shit hits the fan, I need to take care of that too. And I don't want to run this in production, I need to also take care of the, if shit hits the fan, I need to take care of that too.

Starting point is 00:35:27 And I don't want that. Yeah. Yeah, I guess this is a similar math that you would do with conventional databases. Like if you don't have much data, you could store it in a flat file and do a brute force lookup or something like that. And then eventually you're going to reach a point where you're like, oh, I actually need to order these things.

Starting point is 00:35:41 So then you're building your own indices that you need to maintain. And then at some point you're like, well, what actually need to order these things. So then you're building your own indices that you need to maintain. And then at some point, you're like, well, what happens if the file disappears? Or I corrupt the file or, like you said, the server goes down or something like that. And then do I want to... At that point, you're like, well, now I'm actually building a database and

Starting point is 00:35:57 a managed service on top of that. Is that something that makes sense for me to do? Exactly. There's this little joke in the industry where we say, you know, building a database takes 10 years. So the first weekend, it just takes a couple of beers and a couple of friends to come up with the API, and then the

Starting point is 00:36:13 last nine years, 11 months, and three weeks, you're actually building the database. So it's a... Building a database is hard, and the reason it's hard is because it needs to be reliable, and building reliable technology at scale, it's a it's a building there is a heart and that's and the reason it's hard because it needs to be

Starting point is 00:36:26 reliable and building reliable technology at scale it's just complex it's hard

Starting point is 00:36:31 to do yeah in in terms of sharding like how does the sharding

Starting point is 00:36:37 work for a vector database typically with a conventional database when you're you're doing sharding

Starting point is 00:36:42 you're you're picking essentially a you know sharding index for a particular you know table based on a column maybe it's your user id or something like that and then we're going to split with a conventional database when you're doing sharding, you're picking essentially a sharding index for a particular table based on a column. Maybe it's your user ID or something like that, and then we're going to split

Starting point is 00:36:49 based on the ordering of user IDs across different servers. How does that work in the world of vectors? That's an excellent question. It's actually a bonus question because it's like there's a double win in this question.

Starting point is 00:37:05 Because the answer to this question is the reason why factor databases exist in the first place. And because, as I mentioned, if it's just an index, why not add that to a library? Like, you know, where we have stuff like Lucene and those kind of things. Why not add it there? And that has to do with that the under the hood the we have a lot of content on this on the if you really want to go in depth on this then we have a lot of context and content around this on the on the website too but the the problem is this if you create a um a chart of a um a factory index at some point you need to merge them together

Starting point is 00:37:45 because otherwise the graph just keeps jumping back and forth and that's just tremendously slow. That's unusable. Some sharding mechanisms use a lot of tiny shards, right? So that they optimize for a lot of tiny ones. And for a vector database, you actually want to have a couple of big ones. So the sharding mechanism is around the bigger shards

Starting point is 00:38:05 that you want to merge together. That's very different than, for example, how the sharding mechanism within Lucene works. Long story short, it's an architectural difference. It's a completely new mechanism, how it's sharded. The rest,

Starting point is 00:38:22 so there's also in ReefShade, for example, inverted index and that kind of stuff, that is traditional. Tractor index itself is similar. And that's the reason why not only us, but also a couple of other people started to build Tractor database.

Starting point is 00:38:37 People were like, hey, besides the developer experience of interacting with the database and being different, also just the core, the architecture is different, right? How we built these kind of databases. And everybody who uses a traditional database for Vectra embeddings will run into this over time when they scale up.

Starting point is 00:39:00 It's just in the nature, and it's not because the existing indices are bad. No, they're amazing. It's just they never designed to it's not because the existing indices are bad no they're amazing it's just they never designed to deal with vector methods so that's the that is the why these databases exist and then in terms of like similarity computation between two vectors like there's different approaches there as well just like there's different approaches to vector indices like there's you know euclidean, Manhattan, cosine similarity, dot products, all these types of things that have been around for a long time in the machine learning space.

Starting point is 00:39:31 So how is that done from a developer experience? Is that something that I'm making a choice and I have to decide how I'm going to compare my vectors? Or is that something that's kind of like abstracted away and I don't really need to think about? Sometimes. So there are two answers to that question so the first part of the answer is that it depends on the person or the team that created the model right so the if you look at how so if you think about a space right so um um you somehow need to organize the data in that vector space.

Starting point is 00:40:09 And that can be done on a plane, that can be done in a sphere, that can be done on all kinds of different geometrical systems. It's super exciting. It gets very complex quickly too. The reason why these researchers are experimenting with that is because you want to figure out how can we come compress compress as much information in a dense space as as possible and um that dictates the um the distance metric that you that you need to use right so um if you're using a factor database and you're getting very weird results, you're probably using the wrong search of the algorithm, right? On the other end of the spectrum, the second part of the question

Starting point is 00:40:58 is that there are also compression algorithms, right? So a simple example is binary compression and the fact that a binary quantization works is still fascinating because it's such a simple concept what it basically does is that if you look at effect embedding it's a floating point that's a positive or negative number so the first one could be 1.2225 and the second one could be minus three dot whatever and what a binary quantization does it's just like if it's positive a floating point we make it a one if it's a negative one you make it a zero and what you basically do is that you then search the binary space so i believe um that I believe that the distance metric

Starting point is 00:41:45 you use for that is, I think it's Manhattan distance. I don't know exactly, but let's say it's one of those metrics you use. You retrieve a couple of them and then you do a traditional vector search

Starting point is 00:41:56 over these embeddings. And it turns actually out that if you do binary plus a little bit of brute force, you're actually faster than just doing it also in the vector space. So now it's a distance metric that you as the user define.

Starting point is 00:42:16 That said, we, as from a developer experience perspective, we try to do a lot of work for you there. So we, of course, course as a user you can choose whatever you want to use but we also try to predict and know what kind of distant metrics are needed for

Starting point is 00:42:34 you to search through the vector space that you're using and you might even be storing data from different vector spaces in the same database and you just want to do a query, right? So you just want to have like a four-line query and not be too concerned about that kind of stuff. So that's from a developer experience perspective, we take a lot of work off your plate there as well. Yeah, I mean, I just want to know, like, are these things related or based on this query?

Starting point is 00:43:02 What are the five things that are most similar within the database or something like that? The binary use case is interesting. I mean, it makes a lot of sense in terms of you're going to be able to compress the, rather than using a photo point number, suddenly you can essentially take multiple bits and stuff them into like a 64-bit long or something like that. And then you can take advantage of bit operators on those in order to do the fast lookup. And then that essentially compresses that, I guess, limits the search space that where you need to do the true vector similarity. So you're using that essentially as a technique to compress, or I don't want to overuse compress, but essentially create like a bounding box around the number of vectors

Starting point is 00:43:45 that you need to do the full vector similarity search against. Yes, and that is correct. And actually what you're saying now, actually something else pops into my mind as well, because inside, for example, Wavage also have traditional indices. But because the database is built from the ground up from scratch, we also could put a lot of innovations in the traditional indices. So like the inverted index is completely built based on bitmaps.

Starting point is 00:44:15 So that's a similar system, but then for traditional index. So you can benefit from all these kinds of innovations under one roof in one database. And it's exciting. It's very... What the core database team is building and the research team is very sophisticated. And then the trick is, as you mentioned, how do we take that obstruction away for the end user?

Starting point is 00:44:42 That is just a couple of lines of code and all that niceness comes out of the box. Yeah. I mean, I think it's a really fun and fascinating time to be involved in technology, especially if you're working in the AI space, because there's just so much going on there. There's boundless things to learn and try

Starting point is 00:44:58 and experiment with. I agree. And it's a... If somebody's listening to this podcast and considering starting something, the time is now, it's like, don't wait. Start building now because it's such an exciting time. People are excited. People are inventing new stuff.

Starting point is 00:45:20 Regardless if that's on the low level in the models or in the vector databases or on the other end of the spectrum when it comes to the application layer, so much interesting stuff is happening. It's like, this is the time to contribute to that and to create a new product or service that can ride the AI wave. It's very exciting. Yeah, absolutely. So we're getting close to time, but I want to start to... We have some quickfire questions for you that we kind of wrap

Starting point is 00:45:51 things up with all our guests, so we'll go quick here. So the first one, if you can master one skill that you don't have right now, what would it be? Thinking. What wastes the most time in your day? Reading boring news websites if you can invest in one company that's not your company who would it be and uh we already

Starting point is 00:46:13 acknowledged that we missed the bitcoin bandwagon um i would um invest in in cpu companies i think cpus are going to play a tremendous important role in model inference in the not-too-distant future. Yeah. What tool or technology could you not live without? Oh, this is going to be such a boring answer. Oh, my glasses. Which person influenced you the most in your career? I would not be doing what I do right now, thanks to the many,

Starting point is 00:46:49 many people that have advised me and helped me to get where we are right now. So I just, I can't single one person out. It's just, there are too many. No problem. And then the last one, five years from now, will we have more people writing code day to day or less? More. Awesome. Well, Bob, thanks so much for being here. I really enjoyed this. I feel like we barely scratched the surface here. Well, hopefully maybe we can have you back down the road. I'd love to talk to you about, you know, building an open source business and all kinds of other things as well. We'd love to come back.

Starting point is 00:47:12 Thank you so much for having me. I love the conversation. This is great and I hope people enjoy it. All right. Thanks. Cheers. Thank you.

Software Huddle - Vector Databases with Bob van Luijt

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.