Software Huddle - What is a Vector Database with Yujian Tang

Episode Date: March 26, 2024

Today's guest is Yujian Tang from Zilliz, one of the big players in the vector database market. This is the first episode in a series of episodes we’re doing on vectors and vector databases. We star...t with the basics, what is a vector? What are vector embeddings? How does vector search work? And why the heck do I even need a vector database? RAG models for customizing LLMs is where vector databases are getting a lot of their use. On the surface, it seems pretty simple, but in reality, there's a lot of tinkering that goes into taking RAG to production. Yujian explains some of the tripwires that you might run into and how to think through those problems. We think you're going to really enjoy this episode. Timestamps 02:08 Introduction 03:16 What is a Vector? 07:01 How does Vector Search work? 14:08 Why need a Vector database? 15:11 Use Cases 17:37 What is RAG? 20:34 RAG vs fine-tuning 29:51 Measuring Performance 32:32 Is RAG here to stay? 35:43 Milvus 37:17 History of Milvus 47:44 Rapid Fire X https://twitter.com/yujian_tang https://twitter.com/seanfalconer

Transcript
Discussion (0)
Starting point is 00:00:00 What is a vector? So vectors, in their most simple form, are just a list of numbers. And so there's many different types of vectors. There's two primary types when we talk about vector search. So one is dense vectors. So these are vectors that typically have a float value. And then there are sparse vectors, which are vectors that may have a lot of zeros. And so these are typically binary vectors.
Starting point is 00:00:29 Why do I actually need a vector database to work with vectors? Why can't I just put my vectors in MongoDB or some sort of traditional database? The reason why you would use a vector database that's purposely built is just because it's built to be able to have the architecture that will let you scale it up. If you could invest in one company that's not the company you work for, who would it be? Huggy Face. Huggy Face, all right. Anything else you'd like to share?
Starting point is 00:00:59 If you're interested in doing hackathons and you're in Seattle, hit me up. Hey, everyone. Welcome back to another episode of Software Huddle. My guest today is Eugene Tang from Zillow, one of the big players in the vector database market. This is the first episode in a series of episodes I'm doing on vectors and vector databases. Eugene and I start with the basics, what is a vector? What are vector embeddings? How does vector search work? And why the heck do I even need a vector database? RAG models for customizing alarms is where vector databases are getting a lot of their use.
Starting point is 00:01:29 On the surface, it seems pretty simple, but in reality, there's a lot of tinkering that goes into taking RAG to production. Eugene explains some of the tripwires that you might run into and how to think through those problems. I think you're going to really enjoy this one and hopefully the series.
Starting point is 00:01:44 And if you do, please leave us a positive rating or review, subscribe to the show, and feel free to hit me and Alex up on Twitter or LinkedIn. All right, let's get you over to the episode. Eugene, welcome to Software Huddle. Thanks, Sean. I'm glad to be here. Yeah, I'm excited to have you here. So we're hoping to do a whole series of episodes focused on vector databases, vector search right now. It's basically all about vectors. So I'm glad that we're kicking off our vector journey with you. But before we dive too deep into vectors, let's start with you.
Starting point is 00:02:14 Who are you? What do you do? Yeah, so my name is Yujin Tang. I am a developer advocate at Zillis. Zillis is a vector database company. Prior to this, I worked at IBM, Amazon, and published some papers to IEEE Big Data. Awesome. And then were you at all knowledgeable about vector databases and vectors before joining Zillis,
Starting point is 00:02:43 or was this something that was a completely new frontier for you? I had never heard of vector databases. I had heard of like feature stores and that kind of like data store. But I have a background in machine learning. So vectors were very familiar to me. I was like, ah, yes, okay, I know what these are. And after my first couple conversations
Starting point is 00:03:10 during the interview process, I was like, oh, okay, I understand what's going on here. Okay, great. Well, I think that's a good place to start. Let's start with the basics around a vector. What is a vector and what are vector embeddings and why do you need any of this for machine learning and AI? Yeah, yeah. So vectors in their most simple form are just a list of numbers.
Starting point is 00:03:33 That's really like all you need to know about vectors is list of numbers. And so there's many different types of vectors, right? So there's two primary types that we think about when we talk about vector search. So one is dense vectors. So these are vectors that typically have a float value. You know, these are basically real numbers. And then there are sparse vectors, which are vectors that may have a lot of zeros. And so these are typically binary vectors. These are typically just zeros and ones. Examples of algorithms that produce the sparse vectors include TF-IDF, which is a very popular natural language processing kind of algorithm, Splayed, and BM25.
Starting point is 00:04:25 And then the dense vectors are produced from machine learning models, actually. So what the dense vectors represent the semantic meaning of some type of input. And so the way that you get this is you have your input and you feed it into a model that has been trained on that type of input. And at the end, instead of having the model do some sort of prediction or classification or something, you cut off the last layer and you just take the output from the second to last layer. And that's your vector embedding. And that contains all of what the model has learned about the input in the form of numbers. Okay. And then you mentioned TF-IDF, so term frequency, inverse document frequency, which is something that I used back in my ML days, which is quite some time ago.
Starting point is 00:05:14 Is that actually still widely used in a popular method? Not really. At least I don't really hear about it used a lot, but it is one way that you can get like these like sparse vectors right because then you can see like oh like how how often is this like popping up compared to like other words and how many documents there are yeah absolutely you get basically like a term that doesn't you know come across in very many documents you're you're going to end up with a lot of this very long vectors with not necessarily a lot of numbers.
Starting point is 00:05:47 And then in terms of embeddings, how do they actually preserve semantic information about the data and its relationship to other similar types of inputs? Yeah. So the embedding, it works kind of like this. So you have a machine learning model, right? And from the beginning, your machine learning model is just a bunch of random weights. And as you train it, it starts to learn the patterns of the input data. And then that, the vector, I guess, the output of that second to last layer
Starting point is 00:06:24 creates a high dimensional latent space that learns what the relative patterns in the data that you've given it look like. So, for example, if we talk about image data, maybe you're feeding it a bunch of pictures of different cats and dogs and I don't know, like turtles or something like that. And it's like learning that, oh, there's these like animals there. And so that is kind of just how it preserves it. It's just like this output knows that there's this type of animal there. And that's how it's encoded into this, this machine learning model. And then that's how we can decode it, basically. And then how does vector search work once you've essentially created these representations of these real-world objects? And how is that different than maybe conventional search? Yeah, yeah, yeah.
Starting point is 00:07:13 So conventional search that we have right now is like, let's say you're working with databases, typically what you're doing is some sort of like, okay, find me all the things that have like this ID value that also have these attribute values. And that's like a basically like you're doing key to key matching. It's very much like you need a direct match. And so vector search is all about finding the nearest neighbors, because it is very, you pretty much don't get the same vector embeddings ever. And unless you're embedding the same thing. And so vector search is all about taking these two long lists of numbers and doing that like compute to find like what is the distance between these vectors. And so vector databases like Milvus are kind of built and optimized to be able to effectively and efficiently do this kind of compute. And there's many ways to
Starting point is 00:08:14 compare these vectors. So there's L2, which is basically physical distance in space. It's like if you have a triangle, you can think of the the hypotenuse. There's cosine, which like if you think of vectors as lines pointing in space, cosine is like the angle between them. And then there's IP, which if you think of vectors as lines pointing in space, IP is the projection of one vector onto another. So then you can think of if one of the vectors was a hypotenuse, the other was the leg of a triangle, it's the other leg of that right triangle. So those are like different ways to measure vector distances. And interestingly, if you transform all of these on normalized vectors, they all give you the same rank order. They basically all come out to the same thing.
Starting point is 00:09:04 So all the nearest, like your top k K will pretty much always be the same. And let's see. Oh, okay. So then this is how you compare vectors. But when you get to like large scales of a large number of vectors, you're going to want to do this thing called indexing. Well, most people who work with SQL databases are probably also familiar with indexing. This is a different type of indexing. So this indexing is creating essentially a map of the vector space that you are using. And there are a few indexes that are very popular.
Starting point is 00:09:43 Milvus has 11. I think we can touch on three here. So one would be IVF, which is inverted file index. This is your most intuitive type of vector index. This is essentially doing a clustering, a K-means clustering. Like, let's say like, oh, I think there's 128 different categories in this vector data, then I'm going to do 128 different clusters. Something kind of like that. And basically, the way that works at query time is you only know the centroids initially,
Starting point is 00:10:18 and you find the closest centroids, and then you dig in and you find the closest vectors. And then there's HNSW. So this is Hierarchical Navigable Small Worlds, which is a mouthful. And basically what this is, is this is a graph index and it's like a layered graph index. So as you insert your vectors, you get a uniform random variable
Starting point is 00:10:42 and that variable will tell you what layer it gets inserted up to. And you get to determine what that is. And then the third one that would be interesting to talk about would be scan, which is a, well, it's called scalable nearest neighbors, which is kind of an interesting name. But basically, it quantizes the vectors, and you only search the quantized space, and then you search the the actual space. So it's kind of like IVF. And by quantized the vectors, what does that mean? Ah, yes. So for example, let's say you have the set of real numbers from 0 to 10. A quantization of that would be like the integers from 0 to 10.
Starting point is 00:11:33 So you would bucket all of 0 to 0.5 into 0 and 0.5 to 1.5 into 1 and so on and so on. So quantization is just that kind of bucketing process. And then that presumably helps with compute because then you're dealing with integers rather than real numbers? Yes. And it also just makes it so that you have like, oh yes. So there's, there's like the flow 64 to like the int eight or the int 16 kind of like reduction. Um, but it also just makes it so that you have like a smaller possible vector space to, to initially search. Right. I see. And then going back to the different ways of actually comparing vectors, are there pros and cons to using those different approaches like, you know, cosine versus a
Starting point is 00:12:10 projection or something like that? Like how, how are those choices made? Are people using a combination of those things? How does all that sort of stuff work? Um, so that would depend on the type of data that you have, the type of data that you are working on. But the way that I kind of think about it is inner product IP, the projection, is actually the most computationally inexpensive. So I kind of just like that because, you know, it's nice to have that. But L2 is a very, very popular one.
Starting point is 00:12:43 And L2 measures what I would call like semantic, like distance in semantic meaning. So maybe, it's really tough to kind of like give like really good examples of this, but cosine measures difference of orientation in semantic meaning. And cosine is much more commonly used in natural language processing. It's kind of like the example that I use in one of my blogs is apples and oranges and how you can actually compare apples and oranges and how far apart they are in space. And maybe you can say that apple pie is closer to apples than it is to oranges.
Starting point is 00:13:29 Yeah, I guess it depends on what you're trying to achieve and what the context is. Like if you are thinking about purely like fruit, then apples are maybe closer to oranges. But if you're thinking about apple as a pure ingredient, then the composition of an apple pie actually has apple in it, whereas the composition of an orange does not or something like that. Yeah, yeah, exactly. So yes, that's a really good point. All of it also does come back to the actual latent space that your vectors embed,
Starting point is 00:14:00 because you can't compare things that don't exist in that latent space. And then bringing this all together back to a vector database, why do I actually need a vector database to work with vectors? Why can't I just put my vectors in MongoDB or some sort of traditional database? Yeah, so you can. You can definitely put vectors into any database. Vectors are just a data type. The reason why you would use a vector database that's purposely built is just because it's built to be able to have the architecture that will let you scale it up. And it has a purpose-built architecture to work with these kinds of vectors. And, you know, traditional databases aren't designed to work like that traditional databases are designed to match these key value pairs. And so they would also have to
Starting point is 00:15:00 add like extra layers on top of that to even achieve anywhere near the similar type of performance, just based on the hardware type that you would regularly need. Okay. And then what are the use cases for a vector database? I think they've become very popular because of retrieval augmented generation or RAG, which we'll get into. But outside of RAG, are there use cases of vector database? Yes.
Starting point is 00:15:30 So before RAG, so Zillow got started in 2017. And prior to RAG, in the early 2020s, the main thing that we saw people use factory databases for, that we saw people use Mildus for basically is product recommendations. So products are these multi kind of like, like unstructured things, entities, right? So there's like product descriptions and there's like pictures and all these different things. And so people want to be able to compare,
Starting point is 00:16:12 like not just like, oh, like what is the product tag, but also like what's in the description or what are in the images. And so that's kind of the example of, that's like probably the most prominent example of production usage of vector databases. Another one that is kind of interesting is that people use vector databases for AI drug discovery. And that is a different use case than the others, because unlike, let's say, RAG or product recommendation, you are not using a lot of search all the time.
Starting point is 00:16:52 And what those people do actually is they insert a ton of data and then they run search just a few times a year. So these are some of the different use cases. And you can see that these have different, the way that people use the database is also kind of different. And so we also think about how do you balance this out. OK. And then let's talk about RAG, where I think is probably what really brought the idea of the vector database to the forefront where, you know, everybody kind of knows in some capacity of the concept of vector database, which I think like two years ago, I think wouldn't be the case. It was a little bit more niche. So what is RAG and then kind of like how does the vector database come into play when we're building something like a RAG model? Yeah. So I think vector databases are still surprisingly unknown, even given the popularity
Starting point is 00:17:51 of RAG. At my talks, I often ask people who knows what vector databases are, and still most people are like, I don't know what you're talking about. Okay, maybe I'm overestimating the hype cycle for vector databases. I think it's because we work in this AI space, right? So the people that we know probably know this kind of stuff. But RAG stands for Retrieval Augmented Generation, and it is exactly what it sounds like. It is when you use data that you retrieve to augment generative AI and what it generates. And so basically the way RAG works is you have some sort of pre-trained LLM,
Starting point is 00:18:32 preferably a very powerful one such as, you know, GPT-4 or mixed stroll or LLAMA2 or something like that. And then you basically want to interface with the LLM but the LLM doesn't have access to your data. And so the way that you get your data to feed into the LLM is you put your data, you vectorize your data using an embedding model and then you put those embeddings into a vector database and you have the vector database kind of sit on top of
Starting point is 00:19:04 or in between like the LLM queries. And so what happens is then the user comes, they ask the LLM a question, they interface with the LLM, they say, hey, blah, blah, blah, query. And then the LLM transforms that query into whatever is needed to make something semantically similar for it. And then it goes into the vector database and says, hey, tell me about this. And it pulls that data back up, and it uses that into the context, into the prompt again.
Starting point is 00:19:32 And it basically says, answer the question now that we know this context. And then it gives you a human readable response. And so that is the RAG process. And then why do we need the RAG process versus just relying on the foundation model? We basically use RAG and vector databases to inject data. You can't really expect large language models or foundation models to keep up with all of the data. And you don't want them to have your private data. And so that's when you would do something like this. Yeah. So essentially a lot of times the foundation model is basically fixed at a certain epoch,
Starting point is 00:20:12 and then you can use RAG to augment it so that you can use something that's maybe more real-time or has happened more recently to get additional context. And then as well as domain specific stuff so that I might be able to disambiguate certain acronyms that are you know relate to the type of query that i'm putting in or whatever it is i need to perform is that right yes yeah yeah and how does this compare to something like fine tuning which is another way of sort of adjusting the foundation model yeah uh so fine tuning and rag have two different kind of um cases, I would say. So RAG is more for when you just have your data and you want the model some more or train some piece of the model or some layer of the model or some set of layers of the model or whatever on your data. And then you can kind of expect the model has learned a little bit about your data.
Starting point is 00:21:20 So the thing with fine-tuning is that unless you are a very large corporation that has access to a lot of GPUs and a lot of money and a lot of time, you are unlikely to be able to inject enough factual data via fine tuning to get the factual data responses back that you want. But what it does do is it does allow you to kind of inject something like a little bit of like context into the, into the foundational model. So some techniques, for example, just fine tune, like, you know,
Starting point is 00:22:03 like the last few layers, right? So then you're basically injecting some sort of context into the model. And you can use these together. For example, you can have a model that perhaps acts like, talks like Taylor Swift and knows everything about machine learning. Okay. And then, so I feel like in principle, when you describe rag,
Starting point is 00:22:28 like it sounds fairly simplistic. Like I, you know, I put in a prompt, I, you know, vectorize it. I run it against my vector database.
Starting point is 00:22:35 I pull back related documents and then I add that as the context and, you know, magic basically happens. But what I know that is much more more complicated than that and there's a lot of like fiddling to actually get these systems to work so like what are some of the like things that make this difficult like where do what are the problems or landlines that people end up stepping on and have to navigate when they're actually building like a rag model for like something that's not just demo where they're actually doing this for something like production? Yeah. So number one is data pre-processing is pretty important. So for text-based RAG,
Starting point is 00:23:15 you basically need to ensure that the way that you chunk your text up, that is like, you know, kind of, I guess, decide how many characters you want to have in one chunk. When you chunk your text up, that's very important. It has to maintain context as well as have enough semantic meaning to it to make sense. So that's one thing that's important is like chunk size. And then another thing is like chunk overlap. So for example, sometimes you will want your chunks to overlap by some amount in order to perhaps context. So for example, if you have something that is like a Q&A, or maybe you have a, you know, a customer service
Starting point is 00:24:16 chat transcript, and you're like, oh, well, you know, like the customer is complaining about this, and it's like very, very in length. And then the sales rep is giving this kind of advice or the blah, blah, blah. And so you have something that's very, very in length. Then you'll want something perhaps like a special character splitter. So something that can look at what the characters are and say like, hey, actually, this is one semantically sound chunk and let's cut it off. So that is number one. And then number two is getting an embeddings model.
Starting point is 00:24:54 So you have to pick the right embeddings model and you probably, when you are putting something into production, you're going to want an embeddings model that is customized because there are generalized ones, but it's very unlikely that that is what you need. It is fine if you're just building a chatbot, I guess, but you're probably going to want it to have some context of your data. So there's embeddings models. And then beyond that, there are the way that you want to save your data or the way that you want to store your data. So metadata, so vector databases can store metadata. So there's
Starting point is 00:25:32 two types of there's two entries that has to go into each entry. So one is ID, and the other is vector embedding. And the rest of it is what we call metadata. So you can store metadata. And there's also a couple of interesting techniques that people use for this, including like storing the vector embedding for a sentence, but then actually storing the text for the larger paragraph. So this lets you pull all the context when you're finding similar vectors. And then people also do the other way around where you store the vector embedding for the entire paragraph. And then you just store the sentence so that even when you're pulling specific or sorry, so you can get like specific pieces of text when you're pulling something that has like ways or techniques to kind of get started with building this kind of rack stuff.
Starting point is 00:26:27 And then getting it into production is always hard because a lot of companies now can't use open AI. So you can't just like drop in an API key. So you got to like run it on your own. You got to like get some sort of like foundational model, maybe an open source one and run it on your own. You got to get some sort of foundational model, maybe an open source one, and run it on your own hardware. And then for essentially the chunking and figure, and sort of this dance that you have to do around the LLM token limits, is that something that a vector database helps you with? Or is that something that you just have to use some additional tooling or build something to figure out what that is going to be
Starting point is 00:27:08 for the particular problem that you're trying to solve. Yeah, so chunking is a pre-processing step to getting the vector. So vector databases kind of sit downstream from that. So I would say like first step is like you chunk up your text and then you get the vector and then you put it into a vector database. So vector databases don't help with that. So I would say like first step is like chunk up your text and then you get the vector and then you put it into a vector database. So vector databases don't help with that.
Starting point is 00:27:31 It's something that you kind of have to like figure out. So you can use tools to do this like lane chain and Lama index all offer ways that you can do this. And then the way, at least like current methods that I've seen for checking, like how good your chunking is, is really just put it into a basic rag app and like do some observability, use some sort of like tool,
Starting point is 00:27:59 like, um, you know, like arise, uh, Phoenix or like true lens, uh, true lens or whatever. Like there are many like tools out there Arise, Phoenix, or like TrueLens, Trera, TrueLens, or whatever.
Starting point is 00:28:05 Like, there are many, like, tools out there that people have built to do observability for RAG apps. Yeah, like, once you've actually created, you've done your chunking, you've created your embeddings, and let's say things are working reasonably well, but then as you're actually observing real users using the system, you realize you need to make some adjustments. Like how do you actually go back and make those adjustments without like
Starting point is 00:28:30 basically blowing everything away and starting brand new? Yeah. So the answer to that would be you basically do. You can take the user's data and you can, yeah, I mean, like you basically would almost have to. If you want to change the embedding space, you would basically have to say like, okay, well, we're going to retrain the embeddings model. Here we have new data that shows that our initial hypotheses was incorrect. And here's our new data and here's how it's actually going to work. And now we're going to have to re-embed everything. So you really don't want to have to go through that process. You want to be right about how users are going to interact with your ag app.
Starting point is 00:29:17 Yeah. Okay. And then the metadata portion, can you explain what exactly the metadata is? Is that just the text that's associated with the embedding? Is that what it's for? Metadata can be anything you want. So it can be the text. You can also add maybe the author of the text, the publication that the text is from, which paragraph it is, the section header, the date that it was published, all of these different attributes you can add as metadata. Okay.
Starting point is 00:29:52 And then how do you actually measure performance and what are the strategies for improving performance? Because essentially, inference is already an expensive process, and it's one of the hard parts, especially with the open source models, whether you have enough like hardware to answer the question in a reasonable amount of time for whatever the application is.
Starting point is 00:30:13 But now you're adding an additional step as well where you're doing the search of a vector database and pulling back that additional context to add into the prompt. Yeah, so we have, so A, you can run Milvus like on your own and kind of like see how that is. But in terms of optimization,
Starting point is 00:30:33 what you want to look for is you want to look for usage. And we have some built-in optimization that will kind of like do this for you. So for example, I don't know if this is in 2.3, this might be in 2.4. We have auto scaling.
Starting point is 00:30:49 So that will like detect like how, you know, how your usage is. And, you know, if you need to spin up more nodes. So Movis has this concept of nodes. So there are three different, I guess, areas of concern when you're doing search, and that would be the query. So how do you actually, you know, actually retrieving your data, the data ingestion part, getting the data into the database, and then the indexing piece, which is creating the
Starting point is 00:31:17 way that you retrieve your data. And so based on, you know, what it is you're doing, you can scale these different nodes up, up and down. And then the other thing is storage optimization, right? So Milvus stores data in 512 megabyte segments. You can change the segment size, but by default, we have 512 megabytes. And we also index over these segments. So what happens when you delete? So when you start deleting data, the segments start losing size. And that means that the indexes start losing efficiency. And so at a certain point when the segments have reduced to a certain size, Milvus will also do a cleanup where it will take segments and combine them again and re-index them again to be more
Starting point is 00:32:07 efficient, basically. And then the way that we kind of get around having to do things like re-indexing if you add a lot of data, which is a big problem if you are using like a mono-index, basically, is that we store these data in these segments, right? So because we build index over these smaller segments, we don't have to worry about re-indexing as we add more data into Milvus. So in the context of RAG, do you think it's here to stay,
Starting point is 00:32:36 or are there other strategies that are coming out of industry or research that are likely to replace the RAG model? I don't think anything's going to replace RAG in the upcoming, let's say, oh, three to five years. RAG will definitely continue to evolve. For example, last year, we saw a ton of people building text RAG. Everybody's building RAG on text.
Starting point is 00:33:01 Next step, we're going to be building multimodal RAG. And then soon, we're going to be building multimodal rag and then soon we're going to be building you know uh maybe auto rag i don't know like you know these different things that will kind of like build around rag it's kind of like uh you know like chatbots right like in 20 2010 or whatever 2012 2013 like these chatbots became popular on websites and they've been there ever since and rag is basically like oh guess what we're going to replace these chatbots became popular on websites and they've been there ever since. And RAG is basically like, oh, guess what? We're going to replace these chatbots now. That's the primary use that I've seen for RAG.
Starting point is 00:33:35 Yeah. I mean, I think the kind of baseline use case for LLMs and for RAGs is chatbots. But at some point, we're going to evolve beyond the chatbot. That's kind of like the hello world version of what you can do with generative AI, essentially. Yes, it has to. And then in terms of the vector database, is this something that if I'm using the vector database, and when it comes to thinking through like the indexing options, like how things are configured,
Starting point is 00:34:07 is this something that I'm basically responsible for like setting up and, and sort of like twiddling these things and trying to optimize for my particular use case? Or are a lot of this stuff like figured out basically by the service for whatever is going to, you know, for the most part, like serve my needs.
Starting point is 00:34:25 So Zillis will do the auto-indexing and whatever for you. Milvus will not. Milvus says, hey, you're using this open source software. You must know what you're doing. So you should set this up as it would work according to your needs. And the reason it kind of also does this is because it is very likely that your needs are going to be different from most other users
Starting point is 00:34:50 or from at least many other users. So Milvus has this kind of approach of offering flexibility, right? Milvus is open source and it is a general use unstructured data platform. And so that's why we wanna cater to many use cases and offer this kind of flexibility in tailoring the way that you want to build your indexes
Starting point is 00:35:13 and do your searches. We even have the ability to tune your consistency for your collection, your individual collections in Milvus and for when you search, right? Because when you're working with a distributed system, so Milvus is a distributed database. When you're working with a distributed system, you're going to have replicas and different instances and things like that.
Starting point is 00:35:35 And so we can even have your search and your write, your read and your write data consistency be different. Yeah, so we haven't really sort of broken down like Milvus versus Zillis. Like, is Milvus, Milvus is the open source project. And then Zillis, is that essentially the managed service for Milvus? That's correct. Yes, Milvus is the open source project. Zillis is basically managed Milvus with some pizzazz on it. So for example, it automates a lot of things.
Starting point is 00:36:06 We've added this thing called Zillows Cloud Pipelines recently, where essentially we do the embeddings for you. We are using an open source model. You can click through and see which one. And then what else does Zillows have? Zillows has some other hardware optimizations. So our cloud team also has a pretty strong hardware background. We did the NVIDIA Raft GPU integration.
Starting point is 00:36:33 So there's this kind of hardware accelerated kind of stuff on there. Zillow's cloud is usually a version behind the Milvus release, just for stability reasons, by a couple weeks and then usually it catches up. So yeah, that's kind of the difference between Milvus and Zillis. Otherwise, they interact pretty much the same. You essentially have a host and a port when you're hosting Milvus locally.
Starting point is 00:37:05 And if you want, you can actually make that into a URI. And then with Zillis, you have a URI, which is the host and the port, and a token to access the server. What's the history of Milvus? When did that project start? Does that predate the 2017 start of Zillis? No, it doesn't. So this is an interesting piece about Zillow
Starting point is 00:37:28 because many open source companies do have that, right? Many open source companies, the open source project predates the forming of the company. So Zillow started in 2017 and Milvus was created in 2018. And Milvus was open source in 2019 and then officially added to the Linux AI Foundation, donated to the Linux AI Foundation in 2020. And then in 2022, we released Milvus 2.
Starting point is 00:37:57 So the idea behind Milvus, so Charles was the CEO founder. He was at Oracle Cloud. So he was building, you know, databases. He's been building databases. So he knows about data, right? And so he wanted to go build his own company. And he was like, well, I'm going to build something that I think is going to be really important over the next, you know, 10, 20 years that isn't a classical database.
Starting point is 00:38:29 And so he was like, I think this vector data kind of stuff is going to be important. And so this was like back in like 2016 or something like that. So he left Oracle and he was like, well, I'm building this thing. And so he went back to China to go do this, partially because of data privacy laws are much more permissive in China than they are here. So you can get a lot more data to do to essentially test out your scale. He was like, OK, so one thing I think is going to be really important with this vector data is that it's going to be operating at large scale. So let's go prove that. And so basically that's how Millvis worked. And in 2021, we had a customer come to us
Starting point is 00:39:10 and basically be like, hey, we have 5 trillion vectors. And yeah. And so we were like, oh, okay. Well, we can do like 5 billion right now. Let's see how we move to 5 trillion. I mean, how did you guys do that? How'd you go from 5 billion to 5 trillion?
Starting point is 00:39:30 Like what are the scale challenges? We can't support 5 trillion yet. This is like a, there's just, we just, it is at some point like, you know, your systems get too big, right? But the challenge for moving into the billion category, the reason why we shifted Milvus from Milvus 1 to Milvus 2 is that Milvus 1 was built...
Starting point is 00:39:56 So Milvus 2 is built as a distributed system, and it's built this way because of the scale problem. Because if you were built as a single instance server, you know, then you're going to run into hardware limitations. And so we saw that this would be an issue. And so we're like, oh, well, then, you know, the only answer is to scale horizontally. And so that's why Milvus 2 is built that way. And that's kind of how we get around that scaling into the billions issue. Yeah. So how does the distributed system work and what new problems does that introduce?
Starting point is 00:40:32 Yeah. So you can basically turn other things. You can build your own distributed system of other vector databases as well, if you would like. The challenge there is one of the big challenges would be this data consistency challenge. So you have these instances. You have these replicas. How consistent does your data need to be? And it will depend on your use case.
Starting point is 00:40:58 And the way we handle that is we have these shards, and we have hashes on your data that tells it what shard is going to write it and where it's going to write it to and things like that so you know you got to come up with these systems of like how to do that and then we have four levels of consistency so we have a strong consistency which basically says like hey like we got to make sure all the writes are done before we do any reads uh and then we have bounded consistency, which says like, after a certain amount of time, all reads are propagated to all replicas. All writes are propagated to all replicas.
Starting point is 00:41:33 And then we have session consistency, which is just saying like, in this instance, in this connection, all reads come after writes. And then there's eventual consistency, which just says, yeah, it'll get done eventually. For people who are using Milvus for RAG models, presumably those are mostly heavy read operations. How much are they really inserting? I mean, you mentioned the drug discovery use case, which was not a RAG use case, but there they're doing a lot of heavy insertion.
Starting point is 00:42:10 It's an only search a handful of times a year. But I would think that for most people, it's not the bulk of essentially the operations against the database or search. Yeah. So you're right. I think for RAG, it's mostly people doing read, which is part of the reason why Milvus has this separation of these query nodes, data nodes, and index nodes. So if you're doing a lot of read, we'll just spend a bunch of query nodes. Whereas if you're doing a lot of writes, you've got to spend a bunch of data nodes. Is this the fact that this works sort of built around being a distributed system?
Starting point is 00:42:47 Is that sort of the unique sauce of Milvus? Or are there other things that make it unique from other vector databases that are out there? Yeah, so that's one of the unique pieces of Milvus, right? Milvus is, as far as I know, the only distributed system by True Database. And other things that make it unique are like, for example, the segments, right? So what we do with that is we, at search time, you can have much, much faster, much more efficient search by essentially performing
Starting point is 00:43:25 something that is a near constant time operation, like searching this particular segment is going to remain the same cost, no matter how many parallel searches you run on that, up to a certain number, of course. And just doing that, and then adding like an extra aggregation on it so that's how we do search and that's how we're able to do like very fast um like millisecond level vector search across a large amount of data um another thing that is really interesting at least to me i think is really interesting about the way that milvis works uh and also is um makes it more effective and more efficient at scale is the way the filtering works. So you can filter on your metadata. You can say, I only want to find
Starting point is 00:44:10 things that are, I don't know, like text that starts with the word the, or that is longer than 500 characters, or date is published after today, or not after today, before yesterday, something like that. You can do this filtering, and the way that it works is a milvus goes and it goes through all of the data and it looks at that attribute and it basically creates a bit mask that goes like, you know, if the attribute matches what you're looking for, then it gets a one, otherwise it gets a zero. And so when you do this kind of pre-filtering, this gives you a linear time addition. But it
Starting point is 00:44:51 also means that the amount of data that you actually have to search becomes a lot smaller if you're doing something that filters through a high amount of data. So these are kind of like some of the pieces that make Milvus unique, the way that we filter the data, the way that we do the data segmentation for search and the distributed system with separation of concerns. What are some of the big technical challenges that need to be solved? Is it really just like the scale? Like, how do we get the $5 trillion? And also, how many businesses really have $5 trillion? I know you mentioned the one,
Starting point is 00:45:29 but how common is it? I would say that I would be unsurprised if many Fortune 500 companies had less than a trillion vectors that they could use. Not that they are using, but they could use if they wanted to. Mainly because there's just so much data that's sitting around unused, and there's probably more data sitting around unused than we're actually using. At least that's what everybody predicts. Yeah, that makes sense. I mean, I actually talked, I was in an event back in November, I think. It was a data event.
Starting point is 00:46:05 And I talked to the CDO of a public company. And he was fairly new at this company. And one of the things he mentioned was that he was trying to solve was they're sitting on like a mountain of essentially unstructured data that's like encrypted in an S3 bucket. And they want to do something with that, but they have no way of essentially unlocking the power of that data. Yep. Yep. So there's a lot of data and scale is definitely going to be one of the big problems. I don't think it's going to be the only problem. I'm sure that there's going to be some other hardware type limitations.
Starting point is 00:46:38 I actually think this is something that kind of applies to foundation models like AI in general. There's going to be some compute limit. And I think it's actually going to be hardware restricted. At least it seems that way at the moment. So I think that's of really implementing and using the data, not just the technology, but actually using the data, it's like education, right?
Starting point is 00:47:13 People have to know, okay, you have this data. How can you use it? If you don't know how to use it, then you're not going to. Yeah, that goes back to some of the things we were saying earlier where the concept of a vector database is still very new. Like, a lot of people don't know what that is. So they might not even be aware of, hey, there's actually this, like, technology that I can use to help me solve some of these problems that I don't have a solution to right now. Yeah.
Starting point is 00:47:40 It's the new category problem, essentially. Yes. Yes. All right. Well, as we start to kind of wrap things up, I. Yes. Yes. All right. Well, as we start to kind of wrap things up, I have some quickfire questions for you. So don't spend too much time thinking about these. Just first thing pops into your mind. You ready?
Starting point is 00:47:55 All right. I love these things. All right. So if you could master one skill you don't have right now, what would it be? I thought I was going to really love this one, but I didn't have this one in mind at all. Sales. Sales. What wastes the most time in mind at all. Sales. Sales. What wastes the most time in your day?
Starting point is 00:48:08 Scrolling social media. If you could invest in one company that's not the company you work for, who would it be? Hugging Face. Hugging Face, all right. And now, actually, Google's a big investor in Hugging Face at this point. It was recently announced. What tool or technology can you not live without? Python.
Starting point is 00:48:24 What person influenced you the most in your career? Matthew McConaughey. I got to dig into that. Why? So I watched a graduation speech that he gave. And one of the things that he talks about in the speech is at some point when he was younger, when he was like 15 or something, someone who was important to him came to him in his life and was like, who's your hero? And so he was like, well, it's me 10 years from now. And so 10 years later, he sees this person again and she goes up and says, well, are you a hero now?
Starting point is 00:49:02 And he says, no, because my hero now is me 10 years from now. And the idea that he kind of like, you know, proposes behind this is that your hero should always be someone who is ahead of you that you can't catch. And for me, the way that this has translated into not just my career, but my life in general is like, it's given me this kind of like mindset of, you know, how do I get better at the things that I'm not good at? And how do I define like, what are the things that I want to get better at? And so this has also been incredibly helpful for me in my career, because it lets me notice like things like, oh, like, here's something that I
Starting point is 00:49:39 can tell that someone's doing a lot better than I am. How do I incorporate that into my image of how I'm going to be better at this? Awesome. All right. Last question. What's your probability that AI equals a doom for the human race? Doom for the human race? I would say zero. It depends on what you mean by doom. I am a big proponent of the singularity. I think it would be really interesting. Definitely. All right. Well, anything else you'd like to share? Anything else I'd like to share? For the audience, if you're interested in doing hackathons and you're in Seattle, hit me up. And how can people learn more and how can they follow you?
Starting point is 00:50:24 Oh, yes. So you can find me on and how can they follow you? Oh, yes. So you can find me on LinkedIn. That's where I'm the most active. I'm pretty much responsible on there all the time. Y-U-J-I-A-N-T-A-N-G. Awesome. Eugene, thanks so much for being here. I really enjoyed this and hopefully we'll have you back down the road.
Starting point is 00:50:39 Yeah, great to be on here. It was an awesome chat, Sean. All right, cheers.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.