Software Huddle - What is a Vector Database with Yujian Tang
Episode Date: March 26, 2024Today's guest is Yujian Tang from Zilliz, one of the big players in the vector database market. This is the first episode in a series of episodes we’re doing on vectors and vector databases. We star...t with the basics, what is a vector? What are vector embeddings? How does vector search work? And why the heck do I even need a vector database? RAG models for customizing LLMs is where vector databases are getting a lot of their use. On the surface, it seems pretty simple, but in reality, there's a lot of tinkering that goes into taking RAG to production. Yujian explains some of the tripwires that you might run into and how to think through those problems. We think you're going to really enjoy this episode. Timestamps 02:08 Introduction 03:16 What is a Vector? 07:01 How does Vector Search work? 14:08 Why need a Vector database? 15:11 Use Cases 17:37 What is RAG? 20:34 RAG vs fine-tuning 29:51 Measuring Performance 32:32 Is RAG here to stay? 35:43 Milvus 37:17 History of Milvus 47:44 Rapid Fire X https://twitter.com/yujian_tang https://twitter.com/seanfalconer
Transcript
Discussion (0)
What is a vector?
So vectors, in their most simple form, are just a list of numbers.
And so there's many different types of vectors.
There's two primary types when we talk about vector search.
So one is dense vectors.
So these are vectors that typically have a float value.
And then there are sparse vectors, which are vectors that may have a lot of zeros.
And so these are typically binary vectors.
Why do I actually need a vector database to work with vectors?
Why can't I just put my vectors in MongoDB or some sort of traditional database?
The reason why you would use a vector database that's purposely built is just because it's built to be able to have
the architecture that will let you scale it up.
If you could invest in one company that's not the company you work for, who would it be?
Huggy Face.
Huggy Face, all right.
Anything else you'd like to share?
If you're interested in doing hackathons and you're in Seattle, hit me up.
Hey, everyone.
Welcome back to another episode of Software Huddle. My guest today is Eugene Tang
from Zillow, one of the big players in the vector database market. This is the first episode in a
series of episodes I'm doing on vectors and vector databases. Eugene and I start with the basics,
what is a vector? What are vector embeddings? How does vector search work? And why the heck do I
even need a vector database?
RAG models for customizing alarms is where vector databases are getting a lot of their use.
On the surface, it seems pretty simple,
but in reality, there's a lot of tinkering
that goes into taking RAG to production.
Eugene explains some of the tripwires
that you might run into
and how to think through those problems.
I think you're going to really enjoy this one
and hopefully the series.
And if you do, please leave us a positive rating or review, subscribe to the show, and feel
free to hit me and Alex up on Twitter or LinkedIn. All right, let's get you over to the episode.
Eugene, welcome to Software Huddle. Thanks, Sean. I'm glad to be here.
Yeah, I'm excited to have you here. So we're hoping to do a whole series of episodes focused
on vector databases, vector search right now.
It's basically all about vectors.
So I'm glad that we're kicking off our vector journey with you.
But before we dive too deep into vectors, let's start with you.
Who are you? What do you do?
Yeah, so my name is Yujin Tang.
I am a developer advocate at Zillis.
Zillis is a vector database company.
Prior to this, I worked at IBM, Amazon,
and published some papers to IEEE Big Data.
Awesome. And then were you at all knowledgeable
about vector databases and vectors before joining Zillis,
or was this something that was a completely new frontier for you?
I had never heard of vector databases.
I had heard of like feature stores
and that kind of like data store.
But I have a background in machine learning.
So vectors were very familiar to me.
I was like, ah, yes, okay, I know what these are.
And after my first couple conversations
during the interview process, I was like,
oh, okay, I understand what's going on here.
Okay, great.
Well, I think that's a good place to start.
Let's start with the basics around a vector.
What is a vector and what are vector embeddings
and why do you need any of this for machine learning and AI?
Yeah, yeah. So vectors in their most simple form are just a list of numbers.
That's really like all you need to know about vectors is list of numbers.
And so there's many different types of vectors, right?
So there's two primary types that we think about when we talk about vector search.
So one is dense vectors.
So these are vectors that typically have a float value. You know, these are basically real numbers.
And then there are sparse vectors, which are vectors that may have a lot of zeros. And so
these are typically binary vectors. These are typically just zeros and ones.
Examples of algorithms that produce the sparse vectors include TF-IDF, which is a very popular natural language processing kind of algorithm, Splayed, and BM25.
And then the dense vectors are produced from machine learning models, actually.
So what the dense vectors represent the semantic meaning of some type of input. And so the way that you get this is you have your input and you feed it into a model that has been trained on that type of input. And at the end, instead of having the model do some sort of prediction or
classification or something, you cut off the last layer and you just take the output from
the second to last layer. And that's your vector embedding. And that contains all of what the model
has learned about the input in the form of numbers. Okay. And then you mentioned TF-IDF,
so term frequency, inverse document frequency,
which is something that I used back in my ML days,
which is quite some time ago.
Is that actually still widely used in a popular method?
Not really.
At least I don't really hear about it used a lot,
but it is one way that you can get like
these like sparse vectors right because then you can see like oh like how how often is this like
popping up compared to like other words and how many documents there are yeah absolutely you get
basically like a term that doesn't you know come across in very many documents you're you're going
to end up with a lot of this very long vectors with not necessarily a lot of numbers.
And then in terms of embeddings, how do they actually preserve semantic
information about the data and its relationship
to other similar types of inputs?
Yeah. So the embedding,
it works kind of like this. So you have a machine learning model, right?
And from the beginning, your machine learning model is just a bunch of random weights.
And as you train it, it starts to learn the patterns of the input data.
And then that, the vector, I guess, the output of that second to last layer
creates a high dimensional latent space that learns what the relative patterns in the data that you've given it look like.
So, for example, if we talk about image data, maybe you're feeding it a bunch of pictures of different cats and dogs and I don't know, like turtles or something like that.
And it's like learning that, oh, there's these like animals there.
And so that is kind of just how it preserves it. It's just like this output knows that there's this type
of animal there. And that's how it's encoded into this, this machine learning model. And then that's
how we can decode it, basically. And then how does vector search work once you've essentially created these representations of these real-world objects?
And how is that different than maybe conventional search?
Yeah, yeah, yeah.
So conventional search that we have right now is like, let's say you're working with databases, typically what you're doing is some sort of like, okay, find me all the things that have like this ID value that also have these attribute values.
And that's like a basically like you're doing key to key matching. It's very much like you
need a direct match. And so vector search is all about finding the nearest neighbors,
because it is very, you pretty much don't get the same vector embeddings ever.
And unless you're embedding the same thing. And so vector search is all about taking these two
long lists of numbers and doing that like compute to find like what is the distance between these
vectors. And so vector databases like Milvus are kind of built and
optimized to be able to effectively and efficiently do this kind of compute. And there's many ways to
compare these vectors. So there's L2, which is basically physical distance in space. It's like
if you have a triangle, you can think of the the hypotenuse. There's cosine, which like if you think of vectors as lines pointing in space, cosine is like the
angle between them. And then there's IP, which if you think of vectors as lines pointing in space,
IP is the projection of one vector onto another. So then you can think of if one of the vectors
was a hypotenuse, the other was
the leg of a triangle, it's the other leg of that right triangle. So those are like different ways
to measure vector distances. And interestingly, if you transform all of these on normalized vectors,
they all give you the same rank order. They basically all come out to the same thing.
So all the nearest, like your top k K will pretty much always be the same. And let's see.
Oh, okay. So then this is how you compare vectors. But when you get to like large scales of a large
number of vectors, you're going to want to do this thing called indexing. Well, most people who work with SQL databases
are probably also familiar with indexing.
This is a different type of indexing.
So this indexing is creating essentially a map
of the vector space that you are using.
And there are a few indexes that are very popular.
Milvus has 11.
I think we can touch on three here.
So one would be IVF, which is inverted file index.
This is your most intuitive type of vector index.
This is essentially doing a clustering, a K-means clustering.
Like, let's say like, oh, I think there's 128 different categories in this
vector data, then I'm going to do 128 different clusters. Something kind of like that. And
basically, the way that works at query time is you only know the centroids initially,
and you find the closest centroids, and then you dig in and you find the closest vectors.
And then there's HNSW.
So this is Hierarchical Navigable Small Worlds,
which is a mouthful.
And basically what this is, is this is a graph index
and it's like a layered graph index.
So as you insert your vectors,
you get a uniform random variable
and that variable will tell you what layer
it gets inserted up to.
And you get to determine what that is. And then the third one that would be
interesting to talk about would be scan, which is a, well, it's called scalable nearest neighbors,
which is kind of an interesting name. But basically,
it quantizes the vectors, and you only search the quantized space, and then you search the the actual space. So it's kind of like IVF. And by quantized the vectors, what does that mean?
Ah, yes. So for example, let's say you have the set of real numbers from 0 to 10.
A quantization of that would be like the integers from 0 to 10.
So you would bucket all of 0 to 0.5 into 0 and 0.5 to 1.5 into 1 and so on and so on. So quantization is just that kind of
bucketing process. And then that presumably helps with
compute because then you're dealing with integers rather than real numbers?
Yes. And it also just makes it so that you have like, oh yes. So there's, there's like the flow
64 to like the int eight or the int 16 kind of like reduction. Um, but it also just makes it
so that you have like a smaller possible vector space to, to initially search.
Right. I see. And then going back to the different ways of actually comparing vectors,
are there pros and cons to using those different approaches like, you know, cosine versus a
projection or something like that? Like how, how are those choices made? Are people using
a combination of those things? How does all that sort of stuff work? Um, so that would depend on
the type of data that you have, the type of data that you are working on.
But the way that I kind of think about it is inner product IP, the projection,
is actually the most computationally inexpensive.
So I kind of just like that because, you know,
it's nice to have that.
But L2 is a very, very popular one.
And L2 measures what I would call like
semantic, like distance in semantic meaning. So maybe, it's really tough to kind of like give
like really good examples of this, but cosine measures difference of orientation in semantic meaning. And cosine is much more commonly used in natural language processing.
It's kind of like
the example that I use in one of my blogs is apples
and oranges and how you can actually compare apples and oranges
and how far apart they are in space. And maybe you can say that apple
pie is closer to apples than it is to oranges.
Yeah, I guess it depends on what you're trying to achieve and what the context is.
Like if you are thinking about purely like fruit, then apples are maybe closer to oranges.
But if you're thinking about apple as a pure ingredient,
then the composition of an apple pie actually has apple in it,
whereas the composition of an orange does not or something like that.
Yeah, yeah, exactly.
So yes, that's a really good point.
All of it also does come back to the actual latent space that your vectors embed,
because you can't compare things that don't exist in that latent space.
And then bringing this all together back to a vector database, why do I actually need a
vector database to work with vectors? Why can't I just put my vectors in MongoDB or some sort of
traditional database? Yeah, so you can. You can definitely put vectors into any database. Vectors
are just a data type. The reason why you would use a vector database that's purposely built is
just because it's built to be able to have the architecture that will let you scale it up.
And it has a purpose-built architecture to work with these kinds of vectors. And, you know, traditional databases aren't designed to work like that
traditional databases are designed to match these key value pairs. And so they would also have to
add like extra layers on top of that to even achieve anywhere near the similar type of performance,
just based on the hardware type that you would regularly need.
Okay. And then what are the use cases for a vector database?
I think they've become very popular because of retrieval augmented generation or RAG,
which we'll get into.
But outside of RAG,
are there use cases of vector database?
Yes.
So before RAG,
so Zillow got started in 2017.
And prior to RAG,
in the early 2020s,
the main thing that we saw people use factory databases for, that we saw people use
Mildus for basically is product recommendations. So products are these multi kind of like,
like unstructured things, entities, right? So there's like product descriptions and there's like pictures and all these different things.
And so people want to be able to compare,
like not just like, oh, like what is the product tag,
but also like what's in the description
or what are in the images.
And so that's kind of the example of,
that's like probably the most prominent example
of production usage of vector databases.
Another one that is kind of interesting is that people use vector databases for AI drug discovery.
And that is a different use case than the others, because unlike, let's say, RAG or product recommendation, you are not using a lot of search all the time.
And what those people do actually is they insert a ton of data and then they run search just a few times a year. So these are some of the different use cases. And you can see that these have different, the way
that people use the database is also kind of different.
And so we also think about how do you balance this out.
OK.
And then let's talk about RAG, where I think
is probably what really brought the idea of the vector database to the forefront where, you know, everybody kind of knows in some capacity of the concept of vector database, which I think like two years ago, I think wouldn't be the case.
It was a little bit more niche.
So what is RAG and then kind of like how does the vector database come into play when we're building something like a RAG model? Yeah. So I think vector databases are still surprisingly unknown, even given the popularity
of RAG. At my talks, I often ask people who knows what vector databases are, and still most people
are like, I don't know what you're talking about. Okay, maybe I'm overestimating the hype cycle for
vector databases. I think it's because we work in this AI space, right?
So the people that we know probably know this kind of stuff.
But RAG stands for Retrieval Augmented Generation,
and it is exactly what it sounds like. It is when you use data that you retrieve
to augment generative AI and what it generates.
And so basically the way RAG works is you have some sort of pre-trained LLM,
preferably a very powerful one such as, you know, GPT-4 or mixed stroll or
LLAMA2 or something like that. And then you basically want to interface with the LLM
but the LLM doesn't have access to your data.
And so the way that you get your data to feed into the LLM
is you put your data, you vectorize your data
using an embedding model
and then you put those embeddings into a vector database
and you have the vector database kind of sit on top of
or in between like the LLM queries.
And so what happens is then the user comes,
they ask the LLM a question, they interface with the LLM,
they say, hey, blah, blah, blah, query.
And then the LLM transforms that query into whatever is needed
to make something semantically similar for it.
And then it goes into the vector database and says, hey, tell me about this.
And it pulls that data back up, and it uses that into the context, into the prompt again.
And it basically says, answer the question now that we know this context.
And then it gives you a human readable response.
And so that is the RAG process.
And then why do we need the RAG process versus just relying on the foundation model?
We basically use RAG and vector databases to inject data.
You can't really expect large language models or foundation models to keep up with all of the data.
And you don't want them to have your private data.
And so that's when you would do something like this. Yeah. So essentially a lot of times the foundation model is basically fixed at a certain epoch,
and then you can use RAG to augment it so that you can use something that's maybe more real-time
or has happened more recently to get additional context. And then as well as domain specific stuff
so that I might be able to disambiguate certain acronyms that are you know relate to the type of query that i'm putting in
or whatever it is i need to perform is that right yes yeah yeah and how does this compare to
something like fine tuning which is another way of sort of adjusting the foundation model yeah uh
so fine tuning and rag have two different kind of um cases, I would say. So RAG is more for when you just have your data and you want the model some more or train some piece of the model or some layer of the model
or some set of layers of the model or whatever on your data.
And then you can kind of expect the model has learned a little bit about your data.
So the thing with fine-tuning is that unless you are a very large corporation that has access to a lot of GPUs and a lot of money and a lot of time, you are unlikely to be able to inject enough factual data via fine tuning to get the factual data responses back that you want. But what it does do is it does allow you to kind of inject something like a
little bit of like context into the,
into the foundational model.
So some techniques,
for example,
just fine tune,
like,
you know,
like the last few layers,
right? So then you're basically injecting some sort of context into the model.
And you can use these together.
For example, you can have a model that perhaps acts like, talks like Taylor Swift
and knows everything about machine learning.
Okay.
And then, so I feel like in principle,
when you describe rag,
like it sounds fairly simplistic.
Like I,
you know,
I put in a prompt,
I,
you know,
vectorize it.
I run it against my vector database.
I pull back related documents and then I add that as the context and,
you know,
magic basically happens.
But what I know that is much more more complicated than that and there's a lot
of like fiddling to actually get these systems to work so like what are some of the like things that
make this difficult like where do what are the problems or landlines that people end up stepping
on and have to navigate when they're actually building like a rag model for like something
that's not just demo where they're actually doing this for something like production? Yeah. So number one is data pre-processing is pretty important. So for text-based RAG,
you basically need to ensure that the way that you chunk your text up, that is like,
you know, kind of, I guess, decide how many characters you want
to have in one chunk. When you chunk your text up, that's very important. It has to maintain
context as well as have enough semantic meaning to it to make sense. So that's one thing that's
important is like chunk size. And then another
thing is like chunk overlap. So for example, sometimes you will want your chunks to overlap
by some amount in order to perhaps context. So for example,
if you have something that is like a Q&A, or maybe you have a, you know, a customer service
chat transcript, and you're like, oh, well, you know, like the customer is complaining about this,
and it's like very, very in length. And then the sales rep is giving this kind of advice or the blah, blah, blah.
And so you have something that's very, very in length.
Then you'll want something perhaps like a special character splitter.
So something that can look at what the characters are and say like, hey, actually, this is one semantically sound chunk
and let's cut it off.
So that is number one.
And then number two is getting an embeddings model.
So you have to pick the right embeddings model
and you probably, when you are putting something into production,
you're going to want an embeddings model that is customized
because there are generalized ones, but it's very unlikely that that is what you need.
It is fine if you're just building a chatbot, I guess, but you're probably going to want it to
have some context of your data. So there's embeddings models. And then beyond that, there
are the way that you want to save your data or the way that
you want to store your data. So metadata, so vector databases can store metadata. So there's
two types of there's two entries that has to go into each entry. So one is ID, and the other is
vector embedding. And the rest of it is what we call metadata. So you can store metadata. And
there's also a couple of interesting techniques that
people use for this, including like storing the vector embedding for a sentence, but then actually
storing the text for the larger paragraph. So this lets you pull all the context when you're
finding similar vectors. And then people also do the other way around where you store the
vector embedding for the entire paragraph.
And then you just store the sentence so that even when you're pulling specific or sorry, so you can get like specific pieces of text when you're pulling something that has like ways or techniques to kind of get started with building this kind of rack stuff.
And then getting it into production is always hard because a lot of companies now can't use open AI.
So you can't just like drop in an API key.
So you got to like run it on your own.
You got to like get some sort of like foundational model, maybe an open source one and run it on your own. You got to get some sort of foundational model, maybe an open source one,
and run it on your own hardware. And then for essentially the chunking and figure,
and sort of this dance that you have to do around the LLM token limits, is that something that a
vector database helps you with? Or is that something that you just have to use some additional tooling or build something to figure out
what that is going to be
for the particular problem that you're trying to solve.
Yeah, so chunking is a pre-processing step
to getting the vector.
So vector databases kind of sit downstream from that.
So I would say like first step is like
you chunk up your text and then you get the vector and then you put it into a vector database. So vector databases don't help with that. So I would say like first step is like chunk up your text and then you get the
vector and then you put it into a vector database.
So vector databases don't help with that.
It's something that you kind of have to like figure out.
So you can use tools to do this like lane chain and Lama index all offer ways
that you can do this.
And then the way,
at least like current methods that I've seen for checking,
like how good your chunking is,
is really just put it into a basic rag app and like do some observability,
use some sort of like tool,
like,
um,
you know,
like arise,
uh,
Phoenix or like true lens,
uh,
true lens or whatever. Like there are many like tools out there Arise, Phoenix, or like TrueLens, Trera, TrueLens, or whatever.
Like, there are many, like, tools out there
that people have built to do observability for RAG apps.
Yeah, like, once you've actually created,
you've done your chunking, you've created your embeddings,
and let's say things are working reasonably well,
but then as you're actually observing real users
using the system, you realize you need to
make some adjustments. Like how do you actually go back and make those adjustments without like
basically blowing everything away and starting brand new? Yeah. So the answer to that would be
you basically do. You can take the user's data and you can, yeah, I mean, like you basically would almost have to.
If you want to change the embedding space, you would basically have to say like, okay, well, we're going to retrain the embeddings model.
Here we have new data that shows that our initial hypotheses was incorrect.
And here's our new data and here's how it's actually going to work.
And now we're going to have to re-embed everything.
So you really don't want to have to go through that process.
You want to be right about how users are going to interact with your ag app.
Yeah.
Okay.
And then the metadata portion, can you explain what exactly the metadata is? Is that just the text that's
associated with the embedding? Is that what it's for?
Metadata can be anything you want. So it can be the text. You can also add maybe the author of
the text, the publication that the text is from, which paragraph it is, the section header,
the date that it was published, all of these different attributes you can add as metadata.
Okay.
And then how do you actually measure performance
and what are the strategies for improving performance?
Because essentially, inference is already an expensive process,
and it's one of the hard parts,
especially with the open source models,
whether you have enough like hardware
to answer the question in a reasonable amount of time
for whatever the application is.
But now you're adding an additional step as well
where you're doing the search of a vector database
and pulling back that additional context
to add into the prompt.
Yeah, so we have,
so A, you can run Milvus like on your own
and kind of like see how that is.
But in terms of optimization,
what you want to look for
is you want to look for usage.
And we have some built-in optimization
that will kind of like do this for you.
So for example,
I don't know if this is in 2.3,
this might be in 2.4.
We have auto scaling.
So that will like detect like how,
you know, how your usage is.
And, you know, if you need to spin up more nodes.
So Movis has this concept of nodes.
So there are three different,
I guess, areas of concern when you're doing search, and
that would be the query. So how do you actually, you know, actually retrieving your data, the data
ingestion part, getting the data into the database, and then the indexing piece, which is creating the
way that you retrieve your data. And so based on, you know, what it is you're doing, you can scale these different nodes up, up and down.
And then the other thing is storage optimization, right?
So Milvus stores data in 512 megabyte segments.
You can change the segment size, but by default, we have 512 megabytes.
And we also index over these segments. So what
happens when you delete? So when you start deleting data, the segments start losing size.
And that means that the indexes start losing efficiency. And so at a certain point when the
segments have reduced to a certain size, Milvus will also do a cleanup where it will take segments and combine them again and re-index them again to be more
efficient, basically. And then the way that we kind of get around having to do things like
re-indexing if you add a lot of data, which is a big problem if you are using like a mono-index,
basically, is that we store these data in these segments, right?
So because we build index over these smaller segments,
we don't have to worry about re-indexing
as we add more data into Milvus.
So in the context of RAG,
do you think it's here to stay,
or are there other strategies that are coming out
of industry or research that are likely
to replace the RAG model?
I don't think anything's going to replace RAG in the upcoming, let's say, oh, three
to five years.
RAG will definitely continue to evolve.
For example, last year, we saw a ton of people building text RAG.
Everybody's building RAG on text.
Next step, we're going to be building multimodal RAG.
And then soon, we're going to be building multimodal rag and then soon we're going to be building you know uh maybe auto rag i don't know like you know these different things
that will kind of like build around rag it's kind of like uh you know like chatbots right like in
20 2010 or whatever 2012 2013 like these chatbots became popular on websites and they've been there
ever since and rag is basically like oh guess what we're going to replace these chatbots became popular on websites and they've been there ever since.
And RAG is basically like, oh, guess what?
We're going to replace these chatbots now.
That's the primary use that I've seen for RAG.
Yeah.
I mean, I think the kind of baseline use case for LLMs and for RAGs is chatbots.
But at some point, we're going to evolve beyond the chatbot. That's
kind of like the hello world version of what you can do with generative AI, essentially.
Yes, it has to.
And then in terms of the vector database, is this something that if I'm using the vector database,
and when it comes to thinking through like the indexing options,
like how things are configured,
is this something that I'm basically responsible for like setting up and,
and sort of like twiddling these things and trying to optimize for my
particular use case?
Or are a lot of this stuff like figured out basically by the service for
whatever is going to,
you know,
for the most part,
like serve my needs.
So Zillis will do the auto-indexing and whatever for you.
Milvus will not.
Milvus says, hey, you're using this open source software.
You must know what you're doing.
So you should set this up as it would work according to your needs.
And the reason it kind of also does this is because it is very likely
that your needs are going to be different
from most other users
or from at least many other users.
So Milvus has this kind of approach
of offering flexibility, right?
Milvus is open source
and it is a general use unstructured data platform.
And so that's why we wanna cater to many use cases
and offer this kind of flexibility
in tailoring the way that you want to build your indexes
and do your searches.
We even have the ability to tune your consistency
for your collection, your individual collections in Milvus
and for when you search, right?
Because when you're working with a distributed system,
so Milvus is a distributed database.
When you're working with a distributed system,
you're going to have replicas and different instances and things like that.
And so we can even have your search and your write,
your read and your write data consistency be different.
Yeah, so we haven't really sort of broken down like Milvus versus
Zillis. Like, is Milvus, Milvus is the open source project. And
then Zillis, is that essentially the managed service for Milvus?
That's correct. Yes, Milvus is the open source project. Zillis is basically managed
Milvus with some pizzazz on it. So for example,
it automates a lot of things.
We've added this thing called Zillows Cloud Pipelines
recently, where essentially we do the embeddings for you.
We are using an open source model.
You can click through and see which one.
And then what else does Zillows have?
Zillows has some other hardware optimizations.
So our cloud team also has a pretty strong hardware background.
We did the NVIDIA Raft GPU integration.
So there's this kind of hardware accelerated kind of stuff on there.
Zillow's cloud is usually a version behind the Milvus release,
just for stability reasons, by a couple weeks and then usually it catches up.
So yeah, that's kind of the difference
between Milvus and Zillis.
Otherwise, they interact pretty much the same.
You essentially have a host and a port
when you're hosting Milvus locally.
And if you want, you can actually make that into a URI.
And then with Zillis, you have a URI, which is the host and the port,
and a token to access the server.
What's the history of Milvus?
When did that project start?
Does that predate the 2017 start of Zillis?
No, it doesn't.
So this is an interesting piece about Zillow
because many open source companies do have that, right?
Many open source companies,
the open source project predates the forming of the company.
So Zillow started in 2017 and Milvus was created in 2018.
And Milvus was open source in 2019
and then officially added to the Linux AI Foundation,
donated to the Linux AI Foundation in 2020.
And then in 2022, we released Milvus 2.
So the idea behind Milvus,
so Charles was the CEO founder.
He was at Oracle Cloud.
So he was building, you know, databases.
He's been building databases.
So he knows about data, right?
And so he wanted to go build his own company.
And he was like, well, I'm going to build something that I think is going to be really important over the next, you know, 10, 20 years that isn't a classical database.
And so he was like, I think this vector data kind of stuff is going to be important.
And so this was like back in like 2016 or something like that.
So he left Oracle and he was like, well, I'm building this thing.
And so he went back to China to go do this, partially because of data privacy laws are much more permissive in China than they are here.
So you can get a lot more data to do to essentially test out your scale.
He was like, OK, so one thing I think is going to be really important with this vector data is that it's going to be operating at large scale.
So let's go prove that. And so basically that's how Millvis worked.
And in 2021, we had a customer come to us
and basically be like,
hey, we have 5 trillion vectors.
And yeah.
And so we were like, oh, okay.
Well, we can do like 5 billion right now.
Let's see how we move to 5 trillion.
I mean, how did you guys do that?
How'd you go from 5 billion to 5 trillion?
Like what are the scale challenges?
We can't support 5 trillion yet.
This is like a, there's just, we just,
it is at some point like, you know,
your systems get too big, right?
But the challenge for moving into the billion category,
the reason why we shifted Milvus from Milvus 1 to Milvus 2
is that Milvus 1 was built...
So Milvus 2 is built as a distributed system,
and it's built this way because of the scale problem.
Because if you were built as a single instance server, you know, then you're
going to run into hardware limitations. And so we saw that this would be an issue. And so we're like,
oh, well, then, you know, the only answer is to scale horizontally. And so that's why
Milvus 2 is built that way. And that's kind of how we get around that scaling into the billions issue.
Yeah.
So how does the distributed system work and what new problems does that introduce?
Yeah.
So you can basically turn other things.
You can build your own distributed system of other vector databases as well, if you would like.
The challenge there is one of the big challenges would be this data consistency challenge.
So you have these instances.
You have these replicas.
How consistent does your data need to be?
And it will depend on your use case.
And the way we handle that is we have these shards,
and we have hashes on your data that
tells it what shard is going to write it
and where it's going to write it to and things like that so you know you got to come up with
these systems of like how to do that and then we have four levels of consistency so we have
a strong consistency which basically says like hey like we got to make sure all the writes are
done before we do any reads uh and then we have bounded consistency, which says like, after a certain amount of time,
all reads are propagated to all replicas. All writes are propagated to all replicas.
And then we have session consistency, which is just saying like, in this instance, in this
connection, all reads come after writes. And then there's eventual consistency,
which just says, yeah, it'll get done eventually.
For people who are using Milvus for RAG models,
presumably those are mostly heavy read operations.
How much are they really inserting?
I mean, you mentioned the drug
discovery use case, which was not a RAG use case, but there they're doing a lot of heavy insertion.
It's an only search a handful of times a year. But I would think that for most people,
it's not the bulk of essentially the operations against the database or search.
Yeah. So you're right. I think for RAG, it's mostly people doing read, which is part of the reason
why Milvus has this separation of these
query nodes, data nodes, and index nodes. So if you're doing a lot of read, we'll just spend a bunch
of query nodes. Whereas if you're doing a lot of writes, you've got to spend a bunch of data nodes.
Is this the fact that this works
sort of built around being a distributed system?
Is that sort of the unique sauce of Milvus?
Or are there other things that make it unique from other vector databases that are out there?
Yeah, so that's one of the unique pieces of Milvus, right?
Milvus is, as far as I know, the only distributed system by True Database.
And other things that make it unique are like, for example, the segments, right?
So what we do with that is we,
at search time, you can have much, much faster,
much more efficient search by essentially performing
something that is a near constant time operation, like searching this particular segment is going
to remain the same cost, no matter how many parallel searches you run on that,
up to a certain number, of course. And just doing that, and then adding like an extra aggregation on it so that's how we do
search and that's how we're able to do like very fast um like millisecond level
vector search across a large amount of data um another thing that is really interesting at least
to me i think is really interesting about the way that milvis works uh and also is um makes it more
effective and more efficient at scale is the
way the filtering works. So you can filter on your metadata. You can say, I only want to find
things that are, I don't know, like text that starts with the word the, or that is longer than
500 characters, or date is published after today, or not after today, before yesterday, something like that.
You can do this filtering, and the way that it works is
a milvus goes and it goes through all of the data
and it looks at that attribute and it basically creates a bit mask
that goes like, you know, if the attribute matches what you're looking for,
then it gets a one, otherwise it gets a zero.
And so when you do this kind of pre-filtering, this gives you a linear time addition. But it
also means that the amount of data that you actually have to search becomes a lot smaller
if you're doing something that filters through a high amount of data. So these are kind of like
some of the pieces that make Milvus unique,
the way that we filter the data, the way that we do the data segmentation for search and
the distributed system with separation of concerns.
What are some of the big technical challenges that need to be solved? Is it really just like
the scale? Like, how do we get the $5 trillion?
And also, how many businesses really have $5 trillion? I know you mentioned the one,
but how common is it? I would say that I would be unsurprised if many Fortune 500 companies
had less than a trillion vectors that they could use. Not that they are using, but they could use if they wanted to.
Mainly because there's just so much data that's sitting around unused,
and there's probably more data sitting around unused than we're actually using.
At least that's what everybody predicts.
Yeah, that makes sense.
I mean, I actually talked, I was in an event back in November, I think.
It was a data event.
And I talked to the CDO of a public company.
And he was fairly new at this company.
And one of the things he mentioned was that he was trying to solve was they're sitting on like a mountain of essentially unstructured data that's like encrypted in an S3 bucket.
And they want to do something with that, but they have no way of essentially unlocking the power of that data.
Yep. Yep. So there's a lot of data
and scale is definitely going to be one of the big problems. I don't think it's going to be the only problem.
I'm sure that there's going to be some
other hardware type limitations.
I actually think this is something that kind of applies to foundation models
like AI in general.
There's going to be some compute limit.
And I think it's actually going to be hardware restricted.
At least it seems that way at the moment.
So I think that's of really implementing and using the data,
not just the technology, but actually using the data,
it's like education, right?
People have to know, okay, you have this data.
How can you use it?
If you don't know how to use it, then you're not going to.
Yeah, that goes back to some of the things we were saying earlier
where the concept of a vector database is still very new.
Like, a lot of people don't know what that is.
So they might not even be aware of, hey, there's actually this, like, technology that I can use to help me solve some of these problems that I don't have a solution to right now.
Yeah.
It's the new category problem, essentially.
Yes.
Yes.
All right. Well, as we start to kind of wrap things up, I. Yes. Yes. All right.
Well, as we start to kind of wrap things up, I have some quickfire questions for you.
So don't spend too much time thinking about these.
Just first thing pops into your mind.
You ready?
All right.
I love these things.
All right.
So if you could master one skill you don't have right now, what would it be?
I thought I was going to really love this one, but I didn't have this one in mind at all.
Sales.
Sales. What wastes the most time in mind at all. Sales. Sales.
What wastes the most time in your day?
Scrolling social media.
If you could invest in one company that's not the company you work for, who would it be?
Hugging Face.
Hugging Face, all right.
And now, actually, Google's a big investor in Hugging Face at this point.
It was recently announced.
What tool or technology can you not live without?
Python.
What person influenced you the most in your career? Matthew
McConaughey. I got to dig into that.
Why?
So I watched a graduation
speech that he gave. And one of the things that he
talks about in the speech is at some point when he was younger, when he was like 15 or something, someone who was important to him came to him in his life and was like, who's your hero?
And so he was like, well, it's me 10 years from now.
And so 10 years later, he sees this person again and she goes up and says, well, are you a hero now?
And he says, no, because my hero now is me 10 years from
now. And the idea that he kind of like, you know, proposes behind this is that your hero should
always be someone who is ahead of you that you can't catch. And for me, the way that this has
translated into not just my career, but my life in general is like, it's given me this kind of
like mindset of,
you know, how do I get better at the things that I'm not good at? And how do I define like,
what are the things that I want to get better at? And so this has also been incredibly helpful for
me in my career, because it lets me notice like things like, oh, like, here's something that I
can tell that someone's doing a lot better than I am. How do I incorporate that into my image of how I'm
going to be better at this? Awesome. All right. Last question.
What's your probability that AI equals a doom for the human race?
Doom for the human race? I would say zero. It depends on what you mean by doom. I am a big
proponent of the singularity. I think it would be really interesting. Definitely. All right. Well, anything else you'd like to share?
Anything else I'd like to share? For the audience,
if you're interested in doing hackathons and you're in Seattle, hit me up.
And how can people learn more and how can they follow you?
Oh, yes. So you can find me on and how can they follow you? Oh, yes.
So you can find me on LinkedIn.
That's where I'm the most active.
I'm pretty much responsible on there all the time.
Y-U-J-I-A-N-T-A-N-G.
Awesome.
Eugene, thanks so much for being here.
I really enjoyed this and hopefully we'll have you back down the road.
Yeah, great to be on here.
It was an awesome chat, Sean.
All right, cheers.