Software Huddle - Vector Databases with Bob van Luijt
Episode Date: June 11, 2024Today we have Bob van Luijt, the CEO and founder of Weaviate on the show. Bob talks about building AI native applications and what that means, the role a vector database will play in the future of AI ...applications, and how Weaviate works under the hood. We also get into why a specialized vector database is needed versus using vectors as a feature within conventional databases. Bob van Luijt: https://www.linkedin.com/in/bobvanluijt/ Sean on X: https://x.com/seanfalconer Software Huddle ⤵︎ X: https://twitter.com/SoftwareHuddle Substack: https://softwarehuddle.substack.com/
Transcript
Discussion (0)
What is it that makes Weviate special in comparison to some of these other things that are on the market?
The big difference, especially for Weviate, is what we like to call a focus on AI native.
We do a lot of education, right?
So we share with the outside world, like this is how you build X, Y, and Z.
And we spend a lot of effort on that to educate people how to build AI native applications.
That is directly inspired through how fashion brands position new products in the market.
So there's a direct correlation between fashion and building infrastructure businesses.
And then what types of searches are vector databases great at?
And then what are the limitations?
Where are they not the right solution, essentially?
A pure vector search, so with nothing else,
a pure vector search is kind of like casting a net in the sea.
Hey everyone, Sean here.
And today we have Bob Van Luij,
the CEO and founder of Weeviate on the show.
I thought this was a really interesting conversation.
You know, Bob talks about building AI native applications
and what that means,
the role of vector database
will play in the future of AI applications
and how Weeviate actually works under the hood.
We also get into why a specialized vector database
is needed versus using vectors as a feature
within a conventional database.
I love talking to Bob
and hopefully we'll have him back down the road.
We barely scratched the surface. And if you enjoy this conversation and the we'll have him back down the road. We barely
scratched the surface. And if you enjoy this conversation and the show, please leave us a
positive rating and review. And if you ever have questions or suggestions, please hit me or Alex
up on social media. All right, let's get you over to the interview with Bob. Bob, welcome to Software
Huddle. Thanks for having me, Sean. Great to be here. Yeah, thanks so much for doing this. So I
was digging into your background a little bit in preparation for this.
And, you know, I had heard that you kind of you started coding as a kid, but then it looks like you later went on to actually study music in college.
So I was kind of curious about like, what was the original sort of goal or dream with pursuing something in music?
Oh, the question assumes
that there was a goal and and there wasn't so the um the yeah and this is sometimes when i when i
talk you know to especially to younger people it's like when i was like young and as in like
uh i mean like 15 right that kind of age like 14 15 you know until my 20s
that was
just a lot of stuff that I liked
doing right so and
that was just a you know
I'm born with a couple of
just a handful of gifts I guess
and one of them was music because it turned out
that I understood how that worked
so when I auditioned
to study music, I was
accepted at the conservatory.
And simultaneously, I was writing
software, and I kind of figured out how to do that
on my own, and also
around that age. And I also
like building businesses around that.
To be clear, we're not talking
about huge businesses. It was
all very small. It was just a, you know, a young kid, you know, trying to figure stuff out online.
And all these things kind of came together, but there was never a, hey, I am very interested in, you know, there's still, I'm still in art, you know, certain types of literature, certain types of, you know, contemporary art, music, software, those kind of things.
I like that kind of stuff.
I mean, I was the other day, I was like, I was browsing through Hacker News.
And then I see like, hey, there was like a whole article about the MIDI protocol right so between synthesizers and software and I go hey that's cool I love that
stuff so it's like I just like these kind of things and if you distill it I like to make stuff
right so I like to make and now when I'm a bit older 38 now that kind of morphed into you know
software business building those kind of things but the the original
um yeah i guess part of my personality that this original thing of like just wanting to build stuff
and explore stuff that is unchanged that is still the same so there's no there was no goal there was
just like hey this is super exciting i'm i want to be part of this i want to i want to fix and
what makes me very proud is like I
as part of that
I studied in
because I'm Dutch
so I studied in the Netherlands
but then I studied
for a period of time
in Boston
and there's like
now musicians
that I
literally hear on the radio
or that kind of stuff
that were in school
with me there
so it's like
I'm super proud of that
so it's just
it's just
you know
riding the wave of life.
And then that just takes you in certain directions.
No, absolutely.
That's really cool.
And then what was your instrument?
So I started with bass guitar.
So because it's something you could study.
And I was very early on interested in jazz.
So I like very improvised music, you'd say so that's i started
that and then later when you do that like a lot when you study a lot then always you
um you always end up everybody ends up who studies bass guitar with the same thing and that
those are the the the cello suites of bach so and then at some point you do all these things
on your bass guitar that's you know that was like tremendous fun right so like a lot of jazz and classical music when i was older some some composition
contemporary music which was wonderful i had a at a at a blast was if i could go back in a time
machine i would do it again yeah i would buy more bitcoin though yeah yeah if you get a time machine
there's probably a bunch of stuff that we both do.
A little Bitcoin, maybe invest in NVIDIA 10 years ago or something like that.
That kind of stuff.
I would change that, but for the rest, I wouldn't change anything.
Yeah.
How did you go from that to founding WeD you know, interested in the vector database space. Um, so when I was done studying,
um,
I,
I wrote software.
Actually,
I was actually doing art projects that I funded myself by writing software.
And,
um,
in a,
in a journey on like,
you know,
just software consult,
just freelance,
you know,
I'm small,
small company.
And I ended up at a publisher and,
um,
that, um that I was
working on something completely different
related to e-commerce, but back in 2015,
they were looking at building new products
with machine learning. So that was when
I was introduced to Glove. Glove,
it's still from Stanford, it's still on
GitHub,
where you could download the CSV file
with individual
boards and then the embeddings, the vector embeddings related to these words.
So that was my introduction to that.
And I started to play around with that myself.
And I had this little idea that I was like, hey, wait a second.
If it's in space, then I can take words that might be in a paragraph,
calculate a centroid, and store that information.
Now, bear in mind, it's very important to know this is all pre-transformers that didn't exist yet. words that might be in a paragraph, calculate a centroid, and store that information.
Now, bear in mind,
it's very important to know this is all pre-transformers
that didn't exist yet.
But I was like, hey, this is super awesome.
So I started to play around with that.
I started to present it to people.
Actually, my first use case
was knowledge graphs in vector space.
That was for a long time.
People didn't get it.
I was like, I remember the first, I even organized my own meetups that again, I funded myself
to travel by writing software.
And I was like, my first meetup in New York, there was like exactly zero people showed
up.
Right.
And, and now, and luckily now it's different, but back then that was the case.
And, and then Transformers came on the scene.
I met my co-founder.
We figured out, hey, there's room in the,
we believe that there's room in the market
to build a database specifically for machine learning.
And then the whole AI wave took off.
And we're just riding that wave.
And my co-founder, HN, turned into CTO.
So really focusing with the team on the core database.
And I became interested in the business side of that.
So I really doubled down on the business side.
Like, how do you build a new infrastructure company in a new emerging market?
So that's my focus right now.
Yeah.
And then was there some, you know, going back to that, you know, sort of early introduction to embeddings from Stanford, was there also some inspiration or, you know, fascination with semantic web?
Like that was also kind of that era of, you know, people were talking about RDF, triples and all this sort of stuff.
And that being sort of the future of how we understand the relationship between objects and concepts.
Yeah, so i've
absolutely went through my um semantic web period right and the um and uh so the first use case that
i would and and bear in mind i'm i'm using the word weviate now but it it's by no means was
what it is today but the uh my first use case for weefy was an iot related use
case because um i was working on a um it was like a smart building project where you had like
different types of elevators right from different vendors from different oems and the problem was
that the data that they were sending in or through apis or old-fashioned csv files the definitions of things in these um
in these in these files were between these apis they meant the same thing but they were not
expressed in a similar way so we created yes we created an ontology to describe what was in there
but the problem was we couldn't make these relations and then i was like hey wait a second what if we use the embeddings to determine the relation in the data set and so there's
absolutely an linked data origin story and if you go on youtube you even find a video on the google
cloud a youtube channel from maybe four years ago or something, where I referred to WeaveYate as a knowledge graph.
And because the term vector database didn't exist.
And what we were doing was storing data objects, JSON data objects, with a vector embedding representing that data object.
That is unchanged.
That is still what it is today.
But it started to focus more and more
on the actual vector embedding
rather than the data object.
Long story short, yes,
there's absolutely an original story
in the semantic web.
And what's interesting is that we're going full circle
because now everybody talks about
vector embeddings from knowledge graphs and and i can proudly say well you know i have some videos and material from
that from like four years ago so it's nice to see they go full circle yeah i mean i feel like that
those waves and technology is always like what what is old is new again it's like fashion
essentially like things coming in in and out out of fashion when it comes to like programming languages, frameworks, all these types of things.
Even if you look at neural networks, neural networks, there was the sort of AI winter phase where no one was doing any research on neural networks outside of like Canada and a few other places.
And now, of course, it's like the only thing that people are focused on.
So these things like admin flow you know if i can double click even on that so it's like a in the way that we built business
right now around the database i'm getting a lot of inspiration from the fashion industry actually
because um um uh one thing that that people might be listening to this who might be familiar with
vp8 or not maybe they look it up after listening to this,
but they'll see that we do a lot of education, right?
So we share with the outside world, like this is how you build X, Y, and Z.
And we spend a lot of effort on that to educate people
how to build AI-native applications.
That is directly inspired through how fashion brands position
new products in the market.
So there's a direct correlation between fashion
and building infrastructure businesses.
It's probably a necessary step too
when you're building essentially
like a new category of infrastructure.
Like a lot of people,
now I think more and more people are aware
of like what a vector database,
or at least like have heard the term
over the last year and a half. before that i'm sure you know most people couldn't
didn't know what a vector was you know i learned like a vector database and it's growing but
there's still a need essentially to educate the market and i think the good thing is with the
explosion of things like chat gpt and everything that's happening with transformers and LLMs, it is creating this sort of market force where people are really curious and
they want to know what is going on and, and you know,
learn more and more about it.
I agree. Yeah. A hundred percent.
In terms of Weave, like you were, you know, pretty early to the space,
but now it's, you know, quickly's quickly becoming like a crowded space.
I'm sure that was maybe unexpected from where you started,
but there's like over, I think like 60 plus databases now
that offer some level of vector support
from the like specialized vector databases
to things like Postgres and MongoDB
that have extended to support vectors.
Like what is it that makes Weviate special
in comparison to some of these other things that are on the market?
Sure. So the first thing
is I think that's a good thing that this is happening
because a vector
embedding is a data type, the vector
embedding itself. So it's like an
integer, a
string, a floating point, whatever. It's an array
of floating points.
And so what we start
to see is that the majority of the of that number
right let's let's say you mentioned 60 i i lost count so but so let's let's work with let's take
that as a working assumption right 60. so the majority of these 60 um they use the effect
from betting as a um as a feature great i mean in weviate you can make a graph connection if you
want right that doesn't make with the graph database it's just a feature. Great. I mean, in Weavey8, you can make a graph connection if you want, right? That doesn't make Weavey8 a graph database. It's just a feature that we have, but for certain use cases,
it helps people to do whatever we want to do. The big difference, especially for Weavey8,
is what we like to call a focus on AI native. And how I always explain that is very simple.
You have two types of applications. So first of all, I believe AI is going to be in everything, right?
And now you have two types of applications.
Application number one is the application where you just,
as I like to call it, you sprinkle some AI on your application.
So a little bit of vector search, maybe generative stuff.
Great.
But if I would take the AI out of that application,
it's still there.
It just misses a feature, right? The other hand, you have applications that if would take the AI out of that application, it's still there. It just misses a feature, right?
The other hand, you have applications
that if you take the AI out of the application,
it's just gone.
It doesn't exist anymore.
That's what we call an AI-native application.
And that's what we focus on when we review it.
And how do we do that?
At the heart, we have the database.
And the database, of course,
has its core defector index.
That's where it's important.
But everything we built around that
hybrid search functionality, this multi-tenancy functionality, these different
types of vector embeddings and recommendation engines, the education,
all stuff that we do, and so on and so forth, is giving people the tools to work with,
to build AI native applications.
And a metaphor I often use,
like think about it like GitLab.
So when GitLab started, it was just Git, right?
It was just version control.
Today, GitLab is there for your whole,
for your DevOps needs, right? If you're a bank and you there for your DevOps needs, right?
If you're a bank and you have big DevOps needs,
GitLab is your go-to solution.
That's what we're doing for AI.
So the metaphors where they started with Git,
we started with factor search,
but we're building this whole ecosystem around it
to build AI-native apps.
So that's the big difference between just having it as a feature versus having it as
a core element of the ecosystem that you're building.
And what's an example of some AI native apps?
I feel like a lot of the stuff that we're doing with AI at the moment
feels a little like bolted on.
We're kind of like shoehorning
co-pilots into everything.
And some of those,
it's like early days of internet
or early days of mobile
where like people are just
kind of like experimenting
and trying to figure out
like what's going to work.
And it took a little while
before you really had truly
sort of native internet experiences
or native cloud is another example. Like I remember
when I spoke to Bob Moogly, the ex-CEO of Snowflake, like one of the things I asked him
was, was there pressure in the early days to do what they were doing on-prem? And was that a hard
decision to essentially stay cloud native? And he said it wasn't a hard decision because they didn't
know how to do this outside of the cloud. So that was a very good example of something that couldn't even exist pre-cloud.
So what are some things that you're seeing
that couldn't exist without AI being the core part of the application?
Sure.
I think the easiest way to explain that is by example.
Let's say that you have a um a customer engagement system right so you know
support tickets something like that an example of just adding um ai as a feature might be that
you get a support ticket in you vectorize it and um and you try to predict what labels should be attached to the,
you know, so what department the question should go to
or label or severity, those kind of things.
That's a beautiful vector surgery case.
And that's a feature.
It's a great feature.
If you would remove it, then, you know, you lose a feature.
But yeah, that's, you know, that's what it is.
Now, if we now take the same system, but we turn it into an AI-native solution,
still the request comes in, still it gets vectorized.
But now actually in harmony with the generative model,
you start to create what we call generative feedback loops,
where actually the model knows how to query the database,
requests more information from the database,
and writes a response to the person asking the question.
Now, if you would take that out of the system,
your system just doesn't work anymore.
So it's the actual models, that's AI-native,
it's the actual models looking in the database
to find relative information,
formulating an answer,
and responding almost near real-time.
I mean, it's like we can't do real-time because the models aren't there yet, but let's say near
real-time, respond to the customer with an answer. That's the difference, right? So in the first one,
it's just a feature. And the second one is truly AI native. If I may add one thing, for the first
one, you're probably not going to build a business around it, right? So it's a feature.
You're like, hey, we have like a
customer engagement tool and we're going
to add this feature. The second one, that's
a business, right? They can say, hey,
we can now use the
vector database like we did in combination
with the models and just create
a completely new application around it that automatically
responds based on customer requests
or automatically generates a knowledge
base and so on and so forth. So that's the
big difference. And with this
idea of generative feedback loops, like, do you
see that as sort of the
like, natural evolution
beyond, like, where we are with RAG today?
Because, like, RAG is,
you know, it works, people are creating applications
with it, but in a lot of ways, it feels kind of, like,
hacky. You know, you're kind you're kind of like stuffing things into the
context to give additional information to the prompt. And then it takes a lot of sort of
like testing and iterating to get to a place where you're, you're, you're giving the right
amount of context and not too much context and all this sort of stuff. But if there was a little
bit, uh, I don't know, like hacky, I think is like the best word I can describe it with.
Do you see this as sort of like the next days of that?
No, no.
So I agree.
First of all, I agree with that. And the second thing is there are actually two separate things.
So if we separate them up, the generative feedback loop is just a concept of directly integrating or giving the generative model access to the database.
That's the concept of the generative feedback loop.
That is still something you can do in a hacky way, basically.
But that's the concept.
When it comes to RAC itself, and if we look at the original paper of RAC, right, the thing is like, can we do retrieval directly from a database based on vector embeddings while we're generating something?
Because it's retrieval augmented generation, right?
So it's in the abbreviation.
And we're doing a lot of work there too. So what you refer to as the hacky way,
which again, I agree with,
is that we get some data from our database,
we shove it in the context window,
and then we keep our fingers crossed
and we hope for the best, right?
That's very primitive, right?
So what's actually happening here,
and this is something that I'm a big believer in,
is that the, and this might sound like a sidetrack,
but I'll promise I'll bring it back to your question,
is that when the context windows started to increase,
right, so now we have a million,
and so, you know,
soon we're going to probably have 10, 100, billion, whatever, right?
So it's like,
we're going to get very close to an infinite context window.
And one of the things that people start to say,
oh, did the context window kill the vector database?
And I think the answer is no, it doesn't.
It's actually, I believe, the vector database becomes the context window.
Because if you look what a context window does,
is that it
takes a word. Let's stick to LLMs, right? So let's put multimodality aside for a second.
So it takes a word, turns it into a token, gets a vector embedding for the token,
creates a matrix, does a prediction, applies the transformer mechanism, does a prediction,
turns it back into a token, turns it back into a word.
That's very primitive. So what kind of technology is really good at doing that? Well,
vector database. So this is what we abbreviate weaving together. What we mean with weaving the
model and the database together is that we mean that the database becomes the context window.
And the moment that the model
becomes the context window,
you solve that primitive problem.
So that is a,
and in the slipstream of that,
you solve latency issues
and all those kinds of things.
So it's operational issues
because just storing a flat,
how are you going to do that?
Sort of flat file or something of like a million tokens?
I mean, that's kind of weird, right?
So it just makes so much sense to do that more
from the perspective of the database.
And this is another example of something where it's not a feature.
It's just core elements of the database,
which is way more AI native, right?
So the models and the database, or in this case,
they just start to slowly merge together.
And then outside of RAG,
what are the sort of main use cases for vector database today?
Yeah, so that depends a little bit how you look at it, right?
Because if you use the term use case, right?
So it's like you can zoom in, you can zoom out, right?
So the highest level, the use cases are in the order of that people have been exploring them is vector search, hybrid search, rank, chance feedback loops.
And what do I mean with that, right?
So I mean that the
first one I mentioned, that's the most that people have in
production right now, right? So if you look at our customers,
that's the most. And the next thing is hybrid search, right?
Because they started to explore it a little bit later. And now
we see the first bigger rec use cases. And then you have to the
gens feedback loops. And what comes after that our use cases
around our graph use cases.
And I don't mean graph as in traditional graph,
but in graph neural networks, basically,
where rather than creating the actual relations,
you predict the relations, right?
But that is stuff that's really new.
So nobody has that in production yet.
It's really new. So that's the difference. And then if you zoom in, if? So, but that is stuff that's really new. So nobody has that in production yet. It's really new.
So that's the difference.
And then if you zoom in,
if you say like,
so what type of use cases
do we see?
So we see everything
where it's like,
it started all with the LLMs.
So the LLMs are,
90% is text-based right now, right?
And that is around
stuff that is text-heavy.
So we see a lot of
legal use cases, a lot a lot of legal use cases,
a lot of e-commerce use cases,
a lot of ticket systems,
tier systems, confluences,
and so on and so forth.
That we see a lot.
And people now start to bring
the first agents into production.
But that's more RAG related.
So those kind of use cases we see a lot.
And from an industry perspective, it's really all over,
but especially where it's top-heavy on having a lot of unstructured data.
And then what types of searches are vector databases great at?
And then what are the limitations like where are
they not at the right solution essentially i am a pure vector search so with nothing else a pure
vector search is kind of like casting a net in the sea right so you go like okay i want to i want to
i want to catch a fish so you cast a net into the sea and surely you're gonna catch fish but um if you're
looking for some type of fish right so the way that the stuff is organized in the net um uh is
um that's not there right so it's like it doesn't rank anything it doesn't filter anything it
none of these things and then there's's the problem that if you search for something
where the language you're using,
and again, for the sake of argument,
I'm going to stick with language models.
The language you're looking for was not in the trained model,
not part of the trained model.
Then it's like a fish just slipping through the net.
What's an example of language that was not in the trained model?
That can be product IDs, right?
That can be names of people, those kind of things, right?
So now the good news is that you first want to,
you know, there are methods to close the holes in the net,
and that's something you can do through hybrid search.
So hybrid search, that mixes the best of both worlds together.
In briefs, you can do that out of the box, right?
And just like literally three lines of code.
And so what it does, if I say like,
okay, what was the outcome of a ticket
that was raised with ID 123 ABC?
Then hybrid search will do very,
sorry, the traditional search will do very well
on ABC 123.
And then the vector search will do very well
on the whole sentence.
So now we've solved that problem.
So the answer probably sits in the net.
And now we need to start to do the re-ranking
and the filtering.
And you now see model providers that,
so for example, Cohere has like a re-ranker model.
At Weavey8, we do a lot of work
to make that part of the database itself
so that it can train based on the data that you have.
But what makes vector search very different
from traditional forms of search
is that it's more like casting casting a net and then somehow you need
to organize stuff as efficiently as you can in the net that's something we do with like re-rankers
and filters and hybrid search and those kind of things not sure if there's an answer to your
question but no no that's great so i mean basically it's great at uh you know figuring out that two
objects are related in some fashion and essentially the way it's doing that is it's it's great at figuring out that two objects are related in some fashion.
And essentially, the way it's doing that is it's going to have these objects represented in high-dimensional space
and going to use some sort of similarity metric to determine they're close in space in some fashion.
But it may be not as good at an ID lookup that a traditional database is fantastic at,
and it's really what they're designed for so the key is like can you bring these two worlds together and serve the needs of something that is an index look
up uh exact match while also being able to take advantage of these kind of fuzzy sort of similarity
type of search searches that you also want to be able to do exactly that is that is correct and um
and and so actually in fact the database is of the flavor search engine, right?
Because storing a vector embedding is easy.
You can do that.
That's just an array of floating points.
The tricky part sits in the index and the index that's used to search and retrieve, right? So, and doing that as efficient and as optimized as possible,
that's the trick,
that's the core mode of the vector database.
Yeah, so I want to get into the specifics
around how WeGate is doing their vector indexing.
So there's all kinds of different ways,
essentially, you know, indexing, embeddings,
there's, you know, cluster-based techniques,
there's tree-based index and stuff like that.
So what is actually going on in order to generate the index
behind the scenes for vectors represented in Weeby?
Yeah, sure.
So I think to answer the question, I think for the audience,
it might be, who do not know, it might be helpful to quickly say
something about what the problem is, what problem are we solving, right?
So if you
think of vector embedding right and you can i mean you can think of a tiny vector embedding
it's like three dimensions so you can imagine that it's in a in a three-dimensional space
is that we want to know what's similar to our query so let's say that i have
10 objects in my three-dimensional space, and I have a query, right?
So that's one three-dimensional representation.
The way to figure out what's closest is that you take the query,
three-dimensional object, and you compare it with the first one.
I say, what's the distance?
And it gives you a distance.
You look at the second one, you say, what's the distance?
And the third one, the fourth one,
fifth one, and so on and so forth.
And then when you hit 10, you can
just re-rank them and reorganize them
based on distance. Like, what's the closest?
What's the furthest away?
The problem with that is that that
grows in a linear fashion. So if you now
have 10 data objects,
but like
10 million or 100 million, and you don million, or billion, and you don't
have like three dimensions, but you might have like a thousand dimensions, then all
of a sudden that takes a long time to actually, it is not real time anymore.
So these algorithms were invented, and the abbreviation is ANN and stands for the approximate nearest neighbor.
But basically, we can come up with algorithms where you can quickly find nearest neighbors and it comes at a cost.
And the cost that it comes at is approximation.
So that's why, for example, when you look at these anand benchmarks they always talk
about time versus approximation so the the more likely you want the algorithm to be to
share the nearest neighbors the slower it becomes right and so this index like the way that you build it at the core of the database, the WeaveJet is built on an algorithm called HNSW.
And the reason it's built on HNSW is because with HNSW, you can build full CRUD support, create, read, update, delete.
That's something you want in a database because there are also vector algorithms that, for example, do not
support update or delete.
So then you create the index. But if you want to change
something, you have to recreate it again.
That takes a lot of time. So that's why it's
in HNSW. And then
you need to figure out how do we
sharp them? How do we scale them?
And that is the trick
of the vector database.
So that if you run,
if you have a large use case
with like a billion data objects
with a billion vector embeddings attached to that,
how do you safely and securely
and reliably scale that for your use case?
And that's what sits at the heart of the issue
and that's what's happening under the hood.
And we use open source so people can see that. And nowadays we even have like the heart of the initiative. And that's what's happening under the hood. And we use open source so people can see that.
And nowadays we even have like multiple types of these indices
so that you can have multi-tenants and you can offload stuff
and put it in memory or keep it on disk.
And that call comes off trade-offs.
But now, of course, price and those kinds of things also start to play a role.
So we're pretty sophisticated in the things you can do
and how you can scale your
factor indices.
At what point, you know, you started off by sort of describing the brute force approach
of, you know, we have 10 vectors and three-dimensional space, like we can just compare everything
and figure out what's closest or maybe the five closest or whatever we need to do. At
what point do you need essentially to introduce a vector index?
Like how many vectors does it get where this like a brute force is essentially unscalable?
So, I mean, so there are two answers to that question, right?
So you have a use case and whatever the audience can think of a use case, right?
So whatever use case is, you might have a time limit in mind, right?
So let's say that you have an e-commerce use case, then you might say, well, I need to
be at least within, I don't know, 15 milliseconds, I need to be able to present something to
my customer.
Then it's easy math.
So you can just say, well, I i got this number of data objects similarity scores
brute force take me xyz so and you would be surprised how quickly you're at 50 milliseconds
so then you quickly need a vector index but the second part of the answer is more like when do
you need a vector database and you need the database to keep it reliable because even if
you go like you know i just have a couple of data objects,
you know, I'm just going to write a little
Python script, you know, that
just, you know, does similarity
search, nothing fancy.
What if that server goes down?
You know, so
a database also comes with
all the operational stuff
of making sure that it stays reliably
available.
So the question is not
only when do you need the index, but
also when do you need to make the database.
And then you just
come to the conclusion very quickly
that you're like, you know what, I'll just throw
it in the database. Because if I
am going to write this myself, and
I want to run this in production, I need to also
take care of the, you know, if shit hits the fan, I need to take care of that too. And I don't want to run this in production, I need to also take care of the, if shit hits the fan,
I need to take care of that too.
And I don't want that.
Yeah.
Yeah, I guess this is a similar math that you would do
with conventional databases.
Like if you don't have much data, you could store it in a flat file
and do a brute force lookup or something like that.
And then eventually you're going to reach a point where you're like,
oh, I actually need to order these things.
So then you're building your own indices that you need to maintain.
And then at some point you're like, well, what actually need to order these things. So then you're building your own indices that you need to maintain. And then at some point, you're like,
well, what happens if the
file disappears? Or I corrupt the file
or, like you said, the server goes down
or something like that. And then do I want
to... At that point, you're like, well,
now I'm actually building a database and
a managed service on top of that.
Is that something that makes sense for me to do?
Exactly. There's this little joke in the industry
where we say, you know,
building a database takes 10 years.
So the first weekend, it just
takes a couple of beers and a couple of friends to come up
with the API, and then the
last nine years,
11 months, and three weeks,
you're actually building the database.
So it's a...
Building a database is hard,
and the reason it's hard is because it needs to be reliable, and building reliable technology at scale, it's a it's a building there is a heart and that's and the reason it's hard
because it
needs to be
reliable and
building
reliable
technology at
scale it's
just
complex
it's hard
to do
yeah
in in
terms of
sharding
like how
does the
sharding
work for
a vector
database
typically with
a conventional
database when
you're you're
doing sharding
you're you're
picking essentially
a you know
sharding index
for a
particular you know table based on a column maybe it's your user id or something like that and then we're going to split with a conventional database when you're doing sharding, you're picking essentially a sharding index for a particular table
based on a column. Maybe it's your user
ID or something like that, and then we're going to split
based on the ordering of user IDs
across different
servers. How does that work in
the world of vectors?
That's an excellent question.
It's actually
a bonus question because it's like
there's a double win in this question.
Because the answer to this question is the reason why factor databases exist in the first place.
And because, as I mentioned, if it's just an index, why not add that to a library?
Like, you know, where we have stuff like Lucene and those kind of things.
Why not add it there?
And that has to do with that the under the hood the
we have a lot of content on this on the if you really want to go in depth on this then we have
a lot of context and content around this on the on the website too but the the problem is this
if you create a um a chart of a um a factory index at some point you need to merge them together
because otherwise the graph just keeps jumping back and forth
and that's just tremendously slow.
That's unusable.
Some sharding mechanisms use a lot of tiny shards, right?
So that they optimize for a lot of tiny ones.
And for a vector database, you actually want to have
a couple of big ones.
So the sharding mechanism is around the bigger shards
that you want to merge together.
That's very different than, for example, how the
sharding mechanism within Lucene
works. Long story short,
it's an architectural
difference. It's a completely new
mechanism, how it's
sharded. The rest,
so there's also in ReefShade, for example,
inverted index and that kind of stuff, that is
traditional.
Tractor index itself
is similar. And that's
the reason why
not only us, but also a couple of other people
started to build Tractor database.
People were like, hey, besides the
developer experience of
interacting with the database and being different,
also just the core, the architecture is different, right?
How we built these kind of databases.
And everybody who uses a traditional database
for Vectra embeddings
will run into this over time when they scale up.
It's just in the nature,
and it's not because the existing indices are bad.
No, they're amazing. It's just they never designed to it's not because the existing indices are bad no they're amazing
it's just they never designed to deal with vector methods so that's the that is the
why these databases exist and then in terms of like similarity computation between two vectors
like there's different approaches there as well just like there's different approaches to vector
indices like there's you know euclidean, Manhattan, cosine similarity, dot products, all these
types of things that have been around for a long time in the machine learning space.
So how is that done from a developer experience?
Is that something that I'm making a choice and I have to decide how I'm going to compare
my vectors?
Or is that something that's kind of like abstracted away and I don't really need to think about?
Sometimes.
So there are two answers to that question so the first part of the answer
is that it depends on the person or the team that created the model right so the if you look at how
so if you think about a space right so um um you somehow need to organize the data in that vector space.
And that can be done on a plane, that can be done in a sphere,
that can be done on all kinds of different geometrical systems.
It's super exciting. It gets very complex quickly too.
The reason why these researchers are experimenting with that is because you want to figure out how can we
come compress compress as much information in a dense space as as possible and um that dictates
the um the distance metric that you that you need to use right so um if you're using a factor
database and you're getting very weird results, you're probably using the wrong search of the algorithm, right?
On the other end of the spectrum, the second part of the question
is that there are also compression algorithms, right?
So a simple example is binary compression and the fact that a binary
quantization works is still fascinating because it's such a simple concept what it basically does
is that if you look at effect embedding it's a floating point that's a positive or negative
number so the first one could be 1.2225 and the second one could be minus three dot whatever and what a binary
quantization does it's just like if it's positive a floating point we make it a one if it's a
negative one you make it a zero and what you basically do is that you then search the binary
space so i believe um that I believe that the distance metric
you use for that is,
I think it's Manhattan distance.
I don't know exactly,
but let's say
it's one of those metrics you use.
You retrieve a couple of them
and then you do
a traditional vector search
over these embeddings.
And it turns actually out
that if you do binary
plus a little bit of brute force,
you're actually faster
than just doing it also in the vector space.
So now it's a distance metric
that you as the user define.
That said,
we, as from a developer experience perspective,
we try to do a lot of work for you there.
So we, of course, course as a user you can choose
whatever you want to use but we also try
to predict and know
what kind of distant metrics are needed
for
you to
search through the vector space that you're using and
you might even be storing data from different
vector spaces in the same
database and you just want to do a query, right?
So you just want to have like a four-line query and not be too concerned about that kind of stuff.
So that's from a developer experience perspective, we take a lot of work off your plate there as well.
Yeah, I mean, I just want to know, like, are these things related or based on this query?
What are the five things that are most similar within the database or something like that? The binary use case is interesting. I mean, it makes a lot of
sense in terms of you're going to be able to compress the, rather than using a photo point
number, suddenly you can essentially take multiple bits and stuff them into like a 64-bit long or
something like that. And then you can take advantage of bit operators on those in order to do
the fast lookup. And then that essentially compresses that, I guess, limits the search
space that where you need to do the true vector similarity. So you're using that essentially as
a technique to compress, or I don't want to overuse compress, but essentially create like
a bounding box around the number of vectors
that you need to do the full vector similarity search against.
Yes, and that is correct.
And actually what you're saying now,
actually something else pops into my mind as well,
because inside, for example, Wavage also have traditional indices.
But because the database is built from the ground up from scratch,
we also could put a lot of innovations in the traditional indices.
So like the inverted index is completely built based on bitmaps.
So that's a similar system, but then for traditional index.
So you can benefit from all these kinds of innovations under one roof in one database.
And it's exciting.
It's very...
What the core database team is building
and the research team is very sophisticated.
And then the trick is, as you mentioned,
how do we take that obstruction away for the end user?
That is just a couple of lines of code
and all that niceness comes out of the box.
Yeah. I mean, I think it's
a really fun and fascinating
time to be involved in technology,
especially if you're working in the AI space, because there's just so much
going on there. There's
boundless things to learn and try
and experiment with. I agree.
And it's a...
If somebody's listening
to this podcast and considering starting
something, the time is now, it's like, don't wait.
Start building now because it's such an exciting time.
People are excited.
People are inventing new stuff.
Regardless if that's on the low level
in the models or in the vector databases or on the other end of the spectrum when it comes to the application layer, so much interesting stuff is happening.
It's like, this is the time to contribute to that and to create a new product or service that can ride the AI wave.
It's very exciting.
Yeah, absolutely.
So we're getting close to time, but I want
to start to... We have
some quickfire questions for you that we kind of wrap
things up with all our guests, so we'll go
quick here. So
the first one, if you can master one skill
that you don't have right now, what would it be?
Thinking.
What wastes the most time in your day?
Reading boring news websites
if you can invest in one company that's not your company who would it be and uh we already
acknowledged that we missed the bitcoin bandwagon um i would um invest in in cpu companies
i think cpus are going to play a tremendous important role in model inference in the not-too-distant future.
Yeah.
What tool or technology could you not live without?
Oh, this is going to be such a boring answer.
Oh, my glasses.
Which person influenced you the most in your career?
I would not be doing what I do right now, thanks to the many,
many people that have advised me and helped me to get where we are right now. So I just,
I can't single one person out. It's just, there are too many. No problem. And then the last one,
five years from now, will we have more people writing code day to day or less? More. Awesome.
Well, Bob, thanks so much for being here. I really enjoyed this. I feel like we barely scratched the surface here. Well, hopefully maybe we can have you back down the road.
I'd love to talk to you about, you know,
building an open source business
and all kinds of other things as well.
We'd love to come back.
Thank you so much for having me.
I love the conversation.
This is great and I hope people enjoy it.
All right.
Thanks.
Cheers.
Thank you.