The Data Stack Show - 157: From Search Engine to Answer Engine Using Grounded Generative AI, Featuring Amr Awadallah of Vectara
Episode Date: September 27, 2023Highlights from this week’s conversation include:Amr’s extensive background in data (3:23)The evolution of neural networks (9:21)The role of supervised learning in AI (11:17)Explaining Vectara (13...:07)Papers that laid the foundation for AI (15:02)Contextualized translation and personalization (20:07)Ease of use and answer-based search (25:01)AI and potential liabilities (35:54)Minimizing difficulties in large language models (36:43)The process of extracting documents in multidimensional space (44:47)Summarization process (46:33)The danger of humans misusing technology (54:59)Final thoughts and takeaways (57:12)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Welcome back to the Data Stack Show.
Kostas, maybe this is a trend, but Amr, who has worked in data science and AI for multiple decades and who started a company that is doing LLM stuff is going to be on the show.
We had Brendan Short on recently who is also doing similar things with LLMs.
But this is fascinating.
We're getting to talk to multiple people who are building companies entirely based on technology that hit the news six months, a year ago
in terms of being a hot topic.
So I'm very excited.
Amr has a really interesting past.
He worked on data science stuff at Yahoo
in product development, founded Cloudera.
And so he saw AI at the enterprise level
for over a decade and then now has actually
then spent time at Google and now has founded a company
that's doing AI stuff, which is pretty fascinating. So this isn't going to surprise you or any of our
listeners, but I want to hear the narrative arc of that, right? Like what data science AI stuff
was he doing at Yahoo in the early 2000s?
We tend to sort of call that data science and the new stuff AI,
but in reality, it's all part of one bucket.
And I want to hear his perspective on that.
And ultimately, I want to know why he founded Vectara
because he has so much context for the problem.
It's really interesting to me why he founded a company that's sort of enabling LLM technology
like inside of companies.
Yeah, a hundred percent.
I think we have an amazing opportunity here to learn from, I don't know, like a very unique
experience, right?
By the way, what I think is very interesting
and it's similar to what we were saying about
also with, about Brendan is that, again, we have
someone here who has
a very long journey, has seen many different
phases of like the maturity of the industry.
We are talking from
Yahoo.
Before that, I mean, people who will hear and listen to the episode and hear him saying
like about his PhD and like what he did there with VMs.
And then Cloudera, public company.
And after that, VP at Google, we are talking about, okay, like someone who has an experience in some very influential, let's say, companies out there.
At least like from the companies that came out of like Silicon Valley, right?
So having someone who has done a journey like this and now is starting again something from zero with LLMs, I think it's a unique
opportunity to learn both about what it means to build value out there, but also why he's
doing it with LLMs when he could practically do anything, right?
Sure.
So why in this space? Why AI? Why is it important? LLMs, when he could practically do anything, right? Sure.
So why in this space?
Why AI?
Why is it important?
And I think we are going to have very interesting conversations about many things, many concerns around this new technology.
And as we said already, I think we have the right person to talk about that stuff.
All right. Well, let's dig in. Let's do it. And as we said already, I think we have the right person to talk about that stuff.
All right.
Well, let's dig in.
Let's do it.
Amr, welcome to the Data Stack Show.
We're so excited to have you.
Good to be here.
Well, you have a long history in data.
Can you just give us the brief overview of where you started?
And then after that, I'd love to dig into Vectara and hear about what you're doing today.
Sure. I was doing my PhD at Stanford and out of Stanford, I started my first company.
It was a company called Apptivia, which was doing comparison shopping, online comparison shopping.
And it was acquired by Yahoo a year later. We were just fine people to be part of Yahoo Shopping.
So it became part of the back end of Yahoo Shopping.
I worked at that at Yahoo for about four years.
Lots of data, tons of data to process all of the crawls of product information across the web,
the specs, the images, the prices, et cetera.
And then within Yahoo, I shifted my career to be focused more on data analytics and data science for products and how can we design better products.
And I ran a team called Product Intelligence Engineering and got to see the birth of Hadoop.
And it solved a number of key problems for me in terms of scale, speed, efficiency, flexibility, agility.
And that's when I said, oh, if this works for me at Yahoo,
this will work for many others.
And I left Yahoo in 2008, teamed up with my co-founders
from Facebook, Google, and Oracle, and started Cloudera.
Spent 11 years with Cloudera.
I was one of the founders and the chief technology officer.
Cloudera went public in 2017.
It was acquired back private about two years
ago. After Cloudera, I joined the Google Cloud, where I was Vice President of Developer Relations
for a number of products within the Google Cloud portfolio, including AI and data products,
but others as well. And then after Google Cloud, I started Victara. Very cool.
Well, of course, Victara is based on LLM technology, which we'll talk about.
So of course, I had to ask ChatGPT about you before the show started.
And the very first line says that you're a pivotal figure in technology, which we have
now verified on the Data Sack Show.
And so that is true, which ironically, we will talk about truth the Data Sack show. And so that is truth.
Which, ironically, we will talk about truth relative to LLMs in the show.
That's a topic that I definitely want to dig into.
Obviously, that's a true statement.
The statement you repeated from GPT, obviously, that is a true one.
Yes, yes, exactly. We know that's true.
Meaning me being a pivotal figure in the data space. Yes, yes, exactly. We know that's... Meaning me being a pivotal figure
in the data space.
Yes, exactly.
Yes.
The data section
is where you go to verify
chat GPS, you know.
It's sort of this...
It's confirmed this way.
Yeah.
This is where you find out
if hallucinations are real or not.
Exactly.
Tell us about Vectara, though.
So what does the product do?
What problem are you solving?
Give us an overview.
Yes, so I'll give the shorter version now,
and hopefully during the show, as we get with each other,
we can expand on that.
But essentially, Vectara is a Gen AI platform.
So our goal is to enable companies to be able to integrate Gen AI capabilities
in the products with proper security,
safety, reliability around these implementations
and ease of use.
How to do that without having to go and research,
oh, which large language model should I use?
Which retrieval augmented generation strategy should I use?
Which encoder should I use? Which encoder should I
use? Which vector database should I use? No, we have a very simple API. On one end, you put in
your data. On the other end, you issue your prompts and get back amazing results. So that's
what the CloudAra GenAI platform is about. Yep. Love it. Okay. Now I want to, I actually want to rewind. I want to dig into Vectara, but I want to rewind, actually, I mean, in many ways, certainly we can discuss, you know,
sort of the semantics of LLMs and AI, right?
But it's all based in data science.
You were using data science to build better products, right?
At Cloudera, you, I'm sure, got exposure to, you know,
just a huge footprint of the enterprise
who were enabling data science, AI, ML, call it whatever you want to call it, right?
Yes.
And then, of course, at Google, I'm sure you saw the same thing
working on the Google Cloud platform.
So I have two questions.
The first one is, what are the main conclusions that you've drawn?
Obviously, you started a company that is providing this as a service.
But can you just give us sort of a narrative arc of what you've seen AI,
sort of the storyline of AI over the last decade or so?
Yes.
So there's multiple storylines, actually.
So it's hard to go over all of them right
now, but I'll try to pick the key ones. First, we have to always remind ourselves that AI,
artificial intelligence, is a genetic term. It's an umbrella term under which many technologies
are lumped. Like data science is lumped under that, machine learning is lumped under that,
large language models is lumped up with that. Neuronetworks and deep neuronetworks.
So it's really an umbrella term, right?
So within the AI space itself, there is the field of neuronetworks and whether we can
build these self-teaching networks that can learn from us and expand our knowledge beyond
what we know and solve new problems without us teaching it explicitly how to solve these
new problems. Now, that had a very hot time frame around the early 1980s and 1970s.
But then an ice age took place where that technology just wasn't working.
All this neural network stuff stopped and everybody stopped working on neural networks.
It just wasn't paying out.
And the reason why was twofold.
First, we did not have enough compute power to make these things work.
And second, we did not have enough data to feed these things.
We were still going through the digitalization of our knowledge overall.
So it died.
It died off.
But then, going forward to the late 2000s, we now had enough data and we had enough compute to make that work.
And that's when neural networks really came back by force.
And that's where we are today with the outputs of these large language models.
So that's one arc.
That's a very key arc of the revival of neural networks.
That is not about AI itself.
It's about the neural networks and how they came back
and were able to do that.
Now, AI was always very important through the years.
We had the statistical nature of AI
given a number of decisions that humans have labeled.
For example, let's pick fraud,
how we label a transaction as fraud versus not fraud.
Initially, humans did that.
There was humans looking at every transaction that Visa is doing and marking this Visa transaction looks not fraud. Initially, humans did that. There was humans looking at every transaction
that Visa is doing
and marking this Visa transaction
looks like fraud
and this Visa transaction looks like...
Mass manpower.
Yeah.
And that worked fine
when we only had five credit card users in the world.
Right.
But they very quickly figured that,
no, the scale, we have to automate that.
So statistically now,
let's study how these humans
have been labeling things as fraud and not fraud
and then leverage machine learning,
which is learning from these statistical labels
that humans have been placing,
to be able to do that at scale.
And that was the previous wave of AI, if I might say.
That's how most AI algorithms work.
It's called supervised learning
and that's where we're giving it,
we're solving the problem for it.
It's just taking our solution and scaling it
up to run at larger amounts.
The cool thing
about this new wave now with large language
models and deep neural networks
is it's able to
solve new problems that
it hasn't seen before, right?
We're showing it some of our content.
And by the way, in an unsupervised fashion,
just giving it tons of pictures,
tons of images, tons of text that we have written,
tons of articles.
And then by consuming these articles,
it's now able to do new things
that we haven't really exposed it to.
And that's why
we're all flabbergasted and impressed and excited about the possibilities that this new movement can
produce. Tell us about Vectara. What's the overview? So you have all this history doing
stuff at scale at big companies, and you decided to start Vectara. So what is Vectara? What do you do?
What problem do you solve? Yeah. So hopefully during the conversation,
I'll be able to tell you more about Vectara overall and what we do, but I'll give you the
shorter version of it right now, since we're at the start of the conversation. Vectara is a Gen
AI platform. So what I mean by that is we enable companies to embed Gen AI capabilities in their
products. And we make it very easy for them to do that and very safe and secure and maintaining
privacy around it as well. So we have a very simple API that allows them to upload their data
on one end, and then another API that allows them to issue prompts or questions against that data.
And then these responses get activated right away without them having to worry about, oh,
which vector database I am going to pick, which large language model, which encoder
technique, which segmentation technique.
We just automate all of that.
So ease of use is very core to what we do.
And then we enable you to be up and running with Gen AI in your apps in seconds.
That's really what Victara is about.
Yeah. Love it. Oh man, Costas. I'm jealous about the questions that you're going to ask,
but I will get through mine. So Amar, I don't want to oversimplify this, but I don't want to
assume that all of our listeners have studied the different disciplines within AI or even LLMs on a deep level.
And so I'm going to ask you a kind of a dumb question.
But if you think back to the work that you were doing at Yahoo
and the types of use cases that you're enabling at Vectara,
is the difference sort of data and compute as inputs?
Because it's not like you were doing primitive, you know,
it wasn't like the stuff you're doing at Yahoo, I'm assuming. It wasn't like that was, you know, the dark ages,
right? I mean, you were doing some advanced stuff and doing some really cool stuff.
But is it just compute and data? Or, you know, what are the, what are sort of the big shifts
that are, is there net new, I guess? Not of course, there's net new. I mean,
it's compute and data and new algorithms
that came into existence
that allowed us to do what we're doing today.
So there is three very seminal papers,
very similar and very important in research
that adding them with deep neural networks
on top now of the availability of compute data
that allow us to do the amazing things
that we are doing today.
The first one was in 2013.
That paper is called the Word Vectors paper.
And the Word Vectors paper was about how can we take words from the languages that we speak,
English, Spanish, Chinese, you name it,
and then map these words from a word space, which humans understand,
into a vector space, which is a numerical pointer in a multidimensional space.
But do it in a way that is very intelligent so that words that have similar meanings, like queen and king.
Queen and king has the same meaning, but one of them is male and the other one is female.
But it's the same meaning at the end of the day.
So we want to map them in that word vector space to be close to each other.
And that was the first innovation.
That innovation was very significant, by the way.
It was really the seed that started everything else.
And then after that came in 2017, the Transformers paper.
And the Transformers paper was a very efficient algorithm that allows us to leverage deep neural networks to take into account the sequence of words.
Because we know the sequence of words affect the meaning.
If I say Eric killed a bull,
it has a very different meaning than a bull killed Eric.
Yeah, yeah.
And it can happen in different situations, etc., etc.
So the sequence of words are very important
and transformers allow us to embed neural networks
with the ability to capture sequence very efficiently.
That was the genius of transformers.
That was 2017.
And then 2018 came the BERT paper, also by Google.
And the BERT paper was about how can we do unsupervised learning at scale by taking lots and lots of text, any amount of text that we want,
and then ask these neural networks to fill in the blanks, right?
So this, I would say,
Eric went buying eggs from that,
and then I leave a blank,
and they have to fill in the blank.
The blank is going to be in the supermarket.
That's where Eric would buy the eggs from.
And as you solve more and more of these puzzles,
then the neural network starts to learn
not only the vocabulary and the grammar, it started to learn the pragmatics of our language
as well. So it started to learn that Eric buys eggs at the supermarket, but Eric cooks eggs at
home. Even though Eric cooks eggs at the supermarket is grammatically correct and
it's not pragmatic. It's not pragmatic.
It's not something the language allows.
So it started to comprehend and understand.
The neural network started to comprehend and understand.
And these three papers were the foundation.
Once we took that now and started to scale it to bigger amounts of data,
we started getting what's called emergent behaviors.
We started getting new behaviors out of these neural networks
that we did not pre-code or
pre-determine that will happen.
But because we started scaling them with these proper fundamentals, they start to exhibit
these amazing patterns to the level now where we have these amazing things like the large
language models that can solve puzzles and explain jokes and translate or rewrite the
passage in the style of Shakespeare.
So that was it.
So it wasn't just to answer your question briefly, Eric, and sorry for the long answer.
It wasn't just data and compute.
It was data and compute combined with some amazing new software techniques and algorithms that happened over the last 10 years.
Yeah, so helpful.
Yeah, no, that was kind of the intention of asking a little bit of a dumb question, just because I know you have such a deep knowledge of
this and have studied it. And so I wanted to draw some of those things out. Two more questions,
and then I want to hand it over to Costas. That's generally a lie. So maybe it's three more
questions. The first one, and I want to draw on some of the academic work that you just talked about.
I recently read an article that popped up on Hacker News about various translations of Dostoevsky's Brothers Karamazov. discussing how translations have been pretty divergent, actually,
because it's highly dependent on an individual's exposure to a wider set of Russian literature.
And so we don't want to dive into the humanities too deeply on the Data Stack Show.
But what you just described is really interesting to me because
it stands to reason that you could actually leverage some of those fundamentals to produce a
pretty highly accurate contextualized translation that relies on essentially the full documented
history of Russian literature as a corpus of work.
Is that possible?
Absolutely.
Absolutely.
Yes, you can.
Both angles are possible, meaning the angle of generalize this so it reflects the full
richness of the style and the culture and political biases and the historical narratives
in the answer.
Or you can also personalize it to the extreme where you can say, I, Eric, I believe. political biases and the historical narratives in the answer.
Or you can also personalize it to the extreme where you can say, I, Eric, I believe I'm
like my thought process, my beliefs, my culture, upbringing are way more aligned with this
style.
Only give me that.
Yes.
Don't give me the rest.
So give me the point where we can personalize messages for every single person exactly in
the way that they expect it. Yeah. And that's one of the beautiful byproducts of this technology.
Yeah. It's exciting and scary, right? Like if I can ask for the dossier that I want,
it's a little bit scary. Okay. Well, let's save that because I think we should discuss that maybe
towards the end of the show, but let's get really practical. And I want to really dig in on Vectara as I hand the mic off to Costas. And so what I would love to do is to hear about
something you worked on at Yahoo and how you would use Vectara to do that differently? How would you inform building a product better with Vectara back in
your Yahoo days if Vectara was available? And so maybe can you frame it in terms of,
you know, we had this problem and we maybe approach it in this way, but if I had Vectara,
what I would have done is this. Maybe that'll help us understand a very practical use case
for how a practitioner
would leverage these APIs
and how a company can actually
sort of operationalize Victara.
It's a very interesting question
to put it that way
because at Yahoo,
in my last four years,
I was working on Surf,
how to make Yahoo Surf better.
Oh, wow. No way.
Yeah.
And frankly,
the approach Vict Victoria takes to surf
is the right approach because it's not about search engines anymore. It's about answer engines.
And that's what we see from ChatGPT. ChatGPT, why do people love ChatGPT so much? Besides the
fact that it can help them write their homework, it's the fact that it can answer the question.
When I ask, explain to me thermodynamics, it doesn't give me, here are 10 links of thermodynamics,
go read them so you can figure it out.
No, it'll tell you, would you like it to explain to you at the level of a five-year-old?
Yes.
And it gives you the five-year-old explanation right there in the answer.
You don't have to go click on anything.
You are getting the answer.
So what we are very focused on at Victara with our first products, we'll have
many products more aligned with
moving into the direction that I refer to as
action engines in the future. So take action
on my app as well, verbally.
Today we are focused on, can I give you the
right answer to your question?
But how can you ground that answer in your
own results now as an organization?
So if you are a specific company, we're not doing
this for consumers, by the way, for
the web.
You.com is a great search engine that's trying to do that for the web.
Of course, Google is trying to do that today with BARD as well.
And Bing is doing that with Bing Chat.
We are focused instead on organizations.
So if you're an organization that has a lot of knowledge, say, for example, you are an
investment firm,
and that investment firm has lots of analyst reports from the analysts at J.P. Morgan and
Goldman Sachs. It has lots of investment mails that they wrote, who are they going to invest in
and who they are not, depending on their decision criteria. They have all of the PowerPoint slides
from the entrepreneurs pitching them, and they're storing that all in this very smart
system. And they're coming later and asking the question, oh, there is an entrepreneur now
pitching me on blockchain from Greece. Should I listen to them or not? And the system, I see
nothing is ahead. So I see this. So the system will respond back and say, instead of giving you
a list of the previous memos,
like go read these previous memos, it will tell you, no, this is why you should invest.
This is why you should not.
The pros and cons, depending on your historical decision framing and your historical knowledge that you have in these documents.
And then if we get that answer right over and over again, then the next step is now
me as a knowledge worker, meaning an investor working with the system, I would say, can you just put this for me into an email I can send to our investment
committee for why we should pass on this deal or why we should look further at this deal?
So that's kind of the trajectory.
I hope this makes it concrete to you what we do.
We are helping you take your knowledge and activate that knowledge as a resource that
is no longer you sifting through massive amounts of
documents to come up with a PRD or with an analyst report or with a pharma researcher document for
why a new drug has adverse effects. No, we're giving you the response right there, increasing
your efficiency 10x while doing that. So this is really what this motion is about.
Yeah, 100%. And it really sounds like it's
sort of not only sort of a B2B play where you can enable sort of a knowledge worker, but also,
let's say, B2C, where you have a corpus of knowledge, and you're exposing that same
knowledge to an end user who can also leverage that, right? I mean, I'm thinking about your typical,
let's take Docs as an example, right? If you have expansive products like Stripe, say,
extensive Docs, amazing Docs, their team has done an awesome job. But the search is really
primitive, right? It hasn't changed in 15 years, like the Docarch. I mean, maybe the matching algorithms have gotten better,
but imagine a world
where they could provide sort of
answer-based...
Exactly. Yeah, exactly.
And I mean, RutherStack, we are big
users of RutherStack at Victara.
Amazing product. Thank you for building that amazing
product. Same thing, like RutherStack
has lots of documentation,
lots of knowledge-based
articles from issues that your customers are facing as they're developing with your framer.
Imagine now having an engine like Victoria, sorry for pitching you on using us,
indexes all of that content.
And when a developer now has a question, it's not going to point them, here's the document
you have to go read to figure out how to do this.
We tell them, here's the answer. Here are the steps you need to go through. Step number one, do this we'll tell them here's the answer here are the steps you need to go through step number one do
this step number two do that step number three do this and you're done and if they have a follow-up
question they can say oh tell me more about step number three and it will tell them more
almost like they're speaking to a live customer support rep so customer support use cases are
actually the number one use case for us that's what most of our customers are using us for
and i predict that five years from now,
call centers or having customer support reps
is not going to be required anymore.
This technology will completely,
not just from Victara,
but from other companies building this capability,
will completely be replaced by large language models.
To your point, though, on B2C,
I cannot mention their name,
but we are working with a very large social media company
that has user-generated content about topics, many different types of topics that you can
write about.
And they're evaluating our platform right now exactly to do what you just said.
Because when people are surfing, they're very frustrated, especially with keyword surf.
When you have user-generated content, people say the same thing in many ways.
They mis-tell it.
So you never find the right answer when using legacy search techniques. And then they're evaluating our system in a B2B2C concept
where now you can ask the question and you're getting back not the posts that everybody else
wrote. You're getting back the digestion of the summary of the wisdom of the knowledge of all of
these posts and the response that you're seeing. And if you want to click through and go deeper,
then you can do that. Absolutely right.
Yeah, I love it.
And certainly Vectara
is going to recommend
to any investor
who's interested in a data startup
to scan the episodes
for the Data Stack Show
because if they haven't been
on the Data Stack Show,
that's a pretty...
If they haven't been on there,
they're suspects.
I don't know if you want to put your LP's money there.
Absolutely.
Absolutely.
There's another thing I want to double click on, which is why Victoria now and why, what
am I doing differently for Victoria as compared to Cloudera, which was my previous iteration.
Yeah.
Yeah.
Yeah.
I'd love that.
So Cloudera, Cloudera is a successful company.
We did an IPO.
We got acquired for $5.3 billion
from the public markets.
By all means, making $2 billion,
almost $2 billion in revenue
every year right now.
Very successful.
But I have no shame in saying
I was always super jealous of Snowflake.
Snowflake is another company in our space that was able to achieve 10 times the valuation,
the ultimate valuation that we were able to achieve at Cloudera.
And when I studied carefully, Cloudera had a very powerful platform, extremely powerful,
extremely complex, can do many things, machine learning, data science, ETL, storage, compute,
high-performance computing, simulations, Monte Carlo. It can do many things, machine learning, data science, ETL, storage, compute, high performance computing, simulations, Monte Carlo.
It can do many things.
It was so freaking hard to use, right?
Only the rocket scientist engineers could use it and get something useful done with
it.
Snowflake, on the other hand, came in and said, we're going to attack an existing problem,
which is databases.
Google has an amazing database called BigQuery.
Amazon has an amazing database called Redshift and extensions of that. Microsoft has, of course, SQL Server, which was
number one in the world, and they have Cosmo DB. Yet they were still able to come in and disrupt
that entire market by doing one thing, but doing that one thing really well, which is what? Ease of
use. They nailed the ease of use. They said, we're going to have a very simple API.
You upload your data on one end.
On the other end, you run your SQL queries.
You don't have to worry about partitions, indexing, primary keys, rebalancing.
We'll take care of automating all of that for you so you get amazing results.
And that formula worked.
That formula worked.
There's no question about it. So one of the key things that lessons I learned from my experience with Cloudera, which I'm
focused on at Diktara, is how can we do the same things for large language models and Gen EI at
large? How can I provide developers with a super simple API? They plug in the data on one end,
they get issued their prompts on the other end with an API and get amazing responses in return without having to be worried about, oh, which vector database, which encoder,
which neural network, which language model, which language, like everything just gets
loop balanced and scaled and secure and private automatically for them.
So that's kind of the main distinction or the main lesson that I learned from my previous
journey that I'm applying in my current journey.
And hopefully it proves to be the correct lesson to be following.
I will agree with you.
I'm super happy that you brought up Snowflake actually, because you mentioned like a couple
of things about like the decisions that they made that I think like they are very important.
And they are also like a very good, let's say, introduction to what, you know, created all this success with systems like ChatGPT, right?
The ease of use for technology, I think, and it doesn't matter who the user is, by the way, I would argue.
And I'd love to hear your opinion on that because with your role as VP of DevRel in Google,
you are much more experienced than me on how developers work.
But even with hardcore developers,
how easy it is to use and remove obstacles
and allow them to go and do the work that they have to do,
it's super, super important.
This will never change.
And this is like related with
like the more general, like let's
say topic of like the human-computer
interaction, which I would like to chat
about that, but a little bit later
because I have a question that
I want to ask you for quite a while
now while we're talking with Eric.
So you mentioned
that we have scenes search engines
who by the way if i know that like most people right now that they are using google they don't
realize that but one of the greatest successes of google search in my opinion is how easy it is to
use it right it's just like a text box you just like put something there and you get results.
Replicate that at scale, it's extremely hard.
Similarly with Apple and all that stuff, but we have still like search engines. And search engines return back data, right?
They don't give answers, as you put it very well.
Now, there is also something
important here, though, that I get the data as the user, make decisions, and I am liable
for these decisions, right? It's the machine liable because the machine does not like serving
data, right? Now, and we've seen what misinformation can do, right? Especially with social media and all these things.
So information is a very powerful tool.
So what happens when we move away from that
and now the machine actually gives answers, right?
So there's the balance of who is liable here.
It's not that clear anymore.
We're not used to it.
We don't know how to deal with that.
What happens, for example, inside the company,
the customers that you're working with,
I ask something and the machine hallucinates
and talks about an imaginary PRD that doesn't exist.
And I'm going to talk about this presentation.
Right?
I mean, exaggerating.
That happened to a lawyer.
I think there is a lawyer.
There was something in the news a couple of weeks ago
about a lawyer that GBT made up a number of cases
as part of his defense.
And the judge was like, none of this exists.
Like, nobody ever did that.
A hundred percent.
A hundred percent.
I mean, I'm exaggerating a little bit,
but I think it is an important topic.
And I think it's something that if we want to move fast
and harness the value of like this amazing technology,
it's something that we humans,
because we have to figure this out.
Like we have to figure it out.
So what's your opinion on that?
And what's the tooling that companies like Vectara can bring out there to help in doing
that?
Yes.
Yes.
Excellent question.
And there's another question you had there, which is that the developers out there at
ease of use and how I see developers in the world, given my role at Google.
So first, answering that question
on automation of decisions.
We have been automating decisions already
and doing it.
This is not new.
Like, again, I give the example
of credit card fraud.
Credit card transactions are being scored
and automated in real time,
and they have been improving significantly
over the last 20 years.
20 years ago,
if you were to travel anywhere
and use your credit card in a new
country, it would get blocked right away. Because they just had a rule saying, if not in-country,
block transaction. It was as simple as that. And now it's a lot more dynamic and figures out,
oh, you just bought a ticket with your credit card, you probably are traveling.
So these systems are getting smarter over time. Google Maps, for me, Google Maps is one of the
best decision-making tools that we have ever used at scale.
We follow directions.
We tell us go right, we go right.
Tells us to go left, we go left.
We are literally doing what the app is telling us to do.
If it ends up sending us to the wrong address, do people file a lawsuit against Google?
Hey, you're liable for sending me to the wrong place?
Not yet, I guess.
I haven't seen anything like that.
That said, now, when you apply that to autonomous vehicles,
which I have been using Cruise and Waymo in San Francisco,
you can use those in San Francisco now,
it's a very big question mark for me.
Like, who's liable now when the car does an accident?
Is it the car manufacturer that made the car?
In the case of Waymo, it's Porsche.
Is it the AI driving the car?
Is it me, the person who is renting the car or buying the car to do something for them?
It's actually a very big question.
I don't know the answer to that.
That's actually going to be very interesting to figure out how we're going to figure out
the liability of the mistakes.
But what I know for sure we need to do is we need to minimize the mistakes, right?
We need to minimize the mistakes because the more we minimize them,
the more likely we will use them in the correct way.
Again, credit card transactions
being a perfect example of that.
So if we continue to have
large language models that hallucinate,
it would be very hard to use them in a business context.
I have to first start by saying
that hallucination is a feature, not a bug.
Like hallucination is something we, is useful.. Like hallucination is something that is useful.
We use it for some things.
When you're creating a new mid-journey picture and it comes out a new hallucinate picture that nobody has ever seen before,
or you create a new movie script by defining or a new poem by telling the judge, that's hallucination.
That's literally it making up something new that nobody has seen it before.
That's literally it making up something new that nobody has seen it before. That's useful.
But when you're asking it to write a product plan for me or write a legal draft for me,
you cannot tell us any facts that don't exist.
So the problem, the solution to the problem, which is not perfectly solved yet, I have to be clear about that, but we are minimizing it as much as we can, that we're adopting
at Victora is called grounded generation.
So grounded generation is about how can I generate that response to the prompts that
you provided, but grounded in the facts, right?
So when you're saying, write this new PRD based on these features that we had in the
past, and based on this design of our product, and based on the manuals of our documentation,
then it's only constraining that output to what these documents are providing.
It's not going and trying to come up with a new output based on its statistical model
that it has in its neural network brain.
And that's how we minimize hallucinations in these systems.
And that's the approach that we are taking at the Victor.
That approach has other benefits, by the way.
By separating the knowledge in the large language model,
which is what it learned about the world,
and the knowledge graph or the knowledge base
of the content of your organization or your business,
you're now allowing it to be real-time as well.
Because now new data can be added to that knowledge graph.
And as a new prompt comes in,
the large language model does not need to be retrained. We don't have
to go and retrain it and refi-tune it.
It can pick up that response and start giving you an
answer right away. So that's another very key
benefit that approach
provides and brings to you. It's also
a lot more cheaper to do it this way than
to fine-tune a model from scratch.
And last but not least, it solves a problem
called model pollution,
where you might be afraid that the large language
model is being trained on your data.
And now if you're using that large language model from another vendor,
you might be afraid, oh, what if that vendor jumps into MySpace?
So, for example, let's say RutterStack wants to put all of their documentation
and even source code inside of a large language model.
Oh, what prevents you now from repeating RutterStack?
So you need to have this balance between the large language model is not being trained
on the data, it's simply serving the data back and interpreting the data.
So here's some of the problems that this kind of approach of ground degeneration, which
by the way, that's not unique to us.
Like ground generation is a technique that many companies do.
I think we are the best at doing it.
It's also referred to sometimes as retrieval augmented generation is the name of that. Now, can I address your other question about developers?
Yes, please. Yes. So I have a layman way I used to explain this. There's two types of developers
in this world and they're both important types and they both spend money and we should, as vendors,
we should be building for both of them. The type i call them the ikea developers like ikea the furniture company and the other
type i call the home depot developers so what's the difference here the home depot developers
they are very descriptive in nature so descriptive is the more technical term so what that means is
they like to describe to me what I can do.
Don't tell me what I can do.
Describe to me what I can do.
Give me all the Lego blocks.
Give me all the pieces and let me figure it out.
So that's like Home Depot. I'm going to go to Home Depot.
I'm going to buy the planks of wood.
I'm going to buy the hammer, the nails, and the saw.
And I'm going to make a nice desk myself because I love making desks.
It's something I enjoy doing.
Right?
So that's the first type of developer.
Most Silicon Valley type developers, they tend to be in that category.
They tend to be, give me the building blocks and let me figure it out.
That's why Cloudera actually was doing very well in the Silicon Valley.
Because Cloudera was a very complex platform that had all of these Lego blocks.
The IKEA developer, which prefers something like a snowflake or a Victara, they are like,
don't let me figure out how to do it. I don't know how to do it. Tell me how to do it. So they
are prescriptive in nature. They're like, give me the recipe. The recipe has step number one,
do this. Step number two, do that. Step number three, do this. And then at the end, I have an
amazing solution that is working.
So that's IKEA.
When you buy this desk inside of a box and it has steps, the recipe, the prescription,
you end doing the prescription and you end up with an amazing desk.
So that's exactly the model that Snowflake followed.
And it's the model that BigPara is following.
I think both are important.
I think you can make good money from both approaches.
I might be wrong, but I think there is more developers in the IKEA category
that just want to get their job done.
Give me an nice and easy to use API that gets my job done versus the other category of give me all the building blocks and let me figure it out myself.
Yeah, yeah.
I agree.
I mean, we can talk a lot about that actually.
Maybe we should have like an episode to just chat about the different archetypes of developers
and how they are affected, let's say, or how they affect this generative AI revolution
that is happening right now.
But we need to focus on a few other things today. So, okay, let's talk a little bit more about the technology behind Vectara, right?
So, what it takes to build a platform that serves generative AI?
It's something that, I mean, okay, we've been hearing about like SaaS
platforms for a very long time.
I think like pretty much like every engineer who has been around for a while,
they have even unconsciously, let's say like a mental model of how a system
like this would look like, right?
But when it comes to generative AI, I have a feeling, correct me if I'm wrong, but I think things probably are a little bit different.
And platforms might be a little bit different.
So how do you architect a system like Vectara with the goal to not be like Cloudera, right?
Yeah.
Yes. So actually,
the Victara to not be like Cloudera,
meaning to be easy to use
versus being powerful and complex,
that's more a product approach, actually,
than a technology approach per se.
It's like the products
and how we're designing the API,
how you're choosing what to expose to end users
and what not to expose.
I think that's more what comes from that.
But from the technological design point of view,
it's actually very similar.
Like we're still using Kubernetes
as our underlying fabric for deploying stuff.
We need to be able to provision to servers
that have GPUs in them.
So that may be a little bit of a difference
from the previous wave
where we don't really care about GPUs that much.
So being conscious of how GPUs are being utilized is key.
But the Victara pipeline,
if you look at it,
I can
define it to five components.
We have a document
parser that knows how to parse documents
coming in and extract
the text, but then
tokenize that text. So a very
key thing to using this large language model is something called
tokenization and how we generate these tokens for different languages,
English, French, Japanese, et cetera, et cetera.
So that's the first component that we have.
The second component is a neural network encoder.
And the neural network encoder is the technology
that knows how to go from human language space,
meaning English, French, Japanese, Chinese,
into computer language space, the lingua franca space, meaning English, French, Japanese, Chinese, into computer language space, the lingua
franca space, that is symbolic vector space that understands all languages. And that's a very
critical component. It's actually one of our secret sauce components. My co-founder Amin, he
was on the team that built one of the very first multilingual encoders at Google Research. That is the space in which most of the operations take place.
You will hear that space called as an embedding space.
You will hear the term embedding.
That's what it means.
In reality, if you're a mathematician, it's a vector.
It's a multidimensional vector in a space that can have many dimensions,
thousands of dimensions.
So it's not that three-dimensional vector.
And then every concept or every meaning is a pointer in that space,
in that multidimensional space.
And by the way, our name is Victara because of that, right?
Because of the underlying theme of how these things work is the vectors.
So that's the second module, okay?
So the first module is the document extraction.
The second module is the encoding into a vector space.
The third module is a vector database.
So you need to have a vector database that knows how to store these vectors. And then when new
vectors come in for a question that somebody is asking, it finds the vectors that are closest in
proximity pointing in the same direction, which means they have the same meaning. It's really as
simple as that. But to make it work at scale is, of course, very hard. So we do have a vector
database that we built at the heart of our platform.
We leveraged an open source library from Facebook called the FIAS library.
Then we added extensibility for it to be scalable, to be multi-tenant, to balance in memory on this for the economics of doing this,
and to be able to fine-tune the speed with which it's doing the matching depending on the use case.
So that's the middle component that does the matching. And then you have
another neural network that's called a cross-attentional re-ranker. And sorry
for getting too technical here. But that neural network is very essential because that neural
network picks the output of the vector database, which are, here are the 10 facts
that are most relevant to the prompt or the question you're asking. You need to
re-rank these facts so that the most relevant fact is number one,
based on meaning and understanding again.
So that cross-attentional neural network,
it's a birth-like model.
It re-ranks the facts that have been retrieved
from the back of the database.
So the most relevant one to the prompt or question
is the first one, and then the second one,
and the third one.
And that's very important
because the last step is the neural network, which is the large
language model that's doing the summarization.
And these summarization systems, they pay more attention to things earlier in the context
window, meaning what you're giving to them in the prompt versus later in the context
window.
Actually, they pay attention to the beginning and the end, and the middle can sometimes
be glossed over.
So it's very important to rank these things appropriately.
And then you, so the last step in the pipeline
is to pass this to this summarizer
that takes here of the input facts
that we found in the right order.
Please now summarize this into a cohesive
generated output that I can give to my end users
as a function of the prompt or the task
or the question that just came in.
So that's literally our pipeline beginning to end.
You can go and build this pipeline.
You can use LanChain.
You can use Pinecone as a vector database.
You can use a model for BERT from HuggingFace.
You can use GPT for the summarization.
You can use Ada from OpenAI for the embedding.
And you can build it all yourself.
So that's what the Tinkerer developer,
that's what the Home Depot developer would do.
They would go do that.
Or you can use Victara.
You don't need to know about any of that stuff.
You have one API with the data,
the other API with the prompt, and it just works.
Yeah, no, that was awesome.
One question.
You mentioned the term facts.
What is a fact?
How do you define a fact in in the context of Vectara?
Excellent question. So in fact, you can, the layman way of saying it, it's a search results,
right? It's what are the search results that are most relevant to the task at hand,
the task I'm trying to do right now. But I prefer calling them facts because it goes back to this
grounded generation because the end output
is not the 10 results, really.
We don't want to show the 10 results to the end user.
We want to show them the answer.
So that's why we are saying these 10 results are really the facts, the underlying facts
in which the answer is being grounded as you're providing it back to the end user.
And by the way, that's tunable.
You can say sometimes I only want the top five results. You can say I want the top 10 results. Today's static, the future where it will keep
going down the facts until there's a relevance threshold you hit, where this next fact now is
not going to add a lot more informational content to the response. Now this is a bit technical,
but the reason why you want to do that is in the large language models are expensive to run, actually.
They're very expensive to run the large language models.
So the more tokens you give them in the inputs, the more cost you pay as you're running them.
So you want to minimize how many facts or how many words am I giving in this last stage?
So that's why we are researching dynamically tuning how many facts it will be providing to the summarizer.
And apologies for getting too technical here.
I don't know. Please get as technical as you want. Both us and the audience enjoy that.
Okay. We are close to the end here and I want to give the microphone back to
Eric. But before I do that, one question about just to make the grounding process that you described, together with what you said about the facts, a little bit more clear to the audience out there.
So how does grounding through all these processes, like this pipeline that you described, is actually implemented?
And how much the user needs to be aware of the process itself?
How transparent the grounding process is?
Yes, excellent question.
So to give a layman analogy,
it's like you have two humans working together.
There is one human who is very good at speaking English
and giving you a perfect response to a question, but they
have no knowledge. They don't have the knowledge. They don't know what are the right answers to a
given question. Like for example, should I invest in this entrepreneur pitching me on blockchain
from Greece? So that's the question. They have no idea, but if they're given the right facts, they can say them back in a very good way, right?
So that's one human.
And then the other human is the human that knows everything.
They literally read all the documents you have ever got.
They read all of the articles.
They read everything about this topic.
So that when the question comes, it goes to that first human first.
That human is not good at writing answers.
They don't have good English writing skills, but they have very good comprehension skills,
right?
They're really good at matching concepts to each other.
And they can do them across languages.
They can do them for English, Spanish, German, Chinese, all at the same time.
So the question comes to them, to that first human.
The first human looks at the knowledge base they have.
And they have photographic memory, by the way.
That's another key thing.
That human remembers everything exactly.
It's not compressing the knowledge down like a large language model does.
So that human now gets the question.
They leverage this photographic memory to come up with these are the most relevant facts to be able to answer that question about whether you should invest in this entrepreneur pitching you a blockchain from Greece.
I don't know how to compile them into a balanced response.
Here you go, second human.
You're very good at prose.
You're very good at writing amazing English
or you're very good at writing amazing Chinese,
depending on the end user and who's asking the question
or Greek or Italian or whatever.
And then they take these facts,
they compile them now into a response
that they give back to the end user.
So by doing these two things together, we are now preventing this second user from making up stuff, right?
This second creative user, which is very good at writing, we're telling it, when you are getting that response to this question, don't try to rely on your memory because your memory is a compressed memory.
The way that language models work is they compress all of our human knowledge
into a very small footprint, right?
Like petabytes and petabytes of data gets literally compressed into a hundred
gigabytes.
Like that's really what the large language models are doing.
So by definition, they are lossy, meaning they, they will not have the full knowledge
base in their head at all times.
So by telling them only give you a response as a function of these facts I just
provided to you, that's how you
significantly reduce the probability that they
will make something up. It's not zero
still, though. They can still make something up.
We are thinking of adding another
stage, and sorry for
going for too long in this response,
where we do like newspapers.
So newspapers, when they have
reporters writing articles,
they always have another step before the article goes out
called fact-checking, the fact-checking step, right?
That's where they extract the key facts in that article
and they send it to a fact-checker
that double-checks that these facts are all true
and not something that the reporter hallucinated
as they were writing the article.
We don't do that yet today,
but we'll probably add something like that in the future as well
to help further minimize
the probability of a bad fact.
Doing that right,
getting hallucination to be zero,
is required for us to move
from answer engines to action engines.
We cannot move to action engines
without solving that problem.
Because as you said earlier,
now you open up your self-reliability.
If you hallucinate the wrong response, we tell you, yes, go invest in that entrepreneur.
All of your previous documents say you should invest in that guy from Greece, and then it turns out to be a bad investment.
Yeah, 100%.
Okay, that was fascinating.
I'll give the microphone back to Eric now.
All right. Well, we are at the buzzer, as we say on the show. But
just one final question for you. If we sort of zoom out, I mean, these are really critical
issues. I love the concept of fact checking. But Amr, when you think about AI, what keeps you up
at night in terms of the risks that we need to be aware of in general?
You know, I think that, you know, as I think about Vectara and sort of having an API that
makes all of the knowledge at RutterSec available, like, I mean, that's a no-brainer win. I mean,
I think this is going to really change the way that our customers could interface with our knowledge base.
But it's larger than that.
It's more than just an API.
And you've seen this over several decades.
So what keeps you up at night?
What are you worried about with AI?
Yes.
So first, it's not what you hear from some folks saying Skynet and Lumen Doom and AI is going to take over the world.
That's not what keeps me up at night.
What keeps you up at night is humans.
Humans misuse technology.
Like that's not technology itself.
Humans misuse technology.
So in the same way, when Henry Ford and all the amazing people that helped make the car, their goal was to make a device that helps us get from point A to point B
very efficiently.
Their goal was not to create a device that kills 50,000 people per year,
which is,
that's how many people just,
I think only in the US that die from car accident.
It's not their goal.
That was not their goal.
Their goal was not somebody to take a car and go purposely run over people
and kill them with it or drive it where they're drunk and then kill an entire
family.
That was not their goal.
We, humans, we are the ones who make mistakes like that,
or purposely use technology in a dangerous way.
That's what keeps me up at night.
What keeps me up at night is somebody using large language models
to control us in a negative way.
Because as I said, with these large language models,
I can rehearse how to say things to you, Eric,
to always buy the product I care about.
And you wouldn't know if you're buying the product
because it's truly good for you
or because the messaging you got was so convincing
and hence you ended up buying it.
So that's kind of the concern I have.
It's like Cambridge Analytica.
Remember Cambridge Analytica?
The Trump election? Take that now and multiply it by a million, right? So that's kind of the concern I have. It's like Cambridge Analytica. Remember Cambridge Analytica? Oh, sure.
The Trump election?
Take that now and multiply it by a million, right?
So it's Cambridge Analytica analyzing our profiles, each one of us,
but not now coming with segments of messages.
We're going to send these messages here that play on their fears of these people.
We're going to make a message just for Eric because we know exactly what Eric cares about.
We're going to nail it.
And they're going to vote in our direction because of getting that message.
That is one of my key concerns, actually,
for how this technology is going to evolve.
And my answer to it is we need to have,
if I had time to work on another startup,
I would go build that startup.
But we need to have the antivirus.
Like in the same way we have viruses for computers
and viruses are so illegal.
And when we catch people building viruses,
they go to prison,
but they still happen day in and day out
because humans are bad.
Some humans are bad.
Not all humans are bad.
And they do these things.
We have the antivirus now
that can catch and stop these things.
We need something like that for this
to be able to,
when you are hearing a message
or when you're seeing an ad
or when you're having a conversation
with Amr telling you why Victana
is the best product for your company,
a small light would show up and say, this is manipulation taking place right now.
You're being manipulated 80% of the time and only 20% is true goodness for you.
Actually, I don't know what the answer to the problem is, but when you ask me the question,
what keeps you up at night, that's what keeps me up at night.
Yeah, sure.
Well, maybe we need to go back to greek virtue right i mean the the heroes in greek mythology sort of operated according to a core set
of values right and so perhaps it it comes back to the humans on the receiving end as well
who you know have a value system that can interpret that so yeah but the problem is we
have different value systems.
So this is definitely a topic for a whole episode.
We can talk about this for...
Because I have a value system
that might not be exactly your value system.
It's not exactly the value system of somebody else.
How do you do that properly?
It's really hard.
It's really hard.
It's very difficult.
Well, what a wonderful topic, Amr.
Let's have you back on and then we can just have an open discussion on that topic, which will be great.
We don't have to get into the tech as much.
But thank you for sharing your story.
Vectara seems amazing.
Congratulations.
And this has been an amazing show, so thanks for giving us some of your time.
My pleasure. It's been awesome to be on the thanks for giving us some of your time. My pleasure.
It's been awesome to be on the show with both of you.
Costas, wow.
I'm trying to process the span
or the expanse maybe of that conversation
with Amr from Vectara.
I mean, what a heavy hitter, right?
Like started a company, sold it to Yahoo,
did data science stuff at Yahoo,
founded Cloudera, went to Google,
and then started a company that's doing stuff on LLMs.
I mean, heavy hitter might be an understatement
for someone who has that track record.
But what an approachable guy,
first of all, right? I mean, just conversational, very helpful. I think one of the things that
was really helpful for me was him breaking down the academic, sort of the technical academic advancements that have enabled modern AI as it sort of manifests in LLMs, right?
Like when, you know, people like to talk about, they just, you know, it's like they scraped the whole internet, right?
And like Microsoft, you know, just provided unlimited compute power.
He really defined the academic calendar starting in 2013 of the major breakthroughs and algorithms that enabled what we are experiencing today.
That was really helpful.
To me, at least. I hope for our listeners. But I think the other thing was that, you know, I guess if I had to describe one of my big takeaways is that Amr
really seems like a steward of technology, if that makes sense, right? He's not blindly forging ahead
and he's thinking about what is being built.
Right.
And I think that actually is expressed in what his company Vectar provides, which is
just starting out.
It's an API, right?
It's an endpoint.
But I think that's reflective of his approach to how we wield these technologies.
So I don't know, lots to think about.
What do you think?
Well, I want to emphasize only one thing. I'll leave the rest for our audience to go and listen to the episode.
But I want to share with everyone that we spent about an hour with someone who came to the United States
from Egypt in 1995, did a PhD in Stanford, started the company, got acquired by Yahoo.
After that, started Cloudera, a company that went public, right? Was the CTO there, one of the founders.
VP of developer relationships at Google.
And today, he's starting something again from zero, right?
And the most important thing, which, I don't know,
for me at least was like probably the most unique
and exciting thing
of this whole conversation was his energy and his excitement.
Thinking that you have a person who has done all these things
and after doing all these things has this energy and this excitement
about starting something new, I don't know.
I find it something very unique, something that you
can find with people in
technology.
And it's, I don't know.
Just for that, I would suggest
to anyone to
watch these episodes.
And they will be surprised at the things
that they will learn from
tomorrow.
Yeah, I agree.
Someone who has done all that, if they start something new, you probably should pay attention to it. that they will learn from Mark. Yeah, I agree. I agree.
Someone who has done all that,
if they start something new,
you probably should pay attention to it because they're not going to make a light bet.
All right.
Well, definitely listen to this one.
Really fascinating conversation.
We get into the details of LLMs on a technical level.
How do you actually go to market with that?
But a great history as well. If you have not subscribed, definitely subscribe. You can get
Datastack Show wherever you get your podcasts. Tell a friend and we'll catch you on the next one.
We hope you enjoyed this episode of the Datastack Show. Be sure to subscribe on your favorite
podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric
at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by
Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.