Disseminate: The Computer Science Research Podcast - Vikramank Singh | Panda: Performance Debugging for Databases using LLM Agents | #47
Episode Date: March 4, 2024In this episode, Vikramank Singh introduces the Panda framework, aimed at refining Large Language Models' (LLMs) capability to address database performance issues. Vikramank elaborates on Panda's four... components—Grounding, Verification, Affordance, and Feedback—illustrating how they collaborate to contextualize LLM responses and deliver actionable recommendations. By bridging the divide between technical knowledge and practical troubleshooting needs, Panda has the potential to revolutionize database debugging practices, offering a promising avenue for more effective and efficient resolution of performance challenges in database systems. Tune in to learn more! Links:CIDR'24 PaperVikramank's LinkedIn Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello and welcome to Disseminate, a computer science research podcast. I'm your host, Jack Wardby.
Quick reminder that if you do enjoy the show, please do consider supporting us through Buy Me A Coffee.
It really helps us to keep making the show.
Today, I'm joined by Vikramant Singh, who will be telling us everything we need to know about Panda,
performance debugging for databases using LLM agents.
Vikramant is on the AWS Redshift team, where he's an applied scientist.
Welcome to the show, Vikramank.
Hey, thanks. Thanks for having me. Good to be here.
Fantastic. The pleasure is all ours.
So can you tell us a little bit more about yourself and, yeah,
how you became interested in database management research?
Sure. So, yeah, as you said, I'm currently working
as an applied scientist
here at Amazon.
I'm part of this team
called Redshift,
which is a data warehousing
service in AWS.
And I joined AWS
probably three years ago.
And that's my introduction
to databases.
So I actually had no background
in databases as such.
So I did my undergrad in computer science,
spent some time working at Facebook in computer vision,
machine learning, software engineer,
moved to Berkeley to do my master's.
And when I did my master's,
my research was mainly around reinforcement learning
or control or decision-making systems.
And then I moved to AWS.
And I joined a team which was primarily working on databases.
So all the problem statements that we used to solve
were related to databases, machine learning.
But I think we sort of picture or pose ourselves
as people who work on ML for systems.
So we use machine learning to solve problems in the systems domain.
So yeah, that's my introduction. That's how I got introduced to problems in databases.
I initially started working with this team called RDS,
the Relational Database Services in AWS, and started working on some very interesting
problems for RDS customers for about two years. And then I moved to Redshift and
that's where I am.
Awesome, that's fantastic. So you finally found your way to databases, to the holy grail,
right? So you finally made your way, that's awesome stuff. So today we're going to be talking about LLMs and I know they're all the rage at the moment. So can you maybe kind of
start off for the listener, giving us some sort of background on kind of what LLMs are and kind of,
we can talk a little bit about performance debugging as well
and why that's important and hard necessarily in databases.
Sure, yeah.
So again, I'm no expert in LLMs
and I will probably not dive too deep into what LLMs are
and so on, but just to give a very high level stuff.
The way I see it, LLMs are, I feel,
are what we call generator models.
When I say generator models,
again, not to dive too deep into the technical stuff,
but they learn the distribution of the data that's trained.
And when we talk about the generation process,
just sampling from the distribution
and sampling in some meaningful way.
And when I say a meaningful way,
in this case, it's an autoregressive way,
where what you generate at T plus one
probably depends on the past.
It can be the independent generation as well,
where whatever you generate at T plus one
has no relation to what you generate at T,
but that's not autoregressive.
So why we are interested in LLMs nowadays
is because LLMs have been primarily used for languages,
because they're language models. And language, by default, tend to be auto-regressive, or have
a sequential behavior to them. So what you see at time t plus one is probably related to time t,
which is related to time minus one, t minus two, and so on. So I kind of look at them as
auto-regressive generative modules. That's what LLMs are.
And the reason they've been so fascinating is because of the use cases they've been applied to.
So now, earlier we used to train these models like small corpus of data.
But now that we have the right amount of resources in terms of compute and energy, we can train them on like probably a large chunk of internet.
And then it turns out they can do some pretty interesting stuff so that's that's like a very uh high level very
superficial way of explaining what the items are no that's fantastic because i have i have sort of
i mean i've i've messed around with with chat gpt and it's fantastic and i find myself using it more
and more or um every day kind of for just various different tasks.
I don't tell my mum, but for like writing the message in her birthday card,
I found it very useful for that.
So it's giving me inspiration anyway for me to then fine tune, shall we say.
But yeah, so that was a fantastic description of kind of what they are.
And that was really, really insightful.
So I guess kind of with that then,
so we've kind of got these LLMs and they're super useful.
How can we then apply that to performance debugging
in databases?
And kind of, yeah, tell us more about
the performance debugging angle of your work.
Yeah, I mean, that's the question
that I've been asked before as well.
So like, when you think of LLMs,
databases are not the first thing
that you think of in your case, right?
So why is there, what's the overlap between databases
and more specifically debugging databases?
And so let's talk about debugging databases first.
And I was, again, as I said, before joining AWS,
I had no clue about what data,
I mean, I had to know what databases are,
but not a deeper understanding of how they work
and how people actually use them in production systems.
So when I started working at RDS,
I saw a lot of customers, how they use their databases.
And when I say customers, these are people,
let's say they are database engineers
in their own respective teams, in their own companies.
These are database DevOpsops people a lot of
development engineers who rely on the database maybe they maintain let's say tens of teams of
databases uh for their company and how how do they make sure that the database is performing uh
at the level that they want so there are various tools out there that uh these people try to use
or to monitor the health of the database.
Now, again, the term health of the database is still not very well studied or defined.
So let's say a health of database is something that you can think of like the rate at which a query executes on an average continues to remain the same or P90 or P95, whatever.
So there are different metrics that you can use to measure
the health of database.
It can be how many number,
it can be number of connections over time.
It can be query latency over time.
It can be number of active sessions over time.
The different definitions
of what is the health of database.
So you can pick any one of them.
And there are tons of telemetry data.
So this is a time series data.
So at every time point,
you can measure the
average latency of all your frames. So on Monday, you can measure Tuesday, Wednesday, and so on.
So this is a time series data. And usually in production, these database engineers, they monitor
tens to hundreds of these telemetry data. And this is usually what starts as a database debugging
process.
So when you want to understand how your database is doing,
you look at the stash board, which has tons of elementary data.
And for each metric, you have some sort of threshold over time that you have set.
I would feel like, okay, if my average latency is beyond this,
I don't care.
Things are good.
And if my average latency goes above the threshold,
then something bad is going on.
So that's where it starts usually.
And the impact of this is gigantic.
So like these dashboards that are monitored,
that can have any impact on the business.
Because if let's think about if that query is something,
let's say you are some e-commerce company
and some customer is trying to search for a product
on their website and the page is taking too long to load.
So the customer doesn't know what's going on in the front end,
but on the back end, probably some query is taking too long
to read your entire table or something like that is happening.
So your query latency shot up, the page is not loading
and the customer moves on to, let's say, a rival website.
So it has like significant impact on the company's revenue
and business model as well.
So yeah, the database debugging is important.
So these database engineers,
they constantly monitor these telemetry data.
And whenever they think something wrong is going on,
some of their thresholds have been crossed.
Then they look into why something is wrong.
So they look into each telemetry data,
understand why things are going wrong.
And for that, they usually go about reading
a lot of documentations.
So now when we talk about documentations, that's where the natural language comes in. So they're like a bunch of
open source documentations, they're a bunch of really good handcrafted documentations
created by AWS, if it's an AWS database. And that's usually where they start with.
They start creating why on this specific
database engine, if a query is running slower, what could be the possible reasons? What are
some actions that they're taking? How can they fix it? Is there a parameter that they need to
tune? Is there a query that they need to tune? And things like that. So it's a combination of
looking at this telemetry data and creating the relevant troubleshooting documents is what I generally call as a
debilitating process. And the problem with this process is, A, the telemetry is long enough,
so there are hundreds of telemetry data that you need to monitor, which is difficult for a human
being. And B, the documentation is vast. So it's not just one document per metric, there will be
hundreds of documents. And finding the the right document identifying the right solution from the document is not trivial so the combination of this is very
complex and on top of it this needs to be done in real time and on a continuous basis so they need
to monitor day in day out constantly keep doing this over and over again so the way i see llms
helping them is to understand this large corpus of natural language data, which is the troubleshooting documents, and then try to help them identify the right answers.
And we'll talk later about how you can combine telemetry data and troubleshooting documents together using LLMs.
But this is one place where I feel LLMs can be brought into the picture and help
speed up the debugging process
for databases. Awesome stuff. So just
to recap there, we've kind of got on one side, we've got
this whole like thousands
of metrics that we're like in terms of
today that was kind of hard for us to sort of kind
of have all in our head at one time. And then
you've got these massive amounts of sort of documentation
and then all of a sudden you've got a customer screaming
at you because their query is not fast enough and they're going to lose a sale to a rival company.
Exactly.
Okay, yeah, I can understand the motivation here now very clearly as why this DevOps engineer would kind of want someone else to help them sort of kind of solve this problem quickly.
So I guess that's a nice sort of segue into Panda.
And yeah, so give us the high-level elevator pitch for Panda then.
How is this going to solve my life better as a DevOps sort of DBA,
someone running one of these managed database systems?
Sure, sure.
Interesting.
So let me think of what's the elevator pitch for Panda.
So as I explained in the column,
that's exactly what Panda is designed to solve.
So when we thought of Panda, the goal of Panda is to answer this very specific question.
And the question that we try to answer is, what are some of the essential building blocks that we need in order to safely deploy any Blackbox large language model for debugging databases in production. And the goal of the recommendation or the goal of output of this Panda
is to make sure it generates accurate,
verifiable, actionable, and useful recommendations.
So sure, you can put anything inside a language model
that can generate some stuff that does make sense.
But is it useful?
Is it verifiable?
Is it accurate?
These are the questions that we want to answer with Panda.
And that's like why we want to build Panda
and what exactly Panda is.
Panda is a service or is a framework
that combines the power of telemetry and documentation
using language models.
That's what Panda is.
So it combines information.
It's able to extract information from telemetry data.
It's able to extract information from telemetry data. It's able to extract information from documentation. And then it's able
to combine those two pieces of information together to generate. That's what
Panda is. So yeah, before we go any further, why Panda?
Yeah, yeah.
I'm sure I didn't spend too much time thinking about the name.
And when I wrote this long sentence, sure I didn't spend too much time thinking about the name. And when I wrote this long sentence, I didn't have an acronym for it.
So I just wrote what this thing does.
And what it does was it was performance-stable for databases using language models,
so large language model agents.
And when I started looking at this large sentence, I was thinking of one word description.
I just picked random stuff from the sentence and made this word called panda and i felt like panda is a word that usually people
are aware of in the computer science domain because of this and that library in python
uh so it felt it came on naturally to me so i just picked that name yeah i like that
yeah no i like that also as well because like i guess when i kind of said like panda the kind
of cuddly the like and it's like something that's going to help you.
And I'm struggling to debug this problem.
I need to go and cuddle my Panda and he's going to help.
I don't know.
I think it was something like that as well.
Anyway.
Awesome.
Yeah, it's been interesting.
Yeah.
Yeah.
That's kind of what I was thinking about it.
But anyway.
Cool.
So you listed the four properties there of kind of what you want this
LLM sort of debugging database agent to have.
So kind of taking those sort of design goals and the kind of the sort of the thing you wanted to build.
How did you go about sort of like what's the architecture of Panda?
How did you sort of go about realizing this sort of goal?
Sure. Yeah.
So, yeah, the way started of thinking about this solution was okay we had one thing clear that uh we know from our from our
domain experts we know usually what database engineers think about or what they what's the
process what the process looks like when uh let's say, a database engineer starts debugging our problem.
So we try to write that process down.
That's what our starting point was.
So what did they do?
So, okay, we looked at, okay,
they look at a bunch of telemetry data.
That seems important.
Once they start looking at telemetry data,
they have some sort of knowledge about what each metric means and how different metrics connect with it.
So that list of information is relevant.
Second, third, what do they do after that?
So let's say they identified five out of 100 metrics that do seem anomalous.
What do they do after that?
So then you realize what they do is for each of those metrics, they look at their documentations.
So they have some predefined set of rich documentation,
and they start looking at it.
So that's the third thing we found.
So our model needs to look back.
Then when they start going through each of those documentations,
they try to find things like what are some usual cases
where these metrics become anomalous. So what are some usual uh cases where uh these metrics become anomalous so what are the what are some
usual uh problems what are some mutual root causes for this problem once they identify that uh they
try to find what are some fixes for these solutions which are usually in the same documentations and
once they find those fixes then they try to think which of these are feasible given the time span
that i have given the given the time span that I have,
given the condition that my database isn't right now.
Is it feasible for me to, let's say, restart the database and tune a parameter, like change
the value parameter?
Or maybe it's not possible for me because of the production data.
I can't switch the database off and restart it again.
Okay, so next solution, next solution, things like that.
And they finally apply this fix on the database.
So they're like multiple steps that we followed the database engineer of how they try to fix the
problem and then replicate that exactly with a language model. So that was the design of
thinking phase of how the system should eventually look like to
sum it up at a very high level there are four key design components first is what
we call grounding so we needed a module in Panda which we call this grounding
component or grounding module now what this grounding module does is we know
that the telemetry is very important. You can
pick any language model
like GPT or
LAMA or any language model
and ask it a question about database
and natural language and it will generate a bunch of
recommendations. So it's not that
we are training a language
model, that's not the goal.
But the goal is to ground the language model
more by giving it the right context
of the database.
So how do we do that?
And what we found is that
the database engineers,
they use telemetry to do that.
So how do we connect?
And language models
are not trained on telemetry data.
They're not trained
on these large metrics.
So how do you combine
the telemetry data
with language models
such that the language models can now understand the context of what the database is experiencing right now to generate answers?
That's what we call grounding.
So we needed one component called grounding.
The second component we called was verification. So we believe the system should be able to verify the generated answers using
some relevant sources and produce citations along with it. So the end user can eventually
verify what the output is, where the output is coming from. Now, this is easier for humans
because when database engineers, they try to fix the problem, they read these documentations
and they know exactly the source of this documentation so they know exactly when
they don't know the whole the person who wrote it but they don't know the organization where
the document is coming from there is some sort of trust behind the truthfulness of the documentation
so you want to build that into the system third is what we call affordance so what do we believe
is that if the recommendation that the Panda provided is actually true,
the system should be able to estimate and inform the user about the consequence.
This is also very important.
So if let's say Panda says increase your number of CPUs from 16 to 32, what's the consequence
of that action?
The consequence could be increasing cost.
The consequence could be your query latency could go down.
So we want the user to know what will happen,
what's the counterfactual,
or what will happen if they do this,
if they do that,
before applying that recommendation.
And fourth, and the last component, is feed.
So we believe the system should be able
to accept feedback from the user
and improve over time.
So if Panda is running on a database, there's a database engineer that says,
okay, you sent this last time, I applied the documentation, can fix my issue.
So you want the system to take that into account the next time it generates an answer.
So these are like the four key principles, grounding, verification,
affordance, and feedback that kind of build the Panda. Awesome. So grounding, verification, affordance, and feedback that kind of build the contract.
Awesome, yeah.
So grounding, verification, affordance, feedback,
kind of just running through them one by one then
in a bit, be a little bit more depth.
So on the grounding thing,
because when you were speaking,
I know you was kind of talking about these,
these LLMs are great for sort of
kind of large natural,
like large corpuses of natural language.
Then, but then we kind of want to combine this
with these metrics.
And I was thinking about the time series, like, hang hang on a minute here how do you kind of resolve this fact
that these things expect one type of data and then you've got this massive amounts of different type
of data that's really important so how did you go about sort of bringing those two things together
in the grounding that's yeah yeah exactly and that's what i think is one of the most uh
interesting contributions of Panda.
And there's some there's some this area is, again, very, very active in terms of research.
You'll find new interesting papers and ideas coming up every other day where people are trying to experiment with numbers, experiment with math and ellipse. So I forgot the name of the exact paper,
but there's some recent paper that came from,
if I'm not wrong, Google or Stanford,
somewhere I forgot the name.
But they showed that you can actually
input the raw telemetry data in the prompt,
and it will generate statistical answers about the metrics.
If I give it, let's say, a month-long numbers of data,
which timestamps and some value,
and ask, let's say, what was my average query latency on Monday?
It's actually able to infer what Monday means from the data,
extract the numbers, and sum it up and give you the answers.
So there's some very interesting emergent behaviors coming from LLM,
which proves that these are not just
random word generators.
They're learning something much more interesting
and fundamental underneath them.
And that's a whole different concept
people are having where
are these language models actually
learning a model of the world
are they learning something more sophisticated underneath them that we don't know yet uh but we
feel like they're just generating next word but they're not just generating next word how are
they generating the next word is very important that's a whole different story but yeah so coming
back to this telemetry so yeah there's some very interesting work on how you can combine telemetry. So, yeah, there's some very interesting work on how you can combine telemetry with text, with language models.
And what we did in our case,
let me talk about that,
is what we did is
we took all the telemetry data.
So, you have a database.
For that database,
when a customer asks a question,
what we can do is we have simple tools
that can go and extract the telemetry data
from the database. We don't need
LLMs to do that. In the paper, we have a very complex architecture of the entire
framework and there's one small component inside what we call grounding mechanism.
Inside that grounding mechanism, there is one small component called feature extractors.
For those feature extractors, what they do is, there's no LLM involved yet. what they do is there's no lm involved yet what they do
is for a given database uh they connect the database itself and extract all the relevant
telemetry metrics when i say relevant they will involve them they extract all the telemetry
metrics so for example there could be database parameters there could be configuration knobs
sequence statistics all the active sessions, all the database founders and everything for the last seven days.
So they extracted all this elementary data.
Now, again, before involving language models,
what we do is we run these statistical algorithms
on top of the metrics.
So what we want is, we went back to the human.
So we looked at how humans look at these numbers,
look at these elementary data.
These humans, although they don't look at everything,
they look at metrics that behave anomalously.
They look at metrics which behave differently than usual.
So identifying what metric, when a metric is anomalous,
is not something we want LLM to do.
It can be done very well with simple statistical algorithms
like outlier detections or change point detection
or segmentation algorithms and so on.
So we did exactly that.
So once we extract these elementary data,
we ran some statistical algorithms to identify
what are the metric that are anomalies
that are different than the usual behavior
and how are they different?
Are they spiked up?
Are they spiked down?
Are they level shift?
What are the point anomalies,
segment anomalies and so on so we expect
all these features using statistical algorithms and then we have a separate module that converts
these features into one line summary and that's where elements come into picture so we we uh give
the metric name we give the definition of this metric and we give the type of anomaly the metric
has and ask the l which ended a one-line summary of this.
We would say something like the metric called,
pick some metric, number of rows.
So a metric called number of rows for the SQL query
has a spike in its last seven days
and with a value of, let's say, 120,
whereas on average, the value seen for this metric
on this database is 50. So there's like one
summary. And we repeat
this for all the metrics, and this gives
us like a small prompt of what
the anomalous metrics are,
how are they anomalous, and why are they anomalous.
And we put this prompt together, and then
we combine it with the rest of the prompt that we generate.
That's where, that's how
we convert the telemetry.
Ah, so yeah yeah that makes total sense
is a kind of a conversion where you kind of you go ahead and you kind of give it to the llm in a
format that it can it can handle and awesome stuff cool so yeah i guess kind of being the next step
with the verification when he was talking about that kind of this idea of the provenance of where
the information's come from and kind of because there's this sort of a i mean i can go and chat
gpt and it can it can gap it can give me some garbage citations for something right and it can
be like and it looks really like realistic and it looks great it looks like if i go on google scholar
that the paper don't exist so yeah how did you kind of how did how did you counteract that sort
of thing then because i guess it would be lms would be subject to the same sort of thing the
ones you used so yeah how did you tackle that
exactly that's uh that's exactly the problem with lms right so uh if you ask llm itself to generate
a citation for the answer the recommendation that it made it will definitely give you a citation
but most of the time the citation would be just a hypothetical citation or something that's garbage
and in fact there's a recent study uh we can link the paper in the description,
but there's a recent study that showed that only 50%,
51.5% of LLMs generated answers truly support their citations.
And this completely undermines the trustworthiness of the real world.
And this is where these things become questionable
of being useful in production or in the real world.
I mean, they all sound very interesting
and fascinating for research or for demos,
but then we can't reliably use them
on a continuous basis in the real world
if things like these happen.
So this was like a challenging component to us as well.
Now, what's the extreme end?
So the best possible thing you can do
is have a human verify everything.
That's like the golden truth.
And it's possible,
but it's like highly unscalable and very costly, right?
So you can't have a human verify
every single generated answers
because it's yeah it's
both tiring cost uh cost it's not cost effective and you can't have as many humans there are exports
in databases to verify everything so how do we do that so uh we did we we did something that was in
between inside the verification mechanism we have two things uh, two components, two subcomponents.
One component is what we call answer verification, and one is what we call source attribution.
So for answer verification, we use natural language model itself.
So what we do is we frame this problem as something, again, in literature, it's called natural language inference, NLI, as a natural language inference and a live as a natural language inference task where we reuse
the pre-trained element to act as a verifier and produce a label as accept, reject or neutral
given a hypothesis.
And when I say hypothesis, hypothesis is the generated answer.
So what we do is we give the generated answer to the element and then we also give a premise.
Now premise is all the relevant troubleshooting documents.
So what we do is we basically ask Ellen
if this is the answer and this is the context,
what do you think does the answer
come from this context or not?
And give me an answer in terms of yes, no, or maybe,
or accept, reject, or neutral.
So what we expect the LLM to do
is to look at the context
and try to see if there's an interesting overlap between the answer and the context.
Now, this can be done without LLM as well.
You can just do some very simple textual mapping where you can see how many words are common between the answer and the context.
But the problem with that is that the answer that we are giving to the LLM is actually generated by the LLM.
And LLMs are known to paraphrase or rephrase.
So simple textual mapping
might not be the very best solution
because that would always give you
a very small amount of overlap
between the generated answer and the context
because the answer could be rephrased,
could be paraphrased, and so on.
So the goal of this natural language inference problem
is to make the LM look at the context,
look at the generated answer,
and tell us if the generated answer
is coming from the context or not.
And again, sure, this process is not optimum.
I mean, it's not the best process
because again, in some sense,
you're asking the student
to correct their own answer people, right? So you're saying, okay, you're asking the student to correct their own answer.
So you're saying, okay, you wrote the answer.
Now here is the true answer corrected.
And you can cheat.
So there is a possibility of checking, which we're aware of.
And one way of mitigating it is to make it do multiple times.
So we try to randomize it.
So instead of asking it only once,
we ask it three times
and try to then take an average
of what the response is.
So it may cheat once,
it may cheat twice,
but we're expecting it to not cheat
N number of times.
And that N is high parameter that we can do.
You can increase that number of repetitions
as many times as possible.
So yeah, this process is not perfect,
but it's something that kind of works in reality as of now.
And once this answer is verified in some sense,
where LLM feels that it is indeed representing,
the generated answer is representing the true context,
then we move to source attribution.
And source attribution is something simpler
because source attribution is now,
we want to cite
the lines
that are there
in generated answer.
And again,
we kind of follow
the same process.
So source attribution
can again be done
in two steps,
two ways.
One is you can go
line by line
in generated answer
and give that line
to a lem
and ask
which exact paragraph
in the context
is this line
taken from.
And it can give the exact paragraph
and it will cite the paragraph number
or page number from the documentation.
That's one way.
And second is exact text overlapping.
So if you see the exact verbatim sentence
taken from the doc,
you can just pick the doc number,
page number and cite it there.
So use a combination of both
and that will give you the source attribution
and then move forward.
So yeah,
these are the two components
in verification.
Awesome.
So yeah,
that whole kind of thing
about it kind of
not treating every time
is interesting.
And I mean,
we'll probably talk about
this,
the best end to choose
when we talk about
the implementation
and your evaluation potentially.
But I mean,
is there
not a worry that sort of the more you do it it gets better at cheating and it cheats more the
more you run it is that sort of a concern or is it is each kind of run sort of independent of the
previous run yeah so we make sure that we don't run this in in the same context or in the in the sense that we don't we don't want to run
the like each time we ask the same question it's not in the same chat or in the same sequence so
we wipe it memory essentially every time and say do it again okay so it doesn't think ah i cheated
last time i'm gonna shoot again yeah right okay cool you can even you can even play with the uh
that that subtle thing sense you can even play with the subtle things. You can even play with the different parameters
in the language model that you can tune.
So for example, you can vary the quotient of innovation
in the model or the quotient of the rate,
the temperature of, what's it,
what I'm blanking out on that word,
but I think when you want the model to be stricter
or be more creative, you can tune these models to not be very creative where they they
try to generate something that's so if you in more technical terms you want the model to generate
samples that are very highly likely from the distribution and not out of distribution so if
the model is more creative it can generate things that it is slightly less
trained on but if the model is less creative it will generate something that is exactly what it
is trained on so when you answer verification we want the model to be extremely less creative
so you don't want them to think or wander in the areas where they're not trained to wander in
and that would prevent them from being so cheating is is if you think of a cheating is
a creative mechanism they're trying to be very creative there and that's why they're cheating
so you want to control the the level of creativity in these models and if you can control that we
can somehow push them to not cheat it's funny it's about cheating being a very creative
endeavor i mean you hear these stories don't you people kind of going out to extreme lengths to get exam answers and stuff like i think you may just
revise for it because you put enough like you've put a lot more work into be a good cheat right so
yeah but anyway cool so yeah affordance let's talk about affordance so how do you go about
kind of taking this and turning it into this is what the consequences would be if
you did this in practice because obviously the the how do you put a cost on it in terms of like
the financial impact or the performance impact yeah this is a big space it's often quite hard
to know what the consequences of an action will be is so yeah how did you go about sort of breaking
that down and estimate the impact of something yeah Yeah, correct. And this, I would
say, is the most
controversial,
the most raw
component in the system is this mechanism.
Because I would say we've not
even thought of this component really well.
We're still at a very early stage. We're still
thinking what this affordance mechanism
could encompass.
So, for example, we don't even know what an affordance mechanism could encompass. So like, for example,
we want to,
we don't even know
what an affordance means here.
For example,
if the model generates something like
add an index on this table,
that's a recommendation.
How would you go about
estimating the impact
of adding an index
to let's say something
on query latency?
It's very difficult for,
like forget about LLM doing this,
it's very difficult to even build a statistical model,
a database model to do that.
What's the impact on P50 or P90 of your query latency
if you add an index on this table?
It's very difficult to build that statistical database model.
So it's not clear at all how can LLM do this.
So for now, what we have in a for-else mechanism So it's not clear at all how can LLM do this.
So for now, what we have in a for instance mechanism is it's extremely heavily guarded
with a bunch of guardrails.
So what we do, we start off with something in the answer generation.
So when the model generates an answer, we force it to be actionable, to generate an
answer that is actionable.
So now what does actionable mean?
And actionable mean we wanted to generate something
with respect to a parameter that can be tuned.
So it can't say something like tune your number of CPUs.
That's not actionable.
Sure, it is actionable in the sense that you can tune it,
but what to exactly?
So we wanted to generate things like,
since we are telling the system that my current number of CPUs is 16,
I don't want to tell me increase the CPU.
I want you to tell me increase to what?
So increase from 16 to 32,
or increase from 16 to 64.
So don't have the kind of answers
where we force the model to generate.
So you always want,
you always force the model
to generate something very specific
that can then be converted into a statistical equation mathematical formula and next and we
can estimate the impact so now when i say impact estimation we always what as of now we always want
to drive the estimate the impact estimate towards a specific metric and right now there are a bunch
of methods that we care about we care about the average query latency,
and we care about the average number of active sessions
in the system at any point in time.
So we want to estimate the impact of these actions
in terms of these two metrics.
Now, as you say, how do you settle on those two metrics?
What was the decision there of how to actually pick those two out?
Because there's a lot of things, right?
Correct, correct, correct.
Yes, again, the choice is,
I would say still a design choice here.
We designed the system
based on how we have seen
the RDS engineers
or the customers
think about their system.
And usually what we have seen
is people care about the metrics,
like what's the average query latency
and people really care about
what are the sessions that are active and people really care about what are the sessions
that are active and what are they waiting for.
So how many such are they waiting
and what are they exactly waiting for?
So that metric seems to be pretty simple,
but pretty impactful when people monitor their performance.
So we picked these two metrics,
but yeah, these can be any metrics.
Now, the impact estimation model
is a simple statistical model.
But think of it as a function which takes as input the parameters and output the value of your query latency.
So these are like simple statistical models that are trained on the field data.
Now, there are two ways you can train this model.
These models can be customer specific or they can be field level model when i say customer specific what i mean is that uh these and when i say model you can think
of it as a simple neural model it can be a regression model uh any any sort of model random
forest decision tree whatever so uh this regression model is trained on when it's a customer specific
it's straight only on this customer's data on which we're running Panda.
So if Panda is being run on your database, it's only trained on your database.
And the goal for that is to make sure that the regression model
is trained on the metrics that's coming out of just your database.
So how is your query latency changing with respect to CPUs?
You just want to model that.
That is customer-specific models.
We can also have fleet-level models where we can,
instead of estimating how query latency changes
with respect to CPU on-board database,
we estimate it across the fleet, across all the customers.
So that model will give you an average estimate.
So like on an average, how does query latency is impacted if you increase CPU? But on an average, how is query
latency impacted if you decrease CPU or if you, things like that. So that's what we call fleet
models and customer specific models. And we have simple models on both these fronts that we try to
plug into Panda and estimate the impact. But again again these are very very uh early stage right now where you don't have impactful uh models there yet so we
are still trying to play around with simple regression models and trying to estimate impact
nice nice yeah it's obviously still sort of uh an early sort of project right in terms of its
its life yeah it's like exactly but yeah it's not
going to be the finished polished article yeah so no that's that's that's cool so yeah i guess
there's one more one more component then and this is this the idea of feedback and getting the the
model to improve over time so uh how does that kind of component look at the in the current
iteration of panda sure uh so in fact this is the very first mechanism that the models, the input sees. So like when you query Panda with the question, the question is first sent to the feedback mechanism. And the goal of that is to think of feedback mechanism as a simple database that stores questions, answers and feedback over time. So whenever a final answer is sent back to the customer,
customer is asked to give a feedback
in form of plus one or minus one,
like thumbs up or thumbs down.
And that feedback is again sent and saved in this database.
So if let's say you've been using Panda
for over a week now,
and let's say you have seen about 100 questions,
every one of those questions is stored in this database and this database is think of it as simple as like three
columns and row and column database and you have three columns first column is your question second
column is the answer that panda generated and third column is feedback which comes up in terms
now uh if you ask a new question to panda Panda will take that question and go back to this database and look for what's this exact question or a similar question being asked before.
And if it does find a match in the database, then it will look at when was that question asked because the tell-tale aspect is important.
Like if I ask something like one month ago, it's probably not relevant anymore.
So we keep that window to one day.
So if we find an exact match or a similar match
between your new question and an existing question
in that one-day window,
you'll probably retrieve the answer
and send it back to the customer
without going through the entire generation process,
which might be a high-latency process.
So that's like one use case of feedback can, where we can generate answers faster and they can
be repetitive.
They can be similar to what you've been asked before in the same one day window.
And if it is, we'll surface the same answer.
And then we will tell customer that this is an answer from the question that you asked
in the past.
If this is not relevant, you can re-ask the question and mention that in the prompt, that
you gave me an answer, I didn't like it,
and generate a new one for me.
And that would bypass the feedback with Ansible altogether.
Ah, fascinating.
It's kind of like a mini stack overflow, essentially,
kind of in the whole thing as well.
So cool, yeah.
So we've covered off all the components there.
And this, I mean, there's a lot of various,
a lot of moving parts to this system so can you can
you tell us about the implementation briefly maybe and like how do all these things fit together what
does the actual code look like and yeah what's the user interface like as well actually like
is it simple is it just a bar that i enter text in yes tell us a little bit more about the
implementation sure sure yeah uh so again the way we coded Panda up was not,
it was a very experimental coding in the sense that it was,
it started off as a very small project, a very experimental project.
So the code base that we have is not very gigantic code base.
Everything is well thought of and written in a way that's all fail safe.
It's still very experimental, very raw, very early stage right now.
So that's that.
But the interface of the user is kept to be very simple.
So interface is as simple as a chat interface.
So everything is done in the backend.
So once you've triggered Panda, it connects to your database.
So again, in order to connect to the database,
you need to give it the credentials of your database.
That's a separate story.
But once Panda is running,
the interface is as simple as
the chat interface. There's a question, there's a bot,
and there's your answer
bot. So you keep asking
questions, they answer each and every
answer. It has its own
citations.
And if it has had,
if the answer
has an impact
estimation,
we'll have it
at the right point.
Because not
every answer
can be,
not every answer's
impact can be
estimated.
So the impact
estimation won't
be there for
every answer.
If we could
find a relevant
impact,
we can add it
in the answer
at the right point.
The interface
is very simple,
very,
it's very simple, simple like a chatbot like a chatbot yes interface we're all kind of familiar with yeah yeah cool so i mean
let's talk some results then so you evaluate you've evaluated um so let's we can talk about
that as well how we went actually about approaching to all evaluating it and then yeah let's talk some
results as well so let's start off with like how you went about evaluating it first then we can talk about the results sure uh so i think we
discussed this at some point in our chat today where uh the the golden rule the highest level
we can uphold panda is to is with the human evaluation right so again just the one thing
we need to clarify first what what exactly are we evaluating?
Are we saying, what's the standard we are holding Panda?
Are we saying Panda is something that can generate answers
better than human?
No.
What are we saying is that Panda can generate answers
better than an existing language model.
That's the bar we want to test Panda on.
And the reason we want to clarify that is because
we don't want to compare the Panda answers to a human answer.
We can't compare Panda to an existing database engineer
and say, okay, this guy can beat this guy.
No.
What we want to do is we want to compare Panda
with an existing language model, which in this case was GPT-4.
And we want to say that since Panda
has all these four components,
affordance, feedback, grounding, and verification,
with all these four components,
Panda can perform better as compared to language.
So that's a goal.
So how we started to set up the experiment is
since we don't have any ground truth label
for any recommendation
we do use humans to evaluate the answer so what we do is we pick three different humans
with three different level of database knowledge a beginner and intermediate and advanced and
we show them we we generate uh we picked 50 prompts uh 25 from a Postgres engine, 25 from a MySQL engine,
and all these prompts
were engineered in a way that
we have been seeing
more commonly in how customers
ask questions about their databases.
So customers' questions are usually
not very detailed.
They're always very, I mean,
in some sense, it's correct because they don't have,
not all users have a good understanding of what's going on in the database.
So they usually ask questions which could be very generic,
very high level, and they expect the system
to infer all the element stuff on its own
and generate an answer.
So we intentionally kept the questions
to be very high level and very short
to see how LLM responds to those questions.
So we came with 50 prompts 25 uh
posters 20 for my sql and then we generated the answer using panda and using gpt4 and we showed
these two answers to these three evaluators and asked them to rank uh or score the two responses on three aspects. And those three aspects
were, first was trust,
second was
you understanding, and third was usefulness.
So trust as in, if you read
these two answers, which one do you
generally tend to trust more?
And second is, if you read these two
answers, which one do you understand better?
And third was, if you read these two
answers, which one do you feel is more useful to you?
And they're asked like your thumbs up and thumbs down
on each of these three aspects.
We average their scores.
And then that's the response table
that we generate in this paper
across these 50 evaluations.
Awesome stuff.
So yeah, I guess, yeah.
What's the big reveal?
Tell us, how did Panda do?
Panda was found to be often, so let's go dimension by dimension. So on the dimension of trust,
experts found Panda to be at more than 90% of time a better candidate than GPT. And the reason
for that was, I think, is the citation, which we also mentioned in the paper,
that with every response, Panda generates citations, which is usually an indication of
trust where the customer can now go, or the user can now go and look at a citation,
go to that very link, very documentation, and read exactly in detail if they have any questions
around the recommendation. So that kind of generates more trust.
So for more than 90% of times,
Panda was rated higher on the trust.
The interesting thing was understanding.
So when we asked the scorers to label,
try to score based on understanding
which of the answer they understood more,
the beginner scorer was around 40-60%.
So they rated Panda 60% but GPT 40%.
So there's not that big of a gap.
And the reason for that was GPT answers, if you look for it,
they're highly verbose.
It will probably generate like 100, 300, 400 word passage for you.
And for expert or intermediate people who know exactly what they're looking for,
this kind of highly verbose answer
is a waste of time for them.
They want something very specific,
very actionable.
But when you look at it from a beginner's lens,
reading their entire paragraph
does make sense for them
because now they kind of understand
the problem much more better
and they can ask follow-up questions.
So that's where we found
that beginner people, where in some cases liking uh the responses from gpt better because
panda was being a bit too specific in that sense okay they said that uh tune parameter x now what
if they don't know what parameter x is or what if they don't yeah the pan they've never heard of
parameter x so they need some more context somelevel stuff, and then probably narrowing down to the parameter,
which GPT was able...
So GPT-4 never gave the parameter name or the value,
but gave a very good, verbose explanation
of what are the possible problems
that customer could face with respect to those details,
and they found it to be more understandable.
So just real quick, do you think that could
be something you could incorporate as part of the model kind of kind of the person asking the
question could definitely be like hey i'm a beginner i'm an expert blah blah blah and that
sort of like would definitely in um influence the verbosity of the of the output but yeah exactly
exactly now that's exactly what we thought about in the experiment so uh we wanted to
incorporate we want to incorporate that in the system where
when you instantiate Panda on your database, you also want to assign a role for the user who is
using Panda. So you want to assign the role in the sense like, who are you and what do we expect from
Panda? And Panda would take that into account in generating answers. If your role is a beginner,
it would be very well-posed and it will have a lot of introductory
stuff before coming down to the actual actionable details.
So yeah, that was very useful in the experiment as well.
And I think the third dimension was usefulness.
On usefulness, again, we found the intermediate and advanced folks to have at least more than
90% of approval rate for Panda because they found it to be more actionable, very specific
with the exact parameter names exact
parameter values and so on whereas gpt was again very highly verbose or very very very generic
so yeah useful the useful dimension was similar to trust dimension but the interesting one was
understanding which had something for us to learn from yeah for sure i mean it feels like it is a
clear indication here of sort of the youthness and the efficacy of Panda kind of going forward. So I guess, where do you go next? And what's the next on the research end? Obviously, there's things to make it production grade and a lot of polishing to get everything kind of stitched together and working well. But yeah, what are your next steps? so i think uh it's it's panda is very far away from production i would say it's still very uh
it's taking still taking baby steps it's still in the place where we are still trying to
falsify hypotheses around language models so it's still very experimental right now i would say
and there are a lot of interesting areas that we can exploit or we can take panda into so for
example the first thing that we found in our analysis was Panda is,
so you can chat with Panda,
but the process of debugging is very incremented.
So debugging is never one shot.
It's multi-shot, right?
And the right answer of why a problem has occurred,
the root cause could be multiple. There's not always only one root cause why a problem has occurred, the root cause could be multiple.
There's not always only one root cause for a problem.
So sure, Panda generates something that may be right,
but it's not the only right answer.
What if something else happens?
So let's say you're seeing an increase in your query latency
and Panda said, okay, you have an increase in query latency.
It's because you didn't have an index on this column in the table.
Sure, the customer didn't have an index on that column in the statement. Sure, the customer didn't have an index
on that column in the statement,
but is that the root cause?
Or what if they're just increasing workload at that time?
Or what if it's something related to locking?
So there could be more than one answer to a question.
And we want Pandas to be more human-like in that sense.
So instead of like jumping to conclusions, we want Pandas to be iterative andlike in that sense. So instead of like jumping to conclusions,
we want Panda to be iterator
and think in all different directions.
So in some sense,
we want the Panda to generate a hypothesis tree
where each branch is one root cause
and then go on and evaluate each branch
and then generate a recommendation
which is not just one single recommendation,
just forcing you to take something or believe it believe but it should give you more like an analysis of all things possible and let the
customer pick which one they feel is right yeah that's fascinating because yeah in practice these
problems there can be layers upon layers and has there been sort of um kind of did you encounter
where you solve one problem but inadvertently cause another problem
by solving that one problem, if that makes sense?
Like, I don't know, you add the index
and then that caused something else to blow up.
Like, how would you...
Exactly.
Yeah, exactly.
That's exactly the problem that Panda would fail at, right?
So the answer that Panda generates is very confident,
but it has no clue what that could trigger later on
so sure we have we have the affordance and we have the place which where you can estimate the impact
from this that's the goal the goal that company exactly to solve this if you have if you recommend
something we want the fact what you know what is the impact is it going to trigger something else
or is it going to increase something else that people don't care about?
So things like that.
At one point, Panda is not
able to solve these kind of problems
where down the line things
could break because of the recommendation that he made
today and we want to fix that
before even thinking of
production.
Before unleashing it in the wild.
Cool. Awesome. Still though mean this feels like it can have a really big impact kind of going forward so yeah i guess kind
of re-elaborate on that a little bit like what impact do you think panda could have in the future
yeah sure the the place where i feel panda would impact the most is reducing the amount of time
database engineers or DevOps folks
spend on debugging databases.
So like, for example,
from the scenario that we have seen
in databases or in real world
in RDS customers,
we feel like we've seen the customer
spend about 10, 15, 20 minutes,
even like several hours debugging
one single problem.
And the goal is that if, let's say, a customer on an
algorithm spends like 45 minutes debugging a problem,
can the Panda bring it down to
let's say 5 minutes or 10 minutes?
And saving that 15, 20 minutes per half
an hour worth of effort on one single problem
is a significant amount because this multiplies
really fast. So if a customer
spends, let's say, debugging hundreds of
problems a week, you could spend
multiple hours worth of their effort on these problems. And the interesting part there is
we don't want Panda. The Panda's goal is not to replace the database engineer.
It's to assist them. And that distinction is very important because once you make that distinction,
the bar at which you qualify Panda to be useful significantly lowers.
So what I mean is you don't want Panda to be absolutely accurate or 100% perfect all the time.
You want it to be perfect most of the time because eventually the final action that is supposed to be made is to be made by the human. You want Panda to be accurate most of the time because eventually the final action that is supposed to be made is to be made
by the human you want panda to be accurate most of the time so that it can help the human narrow
down the space and find the right fix immediately so if we were to say that panda is to replace the
database engine then the quality of the bar that you will hold is extremely
high and
you might
never reach
that in
a few
years or
whatever.
And that
would kind
of make
the whole
point of
things more
because then
there are
certain things
that you can
never fix or
at least not
fix in the
near future
and these
kind of
work will
never be
useful or
made available
to the
public.
But I
think the fact that
we are thinking of it more as an assistant rather than a replacement that lowers the power and
allows for more uh creative and aggressive things to be done in this space yeah allows it to become
practical right allows you to kind of keep the human in the loop allows it to augment the human
make the human more efficient and then obviously once you're using it in practice well then you start insights about how it can
you you'll learn things about it that maybe you wouldn't have thought of rather than waiting for
this sort of utopia future where we have this perfect sort of human equivalent right and so
yeah i definitely definitely agree with you on that one and cool yes i mean i don't know how
long you've been working on this project said it's pretty early days still in the sort of it's very yeah it's super yeah uh i've worked
i started working on this probably a year ago okay cool so kind of about a year cool cool so
yeah kind of across that year what's the sort of the most interesting thing you've kind of learned
while kind of working on panda what's been the biggest surprise i think the biggest surprise was uh how how much improvement you can make uh in okay
before going there i think one thing that i missed out in our conversation is what is the actual
lm panda is using because it's using lm? Yeah. And we never talked about what is that LLM.
And I've mentioned this in the paper,
but we use GPT-3.5,
which is an inferior model to GPT-4.
So the goal is to show that
Panda is not a new language model, right?
So the goal of Panda is not to come up with
yet another highly complicated,
brilliant parameter language model. The goal of Panda is not to come up with yet another highly complicated, brilliant parameter language model.
The goal of Panda is that if you pick any language model and plug Panda to it,
can you improve from the vanilla model?
Even if I pick, let's say, Claude V2, or if I pick a Lama model and plug Panda into it,
can Panda plus Lama beat L meet llama that's the goal and that's
what we're striving for and that was really eye-opening to me because i felt when i started
this piece i felt like gpt4 or gpt 3.5 was already like surpassing almost everything
every expectation and but when you talk when we started to look the answers seemed correct uh
fact factually correct like um but they're not very useful and that was eye-opening to me because
every time i asked something related to databases it was able to give me a thousand word answer
but none of it all of it made sense but none of it was useful so it was kind of like a uh
aha moment where you realize that okay this is something that can be exploited because people
feel like people were going all bonkers with this like lms coming around they're like okay
the super useful highly productive and so on but we felt like we we dive deeper into it it's not
it's this vast amount of improvements
that can be made.
And when we started working on it,
just by adding telemetry information,
just by adding
simple statistical models around it,
the kind of improvement
in the responses that were made
was very,
very interesting to us.
That's something we didn't expect
this,
take it that further.
But that was something that was interesting.
Yeah, that's fascinating.
It kind of makes you kind of wonder
how many other sort of applications it can be.
That takes a similar approach, right?
Of taking kind of an NLL, taking GPT,
and then kind of putting this sort of wrapper around it
and kind of building on top of it.
Yeah, I'm sure we're going to see loads
of really fascinating sort of products
and applications built on top of it
over the coming years for sure.
That's cool.
And by the way,
it's already started.
Yeah.
So yeah,
just to piggyback on a thought,
there's already a bunch of research on using LLMs as tool management.
So instead of us making LLMs do everything,
make the LLM act as a manager where it decides,
it doesn't give you an answer,
but it decides what is the right tool to pick to give you the answer.
So in this case, think of it like when we convert telemetry to text,
LLM is not converting the telemetry to text,
but LLM is deciding what is the right algorithm to convert telemetry.
So do I run anomaly detection do i
run uh outlier detection do i run point uh change point detection what is the right algorithm
because what you care about could be different i let's say i only care about uh decreases or
declines in the metrics i don't care about any increase so that information is relevant and if
somehow element can figure that out uh it can it can query the
right tool so there's a bunch of interesting papers one interesting paper i read recently
called tool formula where language models can teach themselves to use tools and that's that's
that's also a very interesting area of research going on where using lms as tools manager or
using lms as reasoners which can create the right tool is very interesting.
Because if you think of it,
LLMs are trained on this vast amount of knowledge base,
which makes them very good,
make them generate good answers on an average.
Because they're trained on large amount of data.
They're not trained on specific stuff.
So they're not experts,
but they're good on an average.
And people who are good on an average,
who are trained on these vast amount of data,
are good leaders.
Or not leaders in the sense of good managers.
Yeah.
If you think of it,
LLM has seen a lot of things
which makes it a good manager.
So instead of asking you to solve
every single fine detail in the problem,
you can ask it,
since you've seen so much,
tell me who is the right person to fix it.
And that's what LLM is better at.
Instead of asking you,
what is the right answer?
So ask it,
who is the right person to solve it?
And that LLM will give you a better answer there
wow all the managers and sort of exec level people in the world better start worrying then if it can
come if you can become an effective manager right so yeah yeah but that's true but like kind of that
point of being sort of you can be sort of average or above averaging kind of across a broad spectrum
of things they tend to be sort of good managers kind of good leaders and yeah so that's that's a really interesting really interesting point cool yeah so um tell me about your creative
process in big man how do you go about kind of thinking about ideas generating ideas and then
selecting which ones to to um to to work on for for a long period of time. Yeah, so let me inside your brain.
Yeah, it's an interesting one.
I think what has worked for me,
I'm sure that the different strategies people use to test their creative self
and to come up with these interesting ideas.
I think the one that has worked for me
is to try out things and make sales fast
instead of thinking of the best idea.
So the whole idea of Panda came up instantly
when I started testing.
When I started trying to charge GPT,
my guess was that it would probably give
a much better answer than what it gives because
it's stayed on so much of data.
So it should be better than that.
But it gave some answers.
The first thing that I thought of was like, why is it not able to give this answer?
And it's probably because it hasn't seen telemetry data.
That's what's missing.
So then I quickly built a demo of what if I'm able to supply telemetry data in some
language form, can we improve it
so I felt like an iterative demoing process kind of helped me a lot so if I would have sat down and
try trying to design like the entire architecture one year ago I wouldn't have reached any
because I didn't I couldn't see a lot of problems that I saw after demoing stuff. So I feel like a creative, fast-filling approach
kind of helped me reach a complete architecture, at least.
So in terms of the demoing, this was a case of, okay,
was it demoing to sort of the team and sort of like,
oh, here's this, and then you kind of thought afterwards,
this didn't work, actually, or got some feedback,
and thought, oh, that could have been different.
And that's the way you approach it.
That's fascinating.
I've not had that answer before that's really cool everyone everyone
answers this question differently it's brilliant to see um how we're all so different and but you
know that's fascinating um cool i think you might try that see how it goes for me
awesome stuff of uh not not knowing what i don't know what I don't know
is that the right phrase
yeah yeah yeah exactly
exactly and it's very difficult to
sit at t0
and predict t infinity
so the more
the faster you move towards
t infinity as you find new
ways of exploring it's like
that these four components that
we talk about
in Panda,
it's not that
we started
thinking of it
that way.
It just,
when we built
the entire
framework
component by
component
based on the
feedback that
we got,
eventually we
realized,
okay,
this is what
one framework
is,
this is what
the other
framework is.
It's all like
working backwards instead of working forward. yeah awesome stuff cool yeah anyway so so it's um
it's time for the for the the last the last word now so um what's the one listener you
absolutely what's the one takeaway you want the listener to get from this podcast today
i think i would say I think the
last question that we discussed, the last point
that we discussed is probably an interesting one
I think people can take away from this because
again,
if you look at the work, I'm
neither an expert in databases nor
an expert in language models.
And we still came up with
something interesting
in this combined space because of just quickly experimenting and failing on these crazy ideas.
So I think that's one takeaway we can take.
You don't have to be an expert in either of the domains to come up with something useful.
Because what's useful and what's interesting are very two different things
because something that can be absolutely uninteresting
could be very useful
and something that is super interesting
can be extremely useless.
So I would say if you want to build something
aimed for useful
and eventually you'll realize
that what you end up building
it will become interesting at some point
because interesting
kind of is a state
that evolves over time.
If you have a useful problem, start building it.
Don't care about how complex
this model is going to be or how
sophisticated I can train this model to.
Start with something very simple and
try to solve it in space like
Panetta does and eventually it will become
interesting because no problems,
because I'm sure you'll never,
you can never think of all the problems
at the very start.
You'll always encounter problems
that will make things more interesting.
I think that's a brilliant line
to finish it on.
So yeah, thank you so much, Vikram.
It's been an absolute pleasure
to talk to you today.
And if the listener wants to know more
about Vikramant's work,
we'll put links to everything
in the show notes. And yeah, thanks again vikramant's work we'll put links to everything in
the show notes and yeah thanks again vikramant it was a fantastic episode and we'll see you all next
time for some more awesome computer science research Thank you.