Disseminate: The Computer Science Research Podcast - Suyash Gupta | Chemistry behind Agreement | #24
Episode Date: February 27, 2023Summary: Agreement protocols have been extensively used by distributed data management systems to provide robustness and high availability. The broad spectrum of design dimensions, applications, and f...ault models have resulted in different flavours of agreement protocols. This has made it hard to argue their correctness and has unintentionally created a disparity in understanding their design. In this episode, Suyash Gupta tell us about a unified framework that simplifies expressing different agreement protocols. Listen to find out more! Links: PaperWebsiteTwitter Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello and welcome to Disseminate the Computer Science Research Podcast. I'm your host, Jack Wardby.
Today we have another installment of our CIDR series and I'm delighted to say I'm joined by Suyash Gupta, who will be talking about his paper, Chemistry Behind Agreement.
Suyash is a postdoc at UC Berkeley and he's currently working on designing efficient fault-tolerant systems with a key focus on handling malicious attacks.
Suyash, thanks for joining us on the show.
Yeah, thank you for having me.
I mean, it's great to join here.
This is an exciting platform, and I think it will help a lot of community.
So yeah, as I'm doing a postdoc at UC Berkeley,
this was a recent paper which we presented at CIDR,
so I'm excited to
explain it and look forward to any questions. Awesome. Well, we're happy to have you as well.
So before we get into the paper, can you tell us a little bit more about yourself and how you
became interested in researching databases and distributed systems? Yes, I think so I have been
starting to do database research, I think, from Purdue, from where I started my PhD, and then I moved to UC Davis, where I graduated. So I think we were taking a course which was looking into designing commit and conquer control protocols. And that sort of intrigued me in like, how are these protocols designed? Can we actually do something new? And that's where I started to take a deep dive into distributed databases. And interestingly, this work is sort of an
extension to the early days when I started doing that. Because back then when I was looking at,
I was looking at how to design two-phase commit protocol or three-phase commit protocol. And then
I soon moved on to doing consensus protocols, basically fault tolerant consensus protocols. So this work
sort of takes care of all of those protocols and tries to bring a single framework in some sense.
Amazing. Yeah. I mean, concurrency control, right? That's what got me hooked as well. So
yeah, that's definitely the gateway drug for databases, maybe. I don't know. Yeah.
Cool. So you touched on it a little bit there as sort of the
general idea behind the paper but can you maybe give us the elevator pitch for this work yes so
um if i have to say that i'll say that out in the wild there are many type of agrarian protocols
which come in several flavors like commit protocols are there are there protocols which
can handle crashes there are protocols which can handle malicious attacks but despite this we
believe there's no single framework which can express all of these protocols.
As a result, what we feel is that there is a disparity in understanding and usability of
these protocols. And such a disparity kills innovation and adoption. In fact, in just
this cider, we were listening to some talks and we were listening to people
discussing among themselves. And a common framework statement was like Pax's heart.
So still there is what people feel not easy way to understand this protocol.
And that is where our work come into picture.
Chemistry behind agreement.
We tried to design a chemical framework.
Per se, it's called chemical.
But the idea is to design a framework which can express all of these protocols.
And the hope is that using this frameworks framework which can express all of these protocols and the hope is
that using this frameworks developers can come up with new protocols and they can actually adjust
the system based on the requirements yeah i mean that's it's super useful work so i know from
personal experience trying to understand all the different variants of paxos and the whole spectrum
of protocols there right it's it's so hard hard to compare and get your head around each one, sort of teasing out what the primitives are and things.
So, yeah, definitely really, really sort of a helpful work on that front.
So you've kind of I really like the chemistry angle to it as well.
I think that's a really good way of thinking about these things.
So you start off by in the paper by defining four atoms.
So can you maybe run us through each one
and why you decided on these four being the four atoms?
Yes.
So the idea behind this paper was like,
as you just said,
we're trying to say what is chemistry?
And if you're saying chemistry,
chemistry has many elements,
there's something atom, there are elements,
and then there are compounds.
So first thing we need to design is what is atoms.
And in chemical term,
atom is basically the smallest indivisible unit of an element. So similarly, we were trying to think, what are the atoms of different agreement protocols? How will these protocols actually come into being? So I think a few things which you realize that all of these protocols have something which is called a failure model means what type of attacks or things they can handle like for example some
protocols can simply handle crashes and they cannot do anything more and that is good enough
while some protocols can actually handle uh byzantine attacks or malicious attacks and
that they're there the replicas could actually collude they can lie or do anything else so first
thing is we need to define what is the failure model. Next element according to us is quorum.
Basically, if we are designing a protocol and in that protocol, there will be multiple parties.
What is required from these multiple parties to come together and reach a common decision?
And that is what quorum basically tries to tell.
For example, in some protocols, the idea will be that you need all the parties to come together.
So in a system, if you have n parties, need all the parties to come together so in a system if you
have n parties all n of them should come together while some protocols will say hey i do not need n
of them i only need a subset of them and that subset could be f for example paxo style protocol
what they say is that if you have n parties and they say that n is equal to 2f plus 1 then i only
need f of the parties to come together reach reach a common decision as F plus one parties.
And that is the quorum for them.
Similarly in the BFT world,
but the idea here has been that a lot of protocols say that we need two
third majorities,
two F plus one to come together and actually reach a common decision.
So that is quorum for us.
Then another element we thought is that, okay, we have defined failure.
We defined quorum. What can we do next? The next thing which we thought is that, okay, we have defined failure, we defined quorum, what can we do next?
The next thing which we thought is will be topology.
Basically, where will these protocols run? How will these protocols run?
So generally, if you say in the broad outlook, the protocols can run either as a centralized setup or in a decentralized setup.
In a centralized setup, the idea will be there will be a leader which will actually initiate the protocol and everyone else will
follow whatever the leader is saying while the decentralized setup everyone can work together
and they can send messages to each other they can communicate with each other they do not need to
wait for the leader to initiate stuff alternately we also there's there are some protocols which
follow a ring design where the idea is that i will just communicate with two of my neighbors
basically we assume a ring like structure everyone is sitting in a ring and they are communicating
so these are some of the sort of like topologies we see are common and finally the fourth element
which we thought is like data distribution how is the data distributed among these parties
because we have these parties and what are these parties actually holding on to? What are they actually executing?
So one way could be that all the parties are holding exactly the same data.
And that is a common replication model.
While the another scenario could be where every party only has a distinct set of data items.
And traditionally, this thing is called a sharding.
Similarly, there are some models which use flavor of both,
where you have both sharding and both replication happening at the same point in time. So this is how we define generally the
atoms in the system. Nice. How did these four things sort of fall out then? I mean, was there
certain other atoms you thought, do we make this an atom or not? Or was it pretty obvious from
when you started trying to decompose this and break this down that these are the four sort of founding like indivisible atoms right that that kind of fell out so uh that
is again a very good question i think um why we thought about only these four atoms was the reason
is because we saw a lot of different protocols and i think some of the protocols which we'll
be discussing later in this podcast also will be that any protocol which we
have will follow some of these structures they will first define how their party should have
the data they will of course define what is the type of attacks they can handle they will design
how should they reach a decision and so basically these ended up being the fundamental constructs
in the protocol of course we're not saying that these will be the only sort of four
atoms for the start. We believe these are the four fundamental atoms and we believe in the future,
more atoms can be maybe added if this possible that some protocols might require some different
items which are indivisible and they need to be the founding was so yes, they can be added. But
according to us, these are the four basic which will be part of almost any protocol nice that that really makes sense so now now that we've got our atoms we need that in chemistry we
need we need compounds right as well so you give the six of these in the paper can we maybe run
through those what are they and why yes now once we got the atoms now we need to do next as i just
said we need to make them sort of an element, combine these atoms, what can we do? And in some sense, what we view elements in this paper are basically phases of the protocol. So we have different agreement protocols, we could have two PC, three PC, which are commit protocol, we could have Paxos style protocol, which can handle crash failures. And then we can have PBFT style protocol, which can handle all the
malicious attacks. So in all of these protocols, what is something which are common? And that is
what we try to term as the elements and basically the common phases. For example, a very common
phase in any protocol will be proposal. Someone has to, because if a protocol is being run,
that means someone must have sent a proposal. There must be a leader.
There must be some party who has actually said, hey, let's run this proposal.
Let's complete this transaction.
Let's execute this transaction.
And that is what we call as one of the key elements.
And often this proposal is sent by the leader to all the other parties or in protocols like
distributed protocol.
Maybe they could multiple leaders.
All of them could be sending proposal at the same point of time then once a leader has actually
sent a proposal other parties need to take some decision generally the idea is that these parties
try to reach an agreement what to do with the proposal so they try to vote on the proposal
and that is according to us the next element where the parties will decide to vote whether they want to actually agree to what the
primary is saying or the leader is saying or they do not agree in some protocols the vote might
actually even determine they want to even abort the transaction commit the transaction so the idea
of the next element is to vote then once the the voting is done, now a common decision has to be reached.
So the votes, all of these parties, they vote.
They send these votes to the primary if there is a primary-backed protocol.
And now what the primary will do?
It will collect all of these votes.
And once the primary has actually collected all of these votes, it needs to gauge what is the result of this voting.
Has everyone agreed or has everyone disagreed?
So now based on those results, now the protocol needs to move forward.
So if it is possible, some protocols say that just enough votes are once we get sufficient number of votes, we can simply prepare or we can simply commit.
And that is what our next elements are.
Based on the common decisions,
protocols try to decide on the global decision.
Based on this global decision,
either you can just prepare,
move the protocol to the next stage,
or if you want multiple steps, you can also commit.
And similarly, we also have here another element
which we call as decentralization.
Because if the protocol is just being led by the
primary then your prepare and commit are happening led by the prime because everyone is sending to
the primary then primary sending response back while you can have a decentralized version where
you say everyone can communicate each other so again we have decentralized version of this
protocol and in the paper if you'll have a look we try to show us a symbol to represent this
decentralization so we are trying to show the difference between centralized prepare and commit
and decentralized prepare and commit basically saying who is leading to this next decision
making steps finally when all the decision is done you need to actually execute the transactions
execution means you need to actually find what the result is and then you need to reply to
the client. So that stage we are again representing through an element. And often when you have a lot
of transactions and you have reached completed execution and you have passed through a lot of
history, you need to do some checkpointing and garbage collection. So we again make us element
for that. And finally, it is possible that if the protocol is led by some leader, then that leader might fail.
And if the leader fails, then what should you do next?
You need to select a next leader.
And once you select a next leader, you need to ensure that all the other parties know this is the next leader and they all have the common state.
So we believe leader election could be another element so if you see in premise all of these elements are trying to
show that different phases and protocols are sort of like common in every protocol and these are
exactly the elements that's what we are trying to show nice so yeah just just to summarize
the six different elements we have here so So we have proposal, vote, prepare, commit,
with obviously the decentralized variant as well.
Execution, checkpoint, and leader election.
Great. We've got our elements.
Now let's cook up some protocols of these elements.
Let's mix them together.
So there's this really nice bit in your paper where now we've got our framework.
You analyzed four of probably the most famous agreement protocols out there. So maybe we can
run through each one and just talk about how this framework can be used to describe these algorithms.
Yes, I think that's a great thing. So now what we'll try to do is that we'll try to take
the four atoms and the six elements, and we'll try to design four that we'll try to take the four atoms and the six elements and we'll try to
design four of the protocols. Basically, we'll try to see how can we design two-phase commit,
three-phase commit, Paxos, and the PBFT. And the reason we chose these four basic protocols is
because two of them come from the commit world. One of them comes from the Paxos,
a crash fault-tolerant world. Another one comes from the malicious
fault tolerant protocols.
So to understand that, what we'll
try to do, we'll start with 2PC protocol.
And to see 2PC protocol,
we know that in the 2PC protocol, the
assumption here is that there will be
multiple parties. And
these parties, what they try to
do is they try to reach an agreement where a
transaction could be committed or has to be aborted.
And the whole point here is that generally 2PC runs in a partition system where the idea that all the parties are managing their own data.
And that is why they need to take their own individual decision whether they can actually order the transaction or not so in some sense all of these
parties are sending a vote of aborting or committing the transaction so the first phase would be for
would be also our first element which is proposal so whichever party has access to the transaction
it says let's propose and then now it sends this proposal to all other parties. And now all other parties,
when they get this proposal from the leader, what they do, they decide to vote whether they
should commit or abort the transaction. And the thing here is that all of these parties are
individually making a decision. They are trying to look in their own history and seeing whether
they can individually abort a committed transaction or not and this is
why we represent this type of voting sort of differently from the voting we do in pax and
pivot which i'll discuss next and once the primary gets the decision from all the parties it decides
okay if everyone is agreed to commit transaction let's. If one of the persons says, or several
people say that let's abort the transaction,
they'll abort. So the idea is to show that
there's a next phase, which is
the commit phase, where you make a
common decision what you need to do with the transaction.
And based on this discussion, now the
transactions are executed. If you need to actually
execute the transaction or not, you need to reply
to the client, you need to update in the history,
write in the log, so that's what Bing has done, execution step. So that is a very simple two-phase
commit protocol. But now once we have done the two-phase commit protocol, let's try to see what
do we need to do in the three-phase commit protocol. And this is where things become
interesting. The extension between two-phase commit protocol to three-phase commit protocol
is quite trivial if we try to see it from
our framework's perspective. Basically
the problem with two-phase commit protocol
was that it could be blocked.
So prior work
showed that if the primary
fails and
one of the replica fails at the same point
in time, then it is very easy to
create a scenario where all
other replicas are stuck.
They are not able to make any progress and system comes to halt. So to avoid that,
three-phase commit protocol came. And the idea of three-phase commit protocol was to say that,
hey, before you actually make any decision, ensure everyone is ready to move forward.
So that's why three-phase commit protocol had a prepare phase. And this is where another element comes.
So when we said you can either commit the transaction
or you can first prepare and commit a transaction,
that is where three-phase commit protocol comes.
So the proposal is the same.
Leader sends a proposal.
Everyone is voting.
After everyone votes,
now the leader does not make the final global decision.
It decides whether it can actually move everyone to prepare
or it can ask someone to come.
And that's what is happening.
And once it knows that everyone's ready to move to commit,
it again finally does the committing the transaction.
So basically we have added two more phases
in the whole, two more elements in the whole framework.
So if you've seen the two phase commit,
we had four elements, proposal, vote, commit,
execute. Now we have six elements, proposal, vote. Now you prepare, then you again vote,
then you commit, and then you finally execute. So that's what three-phase commit protocol looks like.
Okay. So now we did two-phase commit and three-phase commit protocols. So how can we build
Paxos from this knowledge? And that is what we were trying to think.
So what does Paxos do compared to two-phase and three-phase commit protocol?
Well, if we see in two-phase and three-phase world, the idea of what was happening is that
all these parties were making the individual decision. And in Paxos, what is happening,
there's a notion of a replicated system. So all the parties are working on exactly the same data.
They have same sort of data items.
Whatever the transaction comes, they're trying to order it exactly the same way.
And to do that, we need to run through a consensus, which is backspace consensus.
And what we'll try to see here is that the phases being used are exactly same as 2PC.
For example, in the Paxos, let's assume a leader.
There's already a selected leader
and that leader has to propose.
So the leader proposes a transaction.
And now the same thing was happening in 2PC.
The leader was proposing.
And now everyone is voting.
But what they're voting here is that
they're trying to see whether they can actually agree with the primary or not so they simply send
their agreement to the primary if the primary received agreement from majority of people in
this case it will be f plus one as we said the quorum size is different in two pc the leader
had to listen to everyone while in taxes the benefited leader only needs to listen to F plus one people.
So once the leader listens from F plus one people,
it knows that, okay, F plus one people
have agreed to follow the decision.
So he goes ahead and commits the transaction.
And then finally,
the leader asks everyone to execute transaction.
So exactly same four elements,
which are being used in 2PC are being used in Paxos.
But the only distinction is the way they are being used.
And in 2PC, the voting is done.
Everyone does a distinct sort of voting.
They have their own states.
They might be working on partition data.
While in Paxos, everyone is working on exactly the same data.
While in 2PC, the leader needs response for everyone. While in Paxos, everyone is working on exactly the same data, while in 2PC, the leader needs response for everyone,
while in Paxos, only F plus one.
And similarly, this allowed us to create PBFT now,
because seeing we already made Paxos,
what do we need to do in PBFT?
In PBFT, the difference here is that
you can have malicious attacks.
So because you can have malicious attacks. So because you can have malicious attacks,
like in 3PC,
you had to wait before you reach your decision.
In 3PC, we had to first prepare and then commit.
Exactly same thing what we are doing in PBFT.
We first have to prepare the replicas
and then we commit the transaction.
But the key difference between 3pc and pbft is
same what is between 2pc and paxos and of course in pbft the quorum size is increased the quorum
size which was f plus one in paxos is now 2f plus one in pbft so the phases are exactly the same so
this is how we were able to define the four different protocol,
using just these six elements.
Here you see that it puts it all into perspective
so much more in a clearer way in my mind
than me reading these textbooks over the years.
It just seems it's a really nice framework
and it fits all together really nice.
I mean, the thing that was kind of, I guess,
not obvious to me initially was like,
the way that basically you kind of, what's the word? I was initially thinking that the elements could only be used like once, right?
You could only use the prepare once, right? But you can use them multiple times, right? That's,
I guess, the secret sort of source to it in a way, I guess, is that you can mix and match the
order in which you have these elements, right? Which is awesome. So I guess building off that,
you hinted to it earlier on as well, is that this framework will allow you to maybe create
or open the space up to thinking about
different combinations of these elements
that haven't maybe been explored yet.
And I know you have a section in your paper
where you look at more complex protocols.
Maybe we can touch on that,
and then you can maybe say about the potential space
you think is out there for what of what this framework opens up.
Yes. So you're absolutely right.
I think that was one of the things also driving goal for us.
We wanted to just not restrict ourselves to these four protocols.
The idea was to say that this one framework could actually aggregate all different protocols and in fact uh uh while i was explaining
different atoms and elements i also said that there could be a centralized and decentralized
version so we i think we also show in the papers like the paxos which i was just describing was
the leader base where the leader was actually driving the paxos leader was sending the proposal
it was collecting all the responses and taking decisions.
We also show in the paper, we can simply make a decentralized version where once the leader sent the proposal, now the leader is no longer in the middle.
All the replicas can go ahead and talk with them themselves.
They can come on the common decision and they can finally execute.
Similarly, we can have a decentralized version of PBFT where we can say that all the replicas can communicate with each other and they can come reach prepare and they can then finally commit
in the paper then we go again with several other protocols i think i'll not discuss all the
protocols but to give a quick glimpse we talk of protocols where there can be notion of speculative
consensus for example some protocols say you don't need to wait for decision to be committed.
Basically, you can take an eager decision.
And that is, for example, they're having protocol like ZZ1 and BFD1.
What the idea of this protocol is to say that, hey, as soon as the primary sends me a request, a replica will go ahead and execute the request.
And then in the future, they will decide whether that request can actually be committed or
not. And in case it could not be committed, then they need to roll back this. So we were able to
simply express this in our framework, because why we just needed two elements, we needed
prepare proposal phase, and then we need an execution phase. Of course, if we need to show
the failure, then we have to design a different model.
And similarly, we were able to show several other protocols. For example,
all of these protocols we were talking about assumed right now that there's only one leader and that one leader is sending the proposal. But what if there could be multiple leaders? I mean,
there are several protocols like there's a Menaces protocol, there's an RCC protocol.
These protocols, what they try to say, you can have all the replicas maybe at the same
point and they can act as the leader.
That means in such a protocol, several requests are getting committed, prepared at the same
point of time.
So in our framework, what we try to do is that how can we actually leverage this?
And if we're going to show that multiple concerns are happening at the same point in time, how can we actually leverage this and if we're going to show that multiple constants are happening at
the same point in time how can we represent that and and for represent that was very simple for us
what we added was just sort of like a subscript showing that hey we sort of like uh said that
these phases uh we found what are the some sort of uh phases which everyone will be doing we
subscripted by i i means how many constants do you want to run in parallel.
And then we said that we added a new element
where we said that if all of these constants
are happening in parallel,
then you need to aggregate the response
because these replicas are running multiple constants.
They're running constants
because they have multiple leaders.
But ultimately, they need to execute these requests
and they have to come with a common order for all of these consensus so we need a phase where this common
ordering can be generated so that is how we express parallel protocols and multi-primary protocols
and similarly we went to other style of protocols we said that how about protocols or systems where
there's both sharding and replication like spanner is one example of system so we try to express those systems where we show that if you want to do sharding
how do you represent that and then you can even do replication so we just as what we saw that
was just some sort of change of notation we needed few more notations but the phases and elements
still remained the same the only thing was that yes if you have multipleards, you might require these shards to communicate with each other.
And for that, we added a new element where we said that
maybe you need an inter-shard communication.
And that inter-shard communication is a new element for us.
And this element was interesting because this also helped us to define
a different style of protocol, which is basically, say,
the geo-replication protocol, where you have multiple clusters
and each cluster is running its own consensus.
They're still working on the same data
and everyone is still replicating the same data,
but they now need to communicate.
So that is what we are trying to do.
Next question is,
so far we've been talking about the happy path, right?
So we haven't really touched on failures too much.
How did you go about incorporating failures
into your framework?
I think you asked a very good question.
I think this is a concern for almost every protocol.
I think that is what is challenging for every protocol
is how can we actually handle failures?
And can we actually design a common path
for all the failures?
Can we actually see that even failures
have same sort of elements
and atoms like the happy path as he was talking about and actually that is quite true so because
if if you see a very basic failure model the very basic for failure model talks to us that
there has to be some leader who has failed. Because if a non-leader has failed,
then basically that non-leader just needs to get up
and it needs to be recovered by itself.
It can ask others for help.
But if the leader fails, then the system has an issue
because now the leader has failed,
you need to select a new leader
and you need to ensure that all the replicas
are on the common state.
And that is what we're trying to do.
With the leader action,
we believe there's a possibility of creating
a common set of elements here.
And that's why we'll design four different elements.
The first element was timeout element,
where we said that,
how do you know that the leader has failed?
The idea is that every replica is actually waiting
for some action to happen.
And if that action does not happen for a prolonged period,
then that replica will sort of time out and they need to take the next step.
In our case, what we'll say is that the replicas are waiting for leader to do something.
And if the leader does not do a specific task, then they have to time out.
And once they time out out they now need to do
next and the next thing generally is that they announce if because if i'm a replica and i've
timed out i need to know that how can i make progress and i cannot make the progress by
myself so generally i need to announce to everyone and say hey i've timed out have you also timed
and and based on that decision we can do the next.
And often the idea is if every replica is announcing with each other,
now they will come to a common goal that they'll find out the leader has actually failed.
So now they need to select a new leader.
So the whole process of leader election.
And leader election is a very famous problem.
I think there has been a lot of papers and distributed systems and databases about how to do
leader election. So this simply fits
in here. So after you have done timeout,
there's announcement, another leader election comes.
And once the leader election has happened,
the idea is that everyone
should reach a common state. So generally the
idea is the leader will ensure everyone
has the common state.
In Paxos-style
protocol, the leader might check everyone's log
and maybe try to set their logs into correct state.
Or in BFT-style protocol, the idea is the leader
will ensure it gives everyone the common state
they can all reach in so that they can, again,
start the consensus.
So this is exactly what we believe are the four elements,
and these four elements can help to define any failure model.
And in the paper, we actually show how can these be used
to represent PBFT's failure model,
because we believe if you're able to represent PBFT's failure model,
we can also represent Paxos failure model, which is somewhat easier.
Awesome. Yeah, I recommend the listener going and digging into that
and looking into that some more for sure and so given the the framework here how did you then go about evaluating how effective the
framework is and how how useful it was like and how did you experimentally explore this and i
guess kind of what questions are you trying to trying to answer yes So I think that that is a good thing. So the key challenge for us
while experimenting was to understand how these different protocols are faring with each other.
First of all, at least in this paper, we are not presenting a new protocol. So people already know
how different protocols are supposed to behave in each other.
But one of the key goals of our implementation and evaluation was try to give a highlight that now that we have a framework which can actually express different protocols and even their failure model, can we similarly have an experimental framework like that where we can actually pick and play different protocols. We can have like these elements I'm talking about,
like this proposal, vote, prepare, commit and all.
Can we have all of these elements as like modules?
And if we have these elements as modules
and then these atoms are basically
the general characteristics of different protocols,
then we can simply plug and play these modules
and we can create different protocols.
And that is exactly what we tried to do
in the experimental framework here.
In this paper, we specifically use Bedrock as the framework.
And why we use Bedrock?
Because in the prior work,
we have used Bedrock to actually test different types of BFT protocols.
So in this framework, what we did is that
we actually leveraged the bedrock
to create even 2PC, 3PC and Paxos.
And what we were trying to show here is that
just by having these modules,
so we were easily able to not only create these protocols,
but even able to achieve the same performance what is expected so in general what we expect is that you expect two pc to assuming
there are no failures so assuming there are no failures the assumption is two pc will perform
good because it just have few number phases all the replicas will achieve high throughput and
similarly the assumption here with the pbft would be the most expensive protocol because it
has three phases all to all communication and the leader has to send message to all the replicas
and there's somewhere in between there will be three three pc there will be paxos and that is
exactly what is what we are able to yield the same results uh and we don't the whole aim is that now that we have created
such a framework with these small elements and that framework is able to express these four
different protocols we actually we also evaluate pou uh which basically reduces one phase from
pbft and we try to show that po is actually performing better than pbft in the experimental
so the idea is that with these four or five protocols, which we are now able to create,
maybe in the future, we can also add more elements,
or maybe we can use these existing elements to create newer protocols.
So this experimental evaluation is sort of like an initial step
to allow developers to create a framework where they can easily mix and match different
protocols and create newer designs awesome yeah i have a few questions that kind of follow up from
the first is bedrock i mean can you tell us a little bit more about because i'm not familiar
with that system at all yes so i at the bedrock is actually um work with one of the co-authors so
one of the co-authors on this paper is uh muhammad javed amiri was from jupen so it is his uh ongoing work and the idea of the bedrock is basically trying to understand
how different bft protocols work in practice so i think muhammad has done a good job there he
tried to basically implement almost 10 to 11 10 to 15 style different BFT protocols.
And what it tries to show is that how in practice these protocols are performing, what is the throughput,
what is their latency?
It could be in the single cluster,
it could be across a geo cluster,
multiple regions and continents.
So I think that is what he's trying to experiment in Bedrock.
And I think that is why we use Bedrock for this framework.
Sure, yeah, that makes total sense.
I mean, I really liked what you were saying as well
about this having these plug-and-play modules
for each element, right?
I mean, I can imagine me just coming along
and writing some config files.
Today, I want to try out this new protocol
I think could be great.
I want to evaluate the properties of it
through latency, see how it performs.
And I can just like plug-and-play all the,
I want to do this round on this element, that is kind of stitched small together that's really cool so
how easy was that to implement and like to have these sort of independent modules is that quite
difficult or did you find things leaking into each other actually that's a good question so
so in this paper interestingly we use bedrock so this was something sort of an attempt which we
also did at uh resilient db while we were
doing this thing so and in both of these systems i think the idea what we did was we first started
by creating a simple protocol let's say we just took pbft for example and once we designed pbft
we saw there are very common phases like phases like the same thing which we are talking of
elements here right like The prepare and commit,
the same things have to be done in there. And these could be simply created as independent
modules. Of course, as you said, they will have some common parameters and these parameters will
be accessing each other. So that is the same thing which is going to happen even here. But
we could easily separate out these as functions like a function called this prepare function
as a commit function or a vote function.
And now for these four protocols, for example,
we realized that these functions could be similarly used.
But yes, of course, there were some nitty gritty details.
For example, in 2PC, we realized that the vote
is basically telling whether you want to actually commit
or abort the transaction.
While in PB of 2Paxel, vote basically means I'm ready to agree with the primary.
So in that case, we had to have some changes.
We have to say a vote for commit protocols while a vote for any other specific startup protocol.
Similarly, execution instance, I think it was quite easier for almost all the protocols
because you're simply executing the transaction or even the lead election was quite simple.
So some of these phases were very easy to separate out.
Yes, but there were some other phases
which use common global parameters.
And I think for there, you have to link to it.
So there, I think the developers have to uh do some edit
okay how how big was the implementation of the fallout how long did it take
um okay so the the two things as i said because um part of the thing was we already had bedrock
uh with us so we didn't have to do a lot of implementation. The only implementation we had to do was for 2PC, 3PC, and Paxos.
But I think it did not take much time.
It just took us a matter of, I believe,
Mohamed told me it just took him like a couple of days
just to implement these few protocols.
And I think that was the beauty about it.
Because generally, if you're going to write a protocol,
it is going to take you a lot of time.
But if you have a plug-and-play system it just takes a couple of days to create
multiple protocols i was thinking the answer is going to be in a magnitude of months rather than
days wow that's yeah that's awesome uh cool um yeah i guess i guess kind of spoke about all like
the the like how effective the framework how useful the framework
can be and you've spoken about the evaluation but are the any limitations to the framework like what
are the the areas where it doesn't maybe fit perfectly actually that's a great question in
fact you yourself uh while asking me the question pointed out a limitation interesting and there is
sort of a limitation i think the limitation the framework is sort of like the system right now itself is not fully expressive we need to do much more i think
to start with one of the points which you've said was uh there are protocols which talk about read
and write sets where you need to specify read and write says earlier in time and like the
deterministic protocols which say that you need to know the read and write set of the transaction
ahead of time and i think at this point in time, our
framework does not provide a way to
express such protocols, but we need to think about
how can we do that. And
similarly, right now, all the
protocols that we have discussed,
we make an assumption that they have
sort of a partial
synchrony or synchrony assumption. The idea is that
like
the real election with the assumption
here is that we're assuming the replicas will time out and when they time out they will announce
they will announce and then they will go and select a leader and they will do for but what
if the protocol is asynchronous and there are sort of many asynchronous protocol where there's no
notion of timeouts so how do we express such protocol in the framework i think we still need
to do on that and similarly our protocol our protocol right now, for example,
does not talk about things like reconfiguration.
I mean, what if a node wants to join the system?
What if a replica wants to leave the system?
Can we express those type of things in our framework?
That is something, again, we have to think about a challenge.
So, and some more protocol like there
are protocols which have a dag like right now we are assuming a single ordering and interesting
most of these protocols assume there's a single but there have been recent protocol which assume
dag based orders dag based ordering there's no there's a partial ordering happening so
again we need to think how can we include these type of designs in our frame the dag thing is
interesting because i don't know whether this is related to what i'm going to ask is that how would you capture different
consistency guarantees in the pro in this framework and is that something that you're
thinking about exploring as well actually i'll say that that is actually one of the things which
we are thinking about exploring i do not have a right answer right now. That is something which also
hovered around
because right now we are assuming
just one guarantee.
We are assuming a single ordering across
all the replicas, but I think, yes,
you're right, we might have to think about
eventual consistency or we might have to cause
a consistency. Can we actually even talk about
these things in that framework?
Might be quite interesting i we
don't have a right answer right now but we are this is something we are actively exploring right
fantastic yeah plenty of interesting things to be going at cool um yeah i guess my my next question
is then has maybe as a like a software developer or someone working in databases or data management
how can i leverage the findings from your research
and things you find and what impact do you think
they can have on someone's day-to-day work in life?
Yes, I think, so for that, I think I'll go back
to like a few years back and I'll try to say like,
why did we actually even come up with this idea?
So, I mean, this was not something
which we have been thinking for years,
but I think something, so during my PhD,
I think we have been working on like uh these different constant protocols and that is where we designed this framework we
which we call this resilient db so ideas basically which expresses different bft protocols and uh
similar to bedrock and the idea of our resilient db framework was basically trying to say that
how can we express different protocols uh the idea was not exactly plug and play but just to give a framework which can allow
implementing different protocols and uh we found this challenging because there was speculative
protocol there were geo-scale protocols there were sharded replication protocols we were trying to
have these challenges and i think this is the same sort of challenge which even developers had because i think my one of the person i was uh working on such this framework uh
he is now working at oracle and or other places and he also said that he met different developers
and these developers still find it hard to design back source because it's not it's not it's very unclear to people like how can you
even do simple sort of like replication or safe replication and i think the use of this framework
would be like a lot of people easily know about two-phase commit three-phase commit protocol
because these are the basic database protocols which are taught to anyone and if we can show
that other protocols are quite
similar to these uh like protocol like packs or the pbft are quite similar to this this two-phase
three-phase protocol then people can easily reason about them and uh this reasoning can actually help
in expressing different protocols and i think might easily help developers to develop these
protocols and create the fallback case because often i've seen developers saying that hey uh pbft is hard why it is hard because oh the failure case is very hard
you have to think about a lot of sending a lot of messages or receiving or actually generating
the common state we believe that may not exactly the case it might just be a simple way of expressing
first and yeah i think it definitely gives a common language
for people to talk about these things
because I think we all kind of have in our heads,
otherwise, our own way of understanding these protocols, right?
And then often my worldview is not compatible
with somebody else's
and we kind of get confused on terminology and whatnot.
So yeah, definitely kind of giving people a common baseline
to talk about these things on is a massive, massive win.
And I think as well, I mean, you touched on it a little bit there where you said that, people are coming baseline to talk about these things on is is a massive massive win um and i
think as well i mean i you touched on a little bit there you said that i mean i'm i'm i was getting
this as well like two first commit three years yeah this is these are pretty like okay i can
understand these you go into pactos and pactos sort of has this like myth associated with it
being difficult to understand right and it kind of almost it feels impenetrable almost sometimes
but when you break it down like this it's you just saw oh no it's not just doing a few extra phases right like and that that that
kind of almost demystifies it and sort of takes away sort of um the scariness of it almost um so
yeah definitely yeah i agree with it agree with everything you're saying and cool yeah i guess
when you when you were working on this then what what was maybe the thing that you kind of learned
across working, that kind of caught you off guard maybe
and was you were like, huh, that's interesting.
I didn't expect to learn that sort of thing.
Yes, I think two things were quite interesting to me,
especially while I was trying to understand
this framework was specifically,
we saw, as I was describing the protocols,
we saw there is some sort of similarities between pbft paxos 2pc and 3pc i think that was quite interesting especially
similarities not just similarity in the saying that of course their agreement protocols and
they have similar style phases what it was like the number of phases were exactly the same and
like the only difference what we found was how well
they were actually replicating the transactions or they were not replicating transactions where
they were actually uh nodes were taking their own decisions and that was quite intriguing to us
because we were like okay they're not exact they're not quite different protocols they have
same style of phase and i think this is what exactly I think what the challenge has been in the company.
And we all have been looking at the protocol
in a different way,
but they're exactly simple extension.
You just add more phases to Pax,
it becomes PBFT.
You added more phases,
you increase the quorum size,
now it's PBFT.
You just change the model from like,
you increase,
change the quorum size,
you decided who will be making the decision.
Now it's actually two pieces.
So this is what was quite.
And now how should I parallelize the protocol?
You just allow replicas to run multiple concerns at the same point in time.
Now it's a parallel protocol and just need to collect and aggregate this state.
So I think this was quite interesting to us.
And this was something I think we didn realize it it is quite easy to understand even
even after working for several years on these protocols itself we ourselves did not realize
until we actually started making down this framework and realize yeah it is quite we are
using exactly the same states and that is exactly easy to define nice yeah for sure i mean when you
were working on it as well,
I mean, when did the idea sort of happen for initially?
When did you decide, you know what,
we're going to do this framework?
How long back was that?
So I think in some sense,
the idea for this,
okay, the idea for this specific paper,
I'll say was just last year,
but the idea in general to design such a
framework or way to express had been there with us for i think since 2018 2019 i'll say that since
2018 2019 we wanted a way to express and i think the problem the reason why we wanted to express
is because as like everyone uh when i started i started with just understanding 2PC, 3PC. And then, of course, Paxos has been
difficult for everyone, I think. And then I just switched to
PBFT and then I tried. Okay, I was trying to understand. Once I understood PBFT, I think
I went back to Paxos. Oh, it's not exactly difficult. It's quite similar
to what PBFT is doing. It just reduces phases and
you don't require that much communication sometimes.
So I think that is what
was quite interesting for me.
But at that point,
even I'm still saying,
if this is the case,
why don't we just simply express
all these protocols in one framework?
And actually that helped us
to design a few other protocols.
I think subsequently,
while I was understanding these protocols,
we had this GOBFT paper which was at vldb and we had another paper rcc which was icd so
all of these papers and i talk about these protocols actually uh in in the paper all in
this chemistry behind agreement paper also so the the protocol POE, RCC, and GOBFT,
all three of the protocols,
they came to us in a matter of like
one and a half to two months.
And why they came to us?
Because once we saw that how similar the phases were,
we were easily able to split them.
We saw that, oh, now you can simply parallelize
and that will give you performance.
And now you can actually replicate the concerns across clusters and that will give you
another different design and which will help to scale more so i think since then we were trying
to think okay maybe we should come up with a common framework on my phone and finally i think
it happened so i thought yeah finally we're gonna have to do we're gonna have to write this framework
down now awesome is it well they need like dead energy hit finally, we're going to have to do it. We're going to have to write this framework down now. Awesome. Were there any dead ends you hit, though, when you finally came to the point of deciding,
right, let's write this down?
Were there some kind of things where you hit a bump in the road sort of thing with it?
Yes, there were.
Actually, there were a lot of bumps in the road.
For example, when we started out, we started out with the four simple protocols.
And I think they were very easy to express. But I think as we move forward with further protocols, we realized that we had to come up with different notations.
For example, in chemistry, if you're trying to create a chemical bond or something, do you have different types of arrows or do you have different types of bonding?
For benzene, you have a hexagon structure do you have different type of bonding like for benzene you have a like a
hexagon structure or you have different structures like similarly we we were quite perplexed how
should we express express multi-primary protocols or how should we express sharding and replication
in fact if you are going to express sharding represents separately it's fine but how should
we express them in the same diagram so uh for a single protocol and that
too in a concise manner we cannot just have like uh some comments written on top of protocol because
that will not be quite interesting and i think for that we had to iterate over many times because
because we first i think we remember we were trying to make some sort of like uh some sort
of brackets or we were trying to make some sort of figures or square box and they're like no that's not concise that's not enough
i think that was quite uh interesting challenging for us to design how to come up with this compact
notation and in fact we still believe like if future protocols as some of the things i was
expressing uh telling that we yet have to discuss in our framework is like about different data ministry protocols maybe we still need to come up with new notations and expand the set of
elements atoms and these own notations to include this protocol so i think that was quite challenging
for us and yeah and that will still be challenging i believe yeah i think that's a nice segue into my
next question is like what's next on the research agenda then where do you go from here so i think uh all the open problems i think we are we have
started looking at them i think uh right now for example we only looked at a very small protocol i
think in this paper we only looked at like around 20 plus protocols or something 2025 protocols i
think we want to move beyond that because in just, for example, in BFT world, for example,
there are like 50 plus protocols and there are chaining protocols.
There are protocols which assume the lead election is happening just part of the happy
case every time it's happening.
And then there, as I talked about, deterministic protocols are there.
And even in Paxos world, there are sort of different type protocol. I think right now, what we're
trying to look next is we're trying to see how we can express more of these protocols.
How do we have to define the annotations? Maybe we need to come up with a much more
precise set, because I think right now this is just a start, we believe. And I think the largest,
we want to expand
from this 25 protocols
to like 50 to 100 protocols
we can have.
And then we can have, say,
hey, now this is the whole
suite of protocols
and this one language
can actually express.
And maybe beyond that,
we are also trying to think
maybe we can have
sort of like automatic
proof generation
for this protocol.
Like, for example, we're trying to think,
can we actually use things like data law to actually write down these protocols
and prove that whatever our designs are,
or these elements we can simply plug and play there
and we can show that we can easily design new protocols.
So I think those are some of the goals
which we are trying to actively think about right now awesome yeah
i am really looking forward to seeing where this goes my uh i'd be nice as well to talk i mean you
mentioned earlier on about talking about resilient db and maybe it'd be a good time to talk about
some of your other research as well that you've done i mean you mentioned like a few vldb papers
and icd paper yeah tell us what else you've been up to in recent times. Yeah, I think, so ResilientDB was something
which I started during my PhD. I think that is a system which
was sort of created. And I think that also relates to sort of the key
focus of my research. So my research mainly focused about
designing fault-tolerant protocols which can actually handle
malicious attacks.
And I think with ResilientDB, we wanted to do that.
We wanted to basically create a framework
which has the same client-server model,
but now you can actually introduce different protocols
which can handle malicious failures.
And we had several of our protocols there.
We have RCC, we have PoE, we have GeoBFT,
we have RingBFT.
So these are different protocols.
So I do these protocols in different environments.
Some work in geoscan environments,
some work in sharded replicate environments,
some assume multi-primaries,
some simply reduces phases.
So that is what we are doing.
But right now, so that was something
which I did as part of my PhD.
But right now I'm doing something more in the sense,
now we are trying to look ahead of just consensus.
We are trying to look at what is beyond consensus.
We are trying to look at, for example, serverless edge framework.
And we are trying to see if we have a lot of these serverless functions
and we have these edge devices,
how can these edge devices make use of these serverless functions and we have these edge devices, how can these edge devices make use
of these serverless functions?
But they don't have to actually interact
with this serverless function.
In the sense, they don't want to store anything
on the serverless hardware or cloud.
And how can they do that in an efficient manner?
And assuming some of the edge devices might be malicious,
maybe the cloud provider might just fail
in some case,
such a scenario, how can we actually provide efficient response?
That is something which is upcoming in our ICD 2023 paper.
Then we are also working on something about trusted hardware,
where the assumption here is that you have systems
which have trusted hardware, and we saw that a lot
of these systems use protocols which claim to have more benefits than they actually have.
And this is something, one of our work which is under process and the idea is that to show
we define a new suite of protocols and which we call as flexi trust protocols and the
idea is that these pseudo protocols are more efficient than the protocol which use trusted
hardware or the protocols which do not use trust are both of the categories and our idea is that
our protocols are more efficient they provide a higher throughput lower lower latency. They provide, they're more resilient to liveness attacks.
And there's some work which we are doing,
which we are doing in parallel.
And finally, I think one more style of research
which we are trying to look for is like,
assuming you have multiple clusters
and these clusters running whatever protocol they want.
And how can these clusters talk with each other?
And why is this more interesting?
Because these clusters could be anything. The clusters could be just organization like for example your top organizations
they want to communicate with each other now if they want to talk with you then they need some
sort of a trusted person to relay their messages or some sort of a media which is exactly specific
for their communication someone needs to handle and manage that. But if I don't want to have some trusted network,
I want to do this thing distributed manner.
I don't want to trust anyone.
So how can we do that?
And that is what we are trying to look at.
Of course, there are simple way you can ask everyone in one system
to talk to everyone in another system,
but then you're creating too many copies of the message.
So we are trying to see if we can do this type of combination but
in an efficient copy less manner so these are some of the direction which i'm trying to explore right
fascinating it's like this is a whole host of really interesting um like directions that how
do you decide which to pursue and how do you even come up with these in the first place what's your
process for doing this i think that's a very interesting question in sense like
i i i cannot just give credit to myself for coming up with most of this i think i think we have a lot
of like brainstorming session where i think a lot of us are just sitting around and trying to think
over the ideas i think uh one of the things which we always do about and i think pretty sure everyone
does is like uh we try to see what might be relevant to uh industry or to wider audience
for example uh like in in the protocol which i was talking about communication protocol the idea
was basically trying to see is that it's we believe it's a common problem for multiple organizations
to talk with each other and there have been some solutions in that direction so we were trying to
see what can we do in that space. We didn't
want to do something with trusted parties. We wanted to do without trust and we wanted to also
reduce the commission cost. So that is where it comes from. And then similarly, we were trying to
think, oh, there are a lot of hardware these days which have some sort of components like SGX is
there, Intel's SGX is there, or the ARM's enclaves are there. Amazon also has a small sort of trusted computing is there.
So these type of machines are coming forward and people are trying to use them for computation.
Can we design concerns around these type of machines?
Of course, people have done in the prior, but we first tried to find some limitations.
We found that there are actually limitations in these designs.
And then we tried to come up with efficient design so i think most of the ways when we most of the times when we try to look at problems we're trying to see uh if that problem actually
relevant uh can actually be implemented somewhere can someone actually use it
um and of course you have to read what other people have done so i think
that is how which gives an idea about what is something more interesting which we can perceive
i guess if you put your uh put your prediction hat on now or where do you think it's things are
going to look like in like sort of 10 years time like what do you think are the biggest
challenges we're going to be facing now that we need to solve for the next sort of 10 years
basically huh that's that's a hard question.
It's an impossible question, I'm sorry.
Yes.
I will not deny that.
First of all, there's a lot of research in AI
and people are doing excellent and beautiful work in that direction.
I'm not even an expert to even comment that,
but I believe AI in databases is very interesting.
A lot of people are doing.
But I'll just talk about from the perspective of a person
who's looking at distributed systems and databases.
From my perspective, I believe,
I think achieving that extra throughput and low latency
would still be an interesting problem.
But I think a lot of focus
has recently moved towards designing uh energy efficient systems and in fact uh one of our works
which is also under submissions we are actually trying to design so for example i because i work
in a malicious attack space um there's a whole world which works on protocols which can handle like
blockchain protocols, basically. And we have been looking at those directions. But I think
traditional protocols like search consume a lot of energy. So I think we have moved towards
designing some protocol which can consume less energy. In fact, the product in this trusted
space work, which I was which I was talking about, they're also one of the graphs which we are trying to show is that our designs try to use less energy.
And I think energy would be quite key.
I think because a recent presentation I was attending from Google engineers and researchers,
they also were claiming that the companies are driving towards having designs which are consuming less energy and are more efficient.
I think that is something which might be interesting.
Another direction I believe it might be more interesting
is security and privacy.
I think, interestingly, some of our works coincide in that direction.
Right now, most of the database works,
they don't cater to security in general.
Their actual works are moving towards privacy.
We have seen a lot of privacy,
how to secure the data of your client,
how to ensure the data is not leaked.
That is something very interesting.
But I think still we are not looking at malicious attacks per se.
There have been a lot of works in systems community,
but I think database community still has very few works.
But I think this will become more relevant
because attacks on databases are quite common.
Data breaches are happening very much.
And I think very recently we just saw like
there was a FAA thing
where the flight control
got completely shut down in the system
because of poor replication in some sense.
So I think such type of attacks
could have been caused by even hackers
where they easily compromise the data.
So I think security might be quite interesting.
So that's where these BFT protocols
will come into use.
Right now, nobody uses them
saying that they're expensive,
but I believe they will be much more
in application in the future. H guess maybe as well as as they become if like efficiency
becomes more important energy efficiency like those protocols improve and then they become
more usable right i guess or more appealing to use because i mean right economics right
and i guess yeah it's time for the for the last word now so guess if you, what's the one takeaway you want the listeners
to take away from this work
in this paper?
So I think
one simple takeaway I would say is
that consensus protocols or agreement
protocols are not hard.
What is hard is the way they have been
expressed and there's a
huge disparity in their design.
And this paper tries to amend those disparities it tries to promote a single framework which can help express these protocols
and our hope is that this one framework can help in designing future protocols newer protocols can
be expressed in this framework and maybe we can simply keep on extending this framework to express other protocols
in a sense.
So this framework can act as like a bridge between different communities, communities
which are using commit protocols or crash fault tolerant or malicious fault tolerant
protocols.
I think this framework can become a bridge.
And this theorem provers which are used by one community can be used in another community to express the same protocols maybe the designs can become common
practices can become common by using this framework that that is what i believe is useful
for this framework great message to end it on and what yeah we'll end we'll end the podcast there
thank you so much for coming on absolutely fascinating conversation and if the listeners
interested in finding more about your work we'll put links to all of the relevant materials in the show notes. And yeah,
we'll see you all next time for some more awesome computer science research. Thank you.