Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Guy Zyskind: Enigma – Providing Scalable Privacy-Preservation to Smart Contacts

Starting point is 00:00:00 This is Epicenter, Episode 246 with guest Guy Ziskind. Hi, welcome to Epicenter, the show which talks about the technologies, projects, and startup Chevy decentralization and the global blockchain revolution. My name is Sebastian Kudu. And my name is Sunny Underwood. And today we have on with us Guy Ziskin, who is the CEO of the Enigma project. And we'll be learning about that today. Guy, can you please introduce yourself. Sure. So yes, my name is Guy. I am co-founder, CEO of Enigma. Enigma started as a project back in 2015 when I was a graduate student at MIT, started as a research around privacy preserving computation, its relation to the blockchain, and particularly how can we use these kind of privacy preserving technologies to add privacy to

Starting point is 00:01:25 blockchains as a whole. Cool. So, So how did you originally get interested in MPC? Like, you know, you wrote this paper back in 2015 while you were at MIT. What originally intrigued you with that problem? So actually, first of all, I got really interested in blockchain technology. First of all, I got interested in Bitcoin. I was fascinated by the idea that, you know, you could basically decentralize a ledger, even though you have untrusted peers.

Starting point is 00:02:00 I thought this was fascinating. There must be a lot more that you can do with it. And then a group of guys, we basically started a project around trying to use blockchain for other kind of applications, especially identity, data privacy, but very quickly realized that there's very strong limitations for blockchain. There's a lot of limitations because blockchain is inherently fully public, and there's so much you can do.

Starting point is 00:02:27 And that's even without discussing the scalability issues. I was really fascinated about how can we make people, users, better tools for them to control their own data and their own privacy. And I was also interested in how can companies, you know, reduce liabilities that they have with storage sensitive data. And that really got me interested into this field of secure multi-party computation and how it relates to the blockchain. I see.

Starting point is 00:02:55 And so what inspired you to take this research, which was more like an academic work, your master's thesis was on like the efficient secure computation? What inspired you to take this academic research and then turn it into a company and like a public blockchain project? Or was it always sort of the intention from the get-go? So the intention from the get-go was to do something that I'm passionate about and that actually can, you know, really change. day-to-day lives for the better. I know this is a cliche, but the goal was really to do something big. Luckily, MIT is really a great place to do those kind of things. And when I wrote the first paper and then circulated the Enigma White Paper, and I was also heavily involved in the MIT DCI and MIT Bitcoin kind of ecosystem, like, I guess all of these things kind of fed each other,

Starting point is 00:03:51 and it very quickly became apparent that the best way, to do this, you know, beyond the initial research was to actually move outside of academia and actually try to do it. Whether it's a company, it's a foundation, it's an organization, it doesn't matter. The goal was and is to create something that a lot of people can use and that would influence a lot of people's lives. Okay, well, let's hop right into it then. I mean, starting with one of the core aspects of Enigma, which is privacy preservation.

Starting point is 00:04:22 And so, you know, if we look back at the history of blockchains and specifically starting with Bitcoin, so transparency, there's always been this sort of delicate balance between transparency and privacy. You know, Bitcoin blockchain is, of course, open and public. And so inherently, the information is available for everyone. But there's this sort of pseudo-ananimity where no one is really aware, technically, in theory, isn't aware of who's behind the transactions. So as time has evolved, so people sort of agreed with that. Others saw it as potentially, you know, vulnerability. And we've seen sort of companies grow into this space that are now analyzing the Bitcoin blockchain and other blockchains to extract certain types of information from it. And of course, you know, now we have Zcash and we have these privacy preserving blockchains. So in this context, can you maybe perhaps describe what are the different types of privacy problems that one can run into while using public blockchains to run smart contracts, execute transactions, financial transactions, and this type of thing.

Starting point is 00:05:34 I think there are two types of privacy that people care about today. And that's very much like how blockchains have evolved in the last fears. So there are like transactional blockchains, right? Bitcoin, all of the so many forks that exist, anything that relates to transfer of money and transfer of value. And for this, we have a lot of privacy solutions like Monero, like Zcash, like so many others that are just popping every day. But there is another dimension to what blockchain and consenseless technology can bring us. And that's what we've seen with smart contracts and with Ethereum. But I believe the problem with smart contracts is actually

Starting point is 00:06:18 much bigger and much harder to solve because, you know, when you're talking about Bitcoin, all you need to hide, which is, again, as we're seeing, really not easy, is, you know, the amounts and the sender and recipient. When you're talking about smart contracts, there's a whole state. There's arbitrary state, arbitrary data, arbitrary computations, and all the nodes in the network see all of that. So to me, focusing on solving that privacy, issue is kind of like the other big thing that we should divert our efforts to. What are some existing technologies that exist in order to do privacy, preserving state? Is there sort of a body of research that exists before the introduction of Bitcoin

Starting point is 00:07:11 and Ethereum and blockchains? Or is this field of computer science new and very much tied? to like this industry? So I'd say this body of resource started around the same time that the body of research around consensus started, maybe even a bit earlier. So both of these are like 30, 40 years old body of researchers. There's a lot of literature. This whole subfield is called secure computation, just like we have consensus, distributed consensus and Byzantine fault tolerance in distributed systems. Both of these have been here for a long while. I think what has changed in the last

Starting point is 00:07:52 decade is that there's a lot of efficiency gains that have been made to both. Many are related to blockchain, at least in the consensus aspect. In secure computation, I'm not sure it's as much related to blockchain as it just is that more researchers are interested in it and the effects of big data have kind of made this a requirement of how to protect. data privacy while computing. Interesting. So at the core, and perhaps correct me if I'm wrong here, but there's three different types of approaches to preserving privacy in a blockchain.

Starting point is 00:08:29 So one is doing secure multi-party computation, and I'll let you explain what that means specifically, but I think just in that term, you know, one can imagine what that entails. Then zero knowledge proofs, which most of our listeners will be familiar with, at least at some level. And the third is trust execution environments, which was also talked about on the show. And so people may be familiar with technologies like Intel SGX and its ability to perform computation in a secure environment. Is this the sort of exhaustive types of approaches to enabling privacy on the blockchain? Are there other methods? And could you perhaps describe more in detail how each of them preserves privacy and what are the advantages and

Starting point is 00:09:13 disadvantages of each solution. Of course. So these solutions, they're not unique to blockchains, but they are very efficient for blockchains. There's a few others like fully homophobic encryption, which is amazingly theoretical, but we don't really know how to do it well enough in practice. And there are other solutions that are more targeted and pinpointed, so it's probably not worth discussing. But of the three that you mentioned, there is, as you said, the secure multi-party computation. Secure multi-party computation is basically the idea that, you know,

Starting point is 00:09:50 you can't trust one computer or like one entity with your data, but if you can have a network of computer, maybe five, maybe 10, maybe 100, as long as you can know that some fraction of them would not collude with the others, then your data will basically remain encrypted at all times and they can still run computations over it as a network. So that's in MPC. I'm happy to go into more detail with that. Trusted execution environments are a similar idea that say there is an enclave, there is a trusted region in the processor

Starting point is 00:10:29 where you can kind of push all data and all code in. Every computation that happens inside that region cannot be accessed from the outside. So that is how you can basically achieve privacy preserving computation using that. And the third one is zero knowledge proofs. So zero knowledge proofs, many don't know this, but it's actually been developed very closely together several decades ago with multi-party computations. Zero knowledge proofs are the idea that if I have some data and I want to run some computation of that data, then I can prove to you without revealing the data that the result of that

Starting point is 00:11:08 computation was indeed correctly computed. And I don't have to reveal anything else. I'd say that, you know, in this context, secure multi-party computation and trusted execution environments are similar. They have very different threat models and deployment models, but they both enable computers and people to run computations over data that they cannot see. If I'm a computer in a trusted execution environment where it's part of an MPC network,

Starting point is 00:11:40 I am not seeing the data that I am actually computing over. Where it comes to zero knowledge proves, by definition, I have to have the access. If I'm the prover that is proving some computation, I need to have access to the data, but I don't have to reveal that data to anyone else. So let's just to make sure I get this right. The difference here is like in Zer Knowledge Proofs, it's I want to prove, I want to do the computation and keep the data private.

Starting point is 00:12:09 So I can have a private input, run the computation myself, and give you the private output, and I never have to give you the input. But if I'm trying to outsource the computation, then, you know, zero knowledge proves don't really help in this situation because whoever's running the computation needs to know about the inputs. and that's where the secure multi-party computation comes in. Is that about right? That is very much right. And that is also true if you want to combine different data from different users, right? Like if you're trying to do it, like if you're trying to do a secure auction, for example, then all the users will need to send someone their beads.

Starting point is 00:12:49 That entity, that person can then prove that they computed the winner correctly and only revealed that. but that person with the zero knowledge proof but that person would still need to see all the beats. So this is kind of like, you know, one of the drawbacks of using something when you use something like Zcash is it's very computationally expensive

Starting point is 00:13:09 for the users of Zcash because if they want any privacy, they sort of have to do these currently very expensive snark computations themselves and there's really no way to outsource that computation without

Starting point is 00:13:25 basically leaking their privacy. And so is this something that like the realm of where the secure multi-party computation, it will like be able to solve that problem? Right. So obviously secure multi-partic computation does has its overhead as well. But in many computations and in many cases, being, you know, doing a multi-party computation is faster than being a prover in a big enough circuit. So and you can definitely outsource it to some other parties and

Starting point is 00:13:55 the users don't have to participate in the computation. There are benefits for users to still participate in the computation, but they don't have to. So yes. Okay, so now let's like, you know, delve in a little bit into the Enigma Project. And so we have these, like, different ways of doing things, of privacy preserving technologies. And, you know, your goal was to create a privacy preserving,

Starting point is 00:14:25 public blockchain network. And so how are you kind of deciding which of these avenues to focus on and go down? That's actually a very good question because there's a lot to do, right? And I still believe very much

Starting point is 00:14:40 that despite the kind of adversarial aspect that a lot of people take in the space and think you know, it's like there can only be one. I very much believe that like we're still building all the building blocks and we are here to collaborate and kind of feed off each other. So it's great to kind of work together and see like how we can help

Starting point is 00:14:59 other projects. So with that said, the way that we started is basically by focusing on what we call the compute layer. So the way we conceptualize, you know, the full Enigma protocol is that there are two big pieces of it. One piece is the verifier and that's the blockchain. The blockchain should only really get proofs of computations, right, and results, and basically store them over time and deal with payments, but it should not do heavy lifting computations. That is also similar to other philosophies from other projects like Truebit, but to be honest, this has already appeared in the 2015 Enigma White Paper. So we shared that belief even before it became so well-known. The other layer is the computer layer. That's basically where nodes can run privacy

Starting point is 00:15:49 preserving computations at higher scale because not every node in the network has to run all computations. And they can do that either using trusted execution environments, which is what we have right now in the first release, or they can do that as a secure multi-party computation protocol, which is what we're going to release later on probably next year. Maybe we should start with then the current implementation that you have now and then go talk into the secure multi-party one later. So, you know, you guys released your first version, like version 1.0 recently and have a public test net using the trusted execution environment system. Could you tell us a little bit about what you have in this like version one and what it really does? It's actually pretty cool because version one, the way we built it, is that it's very easy for developers, for example, in Ethereum to get going.

Starting point is 00:16:55 The way that it's done today, and that's what I mean about, like, we're choosing our balance and we choose what to focus on. So the way that it works right now is that you write just a regular smart contract in Solidity and you deploy that to Ethereum. We have some API that, you know, under the hood registers that as an enigma, secret contract. That's what we call smart contracts that, you know, can deal with private data. And the way that it works is that you can designate specific computations, specific functions, as functions that you want to run in a privacy-presembing way. The way you do that is that users can basically encrypt all of their inputs, and there's some elliptic curve Diffy Helmand happening between a user

Starting point is 00:17:40 and one of the Enigma nodes that, you know, and one of the enclaves in the Enigma nodes, to basically generate an encryption key that the user can use. The user then encrypts inputs to a later computation and pushes that as a transaction to Ethereum to that smart contract state. Then users can actually call a function execution, of the smart contract to happen in Enigma. What then happens is that the bytecode is being loaded

Starting point is 00:18:12 into one of the Enigma nodes. We call these a worker node. This worker nodes pushes that bytecode inside the enclave and pushes the encrypted arguments from the Ethereum blockchain inside the enclave. And then the enclave and only the enclave can decrypt that information because only the enclave has the right information

Starting point is 00:18:33 to derive the key. the crypts the information, runs the computation in an EVM inside of the enclave, and then pushes out the output and commits that back to Ethereum. So you can really do interesting things like you can have a smart contract in Ethereum today, but if you have like, you know, very sensitive functionality is like voting, you know, you don't want the votes to be public or like auctions or like random number generation or many, many other ideas, you can basically. basically offload them to Enigma and then push the result back to your smart contract in

Starting point is 00:19:09 Ethereum. So that's what we have today. So I just want to look at this from an application developer's perspective. So let's say, for example, that I'm building some sort of a voting app, which is a great example because you want your votes to remain private. I build my DAP just as I would, you know, if I was building it in a non-privacy preserving way. Then where I got lost is how does the the Enigma blockchain or network retrieve data from the smart contract. Is the Enigma network parsing your Ethereum blockchain or is the smart contract sending data to Enigma? So all the Enigma workers, they basically have a connection to the Ethereum blockchain, right?

Starting point is 00:19:53 The way that this works is imagine you have a voting application, smart contract. You have to write a smart contract in a way that it can store a blob, like just an array of encrypted votes. What then happens is that every users that want to cast their vote can basically send a transaction like, you know, add vote or like store vote. And that vote itself is encrypted with the key that only the enclave in Enigma can know. There's some more details, but I'm, you know, I'm removing them for simplicity. All those votes, encrypted votes, are just stored in the smart contract state in Ethereum. Ethereum doesn't understand these encrypted votes. This is just like random junk to Ethereum.

Starting point is 00:20:39 But basically, the user can then submit a request to the Ethereum blockchain to run a computation that should run on Enigma. That actually spins out and speeds out an event, right? The node, all Enigma Worker nodes are basically listening to those events coming in. It picks up the computation. It takes the computation from Ethereum. On Ethereum, nothing really happens. It only happens on Enigma.

Starting point is 00:21:08 Enigma takes that computation, runs the logic of that function. What I didn't mention is that the user needs to point to the encrypted arguments that are already stored in the contract. The Enigma node knows how to parse it. So what it does is, it sends the encrypted arguments from the contract. the Ethereum smart contract inside the enclave, it sends, it pushes the bytecode from Ethereum for that function. Let's say the function is, you know, tally votes, pushes that into the enclave. The enclave then decrypts all the votes submitted from all users and just, you know, sums them up, sees like, who's been voted, and then pushes the decrypted result back to

Starting point is 00:21:49 Ethereum. So on the Ethereum side, it's kind of like, you know, you explained that they're writing all their stuff in solidity and their smart contract just has to know how to to accept this in encrypted data and like sort of pass the data out and bring it back in what are they writing on the sgx side so like what uh i know the sgx the bindings currently are mostly in c++ are the developers having to write like c++ code that runs in the sgx or what yeah what are they writing there so that's really the cool part they don't need to to do any of that they just write solidity code. They have like a library. It's kind of like based on Web 3, like an extension that allows you to like handle from the user side, from the DAP side, like all the encryption

Starting point is 00:22:37 and that kind of stuff and like sending transactions that pass encrypted data and all that. But other than that, the developer doesn't need to understand anything about SGX, anything about like C++ code or anything like that. So then the Enigma network just understands the the solidity smart contract and sort of interprets the code there in a way that it will know how to interpret this data

Starting point is 00:23:04 given the function that is meant to execute on the data? Exactly, yes. So essentially what you're doing is you're running an EVM inside of the enclave and so anyone can submit any traditional smart contract

Starting point is 00:23:20 there. One question I have then is, you know, we talk about like the data being able to be encrypted, like for example, you know, the votes are encrypted and we send it and they get decrypted in the enclave. Is it possible to decrypt the contract itself? Like, could I encrypt the EVM bytecode and have that be only decryptable in the SGX? So, you know, maybe there's a case where two companies want to have like a private contract together or something. Is that something that's possible? So it's actually, there's been an interesting discussion in our forum about that.

Starting point is 00:23:56 It's a very small technical change. We don't support it right now, but it's like something that can be done very, very quickly. I'm not sure we would want the first use cases to be that because if you do it that way, yes, it's technically possible. It's very easy to encrypt the bytecode and decrypt it inside the encraib. But at the same time, there's a lot of trust in the person that is basically producing that code,

Starting point is 00:24:21 which you cannot validate the code. not like test it for like bug, security bugs and stuff like that. So while it is possible, it might not be the first applications we want to encourage. Maybe we can talk a little bit about the security model of the SCX. You know, that's often one of the most concerning things for a lot of people. And quite honestly, for me, it's often a little bit concerning as well. So, you know, oftentimes people kind of like laugh about SGX where it's like, oh, why do you need decentralization when you can just, you know, trust Intel. What do you think about, like, you know, how does this SGX trusted hardware security model?

Starting point is 00:25:01 How does it, what do you think it's a relationship with like the decentralization is? I actually, I think this is a very valid point. And we are having a lot of conversation about it on where it makes sense, where it doesn't. And it is also still a work in progress. The way that I view this is that like the entire technology stack, SGX is another very strong part of the toolkit, of the tool chain. which we should not discount. I think there's a lot more work to be done there

Starting point is 00:25:29 and a lot more research. So when I started researching privacy preserving technology is three years ago there was very little in the way of SGX. Actually, one of my advisors is actually the first professor, the first person to write an extensive paper about SGX and about how it works, and it was around, I think, 2014, 2015.

Starting point is 00:25:52 But since then, we've seen a lot more research in that space, and I think that's where it is. It's a very much ongoing research on how to harden SGX, on how to improve SGX, and not just SGX, process execution environments as a whole. And over time, we need to figure out what makes sense and what doesn't. Maybe SGX is okay for integrity, but not for full privacy. Maybe some aspects are okay, but not all of it. So that's very much an open research question to still look at. I think most concerns that we've been getting around SGX is around the ability of like

Starting point is 00:26:30 side channel attacks and data leakage. I think there's less concern right now about integrity issues as far as I've been able together. So I think that's again something to figure out is SGX okay for correctness, like proving stuff, but less okay for privacy. Another aspect to keep in mind is availability. Right, SGX does not give you availability in any way. If you know it goes offline, you still need blockchain to penalize them.

Starting point is 00:27:00 So that's something that should be in any case. In the SGX security model, there's like, for me at least, what I see it as is I would prefer to use an SGX for when I'm running my own computation. So, you know, yes, I know there's like all like Intel stuff with like Spectre or Meltdown, you know, I don't know. know if we can trust the hardware or 100%, but let's assume for a moment I say I'm running an Intel CPU, so therefore, you know, I'll just check, I'll trust my, this, if they are saying that this is more secure, then, you know, I might as well trust that as well. But that's like, you know, let's say I'm running some very, some program that I need to be extremely secure on my own machine. The problem with what the SGX is trying to propose to the world is it says, oh, not only do you

Starting point is 00:27:47 have to trust your SGX in your personal machine, you have to trust the SGX in someone else's machine. And the problem to me really comes down to the fact is, how do you know what's an SGX or not? And currently, the way Intel does it is they have this like singular server, basically, that's like a remote attestation service, which is like Intel gets to say what is a valid SGX. And the problem is if that one service gets hacked, right, anyone can create a, add a fake public key to that list of valid SGXs and pretend to have an SGX when in reality they don't. And do you think this is like a legitimate threat to the entire security of this, like of the SGX side of Enigma where like it is essentially a singular central point of fail? I'm actually much as concerned about that because I know for a fact that Intel has in their roadmap, they actually want to move away from that.

Starting point is 00:28:52 So I think this is very much a temporary thing. I also think trusted execution environments are not limited to SGX, although it is the most widely spread solution right now. So again, I very much think that this is a work in progress to improve these kind of things. Also, there are ways to mitigate right now. There's an interesting thesis about how to reduce the technical trust in the Intel attestation service. I can't remember the name right now, but

Starting point is 00:29:18 there's a thesis about it and it proposes different models. We've actually employed one similar model where it's basically you you do have to still go through Intel for like the first attestation of an

Starting point is 00:29:34 enclave, but then after you do that you generate a key in a sudden enclave and that key is used to sign like messages and like you know operations which is, you know, some improvement. There's still a lot more to do, but I think we're on the right direction. I think it's going to evolve to a place where that's not that much of a big issue. So coming back to the Enigma Network, can you just sort of describe the network topology here?

Starting point is 00:29:59 The way I understand SGX is that you don't necessarily need an enormous amount of nodes. You just need sort of one trusted execution environment that can do the computations. How many nodes exist in the network? How do they interact together? Are there multi-party computations over SGX? Is this sort of thing also possible? It is, but let me, that's like jumping a lot ahead. So the way that it works is, there's the blockchain,

Starting point is 00:30:27 and you can think of the blockchain, is having its own network and its own nodes. These nodes can be the same as Enigma Worker nodes, but for simplicity, let's separate them. So in Enigma, there is a peer-to-peer network of what we call worker nodes. These nodes right now register on chain, right now on the Ethereum blockchain. In the future, it's going to be in our own chain.

Starting point is 00:30:50 And basically register and prove that they have a trusted execution environment enabled, and they put stake so that they can be penalized if they misbehave. And then they can join the network for computation. And the way that it works is that basically for every computation right now, we sample one worker or several workers to do to execute that computation. That's kind of the way in a nutshell, there's a lot of optimizations we're working on. But in a nutshell, that's kind of the way that it's being done right now. A lot of these like concerns around the security model that I was discussing around the SGX,

Starting point is 00:31:31 that's sort of why it is the first version. But really your long-term goal is to delve into the whole secure. multi-party computations, right? Maybe it's a good time to jump into that. Could you explain to us, because I don't think many of our viewers or listeners are very familiar, what is secure multi-party computation? Sure. So before I touch it, I do want to mention that I think what we need Enigma care about the most is to solve the problem of privacy preserving computations at scale, right? And there are different techniques to do. it and we're taking a pragmatic approach. So yes, trust of the execution environments are one

Starting point is 00:32:15 direction to do that, which works right now and allows people to develop right now. And it is going to be a constant research and constant track to improve that. At the same time, there are other technologies right now. My personal favorite and what I think is the most promising, and also because I researched that for several years, is secure multi-party computation. So secure multi-party computation actually says this, and it's a good way to move from trusted execution environment to that. It says the following. It says, in an ideal world, we would have, like, one supercomputer that we can fully trust.

Starting point is 00:32:55 That supercomputer will always, we can outsource computations too, right? And that supercomputer will never leak the data to anyone. It can never be hacked. and it will always compute the correct result, and we can trust it to do that. That is an ideal world, and trusted execution environments try to emulate that in the best way possible with secure hardware.

Starting point is 00:33:19 But in practice, we may not want to trust these because there are weaknesses, they're always, and it's a constant improvement, but can we come up with a better scheme that is based on just cryptographic assumptions? Multi-party computation says the following. Let's say you don't have one computer, but instead you have several untrusted computers. It's very similar to the consensus in blockchain world.

Starting point is 00:33:45 You can't trust one computer, but you have several untrusted computers. Let's make it simple. If you can say that only that at most half of those computers would be bad, would be like malicious, would be adversarial, then you're fine. You can consider that network, even though you cannot trust them individually, as a whole, that network would be able to achieve those two properties. It could run any computation with proof privacy, which means that data would never leak not even to the nodes running the computation and never to the outside, and that all computations would complete successfully. So if you get a result, you know it's the right result. That's basically the theory of MPC.

Starting point is 00:34:29 Could you maybe like touch into a little bit about like the technical of how it works like, you know, from my understanding, there's usually some sort of like, you know, secret sharing that you do and you split the data over multiple chart, like you shard the data and you perform computations that are perfectly like, you're probably the best person to describe it. You give us a, the viewer's a brief summary. Sure. So there are two main techniques to do MPC. One of them is Yao Garboid circuits. It's a bit different. Not going to touch it. The other is using secret sharing. Secret sharing is basically this idea that let's say you have a piece of data. You can split it into N pieces such that any T, so basically any subset of some size, and that's a parameter that you can define. Any, any, any, That subset can basically be used to reconstruct the data, but if you have less than that subset, then no one can reconstruct the data. It's completely random. It's completely useless. So that's the main building block of MPC. And now let's think of an example like that.

Starting point is 00:35:45 So let's use the example of a salary, okay? That's usually the example that's given. So let's see you have Alice and Bob. Alice has her salary, Bob has his salary. They want to compute the average, but they don't want anyone to know, they don't want each other to know how much each of them is making, and they don't want anyone else to know that either. So what do they do?

Starting point is 00:36:09 They do an MPC protocol. And let's say there's now a network of 10 nodes, okay? They can actually do it between themselves, but let's take the example that there's a network of 10 nodes that they outsos this two. What Alis does, Alice basically secret shares her salary, between. all of those 10 nodes. That means that every one of those 10 nodes gets one share of the data.

Starting point is 00:36:32 But again, that share cannot be, is completely uses on its own. And let's say that the way this has been designed is that unless you have all the shares, you cannot see the data. Then Bob does the same thing, secret shares his salary to all the nodes. And now each node basically has a share of Alice's age

Starting point is 00:36:52 and a share of Bob's age. No note can actually see these salaries, but it turns out that they can go through a multi-party computation protocol to compute the average salary. Because this is a very simple use case, really all that they need to do is that each node needs to take both salaries, add them together, their own share of the salaries, add them together, divided by two, and basically run the same computation as if they had the old data.

Starting point is 00:37:20 And then what they get is each node gets the share of the share. of the result. Now, if you want to see the crypt or reconstruct the result, what they have to do, they have to send both, let's say, Alice and Bob all of the 10 shares of that computation that they did. And now Alice and Bob, having all of the shares, can decrypt that information can reconstruct the result

Starting point is 00:37:47 and see what the average salary is. And what do you get? No one learned each other's salary. The network itself, it's even more interesting. The network that you outsource it did not learn anything. It ran a computation, but it did not learn anything about it. The only way to do it is if all of them would be malicious and collude, and that is why you need to put some constraint on the network.

Starting point is 00:38:08 In this case, the constraint was that at least one node is honest. In the average scenario, you talk about, you know, you add the two together, then you divide by two, and that's the result. The classic example that's always used in the, you know, MPC world, is the whole classical millionaires problem, right? Very similarly designed problem, except that it's like Alison Bob or just trying to figure out who has more money, right? The question I have is, so, you know, what I would do is, okay, secret share both of them,

Starting point is 00:38:41 subtract one from the other, and so if it's positive, one has more, if it's negative, the other has more. But how do I return this result to Alex and Bob without telling them the difference? I just want to tell them whether it's positive or negative. is this something where now like oh now I have to is there some other thing I have to use like maybe a snark or something to prove maybe like the MPC computers they don't want to give the end result they want to show just that I have a snark to show you that the end result was negative or how does this work exactly so it's you don't really need snark or anything for that let's take the simple so the place where you need like

Starting point is 00:39:20 zero knowledge proofs is if and there are different ways to do it is if you want the notes to make sure that they render computation correctly, right? Because if I'm a node, I have a share of the data, I need to, how do I know that one of the nodes didn't, like, switch its own share of the data with some random garbage, like with all zeros, right? That's where, like, proofs come into play. But let's take the simple case, we call it a semi-honest case, where we assume the nodes, we don't want them to learn more information than they have. which is their own share, but they won't go and do like those kind of Byzantine behavior type of things.

Starting point is 00:40:01 So in that case, the computation is just more complicated. So there's a very known theorem, and it's very well known by now, that any computation, any function, any computation that you have, you can basically translate that into a circuit of additions and multiplications, and you can basically compute any functionality. So in this, case, and you can compute that without revealing any data that you don't want to. So in this case, what you would do, yes, you would subtract one from the other. The protocol is the high level goes like that. You subtract one from the other.

Starting point is 00:40:35 Then you need to make sure, is the result greater than zero or not? That's like the next step that the nodes do together, which is more complicated. And then they get like one bit as a result. It is either one or zero, two or false. So they don't have to leak more information. that result is also encrypted in the sense that each node has a share of that bit. It never leaked, nothing more than that leaked to any of the nodes. That's very important.

Starting point is 00:41:02 It gets a bit more complicated because the way that we work for security reasons is in what's known as finite fields. So we don't really work under the integers. So the whole thing of like, you know, negative, not negative, it doesn't, there's no order in finite fields. final fields. So there's another check that you need to see if like the like the result wrapped around the field or not which again gets a bit more complicated but the point is you can do it and the notes can do it and they can get to the point where the end result is just one secret shared beat was was a was you know was Alice's is Alice Richard and Bob okay and then they can return that. So no snarks, no anything, just pure old MPC. I see.

Starting point is 00:41:51 Does this process of running an MPC computation, right? Is it sort of, you know, I take the data, they get split, everyone runs this thing in parallel, and then there's one last final merge step, or is there like a lot of communication needed between the computing nodes? So there's a lot of communication needed. That's where basically MPC becomes less,

Starting point is 00:42:16 efficient of course. That is the main trade-off of using MPC compared to something like transit execution environment. Sure, we've been making a lot of improvements. Like, the academic literature in the last decade has made a lot of improvements. I and my thesis have made also some improvements

Starting point is 00:42:32 and that's a continued process and at Enigma we're focusing a lot on how to improve it further. But that is the main cost. So in MPC, the way that it works is that linear operations, which are basically addition, multiplying by Scalar, that is public, that kind of stuff, can be done locally.

Starting point is 00:42:55 It can be done without communication, and it's very cheap. It's basically free. We call it free. But whenever you want to do a multiplication, by the way that the protocol works, all nodes in the MPC protocol have to communicate with each other and kind of sync with each other to be able to reach the end result. And every time you have a multiplication, you got to do that. So that is basically where MPC becomes less efficient, and there are a lot of ways to improve that, but it is an effect of the protocol.

Starting point is 00:43:27 So, you know, you secret share this thing. Everyone has like a share of a polynomial with a certain degree. The issue is when you multiply two share, multiply two polynomials of the same degree, now you have a higher degree and now you have to like somehow figure out to communicate to each other like, okay, how to we reduce the degree of this polynomial again? Is that essentially what the reason for this communication is?

Starting point is 00:43:52 Yes, that is very, very much correct. There's actually a more efficient approach that's becoming heavily used called speeds, which we also employ. That approach basically uses another mechanism instead of the degree reduction. So what it basically does is you pre-process. a lot of what's known as multiplication triplets. So you create random values of, you know, C equals B times A.

Starting point is 00:44:24 These A and B are completely random, and the C resulting is random as well. And you can batch these like offline very efficiently because it doesn't have to, because it's not related to any data in the computation. And then anytime you do a multiplication, you consume one such, what we call a triplet, random triplet and then you don't have to do a degree reduction you can actually have a much more

Starting point is 00:44:47 efficient protocol and that's what we use but yes what you're saying is correct that is the original reason why you need to do that so to get like an intuitive understanding what is the communication complexity like are we talking about like n squared with a very high constant are we talking about n cubed or how does the um you know as i want more nodes participating in the multi-party computation how does my overhead increase? So usually we like to talk about how many, in like one round, how many values as a,

Starting point is 00:45:25 you know, as a factor of the nodes in the network, how many values are being communicated and worked on. So in the original protocol, which is called BGW, which is the degree reduction one, what you need to do is that every node basically needs to take its share

Starting point is 00:45:41 and use that, reshare that, right? And then send every other node in the network, one unique value. So you get n squared because each node needs to send n unique values, so that's n times n squared. In the triplets protocol, what's interesting is that you only need to have n values. The reason is that every node just as to broadcast

Starting point is 00:46:08 another one value for all the nodes, all the peers in the protocol, so you only get any unique values. So that's an amazing improvement. Obviously, there's a question here, is broadcast free or not, right? If you assume the broadcast is free, then it's just one value and it's completely linear. So it depends, really depends how you count it. But all of these are dependent on N, right, of the size. So in my thesis and in the original Enigma White Paper,

Starting point is 00:46:37 and we've seen more and more protocols actually take this approach today, but back in 2015 this was super unique, and it is still unique in the context of MPC. We said, let the network grow. What we need, we need a first sampler, right, that can basically sample a group of MPC nodes for each computation, and then that group is constant size. The constant there is still high, because for security,

Starting point is 00:47:01 that group might be, you know, you might want a big enough group, but at least the scalability does not depend on how many nodes are in the network. All of this that we've been talking about now is in this security model that you mentioned briefly, the semi-honest model, where I'm kind of assuming that at least some percentage of the nodes, whatever my K-percentages, I guess, whatever I'm getting the data from, are non-B Byzantine. They're running the computation correctly. One of the questions that I have, especially, you know, it kind of, particularly due to my work at Cosmos and something that we're very particularly interested in is, is there a way to do MPC computation with attributability? So I can say that like, oh, if there's a Byzantine node, they're trying to give me part of my computation, can we realize that they're like griefing by like, you know, changing everything to a zero?

Starting point is 00:48:03 or and is there can we actually figure out which of the nodes was the one who tried to grieve everyone else? Yes, you definitely you definitely can and actually it's possible to have all nodes attributable. There are different techniques to do it. And I believe that in the systems that we're building, we really need that. So like one third or half of my thesis actually focused on, you know, kind of formally proving that if you you really want to create a system with like proper liveness and that, you know, continues to make progress, you've got to be able to identify each and every node that, you know, misbehave because you want to be able to penalize them, because if that's not the case,

Starting point is 00:48:50 you get freeloaders. And that's definitely not, not something that we can live with. So the answer is yes, and there are, but there are caveats, right? And this is where we're putting a lot of attention. The coverts are usually around efficiency. So there are two main systems, right, that I can propose at least for the sake of this conversation. There's what is known as detectable MPC. Detectable MPC basically says that even if one node misbehaves, you can detect that. You won't know which node, but you can detect that.

Starting point is 00:49:28 This is great because at least, you know, we will never get. the wrong results, which is very important. So no one can really behave in a Byzantine way, but it's hard to create incentives around it because it's not an attributable fault. It's not the next level of what's called identifiable NPC, which is like attributable NPC. It's the ability to uniquely say who cheated and when.

Starting point is 00:49:52 So it turns out that we can do detectable MPC very efficiently. The nodes basically share some Mac key. And they kind of, as they go through the computation, they also on the side compute over a computation that is multiplied by that Mac key. And at the end of the computation, you basically try to see if the result that you got in the real computation, multiplied by that shared key, evaluates to the Mac. And, I mean, there's really no way to do. The probability of that happening randomly is like, you know, inphytest. small. So that is a great way to detect, but you won't know who cheated. The other way is where other mechanisms like zero knowledge proofs, and this is why I always say zero knowledge

Starting point is 00:50:42 proofs is complementary. It's not like, you know, it does a different thing. Zero knowledge proofs, and especially sacaned zero knowledge proofs, could be very efficient in the sense that let's say we ran a multi-party computation protocol. We detected that there's been a fault, but we don't know how to attribute that to one fault. Okay. Now we can actually as the nodes, okay, produce a proof that you run the computation correctly and in zero knowledge and probably in a second way proof to a blockchain that you did it correctly. If you didn't, the blockchain can then go verify these efficiently and penalize you.

Starting point is 00:51:20 The problem why you probably don't want to do it every time is because zero knowledge process is super expensive. So I am a big believer in taking what's known as an opportunity. optimistic execution route. So do something fast, just make sure that you can detect if something bad, really bad happen. And if that has happened, you can file a dispute. And maybe you can do something like TrueBeat. Maybe you can do something like zero knowledge proof.

Starting point is 00:51:46 That's where I believe this can come in. But you can 100% do these. That's great. That's good news to hear. So there's also one last question on the MPC side of things. There's this concept called, homomorphic encryption. And I've always like, you know, I don't really know too much about it. I've always just heard jokes about how it's like a unicorn of computer science. Can you just

Starting point is 00:52:09 quickly explain very briefly on what this fully homomorphic encryption is? Is it at all related to secure multi-party computation? Is it a subset or is it just something completely different? So fully homomorphic encryption is another technique to basically compute over encrypted data. It is actually the cryptographic equivalent of trusted execution environments. You don't. So in MPC, you do that by not trusting one node by taking a lot of untrusted nodes

Starting point is 00:52:40 and as a whole you can trust them by, you know, secret chairman and all that. With fully homoffic encryption, it's basically a type of encryption that you can still run computations over data while it's encrypted. So this would be perfect, right? like I'm a user, I can encrypt my data, I can outsource it to one computer.

Starting point is 00:53:01 We don't even need this. Well, we need decentralization for availability, but we don't need decentralization for the data privacy. Just send it to one computer that one computer would compute over the encrypted data would really not know what it is computing about. So you can see how this is very similar to trusted execution environments, but the guarantee is fully cryptographic. It's not by a secure hardware.

Starting point is 00:53:26 This is what fully home-offeric encryption is. It allows you to fully compute in one node over encrypted data and then return an encrypted result. The problem is that it's very inefficient. It's very inefficient unless you're doing maybe like a lot of additions. But again, when multiplications come into play, it turns out that this becomes like really inefficient. So for general purpose computation, it's just really theoretical.

Starting point is 00:53:53 However, there has been places that, you know, somewhat homomorphic encryptions or like small, small circuits of homeorphic encryptions are being used to day and is useful. And one example is in some MPC protocols. So, for example, when I gave you the idea of how do you generate these random triplets for MPC, these C equals A times B, it turns out that this is just one multiplication and a lot of ideas. so you can actually do that with fullyomorphic encryption. So long story short, it's a different idea. It's inefficient at scale and general purpose, but we're using that in small things. It's actually even using ZK Snarks in some small extent.

Starting point is 00:54:42 A bit earlier we talked about the incentive models, and I'd like to come back to this, specifically talk about the token economics in Enigma. So there is a token, and for the moment, It is an ERC20 token living on Ethereum. You mentioned earlier that the token would at some point live on the Enigma blockchain. So one can assume that an enigma network, a proprietary enigma blockchain will exist in the future. Talk about the role of the token here.

Starting point is 00:55:15 We've already talked about disincentivising bad behavior, but broadly what is the role of the token? The role of the token is really to maintain the security of the network. This is not unique to Enigma. It's very much like most blockchains. It's the idea that tokens incentivize nodes for computations. So if I'm a user, you know, I'm executing a function in a secret contract, then I have to obviously send some payments, some fee, for whatever workers are running that computation.

Starting point is 00:55:50 And the workers are getting incentivized, you know, with Enigma tokens. So very similar to other blockchains, but at the same time, if nodes misbehave in the computation, then we need the ability to penalize them, which is why nodes need to also deposit the stake. Okay, and so can you give us an idea of, like,

Starting point is 00:56:10 how that stake is calculated? Is it based on the complexity of the calculation, or is, like, how does that work? So we're still determining the numbers, but the stake is most likely going to be bigger than, or even much bigger, than whatever, what the biggest computation could realistically be like.

Starting point is 00:56:31 So the way we do this is we define network iterations in like pretty large epochs. We don't have to get into that. So we can imagine how many computations or how many steps can, you know, live inside one epoch, and the stake has got to be larger than that. And what about for users?

Starting point is 00:56:54 Do we have an idea of how the computation price would be calculated and how much can, you know, if we come back to this voting mechanism example and sort of a somewhat simple, simple function like tallying up votes, what is, you know, an accurate representation of what that would cost a user using the data? So right now it's calculated based on, just the same gas model as Ethereum. We are working on adjusting that. One thing we didn't discuss is that we're basically rewriting all of the backend to be based on WebAssembly. We feel that's where the industry is going.

Starting point is 00:57:39 And so the cost model is going to have to change based on that. So we're working on it right now. It's also important for us because we're basically going to have two interpreters. We're going to have an interpreter for WASM inside the trusted execution environment, which is just a normal one, but we're going to have an interpreter that simulates multi-party computation as a WASM interpreter. So a WASAM interpreter that does multi-party computation, and that has a different calculation. The reason in that is that, you know, let's say we said comparing two numbers, right?

Starting point is 00:58:17 comparing two numbers in the trusted execution environment, in Ethereum, it's one-up code, one operation. In NPC is going to be multiple operations and some network communication. So we got to factor that in. Still work in progress, but that's the general idea. Is there an incentive model to incentivize nodes to stay online, basically an availability incentive? there's a penalty for that so the way that it works is that we sample different nodes right for different computations per epoch basically there's like one epoch and epoch could let's say last a day in that epoch a group of nodes can are going to be in charge of computations for this smart contract for this secret contract and we kind of shuffle them around now we So inside that epoch, we have smaller time windows that are known as timeouts. If a node does not respond to a computation request within that timeout, then it can be penalized.

Starting point is 00:59:27 Okay, interesting. So before I wrap up here, can you just give us a brief overview of what are some significant milestones that you've reached recently? Where is the project currently at? And what are some of the plans for the roadmap in the coming months? we released a testnet the first one a few weeks ago so that's actually very exciting we have a lot of projects and partners that we're working with and that are trying the code and continuously giving us feedback what we're seeing from that feedback is that a lot of our a lot of the assumptions that we made

Starting point is 01:00:03 were like in terms of features are wrong and should need to evolve and that is great that is exactly what we wanted. So there are a lot of new features that we're working on right now that we didn't originally expect it to do, but they're like crucial for developers. For example, having an encrypted state. So the way that the network works are right now is that users can encrypt an argument, right? Then you can call a stateless, essentially a stateless function in Enigma, and the result of that set set set foot is then being committed to Ethereum. But people want to be able to actually carry state, encrypted state over and over, that comes up in many applications, you know, for example, a machine learning model, right?

Starting point is 01:00:49 Like, you want to be able to get some small chunk of data in an encrypted form, take a currently encrypted model, and train that together, and then you want to maintain the new encrypted model as it is at time 10. Then at time 11, you want to evolve it a bit and save it to time 11, and you want to continuously keep that encrypted. A lot of use cases require that kind of encrypted state. Another thing that we're seeing is that, you know, EVM is somewhat weak to a lot of the use cases that we want.

Starting point is 01:01:29 So we are working very hard on moving to WebAssembly, and many, many other aspects like that. We have a full list. So that's one thing. The other thing that we're basically, the other two things that we're focusing on in 2018 are moving the state from Ethereum to moving the state to our own network and basically encoding the state in a way that fits Enigma, feeds this idea of encrypted computations and makes it very efficient to that.

Starting point is 01:01:57 That turns on to be a very non-trivial and difficult problem because no other blockchain project has had to keep the idea of encrypted in state and encrypting arguments in mind. And the third thing is actually continuing work on the MPC interpreter and MPC functionality. The goal is in 2019 to enable what we call secret contracts 2.0, which is the ability to write privacy preserving smart contracts with general purpose multipartic computation. This move from the Ethereum blockchain to your own network, is this something that would also allow, users to use Enigma with other smart contract platform, Ethereum Classic, Cardano, or whichever ones that are to emerge in the future.

Starting point is 01:02:47 So that's what we want. The way we see it, we want Enigma to be a fully-contained solution, and that is something that we evolved into. We realized that the only way to create, really make the right design choices, is if we create a full solution and create our own blockchain. but at the same time we want other blockchain, we want to create bridges to other blockchains as well. That's something that we see is really important, but the way we think we should be doing that is basically through partnerships.

Starting point is 01:03:17 I don't think we would have the core capacity to do that. The way is to talk to other projects, we have a partnership with Iron, for example. We've created a bridge to Ethereum because it made sense because a lot of people develop on Ethereum. But going forward, we want that to happen. But that would require collaboration with other projects. Great.

Starting point is 01:03:38 Well, Guy, thanks so much for coming on. It was great to talk to you and learn more about Enigma and how you guys are leveraging trusted execution environments and doing multi-party computations. These are topics that we've covered quite a few times on the show. I mean, we've had R.E. Jules on a long, long time ago to talk about the town crier model that uses TEEs. And then we also had people from Microsoft talk about Coco.

Starting point is 01:04:03 And we've had true. bit. So like all these different technologies and everything that we've been able to learn so far are sort of coalescing in this in this one project. So it's very, very encouraging to see that. Thank you very much. It was great to be on the show. And thank you to our listeners for once again tuning in. You can find new episodes of Epicenter every week. We release to YouTube, but also to iTunes, SoundCloud, and whatever podcast platform you happen to listen to your podcast. on. You can also reach us on Twitter at Epicenter PTC. And if you're looking to, you know, support the show and encourage us in what we're doing, you can always leave us an iTunes review.

Starting point is 01:04:48 It helps people find the show and we're always happy to see your reviews. So thank you very much and we look for it to be back next week.

Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Guy Zyskind: Enigma – Providing Scalable Privacy-Preservation to Smart Contacts

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.