Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Misha Komarov: =nil; Foundation – The Marketplace for ZK Proof Generation

Starting point is 00:00:00 This is Epicenter, Episode 514 with guest Mikhail Komarov. Welcome to Epicenter, the show which talks about the technologies, projects, and people driving decentralization and the blockchain revolution. I'm Brian Crane, and I'm here today speaking with Misha Komorov. He's the co-founder of NIL Foundation. NIL Foundation is basically working on a marketplace for zero-knowledge-proof. So we're going to dive into what Nile Foundation is, zero-knowledge-proof, zero-knowledge-proof market. I think this is one of the areas where there's been a lot of interest, a lot of buzz about it.

Starting point is 00:00:48 There's a lot of investment in this area. And yeah, so really excited. So thanks so much for joining us, Misha. Yeah. Thanks for inviting me. So, you know, I just mentioned, right, like, okay, a lot of like ZK interest happening now. Now, an interesting thing about NIL Foundation is that you guys actually started in 2018. and you know it says on the website that this initial focus was on

Starting point is 00:01:16 best sort of best practices for database management systems for crypto and so I'm just curious if you can talk a little bit about how did this get started and what was the original vision for NIL Foundation and yeah what is this database management system? Why is that important? Okay, so let's go into that. Basically, the reason why Nell Foundation was started is because prior to that, it's like me and, well, it's like, we were together doing a fork of stimid, basically like European dedicated fork of stimid. And I was kind of a fellow who was responsible for all the technical part in there. And it's like, from my perspective, I was dealing with all the technical issues, with all the data management issues. And I literally, it's like, I was literally like in pain. by the absence of proper data management tools back then. And it's still kind of absent, to be honest.

Starting point is 00:02:14 Like, people still struggle with, like, access to the theory and yad-y-y-y-y-y-a. So I was struggling with the absence of that. And considering that all the Stimid stuff and all the Kola stuff was actually kind of a social network and something-something, something, you obviously were required to have, you know, proper data management. I mean, like proper one, just like they do in traditional web industry. So we had no such a thing back then. and we still have no such a thing.

Starting point is 00:02:40 So in April of 2018, I was like, hey, cost, let's go do a DBMS, right? I mean, I don't want anybody to struggle that. I don't want to struggle myself. I don't want anybody to have that issues. So that's what it was. Right. So this would be like, okay, you want to have a database where like you store in there.

Starting point is 00:02:55 I don't know, these are all the users, these are all the posts. And then like use that information to serve a web application. Yeah. Yeah. Since it was required for such a database to work in untrusted environments, like to be basically BFT compliant, right? So it's like database for crypto, right? So I had to, so, so we had to think about how to mix this two industries together

Starting point is 00:03:15 to make it, to make it basically work, to make it to make it suitable for hosting BFT privateals for hosting BFT applications. So yeah, that's what it was. That was the idea to merge the industries together. Yeah. And then because the idea with this databases were basically, I'm curious if you can, Expanded a little bit more on this aspect of have the database be trustless. Did you kind of imagine that the user would have some way of verifying that, you know,

Starting point is 00:03:53 the database functioned in a particular way and, you know, sort of serve the results in the right way? That's one of the critical components because if you weren't, if you want, if you want, if you weren't like the DBMS or like the database to work in untrusted environments, you've got to be able to verify what's going on in. I mean, you can't just like go and access, for example, somebody's data like Ethereum's data or like some roll-ups data or like some other product or data, something, something, whatever. You can't just come in and, you know,

Starting point is 00:04:20 and trust what has been given to you. Because, I mean, this database could easily just, you know, screwed yet. And this can result like in something very nasty. So basically, it is required and it was required to make the interaction with this thing as a trustless as possible because, I mean, like one, one more trust point. Come on. We don't want to be that trust point, right? So we do not want to have that responsibility on our hands. So basically, to make it as trustless as possible, it was required for people which operate over some data inside this database, so through this database,

Starting point is 00:04:53 like over the VMs data, for example, right? It was required to make it to make them capable of, like, verifying whatever they have done. And for the sake of this, it was required to have it's like a like provable execution environment, right? And so that's basically the desire. And how to do that's like a provable execution environment. It's like you also, it's like beside how to do this. I mean, you've got to prove somehow what was executed. And the most, it's like the most convenient, the most suitable thing which we had back then was some like modification of growth and, you know, some radical restraint system based proof systems. So that was the best fit back then, and we were like, it's not sufficient, it's not good, and we would need much more

Starting point is 00:05:38 than that. So we started working on the cryptography suit, we've embedded proof systems in there, and once we realized that, okay, it's like, it's like the industry got to that point, when there is, when there is enough of, like, tech and theoretical research available to make such executable environment, we were like, okay, well, it's like, Blam-Kish-proof systems were introduced, We got the limit in the computer pursuit of ours. Another question arrived. It's like the second question which arrived is that besides just proving the execution of whatever was done with the data inside the database, you've got to prove that the data, which was

Starting point is 00:06:15 taken as an input to this database, was basically taken from a right place. Because otherwise, I mean, how can you be sure that the data that you're operating over wasn't, you know, just made up out of nowhere? That it's actually, for example, Ethereum's data, right? So for the sake of this, we needed state and consensus proofs. And that's how we got together basically with the Zero Foundation, with Mita Foundation, with Salado Foundation guys, because that was like our desire to do state groups and consensus groups.

Starting point is 00:06:43 And they were like the only ones which had any idea about this back then. And once we got this, once we got together in 2021, this collaboration of ours evolved into the birth of ZK bridges. I mean, like so many projects are building ZK bridges now, right? So, but like, back then, it was, back then it was like, hey, guys, we need state truths. You know how to do them. We want to learn. Let's do something together.

Starting point is 00:07:06 So that was basically like, it were just some ZK region in general. So that's like what it was. So and when we, it's like, it's like in the process of doing all that, we were like, yeah, well, it's quite a lot of circuits. I mean, like very, it's like too much circuits, right? And they're very complicated. And we don't want to do that like manually. anymore because we've spent like a couple of years before that already, like, crafting this

Starting point is 00:07:33 circuits. And we were like, nah, we're not going to do that. I mean, probably somebody else has this kind of problems. So we need a compiler for that. Let's just do a compiler for that. So we took LLVM, I mean, like, just a compiler, like, which everybody uses it, like, you know, very solid. And we just took it. We made it provable. So that's how ZKLVM was born. I mean, because we were like sick and tired of building surface, like manually. And apparently, the rest of the market was also sick and tired of doing that. And the proof market is basically, it was born out of our realization that all of those state proofs, consensus proofs and state transition proofs we worked on for the sake of making this as transparent and as trustless as possible,

Starting point is 00:08:13 were really heavy. And we were not willing to make like, you know, anyone to generate it themselves. And we will. It was like, we didn't want to generate it ourselves. So we were like, okay, we'll just make a marketplace since, since, Yeah. So, and basically, and basically we just slapped the marketplace on top of the same database we're doing. And we were like, okay, well, we were building a database. We tried to make it as trustless as, like, you know, as transparent as possible.

Starting point is 00:08:41 So we'll just slap the proof bucket on top of it. So this could also become like, you know, decentralized and like distributed, whatever, whatever, whatever, like for pro generation. That's what it was. Cool, cool. No, I think that was very helpful. Actually, I think what would be great to talk a little bit about. and, you know, I mean, I think for some many people, this will be familiar and for other people, it's still kind of like maybe a little new.

Starting point is 00:09:04 But it's like, there's not as they're like use cases for ZK tech, right? Because what you mentioned, right, basically, okay, you want to have like this provable, you know, I have some code, right? And that's like provable. So of course, like sort of an obvious thing would be, well, you just use a blockchain for it, right? put it on the blockchain. It's not that simple. Right. I mean, I guess a bunch of the big downsize of that would be, well, now you have consensus.

Starting point is 00:09:37 So, like, your database is going to be maybe limited by the speed of consensus, right? And then a scalability and cost would be much worse. Exactly. And then maybe some other things might be, yeah, let's say if you want to have, you know, private data, some off-chain thing being done. Let me expand on that. It's like you're absolutely right when you're absolutely right when you're talking that, okay, of course you can prove your competition, just putting it inside some protocol and, you know, just computing that.

Starting point is 00:10:06 But there is a nuance that you can only compute inside such a protocol. With such a protocol, you can only compute something which doesn't have a lot of like communication complexity. It's like when you come to communication complexity and when you need to compute something really big, or something really complex, right? It's like the problem, the problem with all that, with all that protocol-based computation is that you get basically limited by the communication complexity

Starting point is 00:10:35 that you gotta basically split the piece of computation you want to do to level parts, deploy them separately, then to make them communicating between each other. And this introduces a lot of overhead, so you get kind of limited. And that's not really, that's suitable

Starting point is 00:10:52 when you have, when this, when this computation is decomposable, right? There's, it's like, there is, like, a very traditional trick in the DBMS world when you need to compute something over data. It's like, currently, it's like what people do currently, right? Like, for example, in Ethereum, it's like, people just do, people just do basically, like, synchronized computation, like, over some transactions, over, like, some, I know, replication packets, and there's, like, across all of those notes, right?

Starting point is 00:11:18 So the traditional trick in DGMS industry to overcome this is to introduce, like, dynamic sharding, but that's still not. enough. I mean, it's like we realize that it's still, if not enough, in case you cannot decompose the computation you want into little pieces, which would not introduce communication complexity hard enough, like the large enough or like, you know, big enough, you know, to kill the whole efficiency, to kill efficiency at all. So when you have a big chunk of computation, which cannot be decomposed, you got to, and it's easier, it's cheaper, it's faster to compute it somewhere, and then just to put the result of this competition

Starting point is 00:11:54 to some protocol where you can operate with this results in like, you know, decomposed meta and basically like, you know, with small chunks. So that's what basically, that's what basically it is about. So we tried to cover, so we basically tried to cover, we knew that it would be required, like not only us, the whole industry does this, right? It's like we knew that it would be required

Starting point is 00:12:16 to be able to cover decomposable computations which do not introduce a lot of computations complexity. It's like, you know, for this, basically, and that's why we've used kind of charted to the DBMS because it's traditional. We just brought it from here. It's like, okay. And to cover the piece of computation, which is not decomposable, in which is not better to be decomposed, which is just simpler to be computed somewhere, and then to be proven, and then to be like used on some, I don't know, Ethereum or like whatever. Anyway, that's what for the combination of a proof market plus compiler exists.

Starting point is 00:12:52 It's basically like a marketplace for provable computations which cannot be decomposed. That's what it is. So that's what it makes sense. And that's what it makes sense to use ZK Proof systems for this kind of computations. You mentioned like circuits and you mentioned, you know,

Starting point is 00:13:08 CKLVM, right? So now, guys, please correct me if I understanding here is correct, right? Because let's say you have like some computation and you say, oh, you want to prove this computation. And then circuit basically means, okay, you develop a bunch of like equations, no, to then check this particular computation. So somebody else can then check that. And then if something like the ZK LLVM, you then just do that for, you know, any code that's written using that VM. and then you can just verify anything written on this VM.

Starting point is 00:13:49 Is that kind of? It's not really because, again, it's like, Mill is again being weird, right, in here. So basically what's going on in the industry currently? It's like, obviously, yeah, there are people which do custom circuits, which do custom circuits with libraries for each particular application. Like, I don't know, scroll did that, right? For example, they did very custom, pretty good circuit.

Starting point is 00:14:12 I mean, really good circuit. For ZKVM, they've built over a couple of years. And that's pretty good. They used Halo 2. It works. It's fine. And all right. That's how it was done before.

Starting point is 00:14:25 Then people have started realizing that you can actually not to craft custom circuit for each particular piece of computation, but to do just one circuit of some virtual machine, and then to put the bytecode of the computation you want to prove as an input to the circuit and then to prove this computation via this like, you know, novosely big-ass VM circuit, all right? The problem which this thing, which is like the problem which these two approaches introduce is that for, for example, custom circuits, it's very expensive and very, very

Starting point is 00:14:58 troublesome to write them, right, to implement them. And to implement them properly is even harder. I mean, like, even without complaints about that. And that's, you know, that's the valid, that's a valid concern. It's like in case, for example, some roll-up or something, it turns out that custom circuit they rolled is under-constrained or it's just, you know, it just wasn't audited properly or something, something somebody forgot something. Like, just please don't forget anything, guys. If somebody forgot something in some circuit, turned out to be under constraint, somebody will be able to prove to Ethereum what actually didn't happen on Roll-up. And then we will all get fucked. So that would be really bad if this happens. And that's like the problem with custom circuits. You can't know what's going on inside.

Starting point is 00:15:49 It's really complicated to craft them. It's really complicated to know what's going on. It's really hard to make them secure. The nuance with VM circuits with just one VM circuit is that you introduce an enormous overhead. So basically the overhead with, for example, like custom circuit, it's like NVM circuit and VM-based computation, ZTVM-based computation, is usually like at least 10 times, at least 10 times in terms of cost and time.

Starting point is 00:16:17 So this is very, again, like introduces like a lot of overhead. What is at least 10 times? It's like it's at least 10 times. It's like the complexity in terms of like approximation in terms of a circuit size, in terms of the amount of computation you need to prove. Like for example, let's say you want to prove A plus B, right? So if you do a custom circuit for it, it's going to be very simple. it's actually going to be like, okay, let's just prove A plus B.

Starting point is 00:16:40 You'll lay out as an equation and you're like, okay, well, this is error criticize. Here goes the authorization of it, and we're good. If you do A plus B, it's like with the ZKVM circuit, you basically prove not just A plus B, but you prove the execution of the whole CPU plus everything, everything and everything, like all the bytecodes, everything, everything, everything, everything, a shit lot of upcodes, which do in the end, A plus B. So that's, you know, just an almost overhead for doing just A plus B. We kind of knew that when I'm going to go with this overhead, so we chose the way, as they say in the middle,

Starting point is 00:17:17 which basically about introducing custom service for each particular piece of program, yes, but not manually, but generating that automatically via compiler. So you can still prove mainstream languages, you can still prove, I don't know, like Ross, civilities, DPP, anything you want. But it's going to be custom circuit for each particular part of computation. There will be no overhead, and in the same time, you do not craft those circuits manually. So it's basically like something in the middle. That's what it is.

Starting point is 00:17:48 And how are these automatically generated these custom circuits? Compiler. That's where our compiler comes to the stage. That's what it is about. Okay. So the compiler is basically like a program that will take some, something like, okay, A plus B, and then put out a custom circuit. For it, yeah.

Starting point is 00:18:11 Or, for example, if you want to prove something really big, you want to prove some, I don't know, like ML model or learning or something, something. You just put it in there, and there will be a custom circuit generate for it. So there will be no overhead. But in the same time, you could use the code, which was already written by, I don't know, like thousands of people outside of the industry, for example. Or, I don't know, game. If you want to prove the game, let's say you want to, let's say you want to prove

Starting point is 00:18:34 that you speed ransom game in some particular time. Or I'm like, do my favorite example. Do speed run. So you can actually just put the doomed code inside the compiler. Compiler's circuit for it. And then to brag to your friends, like, hey, I approve. It's like, I can speed run with this amount of time. Beat me.

Starting point is 00:18:52 Here goes a proof like of it on Ethereum. So that's what it is. That makes doable. And so with this compiler here, I mean, I mean, I guess there's all kinds of different. different things that people may want to verify, right? It could be like maybe solidity code or it could be code on, I don't know, Solana or like some other blockchain or maybe some code that runs. You mentioned machine learning or something else.

Starting point is 00:19:19 So this compiler, then it can basically take any kind of code and output these custom circuits or like... Yep. So any arbitrary programming language you can basically use for that. it's like any programming language which is supported by LLVM and there's quite a lot of compilers written for LLVLVL. There were a lot of compilers written for LLVM for the last 20 years. So that's pretty much like, yeah, that's quite a lot of languages. It's like just in case, fun fact,

Starting point is 00:19:48 it's like we have like an internal job that you can actually produce a ZKVM via ZKLVM. So you can take basically, for example, EVM, interpreter, for example, EVM1, right? It's like classic EVM interpreter, which was around like since, I don't know, 2016 or like 2017, right? You can just take it. You can compile it to a circuit. So you would get a ZDK EVM circuit as an output of it as an output of a circuit compiler.

Starting point is 00:20:20 So something like this, yeah. Maybe one thing we can touch on with is it's like a little bit of a detour potentially. I mean, people in crypto, many people have been sort of aware of CK Tech for a long time. But I guess the main project, right, where we've kind of known ZK Tech through is Zcash, you know, basically said like, okay, we're going to take, you know, do something like Bitcoin, but, you know, give people basically the ability to make transactions privately. And then, you know, there's still these proofs so you can kind of know the whole thing is safe and correct. but you don't know who's sending what to whom. So the privacy thing was, you know, I think for a long time, the kind of main way that people thought about ZK.

Starting point is 00:21:11 And yeah, now a lot of it is more around this other use cases, right? Like roll-ups or like where you basically say, okay, you can run this computation. We just use proof. And so it's much more efficient because like maybe you have to run less things on-chain. But let's search on privacy briefly. What are your thoughts on like ZK in privacy? Is there much activity? Do you think, like how do you think is going to play out?

Starting point is 00:21:40 Okay. Yeah. See, it's like practice applications of proof system is something which was traditionally supposed what this is for until recent times. I mean, that's true. And it's like to understand, to understand if there is a way forward with this, if there's like any, you know, like in the end of the title, right, with privacy and with ZK like for privacy is what we've got to understand how this privacy is achieved, for example,

Starting point is 00:22:07 like for Zcash, right? For Z. So for Zcash, privacy with ZK is basically achieved as follows. Because it's like you have some data which is kind of encrypted and stole inside inside Zcache, Zcash, Zcash protocol, Zcash like database, whatever. So what's going on is that you basically, when you want to do like a transfer Z cache, you get the data from there encrypted one, you decrypt it, you post a proof of a successful decryption in there, you do some changes with this data, then you encrypt it back, you do the proof of a correct encryption and you post it back to the cache. I mean like it's a high level, it's a high level overview of like how does that work, right? And what you've noticed in here most probably is that privacy is actually being

Starting point is 00:22:54 preserved, not by proof system itself, but by simply the fact that the data never leaves the, never leaves the device of basically, you know, user. Like just user, it's like the decrypted

Starting point is 00:23:10 data is only available in user's machine and the user is not willing to disclose that, is not interested to disclose that. So that's how the privacy is achieved. So there's any ones. It's like, there's a question. Can you store some large amount of data in the database and still being able to get it each time from a protocol to your machine,

Starting point is 00:23:31 decrypt it, do something with it, and then encrypt it, and then post all of these proofs of correct encryption and decryption. It's like, will you be able to do that with large amount of data? That's a good question. And I haven't seen anybody willing to download, I don't know, like picabytes of data on their phone to just decrypt it, do some small change, and then encrypt it back again and send it back. it's not like it's, you know, like very attractive. So to defeat this, to defeat this basically, people started thinking about,

Starting point is 00:24:01 okay, we need to process this data right where he is, right inside the day space. You don't have to do it. So you won't need to download it or like do something with it. And that's where basically, that's where basically like Google mobile most encryption came into play. And it's still in the lab. It's still not out there. It's still not usable. But once it is, I think in terms of privacy, it will become much more relevant.

Starting point is 00:24:22 than proof system-based mechanisms. Proof-system-based mechanisms are very good for compression. I mean, those use cases which we see right now, like DKBridges, e-KML, roll-ups, I mean, whatever, oracles, anything, anything, right, the gate, gig, gates. Very good for progression. With a privacy. All right.

Starting point is 00:24:41 So that was very interesting. Let me just sort of like, rephrase if I understood this correctly. Basically, like, the issue is that, you know, in a proof system, private. see, you know, I would basically need to run it locally, right? So I'm going to have to download some data from somewhere. I do whatever I do. And then I generate this proof on my phone, on my computer,

Starting point is 00:25:07 and then I send this proof back. And then that's the change. But of course, the downside here is, well, I have to download this data. I don't know if maybe generating the ZK proofs is also computationally. Sometimes it is, yeah. Right. And then with homomorphic encryption, basically the advantage is that they can have the data,

Starting point is 00:25:31 someone else can have the data, and they can apply the computation on top of this data, but everything's encrypted, so they don't know actually what the data is, but I don't have to be involved as a user. Yeah, it's like this is the way. Yeah, okay. And so you feel like homomorphic encryption

Starting point is 00:25:47 is the thing that's really going to help sort of the privacy applications more than the proof systems. Yep. It's like, again, proof systems are very good for compression. For data structure for serving compression, I don't think any other thing will beat this. But in terms of privacy, yeah. Let's talk about the marketplace.

Starting point is 00:26:08 Can you explain how does the marketplace work to ZK Marketplace? And, you know, like who are the different actors that are participating in this marketplace? Basically, what proof market is in terms of, like, you know, tropical architecture. So what is that? It's basically, again, just a marketplace for proof generation, right? It basically turns all those like ZK proofs into commodity, which you can, like, measure their value, measure their generation time, measure how much they would cost to be produced or how

Starting point is 00:26:46 much it would cost to, I don't know, like speculate with them. We had told you're actually speculating with them. So anyway, so that's basically what it is. What are the actors in there? And what are the actors in there? Well, let's start with like the most obvious one. It's like generators. I mean, this is nothing without proof generators, right?

Starting point is 00:27:06 So these are the fellas with, these are the fellas with like, you know, big machines or specialized card or something else, which are willing to provide their computational facilities for the sake of, you know, applications being able to use them for there for the sake of security of theirs. for example. Sometimes it's not security. Sometimes they get it. It's crash or something. Yeah. So that's the most critical component of this whole thing. The nuance is that different participants, different group generators, induce basically like open competition. I mean, obviously it induces open competition because somebody has a better hardware, somebody has worse hardware,

Starting point is 00:27:46 and somebody is more fitted to generate. For example, roll-ups, serve. that proves for roll-up surface. Somebody's more fit to generate like kind of proofs for ZKML circuits or something. So it creates an open competition. To coordinate this open competition

Starting point is 00:27:59 and to make sure that it stays fair, there's basically like coordinating protocol. Like coordinating like product or application, right? Application on top of DBMSS thing. So the second actor, which is the most obvious one, is basically the fella

Starting point is 00:28:15 who maintains maintains this cluster, maintains this protocol, maintains the DBMS. I mean, because currently it's just what it is. You got to store the data, you've got to facilitate the competition, you have like a competition and you got to make sure that it stays fair. So that's what it is. There is any nuance on how this competition designed internally in terms of like product architecture and what is required for that.

Starting point is 00:28:41 But that's like, you know, different topic. We'll come to that later if you want. The third of this participant of this is basically like application. Well, it's the most obvious participant. Like applications, I don't know, DK bridges, ZK oracles, like roll-ups, I don't know, ZKMLs. Anyone, ZK games. It's about, so how does it look like for that? It's basically, in most cases, this is like the theorem application.

Starting point is 00:29:09 It's like, and roll-up is also kind of a theorem application. We can say it that way. So in most cases, this is like an Ethereum application. which comes up with a desire, something like, okay, I need composable complications for Ethereum. I need to be able to just, you know, order some heavy-loaded, big-ass computation to be able to be able to use it in Ethereum. For example, some application decided that they need the result of an ML model which did scoring over different, over different Ethereum addresses for the sake of, you know, figuring out risk parameters for some planting. All right?

Starting point is 00:29:42 this is a very big chunk of computation. And the application usually comes up with something like, okay, I need this chunk of computation and the result of this like, you know, model loss like all this year. So I'll just go, order it. Somebody will generate it for me, and I will just use the result of it

Starting point is 00:29:57 without being concerned that, you know, somebody tries to screw me up. So that's basically, it's like three, it's like three major actors in that. Maybe you can talk a little bit about the second thing. So you mentioned, okay, the maintenance of the, You know, storing your data.

Starting point is 00:30:15 I mean, because in the end, right, I mean, it's a marketplace, right? And you want it to be a trustless, a decentralized marketplace. So that is run on chain? Let's say this way. That is around, that's like that is being run on top of, on top of a DBMS. And the DBMS handles some VFT protocol inside of it. So there is some protocol which facilitates this. And I got to admit that.

Starting point is 00:30:43 I can admit that in the current state of it, in the current state of it, it's not maybe decentralized enough. It's like, we're going to progress thing. But anyway, so, yeah, it is being maintained by some protocol. By, you know, yeah, there is a nuance that to run this marketplace, to run this kind of thing, and that's not only this market place, but actually like quite a lot of other different applications, like, for example, you've got to have some very particular requirements in this radical. Because, I mean, what is effectively proof market is, it's a lot of computation over

Starting point is 00:31:19 data, but computation which can be decomposed, right? It's a lot of verifications. And verifications, I mean, you can run it on Ethereum. But if you do that on Ethereum, like directly on Ethereum, you will pay billions of dollars in fees just for the proof pocket per year. I mean, nobody wants to do that. If you run this on a roll-up or something, like on a traditional roll-up or something, you will quickly hit into the limit of a gas available less sophisticated. And the second thing, it's like just one proof market, just a single proof market, will be enough to induce congestion as like at any roll-up existing out there. So it's like if you deploy a proof market, for example, like to ZKKK or some other roll-up,

Starting point is 00:32:03 it will get congested like in just seconds. That's interesting. Why is the proof market so complex? reputational intensive. Verifications. A lot of verifications. Each proof, which is being submitted by the proof generator, has to be checked on the protocol side that the proof generator didn't try to screw somebody up.

Starting point is 00:32:24 So the verification has to be done. And we have this like verifiers for EDM. So yes, I got to admit that the protocol which runs through market is kind of EVM-based one, all right? So we've put EVM inside database system. So you're welcome. databases now have EVM inside. Anyway, so you've got to check each proof which is being submitted, which is being submitted by the proof generator. Because otherwise, if an application comes to proof and they're like, okay, well, can we be sure that it's good? No one else. So we got to make

Starting point is 00:32:58 sure that it's good. And each pair, each circuit, each new border induces at least one verification. and just to understand verifications, I mean, they take quite a lot of gas, quite a lot of complications. So, yeah, that's intensive. There's also new ones, if you want. There's also new ones if you want that each proof, each proof and the input data, which you use for progeneration, takes not just a lot of computation, but it also takes a lot of storage in terms of you've got to be able to produce a lot of data. And once you try to put, like, for example, like a bunch of, I don't know, 500-kilabyte proofs at some roll-up, it's going to get congested in seconds.

Starting point is 00:33:41 If you pause that much proofs on some roll-up, right? Or if you put input for the proof generation, which, like with which the application has come, you know, to the proof generator, this will also get congested, like, you know, in seconds. For example, the input for Solano consensus proof, I mean, we did Slot DeSlau consensus proof some time ago. Input for Solano consensus proof is being produced. I mean, it's really big. It's actually just, you know, thousands of signatures, thousands of hashes being produced each added all like 0.2 second or something. That's a lot. I mean, that's really a lot.

Starting point is 00:34:20 And for this, to process all of that was like, I mean, it's quite convenient that we started with a DBMS. because we found ourselves very lucky then we started with DBMS. So that's basically really easy. We can process. Yeah, yeah. Okay. And then you mentioned this DBMS.

Starting point is 00:34:37 It still has some sort of BFT thing. So do you still have basically, you know, a bunch of different, I don't know, node operators or something similar that then all run this DBMS? Well, currently it looks that way.

Starting point is 00:34:52 Yeah. It's like not like node operator or something, but like a DBMS instance. yeah, there are different servers, different operators which do host this. But like, again, right now it's just intestable. And again, it's very, it's very important for us for this to not to accidentally become something, you know,

Starting point is 00:35:12 very standalone because the majority of anybody who's interested in this, I mean, like in the DBMS, it's either accessing the Ethereum data, either JDAV-ROOV-ROOV-S or the proofboxes for Ethereum applications, either using the proof market in combination with some other application dedicated to this year. So we try to not be standalone and we're figured out the way how to not end up being a standalone thing.

Starting point is 00:35:39 So yeah. We mentioned the three actors. So maybe let's go through the other actors too. So we mentioned, you know, we talked about the application. So can you just run through it a little bit? Let's say I'm maybe one or two examples of, okay, I'm someone who wants to develop an application that's going to leverage, you know, ZK, how, and, you know, particularly this marketplace. How would this work?

Starting point is 00:36:12 Well, basically, first of all, you got to determine if you would need, if you would need approved generation outclosing. That's like the first thing you've got to determine. It's like the second thing is the second thing you've got to determine. mind in terms of like you being as a developed, the application developer, you've got to figure out if circuit of yours is like, you know, if it's, if it's really complicated or if it requires like something, something huge to be proven or something something, something. So the typical workflow for this, like once you decide these two points, the typical workflow is the, well,

Starting point is 00:36:43 we suppose it that way, all right? So we're kind of like protocol and tool chain agnostic, but by default, we suppose it the way. So you just come in, you take the circuit compiler, you draft some circuit, if you don't need proof generation outsourcing, you just generate locally and you just verified on Ethereum or like somewhere else, and you're good. Basically, here goes, your application, just build the logic, all right? If you need proof generation outsourcing, you will be required to basically force the circuit to the proof market as you know, as we call it list the circuit. It's like listed on the proof market.

Starting point is 00:37:15 Here we go out of kick that. Never thought that's going to be in my vocabulary. But anyway, so you're going to list the circuit on the proof market and say something like, Okay, guys, now I need somebody to provide liquidity, like proving liquidity, like prove liquidity, all right? It's like, hold this particular circuit for this particular circuit pair or something like this. So if you realize that, okay, yes, it needs to be outsourced, somebody comes in, generates your proof, you get your proof, you take it, you use it anywhere else.

Starting point is 00:37:46 So that's basically what it is. There's also one more scenario where you can use this is you can, it's just instead of doing all of that thing, for example, on your front end or like in a custom way from your application, like for something, you can basically order the same piece of chunk of, like the same piece of computation right from EVM, if for example, like building some EVA application, like a Noddipa product or something. You can basically just come to EVM endpoint point for proof pocket, say something like, okay, I need this big chunk of competition for me at this address deployed in, I don't know, like two hours or like in five minutes or like for the fifth amount of, I don't know,

Starting point is 00:38:22 something, whatever, for this amount of E, for example. So you ask them, they're like, okay, well, I see the order, I'll generate it, this particular statement, here goes, here goes a proof of yours, and you get the proof like right inside your EBM, right inside your EVM application. So that's like the second, that's like the second way of using this. Okay, cool, cool. And now maybe finally mentioned generators. So generators basically, right, they just do a bunch of computation, give back the results, get paid for it. Sounds a lot like, you know, proof of work, sort of, right? Or like, is this de facto going to be that you have a lot of the crypto mining farms

Starting point is 00:39:06 that maybe do like GPU mining are then just going to say, okay, now we're also going to do, you know, proof production for like, let's say this marketplace, other kind of ZK systems? How do you imagine that this market is going to look like? Yeah, it's like, first of all, I want to say that this might be similar. This might seem to be similar with just traditional proof of work thing, but there's still quite a lot of differences. It's like, first of all, it's like doing just proof of work is like computing hashers over and over again for the sake of securing, for the type of secure like some proof of work product, right?

Starting point is 00:39:41 So that's what it is. And it's not like you're proving something new each time. It's not like you're proving something, something which, you know, makes sense. you just prove hashes. And these hashers are present in there just for the sake of computational complexity for the sake of you, not to spam the cluster. So that's what it is.

Starting point is 00:39:58 In case we're talking about, like, for example, ZK Proof Generation, yes, it is required to have quite a lot of computational facilitated for this. There are folks which are doing like, you know, specifically AISX, specific hardware, specific hardware for this. Some of them do programmable.

Starting point is 00:40:12 A6, some of them do, you know, just slap thing on board. They're like, okay, we're good with it. and that's something which is similar, but what is different is that what you prove is not like just, you know, meaningless hashes for the sake of computational complexity. It's something, you prove that you proved basically the sequence of some actions which were done at some protocol or outside of the protocol or by somebody else. If you digitalize all your everyday actions, you can actually create a proof of all your everyday actions. And is it going to be, it's like, get a bit meaningless, I mean, like a proof that you cross the road, for example,

Starting point is 00:40:50 or a proof that you, I don't know, something, that you walked, I don't know, like 500, it's like 500 miles, right? So it's like, is that meaningless? Maybe, but it's like, is it meaningless for everybody? Not really. It's definitely not meaningless for everybody. So this is what is different. So that's the first thing, which is different. The second thing, which is different with just proof of work stuff, is you have the same piece of computation is like basically doing being done over and over again. In here, complication each time may be different. And in case we're talking about like, in case we're talking about defining computations custom circuit, there are two ways to handle this.

Starting point is 00:41:33 In case circuit is the same over and over again, just like with DKVMs for ZKVMs, okay? So we have the same circuit. So you got to put the same circuit over and over again with just in different input data. This situation makes sense for people to produce specific A6 for this, specific hardware for this, which is dedicated to particular circuit, right? When we're talking about that each piece of competition can be represented by a different surface, it makes sense. It's like, it's not like just makes sense, it does make sense to produce specific hardware for each of these circuits because they're not that widespread and that might be easier to just, you know, compute them on like CPU or GPU or something. Because these are basically like dynamic circuits and this has, the system has to be more. flexible. So if we're talking about like if this market is going to end up just like, you know, mining market, no, it's not going to end up because it's not going to end up that way because

Starting point is 00:42:26 it's different by design. So maybe there will be like some specific circuits or some specific computations that are where there's just like an enormous large demand that maybe looks more like mining, but then in general, because it's much more generalized there, it would be different dynamic. Yeah. Yep. What are the biggest challenges with this? Like, what are, yeah, what do you guys find are the biggest, maybe conceptual, technical

Starting point is 00:43:03 challenges that you're facing? Which challenges do with case right now? It's like, I got to admit that one of the biggest in terms of conceptual challenges we're face currently is how to make the protocol which powers proof market, it's like, you know, secure and decentralized and not. I mean, how to arrange this product goes the way so it could handle proof market and similar applications. Because, I mean, again, a lot of data, a lot of computations, we cannot deploy that on the roll of. We cannot deploy that relief. It has to be something interesting. It has to be like something different. And that's like one of the, that's like

Starting point is 00:43:39 one of the challenges, the architectural challenges were, it's like we're thinking about right now. That's in terms of, you know, like, tech stuff. So we already realize that's a solution which can handle this. I mean, it has to leverage those things which come from

Starting point is 00:43:55 a DBMS industry. It has to leverage like, it has to leverage, like dynamic sharding. It has to leverage security techniques which were developed within crypto industry. And only the combination of those would be able to handle basically what is required for this and maybe not only this.

Starting point is 00:44:12 There is also nuance that inside the proof market, like applications and proof market alike applications, right? They have to be able to access the theorem data because the majority of what's going on around proving happens around the theorem and you've got to have data to put that as an input to the verifiers for each proof which comes from a food generator. So we're basically got to be able to access Ethereum's data from the inside of a radical, from the inside of a fruit market,

Starting point is 00:44:42 just like it was run on Ethereum, but not on the Ethereum because Ethereum cannot handle the load. So this weird combination, weird combination of two industries is what bothers us currently is, and it is what in terms of like architecture we're facing right now the most challenge about. It's the, you know, how to handle this proof market like applications, how to handle applications which require a lot of data, a lot of applications stuff, transparent data access,

Starting point is 00:45:12 how to handle that. So this could be still aligned with it here. Cool. Can you talk a bit about what is the product roadmap? As I mentioned already, there are basically like two big chunks. It's like the first big chunk

Starting point is 00:45:28 is the combination of a Proof-Walki Plus compiler. It's like in here, in here, the thing is about put it as much applications as like as it makes sense for them. It's like to, you know, to help as much applications as possible, right? Basically, that's what it is. And we target, for example, to introduce the notion of ZK gaming.

Starting point is 00:45:50 I mean, not just, you know, Space Invader is a like thing, just like it was with Dark Forest, right? I mean, Dark Forest is good. I mean, I love it. Space invader than good. But we want to introduce a more widespread. It's like a more interesting notion of ZK gaming. It's like, we want to be able to prove 3D games. want to be able to prove, like, I don't know, something's really weird.

Starting point is 00:46:08 What happens, for example, in a really action game, like, on Ethereum. So that's like, that's like the thing. Another thing is that on our side, not like entirely on our side, but there is an ML, there's EKML extension, basically coming for a compiler, which would allow to prove ML models to Ethereum. And what I'm talking about proving the model theorem, I'm not talking about proving something like similar, something trivial or something similar to, I don't know,

Starting point is 00:46:40 you know, just a classifier or something. I'm talking about proving whatever you want, basically, because you can produce like circuits dense and off. So that's the thing. And the guys which are doing this, as an example again, they target to prove GPT2 to theorem. Is it useful? I don't know.

Starting point is 00:47:00 Is it fun? Yes, it is. So that's what it is. That's the roadmap. That's the roadmap regarding the circuit compiler plus blue blocking. There's also a thing in the roadmap that we want to be able to produce ZKEVLVM via ZKLVM because we think that's going to be like an important milestone for Ethereum community because it's about the security of ZK AVM circuits. We do not want anybody to get hurt by under-constrained circuits. So we try to basically compile EVM with ZKLVM. So we could say something like, okay, guys, here goes ZKVM, which is easily audible,

Starting point is 00:47:42 which is like, you know, secure, which was not written manually, but it was generated automatically. And the compiler is kind of, you know, proven. So it's good. It's fine. And it definitely is not under constraint and nobody, nobody's going to lose, you know, nobody's going to lose anybody because of that. So that's like the bot around the compiler plus proof market.

Starting point is 00:48:04 It's like the bot around the proof market plus DBMS thing. Well, again, it's like, as I already mentioned, we got to get to that point when we'll be able to make the product which powers the market accessible to set by the developers for people to be able to see that, okay, well, mix into industries make sense that, okay, it's like, there's like, you know, there's enough, there's enough of scalability, there's not like, you know, there's not of data which can be handled.

Starting point is 00:48:32 There's not a reputation to be, which can be handled. And you can like leverage, I don't know, transparent access to Ethereum data. And currently we're trying to make that accessible bythreadbody, bythread bythead by the developers, not just us. Because we've already been body in this and like beta mode in private for, I don't know, seven, eight months. And I mean, we want to make it public. We want to make it public for some, for some,

Starting point is 00:48:57 for somebody to say that, okay, it was worthwhile. It's good. It works. So what is the timeline here? Like when do you guys go to like a main net and how do you see like if things go well? How do you see the projects sort of developing over the next two years? Look, it's like just to avoid confusion. It's like we have no such thing as we have no such thing as main net in terms of like, you know,

Starting point is 00:49:25 DBBS or something because it's. It's DBMS, right? So I don't know if there is like such a thing as we made that in that. So the second thing is, the second thing is in terms of when the beta, when the beta bowl ends, like we target the beta mode to end like before, before the middle of November, before the DevConnectal Istanbul. So we target the competent event, having all of our verifiers, having all of our, like, having all of our endpoints and everything, everything, like in production,

Starting point is 00:49:59 deployed on Ethereum on like Madeithereo. Right. So if we're talking about like production or something for protocoling for DBMS, there will still be quite significant time span when we're going to be testing this, just like in the public beta. And it's good around for some time. And so, you know, it's still, it's still going to be testing. to be tested in public.

Starting point is 00:50:27 So, well, we have plans. We have plans. But it's like they're too ambitious to make you tell about them. Thanks so much for joining, Misha. I think this is like a really cool area of ZK. And I think it's going to be super fascinating as we see this getting indicated in different crypto applications. Yeah.

Starting point is 00:50:48 And so, yeah, thanks so much for coming on. I'm excited to see how Neil Foundations and the CK marketplace are going to play out. and you know how in general the impact of CK on crypto is going to be. Thanks for inviting me. It was nice. Cool. Well, thanks so much for listening for tuning in. If you want to support the podcast, make sure to leave us iTunes review.

Starting point is 00:51:10 And we look forward to being back next week. Thank you for joining us on this week's episode. We release new episodes every week. You can find and subscribe to the show on iTunes, Spotify, YouTube, SoundCloud, or wherever you listen to podcasts. And if you have a Google home or Alexa device, you can tell it to listen to the latest episode of the Epicenter podcast. Go to epicenter.tv slash subscribe for a full list of places where you can watch and listen. And while you're there, be sure to sign up for the newsletter, so you get new episodes in your inbox as they're released.

Starting point is 00:51:40 If you want to interact with us, guests, or other podcast listeners, you can follow us on Twitter. And please leave us a review on iTunes. It helps people find the show, and we're always happy to read them. So thanks so much, and we look forward to being back next week.

Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Misha Komarov: =nil; Foundation – The Marketplace for ZK Proof Generation

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.