Bankless - 119 - Dive into Danksharding | Vitalik, Dankrad, Protolambda, Moderated by Tim Beiko

Starting point is 00:00:05 Hey, Bankless Nation, welcome to our special live stream. This is going to be a panel edition. I think, David, this is going to be our dankest panel, maybe the dankest episode we have ever recorded because we are talking about dank sharding today. And I got to be honest, going to this episode, I'm not entirely sure what it means. Like, what is dank charting? But we'll ask some of the participants this. Maybe, David, you could describe the setup of this panel and who's on it and how we're going to handle this episode. Yeah, of course. You guys all know. Tim Bako. He's the guy on the bottom of the screen. Tim manages the all-core deaths call and is coordinating the lead into the merge and beyond. And we've had Tim

Starting point is 00:00:44 Beko on before. He led a very technical EIP-1559 panel and asked the questions that Ryan and I are just not smart enough to ask. So this is one of those panels. We're going to get as technical as possible. We have three fantastic panelists who are behind the scenes. We've got Vitalik, Dankrat, and Proto. Who are the minds behind Danksharding? and charting in general and other Ethereum-related technologies. And Tim is going to be able to ask questions that are the technical questions, the smart questions. But before we get there, Tim, we just need to want to cover some high-level stuff. What is Danksharting?

Starting point is 00:01:18 How to get its name? And just like, what does it mean for users? Right. Well, first, yeah, thanks for having me on, guys. So bank sharding and proto-dank sharding, which we'll also get into, are iterations over the sharding design for Ethereum. And we'll spend the bulk of this panel discussing what they are with the tradeoffs are and what not. But at a high level, sharding is a way for Ethereum to have more data pass through the network.

Starting point is 00:01:47 And because layer two solutions like ZK rollups and optimistic rollups, they produce it on a data. If we have a way for them to post that to the Ethereum network more cheaply, it immediately reduces how much they need to charge users for transactions. So all of these kind of flavors of sharding all have kind of the same end goal, which is to create a cheap place for layer two solutions to post data on Ethereum. And the impact of that is that the transaction fees, that end users end up pay on layer twos is lowered by a lot. So this is all about getting transaction fees down, particularly on layer two's, gas fees down on layer twos. And Tim, is this related? You mentioned proto-dank sharding and dang sharding. You guys will talk about all that in the panel.

Starting point is 00:02:35 But is this related to EIP 4844? Because that's another EIP we've heard a lot about. Yes. So EIP 4844 is proto-dang charting, basically. And one way to think about it is like, proto-deg sharding is like maybe the first step we get towards sharding, and then bank sharding is the simplification that we'd had on the previous roadmap. So like the more prefixes we have, it's like the sooner we get them.

Starting point is 00:03:01 Yeah. All right. This has been, like you said, an iterative cycle, an iterative process to get to where we are today. And dank sharding comes from Dankrad, who's on the panel. And Proto-Dank sharding comes from Proto-Lamda, who's also on the panel. So these guys' names have been baked into the name of this EIP itself. Tim, is this the new EIP-1559? Is EIP 4-844 going to be the new EIP that we focus about going into the future?

Starting point is 00:03:30 So there's a lot of stuff we're working on right now. you know, it's definitely one of the big ones. And I think for end users, it's one of the most impactful ones because it directly affects the gas price they pay. Hopefully it's much as contentious than 155-559 was. But yeah, it's definitely an EIP you're going to be hearing more about over the next few months. And just one last question before we hand it over to the panelists and we bring the panelists online.

Starting point is 00:03:56 There's been a bunch of conversations about, you know, when EIP-1559 came and go, people were like, is this going to reduce gas fees? And the answer was no. And then people are like, the Ethereum merge, is that going to reduce gas fees? And the answer is no. This EIP reduces gas fees. This reduces transaction fees, not for the layer one, but for the layer two, correct?

Starting point is 00:04:14 Yes, on layer two's, yes, that's correct. And basically, 4844 is a way for us to get some of the reductions of shardine quicker. And then the full dang shardine rollout gives us even more reductions. But because 4844, aka Proto-Dank shardin, is simple to implement, we can just get that first. Fantastic. All right, I think that is all of my questions. And I think that's all of Ryan's questions. So with that, I'm going to ask the panelists to come in from the shadows and turn on their cameras. And me and Ryan are going to actually duck out of here. Excuse me, we're going to dank out of here and let Tim take over this stream. Guys, welcome to the panel. And Tim, thank you for doing this. And then absolutely just take it away. If you're trying to grow and preserve your crypto wealth, optimizing your taxes is just as lucrative is trying to find the next hidden gem. Also, IRA can help you invest in crypto in tax advantage. ways to help you preserve your hard-earned money. Also, crypto IRA lets you invest in more than 150 coins and tokens with all the same tax advantages of an IRA. They make it easy to fund your

Starting point is 00:05:12 alternative IRA or crypto IRA via your 401k or by contributing directly from your bank account. There is no setup or account fees and it's all you need to do to invest in crypto tax-free. Let me repeat that again. You can invest in crypto tax-free. Diversify like the pros and trade without tax headaches. Open an auto-cry-o-ri-a. to invest in crypto tax-free. Just go to altoira.com slash bankless. That's A-L-T-O-I-R-A.com slash bankless and start investing in crypto today.

Starting point is 00:05:43 The era of proof-of-stake is upon us, and Lido is bringing proof-of-stake to everyone. Lido is a decentralized staking protocol that allows users to stake their proof-of-stake assets using Lido's distributed network of nodes. Don't choose between staking your assets or using them as collateral in D-Fi. With Lido, you can have both.

Starting point is 00:06:00 Using Lido, you can stake any amount of your eth to the Lido validating network and receive STEth in return. SCEth can be traded, used as collateral for lending and borrowing, or leverage on your favorite defy protocols, all this without giving up your ETH to centralized staking services or exchanges. Lido now supports Terra, Solana, Kusama, and Polygon staking. Whatever your preferred proof of stake asset is, Lido is here to take away the complexities of staking while enabling you to get liquidity on your stake.

Starting point is 00:06:25 If you want to stake your ETH, Terra, soul, ormatic, and get liquidity on your stake, go to Lido.fi to get started. That's L-I-D-O-F-I to get started. The Layer 2 era is upon us. Ethereum's Layer 2 ecosystem is growing every day, and we need bridges to be fast and efficient in order to live a Layer 2 life. Across is the fastest, cheapest, and most secure cross-chain bridge.

Starting point is 00:06:47 With Across, you don't have to worry about the long wait times or high fees to get your assets to the chain of your choice. Assets are bridged and available for use almost instantaneously. Across bridges are powered by Uma's optimistic Oracle to securely transfer tokens from Layer 2 back to Ethereum. A token proposal is being deliberated as we speak in the Across Forum where community members will decide on the token distribution. You can have your part of Across's story by joining the Discord and becoming a co-founder and helping to design the fair, fair launch of Across. If you want to bridge your assets quickly and securely, go to across.tto to bridge your assets between Ethereum, Optimism, Arbitrum, or Boba networks.

Starting point is 00:07:24 Okay, sweet. It's just us now. So I guess, yeah, before we get into it, do you each want to just take a? minute and kind of talk about like what you work on and who you are um yeah vitalik we can talk with you um yeah so hi um i'm vitellic i'm the co-founder of pickwin magazine and um i write a blog i contribute to specs once in a while um denkrad yep hi i'm dankrat i'm an ethium researcher since 2019 and I'm working on among others sharding and what else we've worked on proof of custody, statelessness, yeah, some projects on the roadmap of Ethereum. Nice. And Proto. Hey, hello. I'm Proto Lempta. I used to work at the Ethereum Foundation on research there. Now I do

Starting point is 00:08:20 the same thing with an optimism. I helped to sharding earlier on and now I'm contributing back to layer one while working at optimism and layer two. Sweet. Okay, so just to kind of get into sharding generally, over like the past couple years what sharding means that Ethereum has changed a lot. And I think like the biggest one is like this shift from full execution sharding, the only data sharding. Vitalik, do you kind of, Do you want to give us just an overview of like how that shift happened in the research roadmap and why we've landed on just doing data sharding? Yeah.

Starting point is 00:09:03 So I think there's been this ongoing simplification of the sharding roadmap that started really sometime in 2016. So for people who have been in the Ethereum ecosystem for a long time, you might remember some of the scaling docs that we published back in 2015, back in early 2016. some of the blog post thinking that I came out in 2014. And the stuff that was there in those earlier periods was in a lot of ways really complicated, right? Like there were these ideas around hypercubes and hub and spoke chains and in protocol supported cross-charged transactions that would be routed between like one corner of a hypercube to another corner and where the protocol would help them. from like one chart to another to a third to a fourth.

Starting point is 00:09:55 There was even thinking about super quadratic sharding, which is basically saying, like instead of just having shards, you have shards on top of shards and potentially like in infinite hierarchy of shards inside of shards inside of shards. So actually the sort of stuff that the telegram ton project ended up incorporating into their paper, though I guess that never really ended up coming close to going life,

Starting point is 00:10:21 unfortunately. But that was the kind of thinking that we had back in 2015 and 2016. And I think after that, the progression has just abandoned this big slow process of, I guess, increasing pragmatism, increasing appreciation of how complicated it is to develop and actually bring to production just about anything. Like what feels like 10 lines of code actually becomes like hundreds of lines of code once. You add all of the complex cities that clients have to inevitably have and deal with how some particular thing

Starting point is 00:10:58 interacts with the sinking process, how some particular thing interacts with the fork choice, how some particular thing interacts with the database and the need to store it in optimized formats and that sort of stuff. So the process

Starting point is 00:11:13 of simplification, I think the big first step was definitely the decision to not bother with anything beyond quadratic sharding and just say we're doing quadratic sharding, right? So not bother with doing ever any kind of shards on top of shards and just saying, we have the beacon chain, we have shards, shard headers are connected to the beacon chain, and that's it. Like, that's the only layer of sharding that ends up actually happening. So that was the first step.

Starting point is 00:11:40 Then the second step was the move from this concept of like chains. that have regular commitment blocks in the beacon chain. I think there was another word for them. I forget what that word is. Crosswinks was the word, right? You have shard chains that crosslink into the beacon chain. So moving from that to a system where you just have every shard block directly get included in the beacon chain. So that was the second simplification.

Starting point is 00:12:13 I forget exactly one that happened. I think that might have been around 2019 or 2020 or so. And the big benefit of that simplification is that it meant that we didn't have to worry about shard chains anymore. Then after that, there was the idea that we're going to do data sharding first. This was when I started talking about the roll-up-centric roadmap, right? Basically, instead of shard blocks actually containing transactions that would be executed at the Ethereum layer, these shard blocks would just contain big blobs of data, and it would be the responsibility of Ler2 roll-up protocols to use that data space in order to create secure and more scalable

Starting point is 00:12:58 experiences for their users, right? So Ethereum, the system would provide non-scalable computation and scalable data, and what a roll-up does is it converts scalable data and non-scalable computation into a scalable computation. So we have a somewhat more performant layer one that has extra data space, and then we combined that with this layer two ecosystem and the layer two ecosystem ends up like really bringing of the scalability to life. So that's the roll upcentric roadmap.

Starting point is 00:13:27 And at the beginning, I think the rule of centric roadmap was phrased in this ambiguous way where it basically said, well, look, the rule of centric roadmap, realistically, like data sharding is the obvious. obvious prelude to full sharding anyway, right? Like if we're going to implement full sharding with EVM's all the shards, it's just an obvious first step to have data shards first. But it turns out that data shards are actually really good for roll-ups already. And so we might as well run with that.

Starting point is 00:13:56 We might as well realize the roll-ups are our best hope for short-term scalability. And just take that direction and try to make the best of it. And that still leaves open the door for adding EVM execution charts in the future. but it basically says, well, actually, you know, Ethereum will be fine even if we end up never actually completely doing that, right? So that's roll upcentric roadmap. It's another simplification basically saying we don't have to bother with execution on shards. And then that also allowed some other simplifications, like it made it even more possible for shards to not bother with Fortress rules, for example. And then the next simplification after that is dink sharding,

Starting point is 00:14:39 basically said that there is actually this merged proposal mechanism where there's only one proposer that chooses all of the shard blocks on all of the shards that appear within a particular beacon block. And that simplifies things massively at a whole bunch of ways. So basically means we don't have to deal with like the whole shard proposer bureaucracy, which simplifies that. It's not this complexity a huge amount. It simplifies some of the economic properties a huge amount.

Starting point is 00:15:08 It basically makes the system feel much more similar to like just what a non-scalable chain would look like, except it's just more scalable, right? And that extra scalability happens in the background. And then proto-degs sharding finally, like that's not a simplification. That's more a step on the way to full-dank sharding that gets us maybe half the benefits of sharding. But at a point that's maybe like halfway along the timeline to actually getting full sharding out there. so we actually get some of the benefits sooner. So that was the general progression, like basically more complexity to less complexity, more of Ethereum trying to do everything to less Ethereum, I'm trying to do everything,

Starting point is 00:15:50 and more willingness to work with way or two protocols and those two things together, and that's where we are now. Yeah, thanks. That was comprehensive. Dacrad, like talking about thanks, Hardy, like, can you kind of walk us through how, So, like, this idea that it's okay to assume that the block proposals or the block builders have to track all of the shards because of this separation between proposers and builders that we've seen kind of over the past few years, especially with the rise of MIV.

Starting point is 00:16:23 So, yeah, just talk us through like, why is it possible to do something like bank sharding and not sacrifice the decentralization properties of the chain? Right. Yeah. So I guess, I mean, maybe like if we go a bit into the history of M.A.V. Like, or maybe think about how it has been recently. So like it started with maybe like maybe some mining pools, doing some stuff like to exploit MEP and like to like, yeah, to make to get some more than just the transaction fees.

Starting point is 00:17:02 And over over the course of time, this has become more and more. more professionalized. Like nowadays, most of them work with some other entity, for example, flash spots that sells them complete bundles of, or maybe selling is the wrong word here, that buys complete slots for bundles to be included in blocks that exploit a certain amount of MEV. And then the block producers, which are currently the miners,

Starting point is 00:17:31 they just get the payment for that. They get like, so yeah, So those searches are now like lots of independent entities that try to find the best M.A.V. And mining pools now don't have to bother with us anymore. And basically it turns out that if we want to properly decentralize those, which is something we really want to do, like, so right now there's like maybe a few mining pools, like tens of them or so. And so like the way it works right now is that basically FlashBots has a business relationship with each of them. Like they basically have a trusted relationship. And if like one of the two sides did something naughty, like for example, the miner could start looking into the strategies and instead of just executing them, they could like exploit the strategies themselves.

Starting point is 00:18:29 So for example, if you have an arbitrage transaction, then like you can always. often do much worse things with the individual trades if you want to exploit them. Or like they could try to figure out what the strategies are by what they are being sent and so on. So this doesn't scale to a system where instead of like a few tens of mining pools, we have probably tens of thousands or so of individual validators because you can't have like a trust relationship with each of them. That's not going to work. So the only way to translate this world of MEV

Starting point is 00:19:12 into the future of proof stake is by having some form of proposal builder separation. The way proposer builder separation works is that instead of having traditionally, like I guess like five years ago, we all thought of it as the same thing, if you propose a block, you build that block, right? And with a builder separation, that's not true anymore.

Starting point is 00:19:40 What we do instead is someone, a builder, builds a block, and the proposer just proposes, oh, yeah, I'm going to propose that block that this guy built. So we separated the two roles. And in that way, we can have a professional role of block building, which is this role that will extract M.V or work with all the searches. and so on, and we can have the proposer, and that is just a normal validator. And the good, nice thing is proposing is extremely simple and cheap,

Starting point is 00:20:12 because it's basically just selecting the highest bid and saying, yep, you get to build the block. And whereas building is a complex process where you have to manage lots of searches and they have to trust your system that their strategies won't get exploited and so on. So that is more suited for a complex, and more capitalized entity.

Starting point is 00:20:37 And so it's good that not everyone has to do it. And basically this is the, I guess, this is how we're seeing the future of block building on Ethereum. There's not really currently a viable alternative to this world. And what we also know is that, is that sort of building these, building scalable system, especially also building this massive data availability system, becomes a lot easier once you assume that there is someone who can handle these massive amounts of data.

Starting point is 00:21:23 So once you put that entity into the system and say, well, there's someone who can compute this encoding, who can distribute all this data and so on, and so on, then many things become a lot simpler. And so in the past, I guess we didn't really think about these designs because we were like, well, we want Ethereum to be extremely decentralized. And now with this proposal building, builder separation coming into the design space due to MEV, it's also become available to think about for other things. And basically this is how we, or like I first thought at the end,

Starting point is 00:22:02 of last year, well, let's use this, let's use this idea where you have these entities that can handle, for example, large amounts of data. It's not really a problem if you are like running some large machines anyway. It's not like an absolutely insane amount. It's not data center kind of amount. It's just like large machines with a good internet connection kind of amount. And exploit these entities and let them basically do this building. and that allows us to get to a much simpler and more efficient charting design. Got it. And so am I right in thinking because it's basically very hard to build an optimal block

Starting point is 00:22:46 that becomes a specialized industry, but once you do have a block that is, it's very easy to verify its validity, right? So finding the exact right block to build is hard. And so you need tons of machines to do that. But once you have found it, then anyone, can verify it. It's almost analogous to proof of work in that way. For like, yeah. Yeah, yeah, that's absolutely correct.

Starting point is 00:23:10 Like basically verifying, I mean, that's the, that's the cracks of data availability checks. They, this idea that there, there is a way to check that this amount of data is available that needs much less work than actually downloading all the data. So there's this asymmetry where like there's someone's somewhere we need to do this work of encoding the data and distributing the data, but then verifying it as much easier. Got it. Okay, and to get us all the way to Prologen charting now, so recently the three of you have written an EIP, EIP 4844, that's now colloquially known as Prolodank sharding, which helps

Starting point is 00:23:54 kind of lay the foundation for this full sharding design without requiring this entire kind of shard data network to be live. Provo, do you want to walk us through, like, what are the things that 4844 does to get at storage sharding? And also, like, how does that help layer twos? Like, how do L2s then use 4844 to provide their users with lower transaction fees? Right, so we just ended with how Deng Sharding basically introduce a data availability sampling and these other new advanced tech features to try and distribute this job across the network better. But this comes with additional complexity. It will take more time to properly test and introduce it to Ethereum.

Starting point is 00:24:46 So instead of waiting for the full version of Deng Sharding, we can reduce the features. set and go with a amount of data in between these things. We can offer additional data and we can look at layer two, like what kind of security properties they need and optimize for that. And they can already make a big win there. And then later, with the additional features, we can get to full bank sharing. And so what we started with here is the changes to pay for the data as a layer two, this type of transaction that introduces the data to the network.

Starting point is 00:25:26 And we need some changes to distribute the data across the network. But it won't be as much data just yet. So it's manageable by the own network to download. And so we don't need sampling yet. And we can have everyone download the data. Got it. And how does a layer two then use that data? Like from, say, optimism's point of view,

Starting point is 00:25:53 how do you actually leverage that? Is it just changing where you post the data that's currently posted in normal transactions as call data? Right. Is there more involved? You need to optimize what a layer two really needs. You can take apart all the things that the layer two uses. One of those things is publishing the data,

Starting point is 00:26:15 making sure that this honest minority that protects the layer two is able to get the data in the first place. and then there is this other property that the layer two uses right now of getting the data long term but these are very different data availability is this property that ensures actors are able to get the data and this can be for a more limited amount of time and so you do need to ensure that even with downtime and even with censorship and whatever other in foreseen circumstances these actors on a network on layer two are able to get the data. But after some amount of time, this should be sufficient to guarantee the security of the layer two because you want the actors to be able to reconstruct the state,

Starting point is 00:27:03 because only with a full state, only you have to fill history, they're able to challenge the operator, challenge the sequencer of the rot-up. Got it. I think we've mentioned like a few times already, and I want to make sure we kind of clarify for folks, is this idea of data availability. And I think this is something that, at least to me, was not like clear until I spent way more time looking at the sharding is like what, when we say data availability, what exactly do we mean and how is that different than like the data we store on like the Ethereum blockchain today? I don't know, Vitalik, do you want to give a quick overview?

Starting point is 00:27:42 Sure. Yeah, I mean, I think it's definitely a very important and the subtle topic. Like I think even the really big point of comparison a lot of people have is like what's the difference between what we're doing and IPFS, right? And this, like, IPFS is a platform where if you publish data, then that, like, presumably, you know, if the incentives are right or different enough people care about the data, the data gets broadcasted, and then anyone who wants to download the data is able to download the data. The difference between that and what Ethereum is doing and going to be doing is that Ethereum is and will be providing consensus on data availability.

Starting point is 00:28:24 Basically, so you just, you can always have a hard consensus on the question of whether or not a piece of data with a particular hash or a particular commitment actually is available. And what we mean by available basically is did the data go through this publication process where it got broadcasted on a public network, and anyone who wanted to actually download the data actually did have a lot of time during which they could have downloaded the data. So basically the difference between that and something like IPFS has to do with the case of a malicious publisher, right? I'm like, if I'm a malicious publisher, then on IPFS, I could potentially do something where, you know,

Starting point is 00:29:14 I control some small number of servers, and then I published through a small number of servers, and then a small number of servers might respond to, and say data is available to some people, but they might not respond to other people. And so some people get the data, other people don't get the data, but you never actually get this kind of very binary consensus on whether or not the data was actually published. Now, the reason why this kind of concept of consensus on data availability is necessary has to do with a lot of these layer two protocols where those layer two protocols depend on data being out there and use, not downloaded by everyone by default, but downloadable by anyone in the case that

Starting point is 00:30:00 they wants to download it for some security properties, right? So one very simple example is a Zika rollup, right? In a Zika rollup, you have a sequencer, that sequencer accepts transactions. that sequencer publishes these blocks that contains state deltas, they contain a proof. And that sequencer also contains, like, basically manages this kind of internal state, like what is the state of the ZKK, like, you know, the balances, contracts, whatever, inside of that ZK rollup. Now, the difference between a ZK rollup and a Validium, right,

Starting point is 00:30:32 is that a ZK rollup has like state deltas or inputs on chain. In a validium, you only have the proofs on chain and you have everything else off chain. from a security point of view of like can they force invalid stuff to go into the system both rollups and volidiums both protect against that right because so the zika stark prevents you from actual bringing anything invalid the place where they're different is what happens if the sequencer disappears right what happens if the sequencer becomes malicious and they just basically shut off from the network never talk to anyone again and the reason And why they do this is basically that because they want to just make it not possible for someone

Starting point is 00:31:14 else to interact with that system going forward. And so if people have money inside the system, that money gets stuck. Now, in a voladium, this is actually a problem. If the volidium operator does this, then they can't steal, but they can make people's money stuck. And so, you know, if they're really mean, they could potentially like extort and they can basically say like, hey, you know, if the whales don't. extends 20% of their money to a ransom address, then they're just going to make everyone's money stuck forever.

Starting point is 00:31:42 In Ezekiel Rollup, on the other hand, there's this guarantee that because either the inputs or the state deltas get published to the chain, if the original sequencer disappears, you can always have a new sequencer come in, read the data from the chain, and basically initialize the exact same state that the original secrecylure has, and so be in the exact same position and have the exact same capability to then be able to continue providing ZK Roa blocks, processing withdrawals, and processing transactions, right? So because that data is on chain, and so someone else can come in and reconstruct it and like basically swat themselves into the same role, you don't have this same security problem that Volidians have, right? So the

Starting point is 00:32:26 difference between the two, basically, is data like in this, like on chain or is it off chain? Now, why does it matter if it's on chain? Because on chain is a very, simple, convenient medium where if you see the data is on chain, even if you don't personally process it, even if you personally don't care about it, you still know that if something terrible happened and you needed to recover, then you will be able to actually go on chain and grab that data, right? So what proto-dink charting and then eventually full dink sorting try to do is they basically try to like really zero in on providing a platform that provides exactly that capability, right? So the beacon chain would actually only contain hashes of data. And so if you're a client,

Starting point is 00:33:08 then you would be just downloading the beacon chain and you would get hashes of everything. But when I say hash is here, I mean like hashes of KZG commitments and the, you know, blah, blah, complicated math. But like, think of them as hashesage. Like, actually, yeah, it is a, yeah, yeah, KZG is a cryptographic revalent hash function by, you know, definitions of collision resistance and pre-image resistance and so forth. But basically, yeah, the actual full data would instead live in this like sharded system where it would be inside of shards, and it would be inside of puret peer sub networks. And the point of all of this machinery around data availability sampling and that sort of stuff is to basically

Starting point is 00:33:47 provide a way of kind of checking and guaranteeing that the data actually has been published through this mechanism where if in the future you need it, you will be able to get it without actually requiring everyone to like directly download all of the data themselves. themselves, right? Now, the chain does not have to store the data, or the shards do not have to store that data forever, right? So, like, the plan is for them to delete that data after some period of time. Like, it's, you know, numbers have been thrown around of, like, somewhere, like, could be 30 days, could be a couple of months. And then, basically, but the point is to give enough time that any mechanism or, like, anyone that wants to be able to download the data, will be

Starting point is 00:34:28 able to download the data and like for there to be enough time that for all of the people that would be making backups for what related to a particular application to actually have the time to do that. Right. So basically, you know, create the system that's like really optimized around this idea of like, how can we just provide this exact guarantee of like data availability, like proof that the underlying data behind a particular hash has actually been published to this public notice board where if people want it, they can get it.

Starting point is 00:34:59 So the roll-up similar to you can take advantage of that for scalability without incurring the complexity costs of actually trying to shard, like, full-on EVM execution or whatever. Right, right. And so it's like the guarantee of the Ethereum L1 protocol is quite tight. It's like we guarantee that this data will have been

Starting point is 00:35:22 published on the network for this amount of time. Beyond that, obviously, there's still ways to retrieve that data. They're just not guaranteed by the Ethereum protocol, right? Then it could very well be on IPFS, but that's not a guarantee that the protocol can make, you know. Right, exactly. Yeah.

Starting point is 00:35:40 And we've touched kind of on this a couple of times already, like this idea of data availability sampling and only verifying, like, having each validator verified parts of the data to make sure that like overall entirety has been published. Denkrad, do you want to give us like an overview? view of how does data availability sampling work for, call it an intermediate audience, you know, not a photographer. Yeah.

Starting point is 00:36:09 Yeah. Yeah. So the idea between the data and the data availability sampling is that you somehow want a scalable way to ensure that some amount of data is available. Like is basically and available means you could download it if you wanted to, right? And so the obvious, obviously we know if we don't know it, which is what we do now, it's available. That's simple, right? Because if you could download it, then you could download it.

Starting point is 00:36:40 Okay. But how do we make it scalable? So scalable means we have the same amount, a constant amount or maybe increasing a logarithmic amount, but not like a linear amount of resources that we need to do this amount of work. So we need to find some way of doing this in a more efficient way. And the way data availability sampling does this is what nodes do is that they sample. They pick these random parts of the data and they say, I want this and this and this, and I'll try to get it.

Starting point is 00:37:20 And only if I can get all of these, or maybe a vast majority of them, then I will consider the data to be available. And naively, like if you just do this on blocks as they are now, then it doesn't work. And why? Because if there's, say, just one piece of the block missing, the probability that you catch it is tiny. Because you would have to request exactly that piece of the block. And you want to only request a really small part of it. So the probability that you catch it is small. So that doesn't work because we know it like in blockchains, basically the thing with blockchains is like the bad things could happen anywhere.

Starting point is 00:37:59 Like even one single missing transaction could screw up the whole system. So you cannot allow any part of the data to be missing. Like just sampling directly doesn't work. So what you have to do instead is you have to first encode the data. And you encode it in such a way. And that's this is called. read Solomon codes, you encoded in such a way that any fixed fraction, for example, you can pick 50%. If any 50% of the data are available, then you can reconstruct the whole from that.

Starting point is 00:38:41 So you encode it in that way. And now the scaling becomes different because now you don't have to ensure that. all the data is available. You don't have to know that all the samples are available, but you have to know that 50% are available. And that's a task that you can do statistically, because if you download 30 samples, well, I mean, the correct way is saying like this, if someone is trying to hide, this is the attack we're trying to defend against, right? Someone is trying to hide the data somehow. If they do that, they have to make less than 50% of the samples available. If they make less than 50% of the samples available and you download,

Starting point is 00:39:21 for example, 30, then the probability that all of those will be available is true to the minus 30, which is one in a billion. And so it's really small. And by downloading 10 more, you decrease it by another factor of 10. So that is a scalable way of ensuring data availability. And that's the principle of hard works. AVE is the leading decentralized liquidity protocol. And now AVEV3 is here. AVE V3 has powerful new features to enable you to get the most out of DFI, including isolation mode, which allows for many more markets to be launched with more exotic collateral types.

Starting point is 00:39:56 And also efficiency mode, which allows for a higher loan to value ratios. And of course, portals, allowing users to port their AVE position across all of the networks that AVE operates on, like Polygon, Phantom, Avalanche, Arbitrum, Optim, Optimism, and Harmony. The beautiful thing about AVE is that it's completely open source, decentralized, and governed by its community, enabling a truly bankless future for us all. To get your first crypto collateralized loan, get started at AVE.com. That's AABE.com. And also check out the AVE protocol governance forums to see what more than 100,000 Dow members are all robbing about at

Starting point is 00:40:30 governance.aVe.com. Living a bankless life requires taking control over your own private keys. And that's why so many in the bankless nation already have their ledger hardware wallet. And brand new to the ledger lineup of hardware wallets is the Ledger NanoS Plus, a huge upgrade to the world's most popular hardware wallet. With more memory and a larger screen, the NanoSplus makes it easy to navigate and verify your transactions. And the paired Ledger Live desktop app gets you increased transparency as to what is about to happen with your NFT. What you see is what you sign. The NanoS Plus gives you the smoothest possible user experience while you're doing all of your crypto things.

Starting point is 00:41:03 So go to the Ledger website to check out the features of the new Ledger NanoS Plus and join the waitlist to get your. And don't forget about the Crypto Life card, also powered by Ledger. The CL card is a crypto debit card that hooks right into the Ledger Live app, right next to all the Defy Apps and Services that you're already used to doing, like swapping tokens and staking. So if you don't have a Ledger hardware wallet, go to Ledger.com, grab a ledger and take control over your crypto. Arbitrum is an Ethereum layer two scaling solution that's going to completely change how we use

Starting point is 00:41:32 Defi and NFTs. Over 300 projects have already deployed to Arbitrum, and the Defi and NFT ecosystems are growing rapidly. Some of the coolest and newest NFT collections have chosen Arbitrum as their home, all the while defyreterms continue to see increased usage and liquidity. Using Arbitrum has never been easier, especially with the ability to deposit directly into Arbitrum through all the exchanges, including Binance, FTX, Quibi, and Crypto.com. Once inside, you'll notice Arbitrum increases Ethereum speed by orders of magnitude for a fraction of the cost of the average gas fee. If you're a developer who wants low gas fees and instant transactions for your users, visit Arbitrum.io slash developer to start building your DAP on Arborative.

Starting point is 00:42:08 Arbitron. If you're a Dgen, many of your favorite apps on Ethereum are already on Arbitrum, with many moving over every day. Go to bridge.arbitrum.io now to start bridging over your eth and other tokens in order to experience defy and NFTs in the way it was always meant to be. Fast, cheap, secure, and friction-free. Right, right. And basically, building this entire system is why shipping bank charting is going to take a while. Prolo, can you walk us through? Like, in the meantime, in like the 48-44 world, how do we like sidestep that? Why can we get away with not having all this already live? So for the background here, first of all the merge, we separate Ethereum into a constant

Starting point is 00:43:00 player and exclusion layer. We are not throwing more data at the exclusion layer. but rather we continue to scale the consensus layer. And even then, we are only doing so with a limited amount of data. So we're talking about a month or maybe three months, some amount of data that is retained. After the period, we start to prune the data. So we can ensure that this is available for layer twos, for a sufficient amount of time for them to secure their network.

Starting point is 00:43:34 But then at the same time, it doesn't grow infinitely, like it doesn't grow indefinitely, where now you have a bounded amount of data to store on the consensus knots, and they can distribute this between the different beacon nodes. Right. And I guess just to clarify this for the listeners, the amount of data that we make available in Proto-Dang Chardin is less than the amount that we make available in full-dank-Sardine, correct? Right. So with Phil Deng sharding, we distribute the job of storing and propagating the data

Starting point is 00:44:14 between all the nodes on the network, between the different validators, whereas with EIP4-4-4, we still require all of the consensus nodes to acquire all of the blob data, but we limit this. We don't make it grow indefinitely, so we can increase the throughput. Got it. And can, what if you give me an estimate, like, you know, how much do we lower the cost of storing beta for day or twos with 4844? And then how much do we lower it further with a full sharding deployment? Roughly. Sure.

Starting point is 00:44:51 So current Ethereum blocks are anywhere between like 50, maybe tops like 100 kibytes. It's very favorable. If like worst case, it could grow a lot larger. But you are paying for call data. data that is going through the EVM and that's available forever. This is a very different type of data that a roll-up really needs. So instead, we can try and optimize. You can have this different type of data called blob compared to call data.

Starting point is 00:45:20 And we can grow it from this order of magnitude from like 50 kilobytes to maybe like a megabyte per block. And this is obviously this is already a huge increase that roll-ups could benefit from a reduced cost of it. And on the full dang sharding, it can go another order of magnitude larger because now we don't have to store all the data on one node, that we can distribute it across 64 nodes.

Starting point is 00:45:47 So we could have a multiplier here and how we tool the data apart. Got it. So it's like we get an order of magnitude increase with just 4844 in terms of how much data we can have and then we get another order of magnitude with full day sharding. And then one thing,

Starting point is 00:46:04 that's been interesting to me to learn as I've been spending time on this is the idea that like the demand for storing data in these blobs or in the full sharding system is like independent or at least decoridated from the demand to use Ethereum gas, right? Like there might be people who are willing to pay a lot to execute computation and people who are willing to pay a lot to store data, but they don't necessarily overlap. And it's like it creates two different markets. So what if you kind of walk us through, like, how we're like designing these two different markets and isolating them from each other to an extent? Well, so it starts with the transaction type, where you add this additional fee parameter. But with this fee, you create a different market.

Starting point is 00:46:56 And so if you really want to, you could separate the transaction pool. and the capacity and the type of resource is very different. I think Vitalek already wrote a post about a multidimensional EIP-1559 where we can try and think of all the different resources in Ethereum as different markets. And I believe Dengkrat already has a post on how EIP 1559 could work for this type of blob data instead of regular gas. Right. So it's like there's two auctions happening in parallel. One is people bidding. for like transaction computation and the other one is people bidding for storage and we can use kind of the same mechanism which we have already for gas and call data which is like weirdly

Starting point is 00:47:43 bundled to then to then separate it and have one 1559 that works for gas and one 1559 that works for shard data. Another thing we haven't touched on this a lot but Vitalik I think you mentioned them earlier on, but the Shrine design requires the introduction of KZG commitments, and they're kind of like a hash, but not really. I know Dancrod, you had like a great post about them. Do you want to explain

Starting point is 00:48:12 again, and sort of to non-cryptographers what these are and, you know, how they resemble the cryptography that's currently in Ethereum and differ from it? Yeah. Yeah, I mean, so I mentioned earlier at the end of explaining sampling how the data has to be encoded in this way that we call Reed-Solomon code,

Starting point is 00:48:36 which is a way to ensure that any 50% of the data can be used to reconstruct the whole data. So, I mean, I guess data is a bit misleading here because there's like the original data, which is the actual payload that we're talking about. But then the code expands this data, so it becomes twice as large in the process as well. And so what is a Reed-Solomon code? So a read Solomon code is, well, so yeah, we call it, it's a polynomial. So basically what it means is you've learned about polynomias, maybe in the mathematics classes. It's a certain type of function.

Starting point is 00:49:22 And basically this type of function has the property that, that, when you know it at a certain number of points, which we call the degree of the polynomial, then you know the whole polynomial. So basically we use that property in order to put this polynomial function through the data. And then if you have like a certain number of samples, which is like half the amount of the full encoding,

Starting point is 00:49:54 then you can get all of them. And the reason we need KZG commitments is this. Like when you just sample the data, there's one thing that you can't decide from those samples. And that is whether the encoding is correct. What if someone just like, Reed Solomon calls have a certain structure, right? They have a certain structure that allows us

Starting point is 00:50:22 to reconstruct the whole thing. But what if someone encoded it in a different way? What if they just put garbage in it, then every 50% of the samples would give you different data. And of course, that's not acceptable because the data has to be unique. It has to be that all 50% of the samples give you exactly the same data. And the way we do that, I mean, there are different approaches, but basically over the years we have ended up here where we just found this amazing type of commitment, a KT or KZG commitment, that you can basically, basically,

Starting point is 00:50:58 see as a hash, it's similar to a hash of data, but with a property that instead of hashing to just data, it hashes a polynomial. So it's a way to hash a polynomial function and basically reveal any point on it. And so that guarantees that this correctness of the encoding. And that's why we need these KZG commitments. Got it. And as I was reading about case of G commitments, one of the first things you stumble on as a non-cryptographer is they require a trusted setup. And, you know, I'd be curious for the viewers to walk through, like, what actually, what are the trust assumptions in a trusted setup, you know, like, what are the things that, like,

Starting point is 00:51:46 we are trusting in that setup? And as we make one to kind of enable this on Ethereum, is there things that, like, end users can do so that they can have kind of a higher assurance that the setup was performed correctly and like minimize the trusted assumptions they are making individually. Right. Yes. Yeah. So basically the trusted setup is like basically what we have to do is we have to generate

Starting point is 00:52:17 these elliptic curve points that have a certain relation. That's like one of the fundamental inputs of the KSG commission. scheme and the trusted setup is basically a way. And okay, and in addition to that property that they have a certain relation is nobody is allowed to know the actual relation between them. So this has to be a secret. And that's why it has to be this trusted setup. And it's called trusted setup because one of the ways to do a trusted setup is just to say, hey, we all trust him. Tim, you do it, and you give us the output. And then it's done and you throw it away and everything will be fine. But the problem is, of course, that's not really sufficient for the Ethereum community.

Starting point is 00:53:01 People would be like, well, what if? So instead, we have this way of distributing this trust and saying, like, we let many, many people participate in this trusted setup. And we can design trusted setups in a way so that if even a single one of these people that participate in it, did it according to the protocol. And the protocol means that you execute this whole thing, you run this program, send your output, and then you destroy your data.

Starting point is 00:53:36 Like you destroy the secret that you used to do it. You don't keep it. And if even a single person out of the potentially thousands that are going to participate, did this properly, then the setup is completely safe. So even like, let's say 1,000 people do this, 999 colluded and they all kept their secret and they come together and try to reconstruct it, but one person did it properly and they don't have it.

Starting point is 00:54:02 Even in this case, these 99 people know absolutely nothing that helps them to break it. So that's the security guarantee, which we call like n minus one. So even if n minus one collude, they can't get anything on that last person that participatory properly. And yes, I mean, obviously this property has a nice property that If you are really, really worried about this and I like, oh my God, how can I trust these people, then you can just participate. So like one obvious way is if you participated and you know you did things properly, then you don't need to trust anyone because you're part of it. And yeah, it's done. So like you can consider it secure.

Starting point is 00:54:45 Obviously there are also all the parts of like making sure all the software works properly and so on that we're all very familiar with in our. blockchain systems, like all of this, we're relying for the functional Ethereum as well. But so obviously it has all to be very well audited. And we need several implementations of this as well. But yeah, so that's, that's, I guess, like the other trust assumption that we have to make sure that all the software is safe and works properly. Right. So there's like kind of different levels. It's like either, you know, you don't care at all and you just trust that somebody somewhere has been honest. And maybe you just, it's not that you don't care, but you learn about Ethereum 10 years from now, which is basically what you have to do because you can't participate.

Starting point is 00:55:27 If you're part of the Ethereum community today, there's going to be an opportunity for people to participate individually. So then as long as you're confident that like your participation was correct, then the whole output should be correct. And then if you're even kind of more concerned, there's a specification from which we can write different implementation. So I assume, you know, if you didn't trust the existing ones, you could write your own and also produce an output or at the very least kind of review the code of the different ones and make

Starting point is 00:55:57 sure that they they match up and yeah, you would kind of get a high level of certainty that things are correct. Okay, and I guess we're coming up on time here. The last thing I did want to talk about is like, how do we actually get this deployed, which is the part I'm usually getting involved in. I know Proto, you started prototyping 4844 along with some other folks. Can you give just kind of a quick summary of like what was done so far in terms of prototyping and what do you think are like the next steps that the community can expect? Right. So earlier this year it started with an initial writer of what it could be like. Then during the HECATOM in Denver, it transformed into this software where we have an actual

Starting point is 00:56:52 implementation of the proposal. And over time, we have been improving that and testing that. And what we need to go and do from here is there are two different branches, right, where we want to further develop the client software to be able to make a test network. And we want to continue the development on this trusted setup so that we do have the cryptography, everything there on that side all set as well for when we do want to deploy this. and then once we have both ready, you can make larger and larger dustnuts and then eventually include it as an EIP via the AlkaDfts process into Ethereum Maynardt.

Starting point is 00:57:38 Got it, got it. So we have some initial prototypes. We want to product, like grow them, make them more robust, make sure the trusted setup is working according to plan. And then once we have that, it kind of becomes a normal EIP. We need to shepherd through the process. And then one thing that's worth knowing about 4844 is what's very neat about it is from the execution layer point of view. So like the kind of smart contract and end user transaction generation point of view, sharding is basically done then.

Starting point is 00:58:10 Like there's no more changes that will do for people to interact with this blog data. What will happen is then we need to deploy kind of this entire dank sharding infrastructure. on the consensus layer. But from applications point of view, that kind of just happens in the background. But at the consensus layer, Dancrad, like, what are the steps to get this deployed? Like, how many hard forks do we need to get there?

Starting point is 00:58:37 You talked a lot about like proposer builder separation before. So, like, what do you see is like the logical set of stepping stones to get us to the full shardine on a consensus layer? Yeah, I mean, I hope it's two. But I don't know yet. I think like so clearly like, I mean, this is the reason why we chose this stepping stone of protodank sharding, that it's a, it's something that gets a substantially closer to the full implementation. So it will become simpler. It will like the interface, for example, will stay the same.

Starting point is 00:59:12 The execution layer changes will be minimal once that's implemented. So that's why it's really nice if we can get that done in the Shanghai hard fork. And then, I mean, I think it's very unlike that will be the next hard fork after that. But hopefully, relatively soon, we will get, yeah, we'll get this. I, yeah, I don't know. It's currently still, there are still like some things we definitely need to work on. Like there's a lot of work on the network working to be done to have, yeah, full sharding rolled out. I currently have no exact estimate, but I would hope that that can be done in one hard fork.

Starting point is 00:59:58 Okay. I sympathize with not being able to give direct estimates about complex projects. So I won't push you on there. And maybe, yeah, to kind of close this off, like Vitalik, like if people want to contribute from a research or like engineering point of view, like what are like the big open questions in Shardingland that? they should spend their brain cycles on. I think one problem is definitely like figuring out the networking of data availability sampling.

Starting point is 01:00:33 Like there are designs that we have that work in theory. Like there's doing it based on subnets. There is the approach of trying to make a DHD much faster. There's a couple of other techniques in the middle. but like really taking those ideas and from an engineering perspective like just trying to optimize it really hard. Like how do you actually make a basically it's like a specialized scalable Dhty or we are publishing and downloading can happen extremely quickly. So that's one problem. And I think in general, like in the Ethereum ecosystem, the networking side is one of the sides that's talked about.

Starting point is 01:01:18 the least. I mean, possibly just, I think, a bit of an accident of history that the Ethereum core research community just happens to, like, never really have people who, like, spent a lot of time thinking about networking stuff. It's generally, like, people have spent a lot of time thinking about, you know, cryptography and get sentenced to economics. But it's still a really important problem. And that's a problem where I think it would be amazing if we can have more, like, very active networking expertise in Ethereum. So that's a short term, or that's like a very clear problem. Another problem is that with folding sharding, there's this issue of like how to combine it well with proposer builder separation. And there

Starting point is 01:02:01 there's some like economic challenges are still the challenge of like, well, how do you actually make a good proposal builder separation protocol? How do you add like censorship resistance lists so you can bypass censoring builders? And once again there, like we have ideas on each of those things, but there's still the question of digging into the details, combining a PBS design with the design of Ethereum's future proof of stake, which is something that at some point we'll probably start having to kind of talk and think more about, right? Like there's been this increasing effort within the Ethereum research and the protocol community of basically thinking through what would a better proof of stake design look like in the long term. Like do we want to have single solid finality? How do we achieve single solid finality? What other benefits can we achieve?

Starting point is 01:02:55 How can we offer more in protocol of what Lido offers to people extra protocol to try to reduce, you know, staking pool centralization incentives? and just generally increasing simplicity. So I've written a bunch on that. Like if you just Google for single slot finality, you can probably find it. But do that and the intersection of that and purpose builder separation and the intersection of that and the intersection of that and sharding is going to be another research area going forward. And then the other one is also adjacent to Deng sharding stuff. and also adjacent to 4444 stuff, which is very critical to proto-dicturing and take-shorting and actually being viable and just generally Ethereum scaling well is like creating the

Starting point is 01:03:50 as decentralized as possible and as robust as possible systems to give people the same guarantees that they come to expect out of Ethereum in terms of history retention, but without requiring participants in the core Ethereum Consensus Protocol to all be retaining blocks forever. Right? So there's a team working on portal. There's things like the graph. Like there's this big long list of projects, right? And trying to figure out how to make those better or even create better alternatives to those. I think also an important area. Oh, also one last one I wanted to throw in. This is important. The switch to layer two is I think one where, it's very important for the ecosystem to try to maintain and even improve its

Starting point is 01:04:39 decentralization going through that switch. And so, you know, we need light clients that can poke into optimism and poke into arbitrage up and poke into us dark net. And that's something that there's been a bit of like theoretical thinking work on. But it's the sort of thing that I think there is a lot of room for people to slot themselves in and like really try to improve that ecosystem a lot, right? like basically we try to like really think through if everyone's really going to migrate over to L2 over the next that's well to 24 months especially as protodiac shorting comes alive and the

Starting point is 01:05:13 roll-up costs go down even more like how do we really make sure that transition goes well and that it preserves all of the decentralization properties and even improves on those properties that we've come to expect of Ethereum yeah that's a really good list we have like five minutes left so I wanted to leave a bit of time for the three of you if there's anything you wanted to share that you think is important about shardine or 4844 or a theorem generally that like we haven't talked about the floor is yours whatever you want to rant about or get people to pay attention to I mean the last week has been kind of chaotic to say the least I think this is like the type of market where

Starting point is 01:06:03 If you feel a little bit down, like maybe read a post, try and get involved with Nissan projects. It's not the bear market, it's the builder market. Read the specs for the EIP. There is the site called EIP4844.com. So I'll get you started. And then there are simple diagrams and all the way down to like very elaborate post about the crypto. cryptography involved. And yeah, just reach out and get building.

Starting point is 01:06:39 Stay optimistic, by the way. I love it. Thank you, Vitalik. Anything else you want to share? Stay optimistic, but in the long term, hopefully, stay zero knowledge. Wow. Anything that you heard, that crowd?

Starting point is 01:07:01 Yeah, I mean, like, I mean, I think, I think the week has shown that you need solid designs and that we need to build things that can actually last and that are built to last and won't go away. And that's what we're trying to do here. And yeah, so I'm optimistic on this, but yeah, I am clearly also pessimistic about many other things that are happening in the ecosystem.

Starting point is 01:07:29 So we just have to be better and build better. Love it. Yeah, then that's basically a wrap. The bankless guys did ask me to end with a disclaimer. So here it goes. And I'm reading from the screen now, risk and disclaimers. Crypto is risky. You could lose what you pull in, but we're headed west.

Starting point is 01:07:54 This is the frontier. It's not for everyone, but we're glad you're with us on the bankless journey. Thanks a lot. And yeah, thanks a ton, prolo, Vitalik, Dankrad, for coming at a bunch of different hours across your respective time zones. I think this has been really helpful to explain the entire Sharirolet back to people.

Bankless - 119 - Dive into Danksharding | Vitalik, Dankrad, Protolambda, Moderated by Tim Beiko

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.