Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Trent McConaghy: BigchainDB – Scalable Public Distributed Databases

Episode Date: April 11, 2016

One of the major drawbacks of Bitcoin is its low transaction throughput. Maxing out only a handful of operations per second, there have been many proposals to scale it up so that it can compete with e...xisting distributed database technologies. As the blockchain’s demand continues to increase, it’s unclear if the Bitcoin protocol will ever be able to handle thousands, if not millions of transactions per second. BigchainDB is taking a different approach. Rather than trying to scale up blockchain technology, it starts with a big data distributed database, RethinkDB, and adds blockchain features and characteristics. Trent Mcconaghy, Co-founder and CTO of Ascribe and BigchainDB, joins us to talk about how this protocol may become to databases, what IPFS and Ethereum are to distributed filestorage and computing, respectively. Able to perform more than one million writes per second and capacities in the petabytes, BigchainDB has the ambition to become the world’s public database platform. Topics covered in this episode: A brief update on Ascribe since Trent was last on the show The motivations behind BigchainDB and that problems it’s trying to solve How BigchainDB plans to solve the typical scalability bottlenecks found in blockchain protocols BigchainDB’s capacity, performance and latency characteristics BigchainDB’s consensus model, applied to RethinkDB Potential application for BigchainDB, in both centralised and decentralised application stacks Episode links: BigchainDB BigchainDB Whitepaper Ascribe Left Gallery 23vivi This episode is hosted by Meher Roy and Sébastien Couture. Show notes and listening options: epicenter.tv/126

Transcript
Discussion (0)
Starting point is 00:00:00 This is Epicenter Bitcoin episode 126 with guest, Trent McConaughey. This episode of Epicenter Bitcoin is brought you by Ledger. Now accepting pre-orders for the all-new Ledger Blue Developer Edition, a Bluetooth and NFC touchscreen hardware signing device. Learn more about the LedgerBlue at LedgerWall.com and use the discount code Epicenter to get 10% off your first order. Hi, welcome to Epicenter Bitcoin, the show which talks about the technologies, projects, and people driving decentralization and the global cryptocurrency revolution.
Starting point is 00:01:01 My name is Sylvaincu. And I'm Meher Roy. Today we are going to be talking to Trent McConaughey, who is the city of Ascribe and Big Chain DB. Big ChainDB is aiming to be a new database technology that can enable eventually a planetary scale database. But before we get into Big ChainDB, let's have an introduction from Trent. Trent, your background intro, please. Sure, hi. It's a pleasure to be on here again, guys. I really appreciate whenever you have me on. So I will just make this one quick because I know you're going to dive into some of the details, but I started out, spent almost 20 years in the semiconductor industry doing AI for designing computer chips, machine creativity, that sort of thing. A lot of big data, a lot of distributed computing, a lot of other sort of pedal to the metal sort of coding and whatnot.
Starting point is 00:01:55 About almost three years ago, I started working on a project, which was for IP on the blockchain. with a specific focus on digital art, that became ascribe, the company and the product. And continuing to focus on that, as well as more recently, due to limitations we saw with blockchain technology, we have built another product called Big ChainDB and released that into the market in February. So we'll be focusing on that, I guess, in this call. And I look forward to talking about. Thanks for having. So maybe we can get started before we get into Big ChainDB and cover that topic quite extensively talk about Ascribe.
Starting point is 00:02:35 And since the last time we had you on, it was about 10 months ago, episode 76. If anybody interested, can go back and listen to that episode with Trent. So tell us how has the Subscribe moved forward since then. How has the product develop? And what's going on with The Scribe? Sure. So overall, ascribe the company, which has Big ChainDB as a product as well as Ascribe the product, and a third product called Wherein the Net?
Starting point is 00:03:01 I'll talk about that in a sec. So three products, one company. Ascribe itself, it's actually being quite true to the original vision, which is to make it easy for creators, artists, et cetera, to securely attribute their work onto the blockchain, time stamping, and etc. and not only securely attributing it, but also to make it really easy to license it for others, such that, for example, if you're a collector of digital art
Starting point is 00:03:31 or even physical art actually, if you're a collector, you can truly own that art via a transfer of ownership from the artist to the collector, or perhaps via a gallery and a consignment model, and then from one owner to the next to the next. So whereas before you had no providence, now you have perfect digital providence. So that vision is, as it always has been,
Starting point is 00:03:52 And it's been growing quite nicely actually. So now we're at about 5,000 artists using the system. There are about 9,000 or 10,000 pieces of work, 40,000 additions. We're seeing heavy usage from many sectors. One of the ones we're most proud of is a lot more people from the Creative Commons community using us. We've worked closely with Creative Commons, France,
Starting point is 00:04:12 on that and other organizations related. And actually, we've worked probably, there's about 20, 25 organizations that have been using us in other interesting ways. So of that, several art prizes, probably now a total of, I don't know, 10 art prizes, and also sort of the glams. So galleries, libraries, archives, museums, using us in various ways for things like archival, things like for selling us, selling artwork, etc. And then on top of us, we have a rest API. So this has matured a lot since February, March, April of last year.
Starting point is 00:04:48 And it's really rock solid. And because of that, we've actually had several people starting companies on top of a scribe. We've got about seven or eight startups now that are working on top of a scribe. The most recent ones are Left Gallery, which is an initiative from a world-class net artist named Harm Van Dorpel. So it's Lefti Gallery if you go to that. And a more recent one yet is 23VII, and this is a startup out of the USA, and they are sourcing, photography from social media and other places working with the people creating that and then selling it and they even have a secondary market now limited additions all this sort of thing
Starting point is 00:05:29 stuff happening with the scribe which we're quite excited about and I guess the other big thing you know it's sort of a big be actually quite directly is with a scribe we always saw that people were concerned about sharing their work online they would say okay okay, I claim that I own this work, that I have the license to this with all the legales. You know, we have a full-time lawyer, copyright lawyer that's worked out. All the legals and all the countries kind of make it general. But so they got the attribution, but they were concerned to share it because they have this
Starting point is 00:06:06 feeling that as soon as they put it online, that they lose control over it, that they don't have any visibility into where it's going on. So they would say, hey, can you encrypt it for us? Can you watermark it for us? And we're like, no, that totally goes against the point of this, right? That's, you know, it's sort of possible DRM and it's really not the way to go. And we said, okay, instead, let's shine light onto where your work is showing up. So what we did was we crawled the web, works out to 220 terabytes worth of text, if you limit a text.
Starting point is 00:06:37 And from that, we indexed a bunch of images, 15 billion images. So you look at the links inside that. We downloaded those and computed feature vectors from them and basically came out. up with. It's basically like reverse image search, except it shows you copies versus time. So you can see exactly where your images show up on the web. And that gives you visibility into what's going on. So you regain what would have been a loss of control via visibility. So, and this will lead directly. So we index these 15 billion images. We wanted to record under the Bitcoin blockchain. We were building on that. We wanted to record onto it that we
Starting point is 00:07:16 We had cited these, but we did some quick back of the envelope calculations. You know, the Bitcoin network, it starts to back up that about 1.5 transactions per second. You know, passed it with that, it'll take more than 24 hours for a transaction to go through. So run the numbers, run the numbers, and it will take about a century for 15 billion images to get recorded under the Bitcoin blockchain. So that was clearly a challenge. And it wasn't just these images, right? You know, we've been iterating with galleries and museums, etc.
Starting point is 00:07:42 Where they have a million images here, 20 images there, etc. have all these digitalization efforts. And any one of those who basically go-stopper for really putting the stuff under the Bitcoin blockchain. So it was really starting to hit us really hard about the limitations of the scalability of the Bitcoin blockchain, right? And people talk to about blockchain bloat at 50 gigabytes. I mean, my thumb drive holds more than that.
Starting point is 00:08:06 So clearly something was up. And that's kind of what led us to thinking about how can we approach this problem. You've indexed 15 billion images. What kind of server technologies are using to do that? Oh, we were running on AWS. Actually, AWS, when we started doing this, we hadn't really iterated with them a lot. So they called us up to the blue and said, like, who are you guys? What's going on?
Starting point is 00:08:30 Because our bill was $30,000 a month, actually. So we quickly became their friends. Do you guys have like an Amazon rep just working right in your office or something? For a while, yeah. I mean, they actually flew down a team from elsewhere in Europe just to come and visit us because, I mean, you know, what crazy little startup in the middle of nowhere. Actually, it's not the middle of nowhere. Berlin's an amazing city.
Starting point is 00:08:51 But what crazy startup, you know, has the audacity to go and crawl the web, right? So we did. And it's actually less about algorithms. You know, you have to have good algorithms, but it's more about commitment of time and resources. So we committed it because we saw it as another piece in the puzzle of sorting the creators of the internet. And so from that problem of being able to index and timestamp all of this data, this image data, you came up with the idea for Big ChainDB. Can you talk about how the idea came about?
Starting point is 00:09:33 Sure. So there was the 15 billion images challenge. And there's other ones too. One customer we talked to, potential customer, they have 20 million users. they're going through more than 100,000 images a day, right? So that would roughly double the throughput right there. And actually that particular customer we were talking to in 2014, in sort of fall of 2014.
Starting point is 00:09:56 You know, we had really sort of full-time on a scribe in the summer of 2013. Sorry, summer of 2014. And so even then we're like, hmm, you know, this is going to be a challenge. And I even gave a talk, actually, Brian runs this meetup in Berlin. And I've been talking with him over a drink one night about this problem of Bitcoin. He's like, you know, you should really talk about this. I'm like, okay, I'll give a talk at your talk. You meet up, sure.
Starting point is 00:10:21 So I did. This was in fall of 2014. And I said, hey, look, you know, like there's the Bitcoin blockchain. Here's the scalability issues, right? 1.5 transactions per second. You know, 50 gigabytes, people calling it bloated. And then you look around and you look at the internet. And you see, hmm, Netflix is, you know, using up 37.
Starting point is 00:10:40 percent of the bandwidth of the internet. That's interesting. And there's this thing called big data. Well, what's that about, right? How big is that data? And, you know, I knew about this world already quite well because I come from a world of big compute, right, running 1,000, 10,000 machines to do verification of memory chips, et cetera. So if you look into this, this world of big data, it's, you know, that's the buzzword, but really it comes down to distributed databases and distributed in the sense of compute resources are spread over more than one physical machine, whether it's processing resources or storage resources or whatever. So there's been distributed database technology out for decades, actually,
Starting point is 00:11:24 and it's gotten really good. And this is what powers Netflix. This is what powers Google. This is what powers all the big guys. But guess what? All the little guys, too, right? If you're a startup that quickly ramps to 10 million, even 15,000. million users, you're probably running an WOS or maybe it's error or something. And you're just
Starting point is 00:11:43 paying more money for more compute resources. And it's not just your server and your back office. It's actually just more machines being added to more physical machines being added to serve your customers. So there's established technology for this. And in the database world, it's, you know, for the storage side, it's distributed databases. And if you dive deeper, yeah, you say, well, you know, you think about, okay, if I put data under this database, does it get stored on every machine, well, that wouldn't make it scale. So it only gets stored in a fraction of them, right? So it's the idea that you're sharding up your data.
Starting point is 00:12:16 You know, let's say you have 100 records to store and you have 100 machines and you can store one record in each machine. But what if one machine goes down? So you have to make sure that you maybe have three or five copies. That's the idea of replication. But you don't have, you know, one copy, the same copy in every machine, full replication, because that would kill the scalability. So by doing this, if you have three or five copies instead of end
Starting point is 00:12:37 copies, as you add more and more machines, then the capacity goes up. And if you do things just right, the throughput goes up too. So even looking at the benchmarks, you know, in fall of 2014, we saw that Netflix had done an experiment where they had 50 machines and they were getting to 200,000 transactions per second. And as they added more machines, added more machines. it would go, the throughput went up, up, up, up, such that by the time they hit about 300 machines, they were doing more than 1 million transactions per second. So just when you say transactions, because I mean, we're used to using the word transactions in the blockchain space, but what you're actually meaning is rights per second in the
Starting point is 00:13:19 blockchain or in the database. I apologize. I was being slightly loose there, and you're exactly right. It's rights per second in that sense. Interestingly, if you look into the database space, they also use the word transaction at a lower level and a transaction can include, you know, two rights in a read or one read or whatever. But what I mean here is one database style transaction, which includes one read inside. So it's kind of funny.
Starting point is 00:13:43 There's the sort of transaction in the blockchain sense, transaction in the database sense, and there's also a transaction in the financial sense. So you can really get mixed up if you're not careful. So overall, yeah, I mean, what had led us to this was we saw, hmm, there's all this technology that is enabling big data distributed databases, this idea that as you add more resources, hard drives, etc., then you can increase capacity, storage, as well as increase throughput.
Starting point is 00:14:16 And that's pretty profound, right? And this is actually what powers all these big data challenges. This is why a smallish network is like these are a MasterCard. We're talking 2,000, 5,000 transactions per second in the sort of visa sense in this way or Twitter you know 5,000 per second. A bad ad network is 100,000 per second. A good one is 500,000 per second, right? So, you know, and compare that to Bitcoin, which is, you know, practically speaking, 1.5 transactions per second. So we're talking, you know, four, five, six, or just a negative difference. And if you want to
Starting point is 00:14:54 have a network, a database network, you want to support more than just one application, right? You don't want to be running, well, you might want to be running, say, just Visa or just SwiftNet or something. But if you're doing something on a global scale, you actually want to be handling a whole bunch of different use cases. So this is, you know, what we were looking at. But we also saw that even for our own needs, we can ascribe, you know, we have these 15 billion images that we wanted to index. So, and we wanted to put them on where it wouldn't cost us, you know. If we said, hey, we're going to pay the 10 cents per transaction or whatever it cost these days to store. $1.5 billion.
Starting point is 00:15:31 Can't really go to a VC for that one, can you? Let's take a short break so I can take it to Paris. I walked into La Maison du Bitcoin, the house of Bitcoin, in the heart of Silicon Centier, home to many startups, including Ledger. And I spoke with Eric Larchavec, Ledger's CEO about the all-new unplugged NFC hardware wallet. The Ledger Unplugged is an NFC-based hardware wallet that you can use with compatible Android phones.
Starting point is 00:15:57 The private keys are stored in a secure element and you can use them with wallets such as mycelium and grid bits. Each time you want to make a transaction, the signature will be done by the unplugged and this way your private keys is critical data will never be exposed to the Android phone. This is a secure way to use your bitcoins on the go in mobility and you will also be able to pay directly with the unplugged
Starting point is 00:16:27 with compatible point-of-sale terminals. The Ledger Unplugged is the simple solution for secure, contactless Bitcoin payments. You can get the unplugged at ledgerwollet.com, and when you use the code Epicenter at checkout, you'll get 10% off your order. By the way, that code works on their entire range of products. So we'd like to thank Ledger
Starting point is 00:16:45 for their support of Epicenter Bitcoin. So what a lot of companies have resorted to doing, and certainly what we're doing at Stratham is we're using Merkel trees to put large amounts, of hash data into one single Bitcoin transaction. Why did you feel that that wasn't a good solution? Yeah, well, we were looking around right, and we asked yourself the question.
Starting point is 00:17:11 There's two ways to get to scale. One of them is to start with blockchain technology and try to scale it up, using various tricks and ideas, maybe from distributed databases, maybe with Merkel, hashing, etc. The other way is to start with distributed database technology that already scales and try to blockchineify it. And of course, we had to figure out what the word blockchainify meant in a practical sense. I can get to that. But overall, it's really these two options, right? Blockchineify big data or big datify blockchain.
Starting point is 00:17:45 Now, if you look at the history of work that's been done in blockchain world, right? some of it goes back to time-stomping literature over the early 90s, but it really took off, of course, and then the cyphrupunks, etc., but it really took off with Bitcoin release. But if you look at the world of databases, distributed databases and so on, this goes back to the 50s and 60s. There's a much longer history, a much longer lineage, and a much larger amount of R&D that has been done
Starting point is 00:18:15 and technology that has been developed. And it's not just about scale. It's not just about performance in terms of throughput, capacity and latency, it's also querying. It's also permissioning and all of these other things, right? And if you want to do querying over a distributed database, that is actually a huge amount of engineering work that you have to do. You have to optimize to minimize latency, et cetera.
Starting point is 00:18:35 And the idea of a query itself, right, it goes back to a ton of amazing research in the late 60s, early 70s, that led directly to relational databases and what we now think of SQL, and we think of it as the most mundane thing in the world. But it's incredibly powerful. One line of SQL replaces. 500 lines of custom code for your one application, right? So if you're thinking about how do you manipulate data,
Starting point is 00:18:58 do you prefer to write one or a few lines of SQL, or you're going to write 50 or 500 lines of custom code? So this is basically the choice, though, right? You can take a blockchain and try to scale it up and somehow bolt on all this other work too with querying and permissioning. Or you can start with a distributed database and all that, 50 years, thousands of magic
Starting point is 00:19:20 manuals worth of research in distributed databases and then bolt on the benefits that you get from blockchain ideas. So we went for the latter. And we, as far as I know, are really the only ones that really went that way. But it's worked out marvelously. Happy to share our details. Cool. It does seem like a really cool idea.
Starting point is 00:19:42 But for starters, let's just walk through all the participants in a, let's say, a big chain DB implementation. right so when we when are for our listeners and us when we think of bitcoin we tend to think like okay there are different participants there's the people that are submitting transactions the transactors they are the nodes that verify transactions proof of work miners that create the blocks these are all the participants so with big chain dv you are eventually going to release let's say a public big chain right that is like bitcoin like a public database anyone can write to who would the participants in the system B? What are the different categories?
Starting point is 00:20:21 Yeah, sure. So this is probably drilling a little too deep earlier in the call, but I'm happy to talk about it. So first of all, Big Chain DB can be deployed in private scenarios and public scenarios. You know, people can take it right now and put it out as a public network for themselves right now, right? We are working on rolling our own deployment for a public one, working with a whole bunch of participates on that. And when people go to use it, anyone will be able to read anything. Anyone will be able to issue an asset. Anyone will be able to transfer an asset.
Starting point is 00:20:59 And there can be other permissionings, too, if you like. But for the public one, it's basically going to be very, very, very open. The one thing that isn't as open is simply valid. But right now, if you want to be a validator on the Bitcoin network, you have to spend $50 million for the compute hardware to have any chance. And, you know, that's basically for the public big chain DB deployment that we're working on. One thing I'd like to talk about briefly just before, you know, we kind of drill down to this too much, is just to go back to Sebastian's question. When it comes to, you know, when you want to scale a public, it's kind of clear everyone kind of understands sort of what as a noun does. But when you want to take a distributed database and blockchinaify it, you know, what does blockchain mean as an adjective, right?
Starting point is 00:21:50 What does it mean to blockchify something to make a blockchain database? So we actually looked around and, you know, we had spent, you know, two years already in the space, right, since summer of 2013 working on a scribe. as the project than the company. And we had a very good idea of the different benefits that emerge and sort of the technical ideas that led to them. So we like to define blockchain as an adjective, meaning three specific things, decentralized, immutable, and assets, decentralized as in no single entity owns or controls,
Starting point is 00:22:27 immutable as in tap resistant, as in more tap or resistant than a traditional logging databases, because databases already have logs. And Tampor, sorry, and assets as in assets can actually live on this database, of which you need a prerequisite of the first two things, decentralized and immutable. And live as in you can issue assets, you can transfer assets, and they can just kind of be there. You don't need to have some other sort of means for them to exist.
Starting point is 00:22:59 You can treat that as the main means. And of course, you can have Ricardial, and contracts and all these other sorts of things around it too. But overall, it's just the idea that assets can be issued and transferred within this database network. So that's really key because, you know, with that definition, then we asked, okay, we want to decentralize this. How do we do that?
Starting point is 00:23:23 Okay, we want immutability. How do we do that? Okay, we want assets. How do we do that? And that's what led us down the path. You know, how do we do that on top of an existing distributed database? So what are the kinds of characteristics that your approach enables, in terms of performance, transaction latency, right, latency, etc.?
Starting point is 00:23:45 Yeah, yeah. So overall, it's the traditional performance characteristics that you might see in a database that people measure. But also, what is that our customers care about and whatnot, right? And, you know, ascribe has been our lead customer. So ascribe, you know, one of the things was capacity, right? We've got 15 billion records that we want to write, but not just that, you know, another 20 million here, another 20 million there, etc. So you have to have the capacity to store these records as in the metadata, right? The media blobs themselves, databases aren't designed to store those directly, right? This is where file systems come in well, whether it's S3, whether it's IPFS running on S3 or IPFS running in someone's garage, right? So media blobs are well suited to things like S3 and IPFS. So it's really about capacity as one. Another one is throughput.
Starting point is 00:24:38 That's really important because if you can only get a few transactions per second going through, you'll never have something, you know, even city scale, let alone planetary scale, right? And there's lots of examples, right? I can throw a rock and find a banking app that will need 1,000 transactions per second, let alone 10,000 or 100,000. Anything IoT level, right? You know, even a small IOT deployment will have 1,000 or 10,000 nodes, at which you can easily hit 1,000 transactions
Starting point is 00:25:04 per second, right? Or energy metering, right? If you've got, you know, 50 million meters, numbers I hear are on the order 50,000 transactions per second recording what's going on in those meters, right? So capacity, throughput, another one that's very important, depending on the application is latency.
Starting point is 00:25:21 So it doesn't matter so much in things like art, right? Where, you know, an artist makes a piece every now and, you know, maybe one piece a day or one a week or maybe one a month. And, you know, that piece maybe gets resold, maybe five times in 100 years, right? But it does matter for other applications, things like financial, right? So if you're doing FX trading, you're dead in the water if you're running 100 milliseconds, right? So, but you can get away with 30 milliseconds. If you're doing high speed trading, you're actually dead in the water if you're doing one millisecond, right? So there you need
Starting point is 00:25:52 super dedicated hardware. You need to be running on in-memory databases. You need other dedicated hardware. You need to be right next to the trading floor, et cetera, et cetera. And there's lots a deep, deep engineering optimization that you need for that. And, you know, what's cool is the team inside a scribe, it's a bunch of engineers who have come from pedal to the metal engineering, right, semiconductor design, aerospace design, chemical engineering, etc. Where we're used to doing things where they're running out a million transactions for second, etc. So this does not scare us. This is, to us, it's exciting, it's a challenge. So that's three things then as sort of key performance characteristics that we like to think about, capacity, throughput, and latency.
Starting point is 00:26:31 There's others, but those are really important. And then besides that, I guess, sort of for database, sorry, you'll see that we list this elsewhere too, query. So the ability to query in the first place. And then efficient querying, so how efficiently does it look up? And querying also includes how do you do a write, right? So a query isn't just about to read, it's also about it right? And then finally, permissioning, right? So does it support permissioning?
Starting point is 00:26:57 How does that work? So those are sort of characteristics that matter. the last one's measurable, benchmarkable, the first ones are. Yeah. So, yeah, so let's review the five things. So one is
Starting point is 00:27:08 raw capacity, like how much, how much aggregate data can a particular database technology store? Like in Bitcoin, that's 50GB. Yeah. The second is
Starting point is 00:27:20 throughput, which means like, which in Bitcoin we can think of as transactions a second. So, you know, three transactions a second or whatever. The third thing is latency. How much time does it take?
Starting point is 00:27:30 since the client submits a transaction for it to be irrevocably confirmed. Right? So in Bitcoin, that might be an hour. And then it's squaring ability, which Bitcoin doesn't have, right? Yeah. Is that right? Yeah. Okay.
Starting point is 00:27:45 And then finally, it's, what was the fifth thing? Quering ability and permissioning, right? like being the ability to restrict certain participants to predefined roles. So in terms of benchmarks, have you benchmark the system? And for the first three things, storage capacity, transaction latency and transaction throughput, what are the kinds of results you've got for Big ChainDB? Yeah, for sure. So I'll just go in the order that it's gone before.
Starting point is 00:28:19 So for capacity, and basically maybe before I get there, right? a lot of the people in the audience, I'm sure, are very familiar with Bitcoin. One of the things that Bitcoin has, it's fully replicated, right, which means that every single node stores all the data. But if you want to have any sort of scale at all, in any size at all, that means that every single node has to be storing a huge amount of data, right? You know, and it's sort of like, it encourages centralization because who's going to have the capacity to store, you know, 50 xabytes of data or even a petabyte of data, right? So it's kind of interesting. And even the bandwidth that support that.
Starting point is 00:28:58 So having a smaller number of replicates actually makes a big difference. So capacity, if you say I'm going to have a replication factor of one or three or five instead of a van, what that means is as you increase your total number of nodes, then you are increasing your capacity, right, in a linear fashion, right? Replication factor of one means two X to nodes,
Starting point is 00:29:25 to X capacity. Replication factor of three, it's still linear relation, right? So linear scaling and capacity. We set things up where if you're running on an Amazon Web Services Excel instance, that's 48 terabytes. And working with the database technology we work with, each you can have 32 charts. So 48 terabytes times 32 shards gives you more than a petabyte.
Starting point is 00:29:53 So we talked about that on the paper. We claim petabyte capacity. That's on that example there. There's other ways you can roll things where you have even higher storage per node, but that's one example. On the next one, throughput. So this one was very important to us to do a good job on, because as framing towards this database that
Starting point is 00:30:14 can help to power the planet, how do you get to a scale that helps, right? And by the way, to global email, we ran some numbers. And by our estimations, email is running at 3.2 million transactions per second. So there's 3.2 million emails floating around the world on average every second. So, of course, there's spikes and stuff, but that's what the average is for one day. So throughput, it's interesting for orders of magnitude, right, to get a feel. So for throughput, that was really our aim was to actually get to 1 million rights per second.
Starting point is 00:30:47 And we actually had a lot of iterations. When we started working in the Big ChainDB project in the late summer of last year of 2015, we said, okay, whatever we do, we have to make this thing go fast. So we designed it in a way where one of the constraints we realized was as you add nodes, you have to let the throughput go up, right, in the way that traditional distributed databases do. So that was a very explicit design decision that we made. And as we were going along, then, we are basically experimenting with different approaches, designing different algorithms, trying them, seeing what they got in the way.
Starting point is 00:31:29 And we managed to sort of structure things such that at the end of the day, our algorithms that added decentralization and mutability and assets didn't get in the way of the raw performance of the database itself. And that's really, really key. You know, the database itself, what it does, this is, we can get into this. but what it does on its own is ordering. Like that's the core thing that a database does, is ordering. You know, you've got this replicated,
Starting point is 00:31:54 you know, the theoretical approaches, you've got this replicated state machine or state machine, and it's creating each one of the server nodes is creating a log and you want all the logs to kind of in sync, right? And so distributed databases do this, right? This is actually the essence of them and then you have this all this infrastructure on top. So what you have to do is you have to make sure that,
Starting point is 00:32:16 You have to let the database, the distributed database, keep doing its thing, creating that log, creating that log, each machine, each server having it, maintaining its own state. And then all these things that we do on top, just get underway while, you know, achieving our goals of decentralization and mutability and assets. So for that, as we went along, we managed to get everything out of the way. And then we said, okay, great, now we can benchmark this distributed database itself because all the other algorithms are out of the way. and we benchmarked, benchmark. Then at first, the numbers were so-so, but not amazing. We actually discovered several bugs in the distributed database that we had decided to work on. And so we iterated very closely with them with the developers of the database,
Starting point is 00:33:01 and they fixed the bugs. They had very good turnaround. They were really excellent. So we actually helped them to achieve their database to get the scale they needed. And this is RethinkDB. They're really great company. It's a Jason-style document store we built on. So we basically inherit all the benefits of RethinkDB, including performance.
Starting point is 00:33:19 It's designed as a real-time database. Jason Store, which means it speaks the language of the browser, JavaScript, and Jason, and has excellent scalability. So in the end, we managed to have this plot that we're very proud of, which is showing the increase going up to 1 million rights per second as we increase up to 32 nodes. So that's throughput. latency, this really, the big, big bottleneck in latency is the thing that slows you down.
Starting point is 00:33:51 It's kind of funny. It's the speed of light, right? So the speed of light is slow. It really is. You know, if you're, you know, trying to do, even if you have a database technology that is perfect, that takes zero time to do anything you want, you know, infinitely fast, you'll never able to do be able to do FX trading in a WAN setting because, you know, 100 milliseconds. is the limit for FX trader. Yeah, it takes you about, I believe it's 150 milliseconds to go halfway around the planet. So a round trip, 300 milliseconds, I forget the exact number.
Starting point is 00:34:26 It's too slow. So it depends on the application. So latency really depends. If you're in a single data center, then, or within a single region, then you don't have to worry about the travel that way and it starts to matter. But you can actually get things down to the speed needs of things like FX trading, right, with right optimizations.
Starting point is 00:34:49 When it comes to the other things where you're, like, for example, the public big chain d, where it is a WAN setting, you're going to have delays, but that's okay, right? The consensus algorithm is designed to handle that. That's actually what a consensus algorithm does. It's about keeping all of the different nodes in sync and knowing what isn't in sync and knowing what is in sync and knowing what is. right. So, and, you know, for everything to be in sync where you're more satisfied about it, it might be half a second. That's okay, right? Because typically the full WAN settings that you need, this is things for like art and diamonds and IP. That's fine, right? If you really need speed and
Starting point is 00:35:28 when, then you have some hierarchical structure. And I see this, right? You know, people might do FX trading in New York and they might do it in London. And then they have some sort of way to reconcile that, but sort of at a different level. So that's latency. We continue to do very extensive benchmarking. We've got engineers dedicated on that full-time, benchmarking these things in more and more detailed ways, and also understanding the effect of different faults, et cetera,
Starting point is 00:35:56 things like, you know, as you have a heavier, heavier DDS attack, how does that impact throughput? Right. So this matters. And then finally, towards your question of scalability, I guess I implicitly answered in the other ones, right? So as you increase the number of nodes, capacity goes up, throughput goes up. L latency goes down simply because you need the majority of nodes to agree on this, right?
Starting point is 00:36:17 So if they're spread out, then, you know, on average, you've just got statistics working against you. But it's not, you know, horrible slowdown. And typically, you know, it's not like we're going for a thousand server nodes. You're typically talking, you know, on the order of 10, 20, 30 server nodes, right? So, and that's a very different thing. you can have thousands or tens of thousands, hundreds of thousands of clients. That's fine. But for the server notes themselves, it's actually a relatively small number.
Starting point is 00:36:44 So that's why, you know, latency doesn't totally kill you that way. Today's magic word is big, B-I-G. Head over to let's talkbidcoin.com to sign in, enter the magic word, and claim your part of the listener award. So if we're talking about the public big chain DB that will be released, so I think we can all agree that the latency will not be the same as if you're operating in a private setting with servers that are close by. I mean, that's just logic.
Starting point is 00:37:27 Is there at some point where throughput and latency and scalability sort of cap out when you reach a certain number of nodes, or are we talking about infinite scalability? I mean, I guess the latency, you're limited there. But the scalability and the throughput aspects. Yeah, yeah, of course, yeah, latency, you're limited. Silly speed of light, darn physics. But with capacity, right, the way that RethinkDB is designed right now,
Starting point is 00:37:57 it has a fixed number of shards. You can keep swapping out any given shard and actually having more sort of virtual storage within each. That's one way. The other way that we see is simply having databases sitting side by side by side by side where they have basically a shared namespace with things like IPLD, etc. So we can get under that in a bit if you like. But overall, it allows the sort of horizontal scaling because you can actually have, yeah, database is sitting side by side by side. That's it.
Starting point is 00:38:29 It's still nice to have, you know, as much as you can in a single database because it helps to make the querying much more efficient. but you can actually go from one to two to five to ten plus databases. Okay. So this week, actually, you announced your partnership with RethinkDB. So correctly, if I'm wrong, but so Big ChainDB is built on top. It adds functionality to R rethink DB. That's correct. Okay.
Starting point is 00:38:57 So RethinkDB is a distributed database system, as you mentioned, as a JSON store in a sort of traditional distributed database. schema that we might think of, and then big chain DB adds all of the blockchain, I'm putting in quotes here, features such as permissions and consensus, et cetera. Can you talk about that partnership and how those two services work together? So before you answer, so I installed it earlier, and the way that it works is that you install rethink DB first, and then you install Big ChainDB, and I guess it's sort of an extension to that technology. Yeah, sure.
Starting point is 00:39:44 So basically it's worth mentioning. You mentioned, you know, permissioning consensus. So RethinkDB actually, up until very recently, they didn't have a permissioning system shipped. They've been working on one since last fall, and they just shipped one actually this past week as well. So they have a very nice permissioning system. We have different type of permissioning on top.
Starting point is 00:40:07 So if you think about a transaction in a blockchain type setting, permissioning is based on private keys. And which signatures do you need sign to go through things like multi-sig crypto signatures, this sort of thing, right? And so we have that sort of permissioning. And we really sort of a version, one of our transactions, that were just a very simple transaction. and one input, one output, not much more.
Starting point is 00:40:36 And we're in the process of releasing a V2 that has all these much more sort of fancier features, multiple inputs, multiple outputs, multi-say thresholds, and sort of a more general way to specify some crypto-constraints. It's not turning complete, doesn't have loops, and none of that, it's not trying to because the smart contract decentralized processing
Starting point is 00:41:00 is complementary. But it makes it easy to do, things like escrow at volume. So basically, permissioning actually kind of breaks down in a couple ways, right? There's these things like crypto conditions and then there's permissioning in the traditional way where, you know, this identity on this network has the ability to issue assets or this identity has the ability to read. We are holding back from actually implementing that ourselves until RethinkDB came out with their permissioning system. Now that it's out, we will be moving towards supporting that more directly in Big ChainDB.
Starting point is 00:41:35 So we talked about it in the way paper. It's one of the things in the way paper that it wasn't out right on the very first release, but it's coming down the pipe. You also talked about consensus. So this is actually, you know, really important to stress. A distributed database by its nature, how do the nodes keep in sync, right? How do they, how does the data keep in sync from one to the next to the next? That's a consensus algorithm, right?
Starting point is 00:41:57 And even modern consensus algorithms go back to, research at Microsoft from a guy named Leslie Lamport, he wrote this really wonderful paper in 1982, where he actually defined the Byzantine General's problem, right? And then proposed a solution to it. And over the years, that technology got better, better, better, that line of research.
Starting point is 00:42:22 In 1990, he came up with an algorithm. He actually said to prove that it wasn't possible to do a certain class of consensus, to solve a certain class of consensus problems. And in the process of trying to prove it wrong, he accidentally discovered the solution. And it's called Paxos. He tried submitting the paper in 1990. The referees thought it wasn't interesting enough.
Starting point is 00:42:42 So he's like, ah, forget about it. But then things got more important again with the rise of the web. And he finally published the paper in 1998. It's notorious for being hard to understand. But it's actually more straightforward than people realize too. And there's nicer explanations out there now by Lamport himself as well as others. And there's derivatives of Paxos out there. things like Raft and many others that have approved upon this over the years.
Starting point is 00:43:07 It's a well-established, you know, consensus algorithms are well-established technology. Once again, this is one of the core technologies that helps to power, you know, big data and the modern internet. So we, inside RethinkDB, it uses Raft, and therefore the very, very core, big chain DB, there is Raft as well. We have a consensus algorithm on top as well. Okay. Did you explain how it works and how it's different from, say, proof of work? Yeah. Sure.
Starting point is 00:43:35 So we have a two-layer-consest algorithm, and maybe I'll take a step back. Sure, happy to. So proof of work, basically every 10 minutes, there's a lottery to decide on one of the server nodes, aka miners, gets to say, gets to make a call on what transactions get committed to the, the chain and he so and that lottery is based on how much electricity spends assuming equal basic power etc so that's fine you know these days it takes a lot of money to have any fighting counts at all but that's kind of proof of work right so it's it's lottery based once every 10 minutes one one node makes a vote and and then you know on average 10 minutes later the next you know someone else gets to win etc etc etc etc and it follows this longest chain rule which means
Starting point is 00:44:29 to be really sure that you really do have the longest chain, you probably have to wait around for two or three or four, five, six transactions. And even in the original Bitcoin Waypaper, he talks about the sort of the probability that you've got the longest chain based on how many how many transactions have gone through or how many blocks have gone through. So it's a consensus algorithm, but it's actually quite slow, right? You're going to have to wait for 10, 20, 30, 40 minutes for it. That's proof of work, proof of stake. Traditional proof of stake is basically your, probability of getting to vote to say good yes or no is proportional to how much money you have in the system, how much intrinsic internal cryptocurrency, etc. There was clearly problems with that.
Starting point is 00:45:14 People have identified many attack vectors. So people started engineering improvements around it around it. And what we've arrived to today with proof of stake, it actually looks a lot more like a federation. And maybe I'll describe federation and come back to proof of work. The federation is simply, a basic federation is, you know, you've got five nodes, 10 nodes, 25 nodes, whatever. And for some transaction to go through, some quorum has to say, yes, this is good. It could be three, it could be five, it could be a majority, whatever. A majority or two-thirds is typically a good idea, depending on what sort of assumptions you're making. So that's a federation.
Starting point is 00:45:51 How do you choose? Who are the members in the federation? There's various ways, right? The original Hyperledger project, for example, was based on saying, do you have an SSW. sell certificate. Or you can say, hey, you know, Bank A, Bank B, Bank C, Bank D, you're all my buddies. I'm Bank E. We're going to make a Federation together. We know each other's public keys. That's my list of approved notes. We're good to go. So proof of stake, how it's evolved with things like Casper, et cetera, is it's a federation with very, very dynamic membership
Starting point is 00:46:20 based on a cryptocurrency. And that's cool, right? It basically means you have a very open membership because you have a lot of the complexities, but the complexities are basically mostly around, you know, the rules for membership. But you can have very simple rules too, or you can have external governance, et cetera. So that's sort of overall, the really three main types of ways to approach this
Starting point is 00:46:45 is proof of work, proof of stake, and traditional federation style consensus. And then sort of on top of that, though, you know, when you have consensus, you could say, I'm tolerant to just crash vaults, I'm tolerant to crash faults and arbitrary malicious faults. And then on top of that, you could say, I'm going to allow fully open membership to let anyone come in and do anything
Starting point is 00:47:06 and maybe be an authenticator, right? And that's a citizenship thing. This could be the proof of work system. You know, of course, to be, have any chance at all, you need to spend a lot of money on mining equipment. Or another example is something like the Stellar Consensus Protocol where, you know, everyone makes a call on, who else gets to, like, who do they trust?
Starting point is 00:47:31 So if I want to trade with you, Sebastian, then, you know, you've decided that you've got, say, 10 people on a list that you might possibly trust. I've decided that I've got 20, and there's an overlap of four people. So it's really, it comes down to those four people that we trust together. So with that, what is the consensus model in blockchain DB? Sorry, sorry about that, yeah. No, no, no, it's great to, it's great to give a broad overview of all the consensus model. also it's...
Starting point is 00:47:58 Drill down, yeah, no, it's important to understand. But there's a long history, yeah, there's a long history to this, right? Consensus did not start with Satoshi. There's this huge belief that he did and he walks in water and a bunch of other things. And that's fine. If people want to, you know, not look into the history of computing, but we care about the history of computing because there are a lot of great ideas. So what we have inside Big ChainDB, it's basically two-level consensus.
Starting point is 00:48:25 At the lowest level is... is within rethink D.B and that is Raft, which is part of the lineage of Paxos. It's designed to be more modular, more easy to use, sort of like building blocks that are sort of high impedance to each other, so it's sort of easy to compose them and reasonable. And it is crash-tolerant, but not Byzantine fault tolerant, not fully tolerant to arbitrary attacks. One level up, and this is part of the big chain DB server code, etc. We have technology. We call it the big chain consensus algorithm, and it does voting.
Starting point is 00:49:09 So basically, remember how I mentioned before, it's all about the order, it's all about the order. You want to get out of the way of the database when it's writing. So we let the rights come in. We don't, one node will vet them, but then we write as soon as we can, as soon as we can, as soon as we can. into the the ones that are there for good but they're not vetted. We just let them
Starting point is 00:49:31 right. Why? Because it's much faster to just let it write, let it write, get out of the way but then after the fact what we do is we let the Federation nodes vote. They basically each says yes this transaction that's good or no it's not
Starting point is 00:49:47 it's good if every single transaction inside is good. The block isn't good if any transaction is bad. So that emerges as consensus at a higher level for that block. And if you think about it, you could actually this one transaction at a time, but we simply group them into blocks for speed because it takes time to hash a set of transactions and write it, et cetera, et cetera. So we group them for speed, no more or less. And if a block gets, you know, if the majority of nodes
Starting point is 00:50:24 voters say yes this block is good then it's considered good and then anyone else doing reading afterwards it has knows that it's usable that it's a truth if the nodes vote and the majority
Starting point is 00:50:40 says it's bad then it's considered not truth and any transactions that are still possibly okay get copied back into this sort of incoming buffer and and try it again. So basically in this mechanism we've got two levels of consensus. The lower level is fault tolerant and the higher level is the lower level is crash tolerant. The higher level is
Starting point is 00:51:06 crash fault tolerant plus. So it's got a bunch of other mechanisms in place to validate and verify and as time goes on we are making this more and more resistant as well towards a Byzantine fault tolerance and in fact we have some pieces of technology right now that we're iterating against to make it do that in many settings so as a summary so two levels of consensus the lower level is raft which is not Byzantine fault tolerant but crash fault tolerant and then the layer on so so there's basically one level of consensus that tells all the nodes what data to put in and it takes all sorts of data like you know correct transactions incorrect transactions incorrect
Starting point is 00:51:51 transactions, etc. And then once all of the transactional data is inside the database, then another level of consensus, which is nodes voting to order these transactions into blocks, into valid blocks. Not quite. So it's not voting to order the transactions in a box. So the thing is, like even a fault tolerant database, you know, running raft, whatever, right? It's already ordering. Databases do this. They need this to do the logs, right? It's built into the database, right? Satoshi didn't invent ordering, right? So, and you get out of the way and let the database write these orders itself, right? So at the lowest level, all the ordering is being done. And actually, it's being grouped into blocks there too, because you can, the thing at the higher level that's
Starting point is 00:52:35 being happened is, it's voting on whether any given block is valid or not. And it's valid as long as any transaction in the block. Like, as long as all the transactions in the block of value. Okay. So the lower level is imposing the ordering and the higher level might be imposing say that no extra money is created or the kind of, yeah. So basically, like, what prevents double spending is the higher level of consensus. And the lower level tells us what transactions happened and which came first. Exactly. And this is absolutely critical to performance.
Starting point is 00:53:10 So basically, the bottleneck is in rights, right? So if you can actually manage to get the throughput for writing, really, really high by, you know, letting the distributed database do its thing, do its thing, right, right, right, right, right, right, right. Then you're okay. And for the writing, but then you have to say, okay, should everyone trust what just got written? And that's what this higher level consensus is about, right? And that higher level, though, it's actually just voting as well. You know, each one of the nodes that's doing its own writing. So there's, but there's no sort of central blocker, right? So, So this is sort of the breakthrough.
Starting point is 00:53:50 It's sort of reordering what's happening, letting, you know, ordering happening with this, you know, big data distributed database technology that, you know, runs fast already. But bringing in the decentralized control via the nodes having a say on whether a block is valid or not. So, like, one of the interesting things in your white paper is, is sort of your claim that,
Starting point is 00:54:17 Big chain D.B implementations can't fork. Is that correct? And why is it so? Yeah, so this comes down to once again, if you think about a distributed database, a traditional one, right? Each one of those server nodes is writing a log, right? Just writing a log, one order, then then the next,
Starting point is 00:54:34 then the next, then the next. This is the essence of a consensus algorithm, a traditional one, right? Like Paxos or raft or what are these? So it's not like Paxos says, okay, suddenly imagine you're writing a piece of code and it's a four loop. Right?
Starting point is 00:54:49 Can that four loop code suddenly be mutated such that it's like splitting into a nested loop? You know, like it's just not kind of in the code at all anywhere. You don't have the constructs, right? So within distributed databases, there is no construct for a fork because just those, you know, abstractions haven't been built in. They would have to be built in at the very core, right? And that central problem of sorting out ordering, like solving ordering is exactly what Paxos and Raff do. And it's just an order. That's no more or no less, right?
Starting point is 00:55:20 You can't, they're just slotted in there, and that's it. And it's sort of how the order is resolved is based on what's getting stored within each server node. And then what is the protocol of rules around how to interpret that from each other server node. So basically order is, like order is, comes as a first class citizen with traditional consensus algorithms like Paxos.
Starting point is 00:55:46 and forking is not even in their vocabulary. So let's talk about some of the applications for BCHNDB. What I really like about the website is that you have some really nice illustrations there on how BCHNDB can be implemented and sort of more, I guess, traditional centralized infrastructure where you would have like an Amazon web services instance and a database, etc. and all the way to having it completely decentralized with Ethereum running as the VM and IPFS as the file system and the Big ChainDB as the data store. Could you talk about sort of the different use cases and what would be some practical applications for each of those types of scenarios?
Starting point is 00:56:37 Sure, sure, yeah. So to sort of summarize, right, there's a framework in which you can use. or a style of deployment where you can use Big ChainDB where you keep your stack the same, right? And you just, and right now, maybe you're running in your stack. You've got, you know, five different databases. You've got, you know, maybe MySQL for your SQL database and maybe MongoDB for your NoSQL database and maybe, you know, Neo4J for a graph database or something. And maybe some others too just for fun, right?
Starting point is 00:57:07 It's pretty common these days. So guess what? You throw in another database, right? This one, though, is special, Big ChainDB, because you, you, You as yourself don't own or control that yourself. It's you and this federation of other organizations in your ecosystem, right? So as far as your system is concerned and your developers and your CIO, etc., you've got this other database that you're, you know, you've got hooked into your system,
Starting point is 00:57:32 but it's special because it's data that you and the other organizations in your ecosystem are working on against together, right? So that's really cool because it means that you can, you know, maintain the rest of your stack. You don't have to go all in. You can just bring extra benefits to some of your existing applications. Or you can also, another way to think about this is you're thinking about a new application, but you only really need to have, say, time stamping, right? You don't need to have full on decentralized processing or full on decentralized media storage. So great. You have your team of developers that understand modern web stacks
Starting point is 00:58:13 or modern enterprise stacks. Just use them, right? Keep doing that stack. Now you just have a new database that you can plug in and use to get some of these main benefits of decentralized. I mean, we know this very well because, you know, this is how we built a scribe for IP. You know, we came up with this idea in 2013, you know, long before Ethereum, you know, the Ethereum wait for theater came out six months later. And, you know, we, we came up six months later. And there had been talk of this, you know, in various places and stuff. But what's cool is, you know, Ascribe itself is running on a modern web stack, all the components you would expect running in AWS, Roku, et cetera.
Starting point is 00:58:51 And at the place that where the rubber hits the road, that really matters, you know, who owns what? That is on a decentralized database. Right now it's on Bitcoin. We're in the process of porting it over. We will fully deploy it when the public begin. chain DB, it goes live. And so that's a good example.
Starting point is 00:59:13 And so I'll talk about general, and then I'll talk about some very specific use cases. So the general, there's sort of the partly decentralized stack that I just described, which is really easy to adopt, especially for the enterprise. The other stack is sort of this full-blown, fully decentralized stack, which we're very excited about. Of course, it can be deployed in a private setting,
Starting point is 00:59:35 many people are. for things like business logic or for lots of other applications. But what's really exciting to me is the public deployment, where you have this world computer, as a lot of the Ethereum folks talk about, which I think is really awesome, right? And so you can have a decentralized database, really a planetary-scale database that no one owns or controls,
Starting point is 01:00:00 a decentralized file system, and its IPFS is emerging as the winner there. and processing on top. And Ethereum public database is certainly the leader here. There's some other technologies, we believe, that are really interesting and helpful there too, things like Tendermint, where you can actually have decentralized processing with other languages, right?
Starting point is 01:00:26 The Ethereum VM is really designed for solidity, the Ethereum language, which is really great for a lot of applications. But for other developers, who have a lot of code in Python or only really feel comfortable in JavaScript, then that's cool too because you have things like tendermint. And these things together, right, decentralized processing, centralized storage, and finally, decentralized communication,
Starting point is 01:00:51 just sort of built into the protocols and stuff. That is really the elements of computing, i.e., you know, together, working together, a world computer, right? And then there's the applications on top, right? Decentralized apps, daps. And there's already starting to be an explosion of them, thanks to especially Ethereum and ELSAR too. And that's very exciting. And of course, Ethereum right now isn't quite at the scale that people want, but there's several really smart guys working on it.
Starting point is 01:01:21 And I've had deep, long conversations with them, and I'm excited about the directions they're headed, right? So it will get there. And that's great because it means that, for example, Ethereum will be able to keep up with Big ChainDB. You know, the faster it goes, the better it works for us. But also, you know, we've seen it's really hard to develop a decentralized app if you don't have, you know, let's see you're used to being a web programmer where you have, say, an MVC model, right? So M is for model, which is, you know, instantiated traditionally as a database. If you don't have that, maybe you only have RAM, what do you do, right?
Starting point is 01:01:58 Or a file system. So it's much easier when you have a through and through database to work against. Or, you know, the other way of thinking about it, there is. the traditional LAMP stack, right? So Linux, Apache, MySQL, PHP. We're the M, the MySQL for the modern decentralized stack. So this is sort of roughly like the higher level, right, sort of partly decentralized, where the database really matters at the core for decentralization.
Starting point is 01:02:23 The rest can be centralized. And fully decentralized, where you actually have all the pieces decentralized. And there's shades of gray in between, for example, too, right? You can have big chain DB with IPFS and that's it. And by the way, too, ascribe itself will be moving towards a fully decentralized deployment bit by bit by bit. So we're working with several organizations on that. Quite excited. For applications themselves, maybe I'll just give a quick sampling.
Starting point is 01:02:54 One of our companies who are working very closely is Everledgeer. So Everlager is about diamonds on the blockchain. ran by a woman named Leanne Kemp, who is a force of nature. She's really awesome. And she sees that there is major fraud and corruption in the diamond world. And it's basically been a very opaque industry. So, you know, demons get dug from mines. These mines themselves are often in very corrupt countries.
Starting point is 01:03:22 Some countries have actually become so corrupt that are not even allowed to dig diamonds anymore. And then these diamonds, you know, these rough stones that come out, they get taken to certification houses. Well, sorry, cut first, taken to certification houses. There's about five in the world, that matter. And then they get measured and certified. And ID is even laser etched in there, actually. And by the way, on the path of this,
Starting point is 01:03:50 there's something called the Kimberly process, which basically is sort of to help vet if you're authentic or not. And then once it goes past these herd houses, each diamond has this sort of piece of paper that's supposed to be attached with it. And then it gets sent through various distribution channels that ultimately ends up in the hands of retailers, whether it's online retailers or the local Tiffany's. And basically, each step along the way, there's tons of opportunity for fraud because that piece of paper could get separated from that diamond. That ID might be lying.
Starting point is 01:04:21 And there's many, many examples that you can see. For example, one of the major surthouses got hacked, and their data records got very different on the heat. meals without getting hacked. And if you look at the retailers themselves that are selling the diamonds, so we actually worked with Everledgeer and what's interesting with blockchain technology, right? It encourages companies that are sort of traditional computers, it encourages the share data in new ways to get sort of an ecosystem-wide benefit. In the case of Everledgeer, the different certification houses shared their data with Everledger,
Starting point is 01:04:58 and then we worked with Everledger to cross-reference that data against the retailer data, and we discovered basically by cross-referencing that and applying machine learning techniques, we discovered 7% fraud on one of the major retailers. There was 20,000 diamonds a day going through this retailer. It worked out to $750 million of the fraud in one year. So that's quite exciting, and that's just one retailer. This is pretty important, right? It's an $80 billion industry.
Starting point is 01:05:25 The fraud was just 7% from that initial data mining. It could be as high as 30 or 40%. So we're talking up to $32 billion. That's diamonds, but it generalizes. So in general, supply chain transparency is a big deal for the world of blockchain technology. Yeah, I agree. I mean, this is something that we're particularly interested in at stratum is bringing more transparency and auditability to supply chain. I want to come back to something you mentioned earlier, which I thought was really interesting, is having that database component to,
Starting point is 01:05:59 to a lot of things, a lot of the things that people are doing with blockchain is being able to time them things and notarize. And having that component there in that stack is really essential. We ask yourself that question all the time. It's like, okay, well, our clients are going to be notarizing data through the Bitcoin blockchain, but where is that data being stored? Or are they storing that on their servers? Or in some cases, that doesn't really make sense because you want to have it and you want
Starting point is 01:06:27 to have it decentralized. I wanted to come back, before we sort of wrap up here, I want to come back to the idea that, well, so with Bitcoin, you know, we have the Bitcoin Protocol and so the Core Protocol and it's released as a public blockchain. But a lot of the new protocols that have been emerging, like Ethereum and IPFS and now VH&B, have this functionality within it so that they can be public, but they can also be private, and there's some shades of gray in there. So you can be on the Ethereum blockchain, but you can also deploy Ethereum as your own private network, if you like, or semi-private network. And so can you talk about,
Starting point is 01:07:13 so you also talked about a lot of the public implementation of blockchain DB, and I suppose there would be some semi-public or some fields or industries that would implement their own implementation of, of a big chain db and then some totally private um implementations um particularly interested in the public one uh because i think that can bring a lot of value uh talk about how you see that playing out like who would be the validating nodes like can anybody join and can anybody use the network is it a public good does it yeah how does that work yeah so um we're spending a lot of energy on this and we'll be having more announcements in coming months on this.
Starting point is 01:08:00 Overall, we will be rolling out a foundation for this, and it will be obviously a nonprofit. We are working with organizations as nodes that have demonstrated a commitment to the health of the future internet. And this is organizations that have demonstrated a care in the past or are demonstrating a care in the future by dedicating real resources to this. And overall, the key thing is that permissioning will be set such that anyone can issue an asset, anyone can transfer assets, anyone can read, all of that. And that's really kind of the core things, right?
Starting point is 01:08:41 And the capacity will be so big, et cetera, that it really can be a store of world data. There's some really awesome apps that can emerge on this too, right? The one that gets us excited actually brings us full circle is IP, creative works, right? So, you know, what is the mandate of a museum? It's to preserve the cultural history of what artists have done over the last, you know, decades and centuries. And there's cultural mandate in museums and archives and many other places, libraries, etc., right? what does that look like in the 21st century, in the 22nd century in the next 10,000 years, right? It's going to be digital.
Starting point is 01:09:28 It's going to, you want to have it live, but you want to have it as well where it's, you can trust that it's going to be around, right, and that it's immutable, et cetera. So it's a really great fit. So we're going down a path for this, which is basically we're using a scribe, the company itself and the product described, the team that's still working on a scribe, to drive, to drive us to IP in this public big chain DB and working with other cultural institutions too. For example, we continue to iterate with Creative Commons France and a lot of other organizations that really care about this.
Starting point is 01:10:05 So ascribe, the team actually itself almost acts like a nonprofit, right? It's quite well tuned towards the needs, right? We have a lot of people that are, you know, art pros, you know, art curators, etc., sort of in that space that we work with. So, and it's not just art, right? It's not just digital art. It's also things like music and books and film and video and video games and all of these things, right? This is all sort of creative cultural artifacts where the attribution deserves to be public.
Starting point is 01:10:42 And the ownership for Providence, when people want it, deserves to be public too. This is a really big killer app. There's also other apps too. around this. We see a lot of companies doing really great things in identity and reputation. And much of this makes sense on public big chain DB too or some other, you know, technology like this at that scale. So there's others too, right? Sometimes things do need to be more private. You know, a lot of banking things simply do banking regulations. They need to be in private networks. That's fine, right? So there's really sort of shades of gray, but it's really,
Starting point is 01:11:20 the public big chain db that is, you know, driving us as an organization philosophically. We're, you know, we're excited that we can help to play a role in creating this future internet, this future society that we don't want. Yeah, that's an interesting prospect. I mean, why would you go, you know, get a MySQL database or, you know, it opens up so many possibilities for developing applications without needing any sort of infrastructure, you know, It's just like the, I guess the permissionless side of it is what gets me really going about this. It's just being able to utilize this public good that is much like Bitcoin, much like Ethereum and IPFS and other protocols.
Starting point is 01:12:04 But yeah. And so this foundation would, I guess, pick the validators. Yeah, yeah. So the foundation will be picking the validators. How we see it, though. So right now, the technology does not exist to have open membership at scale in a way where you can store all the data you want, right? You have to make a trade-off, right? And so some of the other works out there have made the trade-off to say, we're going to allow, you know, thousands of notes, but at a severe cost to scale, right?
Starting point is 01:12:41 Severe. We've said it. But they're trying to scale up, right? And, you know, some of the efforts I commend in big ways, right? I think what like, you know, things like Casper and special okay, awesome, right? Great. And rock on. There's another way to approach it, though. So rather than starting with, you know, thousands of nodes as validators,
Starting point is 01:13:00 remember, you can have hundreds of thousands of thousands of all that as clients. So rather than starting with thousands of nodes as validators and trying to add scale, start with a smaller number of nodes as validators while still supporting hundreds of thousands or millions of clients. So start with a smaller number of validators, have scale from day one. and then evolve the technology to support a larger number of validators. And that's exactly the direction we're going, where things can be open membership, you know, two years, three years, four years from now, right?
Starting point is 01:13:28 With things like the stellar consensus protocol or other protocols like that, right? So there's some emerging stuff right there, too. But it's sort of, at the end, we will all end up at a place where it is broadly decentralized even in terms of validation, as well as the scale that the planet deserves, right? No one is there yet. We have, you know, a scalability that the planet deserves, a scale, and a more limited set of notes.
Starting point is 01:13:59 What's cool, actually, is we have a prototype for what we've doing, what we've been doing, an archetype, if you will. It's called the DNS, right? The DNS has been powering the Internet for... The decentralized DNS has been powering the Internet since the late 90s. Before that, you know, it was actually going back to the Internet, the 80s, early 80s, it was actually a text file. You know, one guy was maintaining it, you know, for many years, and then he switched over to
Starting point is 01:14:23 his buddy, right? And it's kind of amazing, right? And they just kind of had to do it. But, you know, as the internet got more important, it made sense to decentralized control, and this is what happened. And so we've been working, the technical architect of the DNS, his name is David Holtzman. And we've actually been working closely with him for more than a year, because this is actually, you know, a linchpin technology. Now there's things, You know, when they created the DNS, there was the business side to Jim Rat, we work with him too. They handed it off to ICANN. And, you know, from 1999-2000 to now, you know, ICANN has devolved.
Starting point is 01:15:02 It's not as good as it was 15 years ago, right? There's those challenges. And, you know, there were some design challenges technical in the DNS. It wasn't perfect. But it was pretty good for the time. Now we have another 15 years worth of learning from crypto, et cetera. and we have all the lessons from ICANN. We're drawing in those lessons for this next-gen database for the planet
Starting point is 01:15:21 and something that's more broad in scope than the DNS. At the very least, for IP, because that's what Ascribe is doing. But we also have a lot of other people who are very interested in using it for a lot of other applications, right? So that excites us. Like I said, it's sort of, it drives us. It's what gets us up in the morning. It's, you know, what makes us excited for the future.
Starting point is 01:15:42 Well, that's a great note to end on, Trent. Thank you for coming on the show once again, and hopefully we can have you on again in the future as Big ChainDB continues to evolve. Awesome. Thank you for having me. I really appreciate. And thank you to our listeners for listening. We are part of the episode of, sorry, we're part of Let's Talk Bitcoin Network. You can go to let's talk Bitcoin.com to find all kinds of shows about blockchain, Bitcoin, and decentralized technologies.
Starting point is 01:16:09 You can listen to episodes of Episcentor Bitcoin every Monday. They come out on your podcast feed as well as. Stitcher, YouTube, or anywhere where you get your podcast, whether it be iTunes or on Android or wherever. And to our loyal listeners, you can always leave us a review wherever you can. It could be on iTunes or any other platform. If you do, just send us an email at show at episode or Bitcoin.com and we will send you a T-shirt and some stickers.
Starting point is 01:16:35 And we thank you for the reviews that you've left so far. It's been really helpful in getting the show out to more people. So thanks so much. And we look forward to being back next week. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.