Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Trent McConaghy: BigchainDB – Scalable Public Distributed Databases
Episode Date: April 11, 2016One of the major drawbacks of Bitcoin is its low transaction throughput. Maxing out only a handful of operations per second, there have been many proposals to scale it up so that it can compete with e...xisting distributed database technologies. As the blockchain’s demand continues to increase, it’s unclear if the Bitcoin protocol will ever be able to handle thousands, if not millions of transactions per second. BigchainDB is taking a different approach. Rather than trying to scale up blockchain technology, it starts with a big data distributed database, RethinkDB, and adds blockchain features and characteristics. Trent Mcconaghy, Co-founder and CTO of Ascribe and BigchainDB, joins us to talk about how this protocol may become to databases, what IPFS and Ethereum are to distributed filestorage and computing, respectively. Able to perform more than one million writes per second and capacities in the petabytes, BigchainDB has the ambition to become the world’s public database platform. Topics covered in this episode: A brief update on Ascribe since Trent was last on the show The motivations behind BigchainDB and that problems it’s trying to solve How BigchainDB plans to solve the typical scalability bottlenecks found in blockchain protocols BigchainDB’s capacity, performance and latency characteristics BigchainDB’s consensus model, applied to RethinkDB Potential application for BigchainDB, in both centralised and decentralised application stacks Episode links: BigchainDB BigchainDB Whitepaper Ascribe Left Gallery 23vivi This episode is hosted by Meher Roy and Sébastien Couture. Show notes and listening options: epicenter.tv/126
Transcript
Discussion (0)
This is Epicenter Bitcoin episode 126 with guest, Trent McConaughey.
This episode of Epicenter Bitcoin is brought you by Ledger.
Now accepting pre-orders for the all-new Ledger Blue Developer Edition, a Bluetooth and NFC touchscreen
hardware signing device.
Learn more about the LedgerBlue at LedgerWall.com and use the discount code Epicenter
to get 10% off your first order.
Hi, welcome to Epicenter Bitcoin, the show which talks about the technologies, projects,
and people driving decentralization and the global cryptocurrency revolution.
My name is Sylvaincu.
And I'm Meher Roy. Today we are going to be talking to Trent McConaughey, who is the city of Ascribe and Big Chain DB.
Big ChainDB is aiming to be a new database technology that can enable eventually a planetary scale database.
But before we get into Big ChainDB, let's have an introduction from Trent.
Trent, your background intro, please.
Sure, hi. It's a pleasure to be on here again, guys. I really appreciate whenever you have me on.
So I will just make this one quick because I know you're going to dive into some of the details, but I started out, spent almost 20 years in the semiconductor industry doing AI for designing computer chips, machine creativity, that sort of thing.
A lot of big data, a lot of distributed computing, a lot of other sort of pedal to the metal sort of coding and whatnot.
About almost three years ago, I started working on a project, which was for IP on the blockchain.
with a specific focus on digital art, that became ascribe, the company and the product.
And continuing to focus on that, as well as more recently, due to limitations we saw with
blockchain technology, we have built another product called Big ChainDB and released that into the market
in February. So we'll be focusing on that, I guess, in this call. And I look forward to talking about.
Thanks for having.
So maybe we can get started before we get into Big ChainDB and cover that topic quite extensively
talk about Ascribe.
And since the last time we had you on, it was about 10 months ago, episode 76.
If anybody interested, can go back and listen to that episode with Trent.
So tell us how has the Subscribe moved forward since then.
How has the product develop?
And what's going on with The Scribe?
Sure.
So overall, ascribe the company, which has Big ChainDB as a product as well as Ascribe the product,
and a third product called Wherein the Net?
I'll talk about that in a sec.
So three products, one company.
Ascribe itself, it's actually being quite true to the original vision,
which is to make it easy for creators, artists, et cetera,
to securely attribute their work onto the blockchain, time stamping, and etc.
and not only securely attributing it,
but also to make it really easy to license it for others,
such that, for example, if you're a collector of digital art
or even physical art actually,
if you're a collector, you can truly own that art
via a transfer of ownership from the artist to the collector,
or perhaps via a gallery and a consignment model,
and then from one owner to the next to the next.
So whereas before you had no providence,
now you have perfect digital providence.
So that vision is, as it always has been,
And it's been growing quite nicely actually.
So now we're at about 5,000 artists using the system.
There are about 9,000 or 10,000 pieces of work,
40,000 additions.
We're seeing heavy usage from many sectors.
One of the ones we're most proud of is a lot more
people from the Creative Commons community using us.
We've worked closely with Creative Commons, France,
on that and other organizations related.
And actually, we've worked probably,
there's about 20, 25 organizations
that have been using us in other interesting ways.
So of that, several art prizes, probably now a total of, I don't know, 10 art prizes, and also sort of the glams.
So galleries, libraries, archives, museums, using us in various ways for things like archival, things like for selling us, selling artwork, etc.
And then on top of us, we have a rest API.
So this has matured a lot since February, March, April of last year.
And it's really rock solid.
And because of that, we've actually had several people starting companies on top of a scribe.
We've got about seven or eight startups now that are working on top of a scribe.
The most recent ones are Left Gallery, which is an initiative from a world-class net artist named Harm Van Dorpel.
So it's Lefti Gallery if you go to that.
And a more recent one yet is 23VII, and this is a startup out of the USA, and they are sourcing,
photography from social media and other places working with the people creating that and then
selling it and they even have a secondary market now limited additions all this sort of thing
stuff happening with the scribe which we're quite excited about and I guess the other big thing
you know it's sort of a big be actually quite directly is with a scribe we always saw that
people were concerned about sharing their work online they would say okay
okay, I claim that I own this work, that I have the license to this with all the
legales.
You know, we have a full-time lawyer, copyright lawyer that's worked out.
All the legals and all the countries kind of make it general.
But so they got the attribution, but they were concerned to share it because they have this
feeling that as soon as they put it online, that they lose control over it, that they don't
have any visibility into where it's going on.
So they would say, hey, can you encrypt it for us?
Can you watermark it for us?
And we're like, no, that totally goes against the point of this, right?
That's, you know, it's sort of possible DRM and it's really not the way to go.
And we said, okay, instead, let's shine light onto where your work is showing up.
So what we did was we crawled the web, works out to 220 terabytes worth of text, if you limit a text.
And from that, we indexed a bunch of images, 15 billion images.
So you look at the links inside that.
We downloaded those and computed feature vectors from them and basically came out.
up with. It's basically like reverse image search, except it shows you copies versus time.
So you can see exactly where your images show up on the web. And that gives you visibility
into what's going on. So you regain what would have been a loss of control via visibility.
So, and this will lead directly. So we index these 15 billion images. We wanted to record
under the Bitcoin blockchain. We were building on that. We wanted to record onto it that we
We had cited these, but we did some quick back of the envelope calculations.
You know, the Bitcoin network, it starts to back up that about 1.5 transactions per second.
You know, passed it with that, it'll take more than 24 hours for a transaction to go through.
So run the numbers, run the numbers, and it will take about a century for 15 billion images
to get recorded under the Bitcoin blockchain.
So that was clearly a challenge.
And it wasn't just these images, right?
You know, we've been iterating with galleries and museums, etc.
Where they have a million images here, 20 images there, etc.
have all these digitalization efforts.
And any one of those who basically go-stopper for really putting the stuff under the Bitcoin
blockchain.
So it was really starting to hit us really hard about the limitations of the scalability
of the Bitcoin blockchain, right?
And people talk to about blockchain bloat at 50 gigabytes.
I mean, my thumb drive holds more than that.
So clearly something was up.
And that's kind of what led us to thinking about how can we approach this problem.
You've indexed 15 billion images.
What kind of server technologies are using to do that?
Oh, we were running on AWS.
Actually, AWS, when we started doing this, we hadn't really iterated with them a lot.
So they called us up to the blue and said, like, who are you guys?
What's going on?
Because our bill was $30,000 a month, actually.
So we quickly became their friends.
Do you guys have like an Amazon rep just working right in your office or something?
For a while, yeah.
I mean, they actually flew down a team from elsewhere in Europe just to come and visit us
because, I mean, you know, what crazy little startup in the middle of nowhere.
Actually, it's not the middle of nowhere.
Berlin's an amazing city.
But what crazy startup, you know, has the audacity to go and crawl the web, right?
So we did.
And it's actually less about algorithms.
You know, you have to have good algorithms, but it's more about commitment of time and resources.
So we committed it because we saw it as another piece in the puzzle of sorting the creators of the internet.
And so from that problem of being able to index and timestamp all of this data, this image data,
you came up with the idea for Big ChainDB.
Can you talk about how the idea came about?
Sure.
So there was the 15 billion images challenge.
And there's other ones too.
One customer we talked to, potential customer, they have 20 million users.
they're going through more than 100,000 images a day, right?
So that would roughly double the throughput right there.
And actually that particular customer we were talking to in 2014,
in sort of fall of 2014.
You know, we had really sort of full-time on a scribe in the summer of 2013.
Sorry, summer of 2014.
And so even then we're like, hmm, you know, this is going to be a challenge.
And I even gave a talk, actually, Brian runs this meetup in Berlin.
And I've been talking with him over a drink one night about this problem of Bitcoin.
He's like, you know, you should really talk about this.
I'm like, okay, I'll give a talk at your talk.
You meet up, sure.
So I did.
This was in fall of 2014.
And I said, hey, look, you know, like there's the Bitcoin blockchain.
Here's the scalability issues, right?
1.5 transactions per second.
You know, 50 gigabytes, people calling it bloated.
And then you look around and you look at the internet.
And you see, hmm, Netflix is, you know, using up 37.
percent of the bandwidth of the internet. That's interesting. And there's this thing called big data.
Well, what's that about, right? How big is that data? And, you know, I knew about this world already
quite well because I come from a world of big compute, right, running 1,000, 10,000 machines to do
verification of memory chips, et cetera. So if you look into this, this world of big data, it's,
you know, that's the buzzword, but really it comes down to distributed databases and distributed
in the sense of compute resources are spread over more than one physical machine,
whether it's processing resources or storage resources or whatever.
So there's been distributed database technology out for decades, actually,
and it's gotten really good.
And this is what powers Netflix.
This is what powers Google.
This is what powers all the big guys.
But guess what?
All the little guys, too, right?
If you're a startup that quickly ramps to 10 million, even 15,000.
million users, you're probably running an WOS or maybe it's error or something. And you're just
paying more money for more compute resources. And it's not just your server and your back
office. It's actually just more machines being added to more physical machines being added
to serve your customers. So there's established technology for this. And in the database world,
it's, you know, for the storage side, it's distributed databases. And if you dive deeper, yeah,
you say, well, you know, you think about, okay, if I put data under this database, does it get stored
on every machine, well, that wouldn't make it scale.
So it only gets stored in a fraction of them, right?
So it's the idea that you're sharding up your data.
You know, let's say you have 100 records to store and you have 100 machines and you can
store one record in each machine.
But what if one machine goes down?
So you have to make sure that you maybe have three or five copies.
That's the idea of replication.
But you don't have, you know, one copy, the same copy in every machine, full replication,
because that would kill the scalability.
So by doing this, if you have three or five copies instead of end
copies, as you add more and more machines, then the capacity goes up.
And if you do things just right, the throughput goes up too.
So even looking at the benchmarks, you know, in fall of 2014, we saw that Netflix had done an experiment where they had 50 machines and they were getting to 200,000 transactions per second.
And as they added more machines, added more machines.
it would go, the throughput went up, up, up, up, such that by the time they hit about 300 machines,
they were doing more than 1 million transactions per second.
So just when you say transactions, because I mean, we're used to using the word transactions
in the blockchain space, but what you're actually meaning is rights per second in the
blockchain or in the database.
I apologize.
I was being slightly loose there, and you're exactly right.
It's rights per second in that sense.
Interestingly, if you look into the database space, they also use the word transaction at a lower
level and a transaction can include, you know, two rights in a read or one read or whatever.
But what I mean here is one database style transaction, which includes one read inside.
So it's kind of funny.
There's the sort of transaction in the blockchain sense, transaction in the database sense,
and there's also a transaction in the financial sense.
So you can really get mixed up if you're not careful.
So overall, yeah, I mean, what had led us to this was we saw, hmm, there's all this technology
that is enabling big data distributed databases,
this idea that as you add more resources,
hard drives, etc., then you can increase capacity,
storage, as well as increase throughput.
And that's pretty profound, right?
And this is actually what powers all these big data challenges.
This is why a smallish network is like these are a MasterCard.
We're talking 2,000, 5,000 transactions per second
in the sort of visa sense in this way or Twitter you know 5,000 per second.
A bad ad network is 100,000 per second. A good one is 500,000 per second, right? So, you know,
and compare that to Bitcoin, which is, you know, practically speaking, 1.5 transactions per second.
So we're talking, you know, four, five, six, or just a negative difference. And if you want to
have a network, a database network, you want to support more than just one application, right?
You don't want to be running, well, you might want to be running, say, just Visa or just SwiftNet or something.
But if you're doing something on a global scale, you actually want to be handling a whole bunch of different use cases.
So this is, you know, what we were looking at.
But we also saw that even for our own needs, we can ascribe, you know, we have these 15 billion images that we wanted to index.
So, and we wanted to put them on where it wouldn't cost us, you know.
If we said, hey, we're going to pay the 10 cents per transaction or whatever it cost these days to store.
$1.5 billion.
Can't really go to a VC for that one, can you?
Let's take a short break so I can take it to Paris.
I walked into La Maison du Bitcoin, the house of Bitcoin,
in the heart of Silicon Centier, home to many startups, including Ledger.
And I spoke with Eric Larchavec,
Ledger's CEO about the all-new unplugged NFC hardware wallet.
The Ledger Unplugged is an NFC-based hardware wallet
that you can use with compatible Android phones.
The private keys are stored in a secure element
and you can use them with wallets such as mycelium and grid bits.
Each time you want to make a transaction,
the signature will be done by the unplugged
and this way your private keys is critical data
will never be exposed to the Android phone.
This is a secure way to use your bitcoins on the go in mobility
and you will also be able to pay directly with the unplugged
with compatible point-of-sale terminals.
The Ledger Unplugged is the simple solution
for secure, contactless Bitcoin payments.
You can get the unplugged at ledgerwollet.com,
and when you use the code Epicenter at checkout,
you'll get 10% off your order.
By the way, that code works on their entire range of products.
So we'd like to thank Ledger
for their support of Epicenter Bitcoin.
So what a lot of companies have resorted to doing,
and certainly what we're doing at Stratham
is we're using Merkel trees
to put large amounts,
of hash data into one single Bitcoin transaction.
Why did you feel that that wasn't a good solution?
Yeah, well, we were looking around right, and we asked yourself the question.
There's two ways to get to scale.
One of them is to start with blockchain technology and try to scale it up, using various
tricks and ideas, maybe from distributed databases, maybe with Merkel, hashing, etc.
The other way is to start with distributed database technology that already scales and try to blockchineify it.
And of course, we had to figure out what the word blockchainify meant in a practical sense.
I can get to that.
But overall, it's really these two options, right?
Blockchineify big data or big datify blockchain.
Now, if you look at the history of work that's been done in blockchain world, right?
some of it goes back to time-stomping literature over the early 90s,
but it really took off, of course, and then the cyphrupunks, etc.,
but it really took off with Bitcoin release.
But if you look at the world of databases, distributed databases and so on,
this goes back to the 50s and 60s.
There's a much longer history, a much longer lineage,
and a much larger amount of R&D that has been done
and technology that has been developed.
And it's not just about scale.
It's not just about performance in terms of throughput,
capacity and latency, it's also querying.
It's also permissioning and all of these other things, right?
And if you want to do querying over a distributed database,
that is actually a huge amount of engineering work that you have to do.
You have to optimize to minimize latency, et cetera.
And the idea of a query itself, right,
it goes back to a ton of amazing research in the late 60s, early 70s,
that led directly to relational databases and what we now think of SQL,
and we think of it as the most mundane thing in the world.
But it's incredibly powerful.
One line of SQL replaces.
500 lines of custom code for your one application, right?
So if you're thinking about how do you manipulate data,
do you prefer to write one or a few lines of SQL,
or you're going to write 50 or 500 lines of custom code?
So this is basically the choice, though, right?
You can take a blockchain and try to scale it up
and somehow bolt on all this other work too
with querying and permissioning.
Or you can start with a distributed database
and all that, 50 years, thousands of magic
manuals worth of research in distributed databases and then bolt on the benefits that you get from
blockchain ideas.
So we went for the latter.
And we, as far as I know, are really the only ones that really went that way.
But it's worked out marvelously.
Happy to share our details.
Cool.
It does seem like a really cool idea.
But for starters, let's just walk through all the participants in a, let's say, a big chain
DB implementation.
right so when we when are for our listeners and us when we think of bitcoin we tend to think like okay
there are different participants there's the people that are submitting transactions the
transactors they are the nodes that verify transactions proof of work miners that create the blocks
these are all the participants so with big chain dv you are eventually going to release let's say a
public big chain right that is like bitcoin like a public database anyone can write to who would
the participants in the system B? What are the different categories?
Yeah, sure. So this is probably drilling a little too deep earlier in the call, but I'm happy to talk about it.
So first of all, Big Chain DB can be deployed in private scenarios and public scenarios.
You know, people can take it right now and put it out as a public network for themselves right now, right?
We are working on rolling our own deployment for a public one, working with a whole bunch of
participates on that.
And when people go to use it, anyone will be able to read anything.
Anyone will be able to issue an asset.
Anyone will be able to transfer an asset.
And there can be other permissionings, too, if you like.
But for the public one, it's basically going to be very, very, very open.
The one thing that isn't as open is simply valid.
But right now, if you want to be a validator on the Bitcoin network, you have to spend $50 million for the compute hardware to have any chance.
And, you know, that's basically for the public big chain DB deployment that we're working on.
One thing I'd like to talk about briefly just before, you know, we kind of drill down to this too much, is just to go back to Sebastian's question.
When it comes to, you know, when you want to scale a public, it's kind of clear everyone kind of understands sort of what as a noun does.
But when you want to take a distributed database and blockchinaify it, you know, what does blockchain mean as an adjective, right?
What does it mean to blockchify something to make a blockchain database?
So we actually looked around and, you know, we had spent, you know, two years already in the space, right, since summer of 2013 working on a scribe.
as the project than the company.
And we had a very good idea of the different benefits that emerge
and sort of the technical ideas that led to them.
So we like to define blockchain as an adjective,
meaning three specific things, decentralized, immutable,
and assets, decentralized as in no single entity owns or controls,
immutable as in tap resistant,
as in more tap or resistant than a traditional logging databases,
because databases already have logs.
And Tampor, sorry, and assets as in assets can actually live on this database,
of which you need a prerequisite of the first two things, decentralized and immutable.
And live as in you can issue assets, you can transfer assets,
and they can just kind of be there.
You don't need to have some other sort of means for them to exist.
You can treat that as the main means.
And of course, you can have Ricardial,
and contracts and all these other sorts of things around it too.
But overall, it's just the idea that assets can be issued and transferred within this database
network.
So that's really key because, you know, with that definition, then we asked, okay, we want to
decentralize this.
How do we do that?
Okay, we want immutability.
How do we do that?
Okay, we want assets.
How do we do that?
And that's what led us down the path.
You know, how do we do that on top of an existing distributed database?
So what are the kinds of characteristics that your approach enables,
in terms of performance, transaction latency, right, latency, etc.?
Yeah, yeah.
So overall, it's the traditional performance characteristics that you might see in a database that people measure.
But also, what is that our customers care about and whatnot, right?
And, you know, ascribe has been our lead customer.
So ascribe, you know, one of the things was capacity, right?
We've got 15 billion records that we want to write, but not just that, you know, another 20 million here, another 20 million there, etc. So you have to have the capacity to store these records as in the metadata, right? The media blobs themselves, databases aren't designed to store those directly, right? This is where file systems come in well, whether it's S3, whether it's IPFS running on S3 or IPFS running in someone's garage, right? So media blobs are well suited to things like S3 and IPFS.
So it's really about capacity as one.
Another one is throughput.
That's really important because if you can only get a few transactions per second going through,
you'll never have something, you know, even city scale, let alone planetary scale, right?
And there's lots of examples, right?
I can throw a rock and find a banking app that will need 1,000 transactions per second,
let alone 10,000 or 100,000.
Anything IoT level, right?
You know, even a small IOT deployment will have 1,000 or 10,000 nodes,
at which you can easily hit 1,000 transactions
per second, right? Or energy metering,
right? If you've got, you know,
50 million meters,
numbers I hear are on the order
50,000 transactions per second recording what's going on
in those meters, right? So capacity,
throughput, another one that's very important,
depending on the application is latency.
So it doesn't matter so much in things like art, right?
Where, you know, an artist makes a piece every now and, you know,
maybe one piece a day or one a week or maybe one a month.
And, you know, that piece maybe gets resold,
maybe five times in 100 years, right? But it does matter for other applications, things like
financial, right? So if you're doing FX trading, you're dead in the water if you're running 100
milliseconds, right? So, but you can get away with 30 milliseconds. If you're doing high speed trading,
you're actually dead in the water if you're doing one millisecond, right? So there you need
super dedicated hardware. You need to be running on in-memory databases. You need other dedicated
hardware. You need to be right next to the trading floor, et cetera, et cetera. And there's lots
a deep, deep engineering optimization that you need for that.
And, you know, what's cool is the team inside a scribe, it's a bunch of engineers who have come
from pedal to the metal engineering, right, semiconductor design, aerospace design, chemical engineering,
etc. Where we're used to doing things where they're running out a million transactions for second,
etc. So this does not scare us. This is, to us, it's exciting, it's a challenge. So that's three
things then as sort of key performance characteristics that we like to think about, capacity, throughput, and latency.
There's others, but those are really important.
And then besides that, I guess, sort of for database, sorry, you'll see that we list this elsewhere too, query.
So the ability to query in the first place.
And then efficient querying, so how efficiently does it look up?
And querying also includes how do you do a write, right?
So a query isn't just about to read, it's also about it right?
And then finally, permissioning, right?
So does it support permissioning?
How does that work?
So those are sort of characteristics that matter.
the last one's measurable,
benchmarkable, the first ones are.
Yeah.
So, yeah,
so let's review the five things.
So one is
raw capacity,
like how much,
how much aggregate data
can a particular database
technology store?
Like in Bitcoin, that's 50GB.
Yeah.
The second is
throughput,
which means like,
which in Bitcoin we can think of
as transactions a second.
So, you know,
three transactions a second or whatever.
The third thing is latency.
How much time does it take?
since the client submits a transaction for it to be irrevocably confirmed.
Right?
So in Bitcoin, that might be an hour.
And then it's squaring ability, which Bitcoin doesn't have, right?
Yeah.
Is that right?
Yeah.
Okay.
And then finally, it's, what was the fifth thing?
Quering ability and permissioning, right?
like being the ability to restrict certain participants to predefined roles.
So in terms of benchmarks, have you benchmark the system?
And for the first three things, storage capacity, transaction latency and transaction throughput,
what are the kinds of results you've got for Big ChainDB?
Yeah, for sure.
So I'll just go in the order that it's gone before.
So for capacity, and basically maybe before I get there, right?
a lot of the people in the audience, I'm sure, are very familiar with Bitcoin. One of the things
that Bitcoin has, it's fully replicated, right, which means that every single node stores all the
data. But if you want to have any sort of scale at all, in any size at all, that means that
every single node has to be storing a huge amount of data, right? You know, and it's sort of like,
it encourages centralization because who's going to have the capacity to store, you know,
50 xabytes of data or even a petabyte of data, right? So it's kind of interesting.
And even the bandwidth that support that.
So having a smaller number of replicates
actually makes a big difference.
So capacity, if you say I'm going to have
a replication factor of one or three or five instead of a van,
what that means is as you increase your total number of nodes,
then you are increasing your capacity, right,
in a linear fashion, right?
Replication factor of one means two X to nodes,
to X capacity.
Replication factor of three, it's still linear relation, right?
So linear scaling and capacity.
We set things up where if you're running on an Amazon Web Services
Excel instance, that's 48 terabytes.
And working with the database technology we work with,
each you can have 32 charts.
So 48 terabytes times 32 shards gives you more than a petabyte.
So we talked about that on the paper.
We claim petabyte capacity.
That's on that example there.
There's other ways you can roll things where you have even higher storage
per node, but that's one example.
On the next one, throughput.
So this one was very important to us to do a good job on,
because as framing towards this database that
can help to power the planet, how do you get to a scale that
helps, right?
And by the way, to global email, we ran some numbers.
And by our estimations, email is running at 3.2 million transactions per second.
So there's 3.2 million emails floating around the world on average every second.
So, of course, there's spikes and stuff, but that's what the average is for one day.
So throughput, it's interesting for orders of magnitude, right, to get a feel.
So for throughput, that was really our aim was to actually get to 1 million rights per second.
And we actually had a lot of iterations.
When we started working in the Big ChainDB project in the late summer of last year of 2015,
we said, okay, whatever we do, we have to make this thing go fast.
So we designed it in a way where one of the constraints we realized was as you add nodes,
you have to let the throughput go up, right, in the way that traditional distributed databases do.
So that was a very explicit design decision that we made.
And as we were going along, then, we are basically experimenting with different approaches,
designing different algorithms, trying them, seeing what they got in the way.
And we managed to sort of structure things such that at the end of the day,
our algorithms that added decentralization and mutability and assets didn't get in the way of the raw performance of the database itself.
And that's really, really key.
You know, the database itself, what it does, this is, we can get into this.
but what it does on its own is ordering.
Like that's the core thing that a database does,
is ordering.
You know, you've got this replicated,
you know, the theoretical approaches,
you've got this replicated state machine or state machine,
and it's creating each one of the server nodes
is creating a log and you want all the logs to kind of in sync, right?
And so distributed databases do this, right?
This is actually the essence of them
and then you have this all this infrastructure on top.
So what you have to do is you have to make sure that,
You have to let the database, the distributed database, keep doing its thing, creating that log, creating that log, each machine, each server having it, maintaining its own state.
And then all these things that we do on top, just get underway while, you know, achieving our goals of decentralization and mutability and assets.
So for that, as we went along, we managed to get everything out of the way.
And then we said, okay, great, now we can benchmark this distributed database itself because all the other algorithms are out of the way.
and we benchmarked, benchmark.
Then at first, the numbers were so-so, but not amazing.
We actually discovered several bugs in the distributed database that we had decided to work on.
And so we iterated very closely with them with the developers of the database,
and they fixed the bugs.
They had very good turnaround.
They were really excellent.
So we actually helped them to achieve their database to get the scale they needed.
And this is RethinkDB.
They're really great company.
It's a Jason-style document store we built on.
So we basically inherit all the benefits of RethinkDB, including performance.
It's designed as a real-time database.
Jason Store, which means it speaks the language of the browser,
JavaScript, and Jason, and has excellent scalability.
So in the end, we managed to have this plot that we're very proud of,
which is showing the increase going up to 1 million rights per second
as we increase up to 32 nodes.
So that's throughput.
latency, this really, the big, big bottleneck in latency is the thing that slows you down.
It's kind of funny. It's the speed of light, right? So the speed of light is slow. It really is.
You know, if you're, you know, trying to do, even if you have a database technology that is perfect,
that takes zero time to do anything you want, you know, infinitely fast, you'll never able to do
be able to do FX trading in a WAN setting because, you know, 100 milliseconds.
is the limit for FX trader.
Yeah, it takes you about, I believe it's 150 milliseconds
to go halfway around the planet.
So a round trip, 300 milliseconds, I forget the exact number.
It's too slow.
So it depends on the application.
So latency really depends.
If you're in a single data center, then,
or within a single region, then you don't have to worry about the travel
that way and it starts to matter.
But you can actually get things down to
the speed needs of things like FX trading, right, with right optimizations.
When it comes to the other things where you're, like, for example, the public big chain d,
where it is a WAN setting, you're going to have delays, but that's okay, right?
The consensus algorithm is designed to handle that.
That's actually what a consensus algorithm does.
It's about keeping all of the different nodes in sync and knowing what isn't in sync and knowing what is in sync and knowing what is.
right. So, and, you know, for everything to be in sync where you're more satisfied about it,
it might be half a second. That's okay, right? Because typically the full WAN settings that you need,
this is things for like art and diamonds and IP. That's fine, right? If you really need speed and
when, then you have some hierarchical structure. And I see this, right? You know, people might do FX trading
in New York and they might do it in London. And then they have some sort of way to reconcile that,
but sort of at a different level.
So that's latency.
We continue to do very extensive benchmarking.
We've got engineers dedicated on that full-time,
benchmarking these things in more and more detailed ways,
and also understanding the effect of different faults, et cetera,
things like, you know, as you have a heavier, heavier DDS attack,
how does that impact throughput?
Right.
So this matters.
And then finally, towards your question of scalability,
I guess I implicitly answered in the other ones, right?
So as you increase the number of nodes, capacity goes up, throughput goes up.
L latency goes down simply because you need the majority of nodes to agree on this, right?
So if they're spread out, then, you know, on average, you've just got statistics working against you.
But it's not, you know, horrible slowdown.
And typically, you know, it's not like we're going for a thousand server nodes.
You're typically talking, you know, on the order of 10, 20, 30 server nodes, right?
So, and that's a very different thing.
you can have thousands or tens of thousands, hundreds of thousands of clients.
That's fine.
But for the server notes themselves, it's actually a relatively small number.
So that's why, you know, latency doesn't totally kill you that way.
Today's magic word is big, B-I-G.
Head over to let's talkbidcoin.com to sign in, enter the magic word,
and claim your part of the listener award.
So if we're talking about the public big chain DB that will be released,
so I think we can all agree that the latency will not be the same
as if you're operating in a private setting with servers that are close by.
I mean, that's just logic.
Is there at some point where throughput and latency and scalability sort of cap out
when you reach a certain number of nodes,
or are we talking about infinite scalability?
I mean, I guess the latency, you're limited there.
But the scalability and the throughput aspects.
Yeah, yeah, of course, yeah, latency, you're limited.
Silly speed of light, darn physics.
But with capacity, right, the way that RethinkDB is designed right now,
it has a fixed number of shards.
You can keep swapping out any given shard
and actually having more sort of virtual storage within each.
That's one way.
The other way that we see is simply having databases sitting side by side by side by side where they have basically a shared namespace with things like IPLD, etc.
So we can get under that in a bit if you like.
But overall, it allows the sort of horizontal scaling because you can actually have, yeah, database is sitting side by side by side.
That's it.
It's still nice to have, you know, as much as you can in a single database because it helps to make the querying much more efficient.
but you can actually go from one to two to five to ten plus databases.
Okay.
So this week, actually, you announced your partnership with RethinkDB.
So correctly, if I'm wrong, but so Big ChainDB is built on top.
It adds functionality to R rethink DB.
That's correct.
Okay.
So RethinkDB is a distributed database system, as you mentioned,
as a JSON store in a sort of traditional distributed database.
schema that we might think of, and then big chain DB adds all of the blockchain, I'm putting
in quotes here, features such as permissions and consensus, et cetera. Can you talk about that
partnership and how those two services work together? So before you answer, so I installed it earlier,
and the way that it works is that you install rethink DB first, and then you install Big ChainDB,
and I guess it's sort of an extension to that technology.
Yeah, sure.
So basically it's worth mentioning.
You mentioned, you know, permissioning consensus.
So RethinkDB actually, up until very recently,
they didn't have a permissioning system shipped.
They've been working on one since last fall,
and they just shipped one actually this past week as well.
So they have a very nice permissioning system.
We have different type of permissioning on top.
So if you think about a transaction in a blockchain type setting,
permissioning is based on private keys.
And which signatures do you need sign to go through things like multi-sig crypto signatures,
this sort of thing, right?
And so we have that sort of permissioning.
And we really sort of a version, one of our transactions,
that were just a very simple transaction.
and one input, one output, not much more.
And we're in the process of releasing a V2
that has all these much more sort of fancier features,
multiple inputs, multiple outputs, multi-say thresholds,
and sort of a more general way to specify
some crypto-constraints.
It's not turning complete, doesn't have loops,
and none of that, it's not trying to
because the smart contract decentralized processing
is complementary.
But it makes it easy to do,
things like escrow at volume. So basically, permissioning actually kind of breaks down in a
couple ways, right? There's these things like crypto conditions and then there's permissioning
in the traditional way where, you know, this identity on this network has the ability to issue assets
or this identity has the ability to read. We are holding back from actually implementing that
ourselves until RethinkDB came out with their permissioning system. Now that it's out, we will be moving
towards supporting that more directly in Big ChainDB.
So we talked about it in the way paper.
It's one of the things in the way paper that it wasn't out right on the very first
release, but it's coming down the pipe.
You also talked about consensus.
So this is actually, you know, really important to stress.
A distributed database by its nature, how do the nodes keep in sync, right?
How do they, how does the data keep in sync from one to the next to the next?
That's a consensus algorithm, right?
And even modern consensus algorithms go back to,
research at Microsoft from a guy named Leslie Lamport,
he wrote this really wonderful paper in 1982,
where he actually defined the Byzantine General's problem, right?
And then proposed a solution to it.
And over the years,
that technology got better, better, better,
that line of research.
In 1990, he came up with an algorithm.
He actually said to prove that it wasn't possible
to do a certain class of consensus,
to solve a certain class of consensus problems.
And in the process of trying to prove it wrong, he accidentally discovered the solution.
And it's called Paxos.
He tried submitting the paper in 1990.
The referees thought it wasn't interesting enough.
So he's like, ah, forget about it.
But then things got more important again with the rise of the web.
And he finally published the paper in 1998.
It's notorious for being hard to understand.
But it's actually more straightforward than people realize too.
And there's nicer explanations out there now by Lamport himself as well as others.
And there's derivatives of Paxos out there.
things like Raft and many others that have approved upon this over the years.
It's a well-established, you know, consensus algorithms are well-established technology.
Once again, this is one of the core technologies that helps to power, you know, big data and the modern internet.
So we, inside RethinkDB, it uses Raft, and therefore the very, very core, big chain DB, there is Raft as well.
We have a consensus algorithm on top as well.
Okay.
Did you explain how it works and how it's different from, say, proof of work?
Yeah.
Sure.
So we have a two-layer-consest algorithm, and maybe I'll take a step back.
Sure, happy to.
So proof of work, basically every 10 minutes, there's a lottery to decide on one of the server nodes, aka miners, gets to say, gets to make a call on what transactions get committed to the,
the chain and he so and that lottery is based on how much electricity spends assuming equal basic
power etc so that's fine you know these days it takes a lot of money to have any fighting
counts at all but that's kind of proof of work right so it's it's lottery based once every 10 minutes
one one node makes a vote and and then you know on average 10 minutes later the next you know
someone else gets to win etc etc etc etc and it follows this longest chain rule which means
to be really sure that you really do have the longest chain, you probably have to wait around for
two or three or four, five, six transactions. And even in the original Bitcoin Waypaper,
he talks about the sort of the probability that you've got the longest chain based on how many
how many transactions have gone through or how many blocks have gone through. So it's a consensus
algorithm, but it's actually quite slow, right? You're going to have to wait for 10, 20, 30,
40 minutes for it. That's proof of work, proof of stake. Traditional proof of stake is basically your,
probability of getting to vote to say good yes or no is proportional to how much money you have
in the system, how much intrinsic internal cryptocurrency, etc. There was clearly problems with that.
People have identified many attack vectors. So people started engineering improvements around it
around it. And what we've arrived to today with proof of stake, it actually looks a lot more like
a federation. And maybe I'll describe federation and come back to proof of work.
The federation is simply, a basic federation is, you know, you've got five nodes, 10 nodes, 25 nodes, whatever.
And for some transaction to go through, some quorum has to say, yes, this is good.
It could be three, it could be five, it could be a majority, whatever.
A majority or two-thirds is typically a good idea, depending on what sort of assumptions you're making.
So that's a federation.
How do you choose?
Who are the members in the federation?
There's various ways, right?
The original Hyperledger project, for example, was based on saying, do you have an SSW.
sell certificate. Or you can say, hey, you know, Bank A, Bank B, Bank C, Bank D, you're all my buddies.
I'm Bank E. We're going to make a Federation together. We know each other's public keys.
That's my list of approved notes. We're good to go. So proof of stake, how it's evolved
with things like Casper, et cetera, is it's a federation with very, very dynamic membership
based on a cryptocurrency. And that's cool, right? It basically means you have a very open membership
because you have a lot of the complexities,
but the complexities are basically mostly around, you know,
the rules for membership.
But you can have very simple rules too,
or you can have external governance, et cetera.
So that's sort of overall,
the really three main types of ways to approach this
is proof of work, proof of stake,
and traditional federation style consensus.
And then sort of on top of that, though,
you know, when you have consensus,
you could say, I'm tolerant to just crash vaults,
I'm tolerant to crash faults and arbitrary malicious faults.
And then on top of that, you could say,
I'm going to allow fully open membership to let anyone come in and do anything
and maybe be an authenticator, right?
And that's a citizenship thing.
This could be the proof of work system.
You know, of course, to be, have any chance at all,
you need to spend a lot of money on mining equipment.
Or another example is something like the Stellar Consensus Protocol
where, you know, everyone makes a call on,
who else gets to, like, who do they trust?
So if I want to trade with you, Sebastian, then, you know, you've decided that you've got,
say, 10 people on a list that you might possibly trust.
I've decided that I've got 20, and there's an overlap of four people.
So it's really, it comes down to those four people that we trust together.
So with that, what is the consensus model in blockchain DB?
Sorry, sorry about that, yeah.
No, no, no, it's great to, it's great to give a broad overview of all the consensus model.
also it's...
Drill down, yeah, no, it's important to understand.
But there's a long history, yeah, there's a long history to this, right?
Consensus did not start with Satoshi.
There's this huge belief that he did and he walks in water and a bunch of other things.
And that's fine.
If people want to, you know, not look into the history of computing,
but we care about the history of computing because there are a lot of great ideas.
So what we have inside Big ChainDB, it's basically two-level consensus.
At the lowest level is...
is within rethink D.B and that is Raft, which is part of the lineage of Paxos.
It's designed to be more modular, more easy to use, sort of like building blocks that are
sort of high impedance to each other, so it's sort of easy to compose them and reasonable.
And it is crash-tolerant, but not Byzantine fault tolerant, not fully tolerant to arbitrary attacks.
One level up, and this is part of the big chain DB server code, etc.
We have technology.
We call it the big chain consensus algorithm, and it does voting.
So basically, remember how I mentioned before, it's all about the order, it's all about the order.
You want to get out of the way of the database when it's writing.
So we let the rights come in.
We don't, one node will vet them, but then we write as soon as we can, as soon as we can, as soon as we can.
into the
the
ones that are there for good
but they're not vetted. We just let them
right. Why? Because it's much faster
to just let it write, let it write, get out of the way
but then after the fact what we do
is we let the Federation nodes
vote. They basically
each says yes
this
transaction that's good or no it's not
it's good if every single transaction
inside is good. The block
isn't good if any transaction is bad. So that emerges as consensus at a higher level for that block.
And if you think about it, you could actually this one transaction at a time, but we simply
group them into blocks for speed because it takes time to hash a set of transactions and write it,
et cetera, et cetera. So we group them for speed, no more or less. And if a block gets, you know, if the
majority of
nodes
voters say yes
this block is good then it's considered
good and
then anyone else doing reading
afterwards it has
knows that it's usable that it's
a truth
if the nodes vote and the majority
says it's bad then
it's considered
not truth and any transactions
that are still possibly
okay get copied back
into this sort of incoming buffer and and try it again. So basically in this
mechanism we've got two levels of consensus. The lower level is fault tolerant and
the higher level is the lower level is crash tolerant. The higher level is
crash fault tolerant plus. So it's got a bunch of other mechanisms in place to
validate and verify and as time goes on we are making this more and more
resistant as well towards a Byzantine fault
tolerance and in fact we have some pieces of technology right now that we're iterating against
to make it do that in many settings so as a summary so two levels of consensus the lower level is
raft which is not Byzantine fault tolerant but crash fault tolerant and then the layer on so
so there's basically one level of consensus that tells all the nodes what data to put in and it
takes all sorts of data like you know correct transactions incorrect transactions incorrect
transactions, etc. And then once all of the transactional data is inside the database, then another
level of consensus, which is nodes voting to order these transactions into blocks, into
valid blocks. Not quite. So it's not voting to order the transactions in a box. So the thing is,
like even a fault tolerant database, you know, running raft, whatever, right? It's already ordering.
Databases do this. They need this to do the logs, right? It's built into the database, right? Satoshi
didn't invent ordering, right? So, and you get out of the way and let the database write these
orders itself, right? So at the lowest level, all the ordering is being done. And actually,
it's being grouped into blocks there too, because you can, the thing at the higher level that's
being happened is, it's voting on whether any given block is valid or not. And it's valid as long as
any transaction in the block. Like, as long as all the transactions in the block of value.
Okay. So the lower level is imposing the ordering and the higher level might be imposing
say that no extra money is created or the kind of, yeah.
So basically, like, what prevents double spending is the higher level of consensus.
And the lower level tells us what transactions happened and which came first.
Exactly.
And this is absolutely critical to performance.
So basically, the bottleneck is in rights, right?
So if you can actually manage to get the throughput for writing,
really, really high by, you know, letting the distributed database do its thing, do its thing,
right, right, right, right, right, right, right. Then you're okay. And for the writing, but then you have to say,
okay, should everyone trust what just got written? And that's what this higher level consensus is about,
right? And that higher level, though, it's actually just voting as well. You know, each one of the
nodes that's doing its own writing. So there's, but there's no sort of central blocker, right? So,
So this is sort of the breakthrough.
It's sort of reordering what's happening,
letting, you know, ordering happening with this, you know,
big data distributed database technology that, you know,
runs fast already.
But bringing in the decentralized control via the nodes having a say on whether a block
is valid or not.
So, like, one of the interesting things in your white paper is,
is sort of your claim that,
Big chain D.B implementations can't fork.
Is that correct?
And why is it so?
Yeah, so this comes down to once again,
if you think about a distributed database,
a traditional one, right?
Each one of those server nodes is writing a log, right?
Just writing a log, one order, then then the next,
then the next, then the next.
This is the essence of a consensus algorithm,
a traditional one, right?
Like Paxos or raft or what are these?
So it's not like Paxos says,
okay, suddenly imagine you're writing a piece of code
and it's a four loop.
Right?
Can that four loop code suddenly be mutated such that it's like splitting into a nested loop?
You know, like it's just not kind of in the code at all anywhere.
You don't have the constructs, right?
So within distributed databases, there is no construct for a fork because just those, you know, abstractions haven't been built in.
They would have to be built in at the very core, right?
And that central problem of sorting out ordering, like solving ordering is exactly what Paxos and Raff do.
And it's just an order.
That's no more or no less, right?
You can't, they're just slotted in there, and that's it.
And it's sort of how the order is resolved
is based on what's getting stored within each server node.
And then what is the protocol of rules around
how to interpret that from each other server node.
So basically order is, like order is,
comes as a first class citizen with traditional consensus
algorithms like Paxos.
and forking is not even in their vocabulary.
So let's talk about some of the applications for BCHNDB.
What I really like about the website is that you have some really nice illustrations there
on how BCHNDB can be implemented and sort of more, I guess,
traditional centralized infrastructure where you would have like an Amazon
web services instance and a database, etc.
and all the way to having it completely decentralized with Ethereum running as the VM and IPFS as the file system and the Big ChainDB as the data store.
Could you talk about sort of the different use cases and what would be some practical applications for each of those types of scenarios?
Sure, sure, yeah.
So to sort of summarize, right, there's a framework in which you can use.
or a style of deployment where you can use Big ChainDB where you keep your stack the same, right?
And you just, and right now, maybe you're running in your stack.
You've got, you know, five different databases.
You've got, you know, maybe MySQL for your SQL database and maybe MongoDB for your NoSQL
database and maybe, you know, Neo4J for a graph database or something.
And maybe some others too just for fun, right?
It's pretty common these days.
So guess what?
You throw in another database, right?
This one, though, is special, Big ChainDB, because you, you,
You as yourself don't own or control that yourself.
It's you and this federation of other organizations in your ecosystem, right?
So as far as your system is concerned and your developers and your CIO, etc.,
you've got this other database that you're, you know, you've got hooked into your system,
but it's special because it's data that you and the other organizations in your ecosystem
are working on against together, right?
So that's really cool because it means that you can, you know,
maintain the rest of your stack. You don't have to go all in. You can just bring extra benefits
to some of your existing applications. Or you can also, another way to think about this is
you're thinking about a new application, but you only really need to have, say, time stamping,
right? You don't need to have full on decentralized processing or full on decentralized
media storage. So great. You have your team of developers that understand modern web stacks
or modern enterprise stacks. Just use them, right? Keep doing that stack. Now you just have a new
database that you can plug in and use to get some of these main benefits of decentralized.
I mean, we know this very well because, you know, this is how we built a scribe for IP.
You know, we came up with this idea in 2013, you know, long before Ethereum, you know,
the Ethereum wait for theater came out six months later. And, you know, we, we came up six months later.
And there had been talk of this, you know, in various places and stuff.
But what's cool is, you know, Ascribe itself is running on a modern web stack, all the
components you would expect running in AWS, Roku, et cetera.
And at the place that where the rubber hits the road, that really matters, you know,
who owns what?
That is on a decentralized database.
Right now it's on Bitcoin.
We're in the process of porting it over.
We will fully deploy it when the public begin.
chain DB, it goes live.
And so that's a good example.
And so I'll talk about general,
and then I'll talk about some very specific use cases.
So the general, there's sort of the partly decentralized stack
that I just described, which is really easy to adopt,
especially for the enterprise.
The other stack is sort of this full-blown,
fully decentralized stack, which we're very excited about.
Of course, it can be deployed in a private setting,
many people are.
for things like business logic or for lots of other applications.
But what's really exciting to me is the public deployment,
where you have this world computer,
as a lot of the Ethereum folks talk about,
which I think is really awesome, right?
And so you can have a decentralized database,
really a planetary-scale database that no one owns or controls,
a decentralized file system,
and its IPFS is emerging as the winner there.
and processing on top.
And Ethereum public database is certainly the leader here.
There's some other technologies, we believe,
that are really interesting and helpful there too,
things like Tendermint,
where you can actually have decentralized processing with other languages, right?
The Ethereum VM is really designed for solidity,
the Ethereum language, which is really great for a lot of applications.
But for other developers,
who have a lot of code in Python or only really feel comfortable in JavaScript,
then that's cool too because you have things like tendermint.
And these things together, right, decentralized processing,
centralized storage,
and finally, decentralized communication,
just sort of built into the protocols and stuff.
That is really the elements of computing,
i.e., you know, together, working together, a world computer, right?
And then there's the applications on top, right?
Decentralized apps, daps.
And there's already starting to be an explosion of them, thanks to especially Ethereum and ELSAR too.
And that's very exciting.
And of course, Ethereum right now isn't quite at the scale that people want, but there's several really smart guys working on it.
And I've had deep, long conversations with them, and I'm excited about the directions they're headed, right?
So it will get there.
And that's great because it means that, for example, Ethereum will be able to keep up with Big ChainDB.
You know, the faster it goes, the better it works for us.
But also, you know, we've seen it's really hard to develop a decentralized app if you don't have, you know,
let's see you're used to being a web programmer where you have, say, an MVC model, right?
So M is for model, which is, you know, instantiated traditionally as a database.
If you don't have that, maybe you only have RAM, what do you do, right?
Or a file system.
So it's much easier when you have a through and through database to work against.
Or, you know, the other way of thinking about it, there is.
the traditional LAMP stack, right?
So Linux, Apache, MySQL, PHP.
We're the M, the MySQL for the modern decentralized stack.
So this is sort of roughly like the higher level, right, sort of partly decentralized,
where the database really matters at the core for decentralization.
The rest can be centralized.
And fully decentralized, where you actually have all the pieces decentralized.
And there's shades of gray in between, for example, too, right?
You can have big chain DB with IPFS and that's it.
And by the way, too, ascribe itself will be moving towards a fully decentralized deployment bit by bit by bit.
So we're working with several organizations on that.
Quite excited.
For applications themselves, maybe I'll just give a quick sampling.
One of our companies who are working very closely is Everledgeer.
So Everlager is about diamonds on the blockchain.
ran by a woman named Leanne Kemp, who is a force of nature.
She's really awesome.
And she sees that there is major fraud and corruption in the diamond world.
And it's basically been a very opaque industry.
So, you know, demons get dug from mines.
These mines themselves are often in very corrupt countries.
Some countries have actually become so corrupt that are not even allowed to dig diamonds anymore.
And then these diamonds, you know, these rough stones that come out,
they get taken to certification houses.
Well, sorry, cut first, taken to certification houses.
There's about five in the world, that matter.
And then they get measured and certified.
And ID is even laser etched in there, actually.
And by the way, on the path of this,
there's something called the Kimberly process,
which basically is sort of to help vet if you're authentic or not.
And then once it goes past these herd houses,
each diamond has this sort of piece of paper
that's supposed to be attached with it.
And then it gets sent through various distribution channels that ultimately ends up in the hands of retailers, whether it's online retailers or the local Tiffany's.
And basically, each step along the way, there's tons of opportunity for fraud because that piece of paper could get separated from that diamond.
That ID might be lying.
And there's many, many examples that you can see.
For example, one of the major surthouses got hacked, and their data records got very different on the heat.
meals without getting hacked.
And if you look at the retailers themselves that are selling the diamonds, so we actually
worked with Everledgeer and what's interesting with blockchain technology, right?
It encourages companies that are sort of traditional computers, it encourages the share data
in new ways to get sort of an ecosystem-wide benefit.
In the case of Everledgeer, the different certification houses shared their data with Everledger,
and then we worked with Everledger to cross-reference that data against the
retailer data, and we discovered basically by cross-referencing that and applying machine learning
techniques, we discovered 7% fraud on one of the major retailers.
There was 20,000 diamonds a day going through this retailer.
It worked out to $750 million of the fraud in one year.
So that's quite exciting, and that's just one retailer.
This is pretty important, right?
It's an $80 billion industry.
The fraud was just 7% from that initial data mining.
It could be as high as 30 or 40%.
So we're talking up to $32 billion.
That's diamonds, but it generalizes.
So in general, supply chain transparency is a big deal for the world of blockchain technology.
Yeah, I agree.
I mean, this is something that we're particularly interested in at stratum is bringing more transparency and auditability to supply chain.
I want to come back to something you mentioned earlier, which I thought was really interesting, is having that database component to,
to a lot of things, a lot of the things that people are doing with blockchain is being able to
time them things and notarize.
And having that component there in that stack is really essential.
We ask yourself that question all the time.
It's like, okay, well, our clients are going to be notarizing data through the Bitcoin
blockchain, but where is that data being stored?
Or are they storing that on their servers?
Or in some cases, that doesn't really make sense because you want to have it and you want
to have it decentralized.
I wanted to come back, before we sort of wrap up here, I want to come back to the idea that, well, so with Bitcoin, you know, we have the Bitcoin Protocol and so the Core Protocol and it's released as a public blockchain.
But a lot of the new protocols that have been emerging, like Ethereum and IPFS and now VH&B, have this functionality within it so that they can be public,
but they can also be private, and there's some shades of gray in there.
So you can be on the Ethereum blockchain,
but you can also deploy Ethereum as your own private network, if you like,
or semi-private network.
And so can you talk about,
so you also talked about a lot of the public implementation of blockchain DB,
and I suppose there would be some semi-public or some fields or industries
that would implement their own implementation of,
of a big chain db and then some totally private um implementations um particularly interested in the
public one uh because i think that can bring a lot of value uh talk about how you see that playing out
like who would be the validating nodes like can anybody join and can anybody use the network
is it a public good does it yeah how does that work yeah so um we're spending a lot of energy on this
and we'll be having more announcements in coming months on this.
Overall, we will be rolling out a foundation for this,
and it will be obviously a nonprofit.
We are working with organizations as nodes
that have demonstrated a commitment to the health of the future internet.
And this is organizations that have demonstrated a care in the past
or are demonstrating a care in the future by dedicating real resources to this.
And overall, the key thing is that permissioning will be set such that anyone can issue an asset, anyone can transfer assets, anyone can read, all of that.
And that's really kind of the core things, right?
And the capacity will be so big, et cetera, that it really can be a store of world data.
There's some really awesome apps that can emerge on this too, right?
The one that gets us excited actually brings us full circle is IP, creative works, right?
So, you know, what is the mandate of a museum?
It's to preserve the cultural history of what artists have done over the last, you know, decades and centuries.
And there's cultural mandate in museums and archives and many other places, libraries, etc., right?
what does that look like in the 21st century, in the 22nd century in the next 10,000 years, right?
It's going to be digital.
It's going to, you want to have it live, but you want to have it as well where it's,
you can trust that it's going to be around, right, and that it's immutable, et cetera.
So it's a really great fit.
So we're going down a path for this, which is basically we're using a scribe, the company
itself and the product described, the team that's still working on a scribe, to drive, to drive
us to IP in this public big chain DB and working with other cultural institutions too.
For example, we continue to iterate with Creative Commons France and a lot of other organizations
that really care about this.
So ascribe, the team actually itself almost acts like a nonprofit, right?
It's quite well tuned towards the needs, right?
We have a lot of people that are, you know, art pros, you know, art curators, etc., sort of
in that space that we work with.
So, and it's not just art, right?
It's not just digital art.
It's also things like music and books and film and video and video games and all of these things, right?
This is all sort of creative cultural artifacts where the attribution deserves to be public.
And the ownership for Providence, when people want it, deserves to be public too.
This is a really big killer app.
There's also other apps too.
around this. We see a lot of companies doing really great things in identity and reputation.
And much of this makes sense on public big chain DB too or some other, you know, technology
like this at that scale. So there's others too, right? Sometimes things do need to be more private.
You know, a lot of banking things simply do banking regulations. They need to be in private networks.
That's fine, right? So there's really sort of shades of gray, but it's really,
the public big chain db that is, you know, driving us as an organization philosophically.
We're, you know, we're excited that we can help to play a role in creating this future internet,
this future society that we don't want.
Yeah, that's an interesting prospect.
I mean, why would you go, you know, get a MySQL database or, you know, it opens up so many
possibilities for developing applications without needing any sort of infrastructure, you know,
It's just like the, I guess the permissionless side of it is what gets me really going about this.
It's just being able to utilize this public good that is much like Bitcoin, much like Ethereum and IPFS and other protocols.
But yeah.
And so this foundation would, I guess, pick the validators.
Yeah, yeah.
So the foundation will be picking the validators.
How we see it, though.
So right now, the technology does not exist to have open membership at scale in a way where you can store all the data you want, right?
You have to make a trade-off, right?
And so some of the other works out there have made the trade-off to say, we're going to allow, you know, thousands of notes, but at a severe cost to scale, right?
Severe. We've said it.
But they're trying to scale up, right?
And, you know, some of the efforts I commend in big ways, right?
I think what like, you know, things like Casper and special okay, awesome, right?
Great.
And rock on.
There's another way to approach it, though.
So rather than starting with, you know, thousands of nodes as validators,
remember, you can have hundreds of thousands of thousands of all that as clients.
So rather than starting with thousands of nodes as validators and trying to add scale,
start with a smaller number of nodes as validators while still supporting hundreds of thousands
or millions of clients.
So start with a smaller number of validators, have scale from day one.
and then evolve the technology to support a larger number of validators.
And that's exactly the direction we're going, where things can be open membership,
you know, two years, three years, four years from now, right?
With things like the stellar consensus protocol or other protocols like that, right?
So there's some emerging stuff right there, too.
But it's sort of, at the end, we will all end up at a place where it is broadly decentralized
even in terms of validation, as well as
the scale that the planet deserves, right?
No one is there yet.
We have, you know, a scalability that the planet deserves, a scale,
and a more limited set of notes.
What's cool, actually, is we have a prototype for what we've doing,
what we've been doing, an archetype, if you will.
It's called the DNS, right?
The DNS has been powering the Internet for...
The decentralized DNS has been powering the Internet since the late 90s.
Before that, you know, it was actually going back to the Internet,
the 80s, early 80s, it was actually a text file.
You know, one guy was maintaining it, you know, for many years, and then he switched over to
his buddy, right? And it's kind of amazing, right? And they just kind of had to do it. But,
you know, as the internet got more important, it made sense to decentralized control, and this is
what happened. And so we've been working, the technical architect of the DNS, his name is David
Holtzman. And we've actually been working closely with him for more than a year, because this is
actually, you know, a linchpin technology. Now there's things,
You know, when they created the DNS, there was the business side to Jim Rat, we work with him too.
They handed it off to ICANN.
And, you know, from 1999-2000 to now, you know, ICANN has devolved.
It's not as good as it was 15 years ago, right?
There's those challenges.
And, you know, there were some design challenges technical in the DNS.
It wasn't perfect.
But it was pretty good for the time.
Now we have another 15 years worth of learning from crypto, et cetera.
and we have all the lessons from ICANN.
We're drawing in those lessons for this next-gen database for the planet
and something that's more broad in scope than the DNS.
At the very least, for IP, because that's what Ascribe is doing.
But we also have a lot of other people who are very interested in using it
for a lot of other applications, right?
So that excites us.
Like I said, it's sort of, it drives us.
It's what gets us up in the morning.
It's, you know, what makes us excited for the future.
Well, that's a great note to end on, Trent.
Thank you for coming on the show once again, and hopefully we can have you on again in the future as Big ChainDB continues to evolve.
Awesome.
Thank you for having me.
I really appreciate.
And thank you to our listeners for listening.
We are part of the episode of, sorry, we're part of Let's Talk Bitcoin Network.
You can go to let's talk Bitcoin.com to find all kinds of shows about blockchain, Bitcoin, and decentralized technologies.
You can listen to episodes of Episcentor Bitcoin every Monday.
They come out on your podcast feed as well as.
Stitcher, YouTube, or anywhere where you get your podcast, whether it be iTunes or on Android or
wherever.
And to our loyal listeners, you can always leave us a review wherever you can.
It could be on iTunes or any other platform.
If you do, just send us an email at show at episode or Bitcoin.com and we will send you
a T-shirt and some stickers.
And we thank you for the reviews that you've left so far.
It's been really helpful in getting the show out to more people.
So thanks so much.
And we look forward to being back next week.
Thank you.
