Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Trent McConaghy: Ocean Protocol – The Platform Making Waves in the Data Industry

Starting point is 00:00:00 This is Epicenter, episode 366 with guest, Trent McConaughey. Hi, I'm Sebastian Guizuio, and you're listening to Epicenter, the podcast where we interview crypto founders, builders, and thought leaders. On this show, we dive deep to learn how things work at a technical level, and we fly high to understand visionary concepts and long-term trends. If you like Epicenter, the best way to support us is to leave a review on Apple podcasts. And if you're on a Mac or iOS device, the easiest way to do that is to go to epicenter. dot rocks slash Apple. Today my guest is Trent McConaughey, and longtime listeners of the podcast will recognize

Starting point is 00:00:46 Trent as a repeat guest. In fact, this was his fourth time on the show. And today he comes to us to tell us all about Ocean Protocol, which he and his team have been working on for some time now. In fact, Ocean Protocol just released its V3, which builds and improves on previous versions of the platform and now lives natively on the Ethereum main chain. Ocean Protocol is a platform to create data marketplaces. And if you haven't noticed, data is a big business. By some estimates, it is over $500 billion in Europe alone. And what is obvious to everyone

Starting point is 00:01:27 who is observing this space is that there's a fundamental misalignment between those creating the data and those consuming it. It's a very one-directional value for, flow in terms of those who are providing the value, which is the data itself, and those who are extracting that value. And those are big tech platforms who typically use that data to sell signals and advertising to brands and merchants. As Trent describes it, there is a shadow data economy just as there is shadow banking. And we need to flip this model on its head. And if you haven't watched the recent documentary by Tristan Harris, the social dilemma, I would encourage you to check it out because it really kind of summarizes this problem in a very concise way and in a way that's easy to wrap your head around. Anyway, the Ocean Protocol provides an alternative to this model, an alternative in which data providers, users typically can sell their data on the platform to whomever is interested in it, whomever wants to buy it.

Starting point is 00:02:34 and that data set is represented as a token. And that token's value is a function of the usefulness of that data to whomever wants to buy it. So it's a much more equitable market where the value flow is more cyclical than one directional. This was such a fun and fascinating conversation. I always enjoy speaking with Trent because he's such a visionary and just all around, a nice guy. And I hope to see this model develop. And I hope to see it really take hold because there's so much. much at stake here. And the current model has so many things that are broken with it. We're only

Starting point is 00:03:09 beginning to see the impacts that that has on our society. So I think that lots is at stake here. And Ocean Protocol provides an alternative, if not a solution to this problem. So with that, here's my conversation with Trent McConaughey. I'm here with fellow Canadian Trent McConaghy, who is a repeat guest on the show. In fact, I was looking just before Trent has actually been on the show four times. So he holds second place after Sean Jones, who of course was our regulatory affairs correspondent early in the day. And so welcome back for a fourth time on Epicenter, Trent. Great to be here. It's going to be fun as usual. Yeah. So actually, the last time you were on, you surprised me with Fred Ursham,

Starting point is 00:03:59 if you guys were at some conference. And you were supposed to come on to talk about IPDB. And And then little and behold, you're like, oh, I got Fred here. Let's talk about data and machine learning and data rights on the blockchain. And that was a really interesting conversation. And I was listening to a little bit of that in preparation for this. But for those who perhaps knew the show or haven't heard your previous episodes, remind our listeners of who you are and what you're doing and how you got here. Sure.

Starting point is 00:04:29 So grew up in rural Canada, trained as an electrical engineer, computer scientist. And then I did a couple AI companies towards computer chip design, as well as a PhD on that as well. In 2013, in 2010, I actually bought some Bitcoin, lost those private keys long ago. In 2013, I discovered, you know, really what blockchain meant, hanging out of Room 77 in Berlin, the eponymous room 77. And kind of, you know, that was the beginning of really, truly diving down the rabbit hole and started hacking away at ideas and stuff. and it led to a scribe then Big ChainDB and IPDB then Ocean now. So a scribe was basically a project about asking the question, how do you collect digital art?

Starting point is 00:05:13 And how do artists that create digital art get paid? And this was, you know, back in 2013, 2014, one of my co-founders was a professional curator, had worked in the Louvre, all these things. So we did this and, you know, we released beta in 2014 and worked with a lot of leading digital artists and went live in 2015 in full production. And, you know, it was really fun, really enjoyed it. Great cause, all that. But scale was an issue.

Starting point is 00:05:39 We had built on top of Bitcoin blockchain. Ethereum didn't even exist at first, right? And, you know, it was clear that the scaling solutions weren't going to be coming anytime soon. So we said, okay, we don't need full smart contract platform, etc. Anyway, so let's instead take an existing distributed database and decentralize it. So at first we wrapped RethinkDB. And then that pivoted to Mongo, and we basically put BFT algorithm around it, tendermint, actually, by the end.

Starting point is 00:06:07 So that was Big ChainDB, and that was in the sort of 2015, 2016, 2017 era. And that also, you know, was a pretty useful tool. It was very appealing, especially to enterprises, looking to dip their toes in blockchain, who kind of understood databases and so on. We worked with a lot of enterprises in that era, and it was pretty neat. At the same time, too, we saw that we really wanted to have. sort of this public blockchain database utility, if you will. So because Big CheneDB software was set up in a federated way,

Starting point is 00:06:41 now we call that a POA type approach, then we needed to do a lot of governance work, sort of lining up many people to be nodes, et cetera, et cetera, et cetera. And around that, we created a nonprofit organization, Germany-based one called IPDB, Interplanetary Database. And so that was basically, the nonprofit arm of Big ChainDB. This was all tokenless, by the way, as well. We didn't see the need for a token. There really wasn't. So we didn't have one. And so basically that went along,

Starting point is 00:07:11 and it's still around to this day as well. Finally, in 2017, we saw late, well, actually even 2016, but especially 2017, we were starting to do projects and think about how do you reconcile AI with blockchain. And at first just kind of exploring, but then people were coming to us saying, hey, you know, like Toyota said, we're doing autonomous driving. We really need to have way more data than Toyota itself has. So we think that a decentralized data exchange could be really, really useful. And, you know, what you wrote about decentralized exchanges, Trent, this is great. Let's, you know, build a prototype. So we built that and shipped that in spring of 2017. And just in the last show, Sebastian, that's probably when I talked about that. We had just announced it

Starting point is 00:07:53 that same week. In fact, that's right. I think when I came on with Fred, I had just come off the stage of announcing it in consensus. Exactly. Yeah, I remember that. So, so basically from then, though, and also in that episode, we had talked about data exchanges. And within a month or two after that, you know, the word ocean, we came up with the word ocean for this project that had been brewing inside, inside the organization. And it was really addressing, we basically, when we were thinking about AI and blockchain and all the different problems that you could solve, all roads led to data exchange. You know, there's problems related to social media, where if you are a, uh, a, Facebook or Google, you basically have all this data of all the people that's kind of signed it over

Starting point is 00:08:33 giving you permission to do whatever you want. And it's really, really harmful to society in many ways. You could go on and on or just watch the social dilemma or whatever. But there's a lot of other issues too, right? If you're an AI startup and you're trying to build an AI model, you're going to be star for data. You just don't know where to get the data from. And the thing is AI loves data with modern machine learning based AI, deep learning and all this sort of thing. To get more accuracy, you need more data. And you know, you want to go from, you know, 30% error to 10% to 1%, to 0.1% to 0.01%, depending on the application. And for that, you need, you know, 10x more data, 10x more data, 10x more data, 10x more data. And AI people don't have the tools for that.

Starting point is 00:09:12 You know, how do they get more data? And there's sort of been the state economy, but it's just like there's been a traditional banking economy that's all shadowed, right? You know, leading to the 2008 financial crisis and many financial crises before that. Well, we've got kind of the same thing in the world of data, right? There's a shadow data economy. And we have a data crisis now. That has been sort of brewing over the last 10 years, 20 years. And we haven't really noticed it because it's sort of been like a frog boiling in a pot of water.

Starting point is 00:09:41 At first it wasn't boiling, but bit by bit. Now it's clearly boiling. And people are asking what to do. And just as crypto said, okay, there's this shadow banking economy. Let's use crypto to try to create an open money economy. And we have that not just with Bitcoin as a store of value. value in Ethereum as a foundation, but also defy on top and all the things there. So what Ocean's goal is to say, hey, there's also a shadow data economy.

Starting point is 00:10:08 And let's use crypto tools to create an open data economy where there's shades of gray permissioning, privacy, and so on. Yeah, thanks for that recap. And all these things, you know, that you mentioned that you worked on, so first described, then big chain DB, then IPDB. To me, these things were all, these projects were all so visioning. like first putting art on the blockchain. I mean, this was in like 2014, I think the first time you came on. This was such an interesting concept and an interesting idea. I think it was really the one of the first times that, you know, people were confronted with the idea of a non-fundable good on the blockchain. And this was before Ethereum. This was before NFTs. And then in big chain DB, I mean, to me, like I latched onto that idea. And of course, IPDB afterwards, I thought that it was so powerful, you know, as a as a developer, like as a web developer, I love the idea of having a permissionless publicly available database available on the

Starting point is 00:11:03 blockchain on a peer-to-peer infrastructure, I guess, is a better way to put it. And same for IPSFS. I have this sort of similar attachment to that idea. Was that concept too simplistic? I think about some of the shortcuts that we sometimes take when trying to innovate with new technologies. And I wonder if the IPDB idea was. perhaps simplistic, like let's saying, let's put newspapers on the internet, but we'll just do that

Starting point is 00:11:32 in PDF format, right? Like, not considering what the technology affords and the types of innovations that the technology provides. Do you think that that was perhaps the case and that Ocean sort of builds like a new model that really leverages Web3 technologies to the fullest? So I think overall, first of all, simple is good in general. But it's, you know, you want to make things simple, but no simpler, right? to paraphrase Einstein. And it can be hard to arrive at simple. You know, Mark Twain has this quote that I just love. He said, I'm sorry, I had to write you a long letter because I had no time to write a short one. And I think this is really true. So, you know, simplicity is hard. Simple as hard. I think in the

Starting point is 00:12:17 case of Big ChainDB, it was a very simple conceptual idea, leveraging, you know, all this great infrastructure developed for the distributed systems world. Like how does Facebook actually scale, right? So leveraging those sorts of technologies. That part was sound. I think the challenge that Big Cheney B and IPDB had was the sheer underestimation of how hard it is to roll out something that is permissioned. The technology is easier, but the politics and everything else around it is much, much harder. And I've seen team after team after team say, hey, let's go for something that's permissioned, thinking that they can use that as a stepping stone for permissionless.

Starting point is 00:12:55 and it's just so much effort. We even did that for the first parts of ocean and stuff, right? We actually had spent a lot of effort doing the politics, if you will, talking to a lot of teams and organizations to get them to be nodes for IPDB and so on. I remember those conversations, yeah. I mean, Stratham was part of that initial group of people, and we were speaking about exactly this, you know, how do we set this up? And, like, you know, governance.

Starting point is 00:13:20 And that was the challenging part. It's like, not the technology. It's human interaction and governance. this is the most complicated. Exactly. So, you know, the governance as well as how does this relate to the law? Because once you have a POA or a federated network, then you have to, you know, it has typically much stronger ties to identity. And then these organizations that are running the nodes have to, they have higher degrees of liability. And then if you start to put together basically an agreement for sort of a node operator agreement of, you know, what rights and

Starting point is 00:13:52 responsibilities they have, the responsibilities and the liabilities start to away the rights quite a lot, actually, unless you're very careful, right? And if they start to get too much reward, on the other hand, from just running a node, then it has the risk of becoming a cabal as well. So it's really a tough thread to solve. And frankly, it wasn't really, like, we always had the plan with Big GDB to be permissionless anyway. It was just a stepping stone. So we saw that, okay, we could develop this more or we could go towards where we saw there was more, you know, more opportunity, in a sense, with Ocean and keep maintaining BGCNDB. It's not like we just shut down one and went for the other. We actually evolved from one or to the other, basically. Now, you know, Ocean is

Starting point is 00:14:35 on a permissionless and even the smart contracts themselves are permissionless by virtue of not having any built-in upgradeability. And it's basically, it's radically simpler from governance perspective. There is still governance, right? But, you know, on the substrate side, you have to do a hard fork. And one level above, you have to convince your community to switch over from one thing to the next. And I view that as a really good thing, right? It's a high friction for changing decisions that should take high friction. There are other cases where you want to have low friction, even at the substrate level.

Starting point is 00:15:08 And I'm sympathetic of all of those approaches as well for on-chain upgrades. Great. But for oceans needs, you know, fully permissionless at the substreet level and permissionless at the smart contract level is what makes sense. And that's where we're right now. Cool. Let's come back to this idea of the shadow data economy. Can you describe in your view what that means and what that looks like? And for people who perhaps are, I'm sure our listeners are all very aware of, you know,

Starting point is 00:15:33 the issues that exist with regards to people's data. And, you know, you talked about the social dynamic. I watched that movie just this week. Describe what the problem is here and what lurks in the shadows, if you will. The big problem overall is that. our personal sovereignty is at risk. And by that, I mean our ability to take action and make decisions without fear of basically oppression or, you know, negative things happening to us. And that sounds very kind of broad and vague and it kind of is, but then you can kind of drill in to what this means.

Starting point is 00:16:13 There's a quote from the World Bank two or three years ago that says, you know, the digital economy is the data economy. And that reflects this idea that as, you know, with every passing year, the world is becoming more and more digital. And what's powering the digital world is, you know, data itself. And so there's sort of this data flowing everywhere. But we haven't really given it its due in terms of this super important thing that we have to keep tabs on. And because of that, you know, the average citizen doesn't want to think about it. You know, they just want to share photos and stuff. And I get that. That makes tons of sense. But the thing is, there's businesses that have emerged that take advantage of this. It's an arbitrage. So they understand that citizens don't know about this or citizens kind of have given up on, even if they do know about it, they can't do anything about it. So these companies say, okay, well, let's take, you know, group these people into a thousand or a million or a billion people and then mine that data and then use that to sell stuff back

Starting point is 00:17:08 at them or basically change their decision-making behavior somehow, sometime, right? And it's at the level of the company. It's also at the level of the nation, right? And there's interplays as well. You know, there's companies out there and individuals and organizations that try to sway election results, right, by leveraging data in various ways. So there, you know, there have been flows of data for a long time going back to, you know, even before the early days of the PC, you know, from the birth of the computer. But it hasn't mattered for a long time because we simply haven't had the scale. And even in the internet, the early days, it was so small.

Starting point is 00:17:39 It didn't matter. People hadn't figured this out yet. But, you know, starting in the sort of early 2000s, that's when sort of AI. people realize that data was really, really important. And then, you know, Google themselves published this paper in about 2007 called the unreasonable effectiveness of data, realizing that, okay, if they can actually get more data, then they can have more accurate AI models, which basically turns into money for them, right? And the problem is that they are incentive misaligned between the people and Google. There's an incentive misalignment between the people and Facebook. Why? Because of ads,

Starting point is 00:18:12 right? In their case, they're trying to sell more ads, which basically means, you know, learning as much as you of what you can so you can be as, so that the ads can be as targeted as possible. And that, of course, so they're trying to gather as much data as they can. This data, of course, is also flowing into Prism, et cetera. You know, after the Snowden revelations, it's not like Prism went away. The government doubled down and there's a 10x or 100 X there. So our decisions are getting shaped more and more and more by companies with a profit motive that is against their interests. And it's very, very subtle, right? Twitter does it, Facebook does it, Google does it, all this, Instagram, all these.

Starting point is 00:18:50 It's unfortunate this way and it affects our decision-making ability. But like I said, it doesn't stop there, right? It's not just, you know, okay, you might see these things there and might change your thought, but it actually affects the outcomes of elections, which then leads to, you know, presidents being hired who then lead to basically ignoring pandemics, right? And you have a quarter of a million, you know, you have more people killed in the the pandemic in USA than the Vietnam War. And one of the big levers that affected this is because of this interplay between data and profit motives and this misalignment of incentives.

Starting point is 00:19:26 So that's kind of like the sort of the problem we're trying to solve from the perspective of a misalignment of incentives and so on around data. And you can view it as sort of a tear it down thing. But I also like to view things in sort of a build it up way. And I really prefer that. And this is where I see, okay, what's, the powerful way for people to think about this. And it really comes down to not your keys, not your data, your keys, your data. So riffing on Andreas Antonopoulos on the Bitcoin quote. And this basically means sovereignty over your personal data, right? So you should be able to control your data. You should be able to choose who sees what, when. You know, this also has

Starting point is 00:20:04 the side effect of controlling privacy. Privacy really is just about the shape of information flows, right? And that comes down to a question of access control. So if you do that at an individual level, then you can also group it into higher and higher level groups, families, small companies, larger companies, enterprises, cities, and even nation states, right? And they're all grappling with this. So if you can give them the tools to control and shape the flow of information about them and about the things around them, then it can lead us towards where there is not incentive misalignment and, in fact, there's incentive alignment where, you know, if they share more in a healthy way and so on, then they can actually get rewarded. And that's really the sort of build it up, of ocean is about, you know, tools for control of your data at a personal level all the way up to a national level in a way that helps to serve data sovereignty. I think that incentive misalignment is so key. And I mean, I think a good way to sum it up, right, if you kind of go up, you go up the stack of this problem and it's kind of fresh in my memory, but like I think the social dilemma documentary that I watched earlier this week kind of

Starting point is 00:21:12 sums it up pretty well, or at least it makes a good attempt at explaining it in layman's terms. Social networks, platforms that control large amounts of data are incentivized to use, well, the model that was created around that data is to sell advertising. The advertising is sold to companies that want access to people's information in order to target those customers more effectively. And at the same time, the companies that hold the data are incentivized to keep their users on the longest as possible, because that's how they might. maximize profit on the advertising side of that marketplace. The way that they achieve that goal of keeping the people on the platform is the longest, well, that goal was somewhat discovered

Starting point is 00:21:55 by machine learning algorithms that figured out that the best way to keep people on the longest is to show them outrageous shit that sends them into, you know, that essentially divides people. That has an effect on, you know, we talk about sovereignty, right? You talk about people's Sovereignty. Well, I mean, that exists at a conceptual level. The risks there at a conceptual level is sovereignty also in a general sense, as in the sovereignty of societies and countries, etc. Because people who are divided, obviously, are more vulnerable and at risk. And I think it's a huge problem that if you get really into the weeds of it, it's like, okay, well, incentive alignments are broken here. Well, you know, is it platforms or is it? is it simply the model? Is it capitalism that's broken in this particular model? It's kind of like a, it's sort of a chicken under the egg problem here, but I wonder if as long as there is sort of unfettered capitalism and unregulated capitalism, if this will ever change and if there will be better models that emerge. Of course, like Ocean Protocol is one antidote to that. Yeah. I have a couple

Starting point is 00:23:04 answers to that. So, I mean, one thing that capitalism recognizes that is super unhealthy for any market is monopolies, right? So that's why the Sherman Aditrust law was brought out. And then after a couple decades, they finally gave a teeth. And that's what broke up all the different pieces of Rockefeller's empire, for example, as well as led to the breakup of AT&T and so on. We have companies that are now far larger and more dominant than basically, you know, the U.S. steals of the world back in the day. Yet the the regulators have forgotten that they can actually apply this tool to address this. And that's, that's an issue. So this is basically not a, that's not a problem with capitalism per se. It's a, it's a problem with the enforcement of antitrust. So that's one piece of the tool. I want to mention

Starting point is 00:23:49 the other one, which is to me, the happier answer. And it's co-ops. So I grew up in rural Canada where the local grocery store was a co-op. It's literally called the co-op. We had those too. They're all closed now, though. Oh, wow. Yeah, yeah. Well, and the local bank was called the credit union, and that was the main bank where I grew up. And basically, there was kind of co-ops for everything. And these co-ups allowed, you know, a thousand, 10,000 people in the area to each collectively own that grocery store, for example, and then get the dividends from the grocery store, and it was acting in the best interest of the people and making a bit of profit on the side. That's a very healthy thing. There was another thing in Saskatchewan. In fact, at the time, I think

Starting point is 00:24:32 was one of the largest co-ups in the world, if not the largest, called the Saskatchewan Meat Pool. It had, I believe, on the order of 100,000 farmers as members. And what it was was collective bargaining for farmers. So you sell your wheat and your barley to the Saskatchewan Meat Pool. Then it goes as a large entity. It ships things by train to the ports. It ships things by ship to Asia and everywhere else through the ports. It does the marketing, et cetera, et cetera, et cetera.

Starting point is 00:24:56 So it was a win-win, right? The farmers basically got some dividends from the Saskatchewan Wheatpool. and they had a market for the grain. They had distribution and all of this. And it served everyone. So this is actually a capitalist notion, but it's capitalists saying cooperation is a good thing, right? And what I really love about the blockchain space is that it makes things like this much easier to implement as well as experiment with, right? To have, you know, collective organization of people towards the betterment of that group of people, right, in the form of DAOs and otherwise, right?

Starting point is 00:25:28 That's my kind of two answers. think we need to rethink capitalism. There's been a lot of calls for that. And even if we did need to rethink, I think it'd be very, very hard the way it is. But instead, we can say, okay, on one side, make sure that antitrust is respected and, you know, the privacy laws of Europe, et cetera. And in the other hand, double down in co-ops, right? And I think they could actually become much more common in this new age, given how much easier it is to form them in the world of blockchain, you know, like we're seeing, for example, the first venture capital firms that are Dow's that are actually truly working, right? I'm a member of Medicaretale Ventures,

Starting point is 00:26:05 and it's an amazing organism to observe. So let's talk about Ocean Protocol, it's switch gears a bit, and so describe what is Ocean Protocol and what does it try to achieve? Yeah, so I guess the goals are, like mentioned, address some of the issues around data in society, misalignment of incentives, as well as give tools for empowerment of people at the level of individuals, families, all the way up to the levels of cities and nations, right? Okay. And perhaps in the context of what we were talking about earlier in the model, how does Ocean sort of flip the model that we were describing earlier?

Starting point is 00:26:44 At the heart of it, it's, you know, your key is your data, right? And then you can choose how to share that where, but also you can turn this data into an asset, if you like, right? because, you know, the way we view it, data is IP, just like, you know, if you write a song and record it, then that is a piece of IP, then you can monetize via music publishers, etc. or via Spotify, whatever you want, right? Same thing with books or podcasts or whatever. So data falls into that same category. And of course, data is useful in the sense of, you know, people who build AI models need data in order to make those models accurate. They need more data.

Starting point is 00:27:24 and once those models are accurate enough, then they can monetize it in various ways. You know, more, you know, safer self-driving cars, more efficient traffic lights. A lot of kind of almost mundane things, but this stuff matters, right? So Ocean basically at the heart then is a set of tools to make this easier to do. At the heart, it's access control. So basically it makes it easy for people to establish a data set or data service as an asset. and then to share that asset or sell it, transfer it, whatever, where there's permissioning around it. And the way that we do that, as of the most recent release of Ocean, B3, the way we do it is every

Starting point is 00:28:05 single date of service is its own E or C20 data token. And by using that, we leverage the full infrastructure of Ethereum. There's a lot of really cool implications. And it also serves as, you know, at the heart, it's access control, right? So it's sort of like Unisox. You know, with Unisox, You can buy 0.1 unisox. You can buy 150.3 unisox. But if you have 1.0 unisox and send those to the uniswop team, then they will mail you back in the physical mail your pair of physical socks. And so you can redeem those unisox with that pair. Same thing with data tokens.

Starting point is 00:28:41 You can send 1.0 data tokens to the publisher of that data asset. And in return, they will give you access to that data asset. You know, you can still speculate. You can buy 0.1 data tokens. You can buy 100 of them, whatever you want. But in terms of redeeming getting access control, it's that magic number of 1.0. So that's maybe a good summary. Ocean is a set of tools to enable the Web 3 data economy, which is all about open while reconciling privacy.

Starting point is 00:29:07 You talked about that every data set is an ERC20 token. What is a data set? Help us to understand what that means specifically. So it's every data service. So data services can be data sets. So I'll start with a data set, though. It could be simply a PDF. It could be a spreadsheet or the machine learning version, which is a CSV file.

Starting point is 00:29:29 It could be a piece of music. It could be 10 gigabytes worth of files behind a directory. So in Ocean, we actually have, it's quite general. So you basically are simply selling a, well, we have a few different ways of defining a data service and it's flexible and it's going to expand over time. To start with there or two, one of them is a static, URL where it could be basically say a CSV file that you have sitting on Google Drive. And you then sell access to that CSV file as defined by that URL sitting on Google Drive. And of course,

Starting point is 00:30:04 it can be a decentralized network as well. So that's one example. The second example of the data service that we support is, this is in the privacy preserving angle, is compute two data. So rather than someone getting a URL and then downloading that CSV or whatever, instead, It's saying I'm a publisher, maybe I'm a big enterprise. I'm going to sell my data, but people aren't going to download it. Rather, they can go and run an algorithm right next to my data and maybe just compute a simple average from a particular column or a median, or maybe something fancy, building a linear model or maybe a fancy deep learning model, whatever, right?

Starting point is 00:30:39 So those are the two services right now, types of data services that Ocean supports. And with time, we will support more and more and more. You know, streaming data is coming down the pipe as one of the most important ones. But we see that this thing can get, there can be dozens or even hundreds of these things. So that's what we mean. And Ocean itself is sufficiently general. These data tokens, they're E or C20 with one extra field called blob, where it's just a string, basically a bunch of information inside.

Starting point is 00:31:07 And that basically helps to support the various types of service provisions. So you mentioned something that I'd like to touch on. You said that the data can be stored on Google Drive. When one thinks of the decentralized data marketplace that we'd wish to have, I don't think that storing data on traditional cloud platforms is the first thing that comes to mind. I mean, in terms of availability, censorship resistance, privacy, etc., help us understand the choice there of storing data on these platforms and what kinds of things as Ocean do to prevent some of the excesses

Starting point is 00:31:45 that we've been talking about since a while ago? Well, I mean, at the end of the day, you have to be pragmatic about understanding, you know, what tools people are using right now and then providing a bridge to them to sort of, you know, walk them bit by bit over this bridge. It's not like, you know, you snap your fingers and suddenly you're in a full permissionless, decentralized world with the whole planet behind you, right? I should mention, yes, you mentioned Ocean Market. So overall, Ocean is these tools, which is smart contracts as well as libraries,

Starting point is 00:32:11 Python and JavaScript libraries, and then React hooks on top. And then on top of that, we've shipped something that's a consumer-facing web app called Ocean Market. And it is a place where people can go, market.oceanprocical.com. It's a place where people can go to publish data assets and to consume them, like once you buy them. And, of course, you know, swap back and forth trade on them and you can even stake on them. And I can get into that a bit later. So then towards your question, if you think, okay, how do we make Ocean Market easy to use, right, for people? It's Web3 Native.

Starting point is 00:32:43 you know, you sign in basically by connecting your wallet. That's all that. But for people that aren't running their own, you know, hosted service with datasets, how can they actually share their data? You know, maybe they don't know about file coin or Ethereum Swarm or anything just yet or don't know how to use it, but they do know that they have a bunch of data on Google Drive or Dropbox or something, right? So you make it easy for them to publish their data. And you might say, well, that's not decentralized. The thing is, this is at the very leaf node, right? So it's that one. single person that has that data. So the connecting platform is decentralized on the, you know, permissionless Ethereum substrate as well as the smart contracts on top. But that one final

Starting point is 00:33:22 person at the edge node is centralized by nature of it being supplied by that one person, right? So even if you say, okay, we're going to oricalize this and stuff, it doesn't really help because it's still one person supplying the data or one entity, right? And that's okay, right? If they are supplying the data and people don't want it, they won't buy it. If they're supplying the data and they do a bad job where they have bad availability, people will stop buying. So they're incentivized to do a good job there. And I do see, though, that, of course, right? Like right now, people could store something on Google Drive and Dropbox and a few other, you know, decentralized services, wrap it all up, make those all pins with IPFS and then just

Starting point is 00:33:58 give the IPFS URL, right? So, and that's a nice bridge. IPFs service is a great bridge because you can have storage on centralized and decentralized storage mediums while at the same time providing that single URL. So it's all about a bridge to get people across towards eventually this sort of public utility network infrastructure where everything is permissionless, including, you know, the cloud storage, et cetera, for sort of all of humanity. But we have to get there one step at a time. Great. Yeah. I mean, I think that's a good approach to allowing people on board with more simplicity is to open it up to all the different types of storage providers that that exist. And if within that, we also have decentralized storage providers like IPFS or

Starting point is 00:34:40 SIA or some of these other ones, and, you know, that, at least provides alternatives so that we can have that censorship resistance option if we wish to choose it. Exactly. And just to add there too, like, Ocean doesn't care because it's just with the URL, right? So people can provide a URL that can point to a service on CA or an IPFS, which is wrapping some of the storage service or a Google Drive or Dropbox or whatever. So Ocean doesn't care. So let's talk about the different stakeholders in Ocean. So obviously there's like the owners of the data, you know, those who provide that data point that you URL you mentioned. Then there's also those who verify the data. There are the curators. Can you

Starting point is 00:35:18 describe all of these different participants, these different stakeholders, and how do their incentives align you to create, you know, this, this platform? Absolutely. So at the heart of it, you know, the heart of the value creation is a publisher publishes a dataset and someone else comes along and buys a data consumer. So you want to connect those data publishers with the data consumers. That's the heart. And the data consumer, when they consume it, they're adding value to their business or otherwise, right? So you want to make sure that that loop is a solid connection. So ocean market and the other markets that can be created. And then ideally that business is providing also value to the publisher, right? It's sort of this closed loop. Exactly. Well, this is the thing,

Starting point is 00:36:00 right? If data consumer doesn't, you know, finds that the data that they buy from the publisher is garbage, then they'll just stop buying from the publisher. So the publisher themselves is incentivized to create quality data, right? The heart of capitalism, frankly, right? In that way, and that's actually a good thing. You know, it aligns towards value creation that way. In between is at the lower level, of course, the main actor is simply the connectivity of Ocean

Starting point is 00:36:25 at the, you know, substrate level, Ethereum Mainnet right now, the small contract level, and then the marketplace is on top. And, you know, we've shipped a first marketplace ourselves as Ocean Protocol Foundation, which is, it's called Ocean Market, I mentioned it. And that's basically, you know, this, you can view it as a multi-sided platform where the two most critical

Starting point is 00:36:45 participants are the publishers and the consumers. However, you want a market to form, you want price discovery, all of this, right? And so for price discovery, you need the data services themselves to be sort of assets, you know, first class assets. And within it, within Ethereum, of course, you have kind of two choices. You can make them non-fungible or fungible. But if you think about data, right, if I'm a publisher publishing a data set, it's not just like one person is going to consume. You're going to have 10, 100,000, right? So it's clearly a more fungible thing. So basically we have these data tokens that are then there as assets, and it's those that flow from the publisher to the consumer. Now, you can say, okay, well,

Starting point is 00:37:28 how does the, let's say that the consumer, data consumer is looking at data sets, but they want to have a good feel for like what's a good data set versus not, right? This is where curation comes in. So, so how do you go about, you know, leveraging crypto infrastructure for good curation? Also, how do you go about leveraging crypto infrastructure for price discovery, right? And this is a question we would get again and again and again in ocean from, you know, from the very earliest days, 2016, 2017, to until basically shipping V3, like how do you set the price. And there's lots of, lots of theories about how to do this, right? You know, you could have auctions. you can have royalties, you can have order books, you can have automated market makers,

Starting point is 00:38:10 et cetera, et cetera, right? So what do you do? And we decided with all this that sort of taking a page from the defy playbook, let's put in an automated market maker. So when someone goes to publish their data set, it becomes a data asset, its own token. And then in the same, right after that, they have the option to publish a pool. And so we've got balancer technology into the hood. So they publish a pool that is the data token and ocean token. And they put in initial liquidity of the ocean token. And now you actually have an authentic price signal between basically as a ratio of the number of data tokens to the number of ocean tokens in there.

Starting point is 00:38:51 Other people can come in there and stake. And also people can go in and swap back and forth ocean tokens for data tokens. People can stake and instake to the pool. And the cool thing about AMM's automated market maker. marketers, adding liquidity, being a liquidity provider, is the same thing as staking, right? And this is very different than sort of the previous idea. Normally staking sort of locks up and slows down the velocity of tokens. But in AMMs, this sort of magical, beautiful thing happens where by providing liquidity,

Starting point is 00:39:20 it's staking that actually increases the velocity and the usefulness. So this is actually what ocean market has under the hood. It's these balancer pools. And that's providing this. It's basically helping to form this market around this specific data. token, right? In addition, if you think about it, you know, you go to Balancer website or the Uniswap website, and by default, if you want to look at all the pools, it will sort by which pools have the highest liquidity. That's a really good proxy for the quality of a given token. It's not perfect, but for sure,

Starting point is 00:39:54 you know, all the garbage spam stuff is at the very, very bottom with like $2 liquidity, so it's already gone. And it's a pretty good first cut, right? So that's actually also what Ocean has itself, if you think about it, by default, you go to Ocean Market, and it actually shows you a sort of list based on the amount of liquidity in there. So that's a signal for the quality of a dataset. It's really hard to arrive at a perfect signal for a quality of data set. What you need to do is provide a bunch of statistics, authentic signals that people can use to assess whether a data set is useful or not, useful to consume, but also useful to invest in,

Starting point is 00:40:27 right? And this is a key thing. So I'm going to mention there's sort of, in terms of the stakeholders, system. I've talked about the foundational ones, which is the publisher and the consumer data consumer. There is the liquidity providers, which are the stakers, which are the curators. It's the same thing in Ocean. And implicitly, they're also doing some soft speculation because right now there's a 70-30 wait. So anytime someone stakes in one of these pools, it's 70% Ocean, 30% data token. And that's simply because, you know, to avoid price fluctuations and to align incentives a bit better.

Starting point is 00:41:01 But besides that, people can just purely speculate if they want, right? They can invest in a data token and hold. If they think that the person who has published this data token is a high quality person, if they know them. If they think that, you know, maybe they try out the dataset, they see it as high quality, great. They can buy it and they can hold it. So basically, this is also a key person in the overall ecosystem. And so at the end, you have the publishers, the consumers, the LPs slash curators slash stakers and the speculators. And this is sort of the heart of it. And then you've got this across many, many, many different data tokens slash pools. And this is basically how the markets are forming. And what's happened, you know, we released Ocean Market over three weeks ago. And it looks like a microcosm of crypto itself, right? You've got, you know, you've got people that are doing the equivalent of an ICO, which is an

Starting point is 00:41:47 IDO, initial date offering. They're promoting on a Twitter. They're actually announcing the launch of it, you know, 24 hours before or a week before or whatever now. And then people pile in when the thing happens to invest and speculate and so on. And at the same time, and then you have data shillers and you have rug pulls and you have fraud and all of these things that you have in the broader crypto ecosystem. But you also then, you know, in this, all this messiness, you have a market forming. This is how markets are born. And so, you know, 2017, yes, there was an ICO bubble, but this is actually how the broader crypto market was born. Before that, it's pretty quiet, right? There was Bitcoin, there's Ethereum.

Starting point is 00:42:25 There is maybe 10 other coins, but 2017, after that happened, you know, now we have these indices on coin market cap and coin gecko that point to something quite healthy, you know, 100 plus coins that are really healthy. So this is what we're seeing in the world of data now with these initial data offerings, IDOs, and speculation and all this. You know, data is truly becoming an asset for the first time ever, and it's fully open, just like defy, right? That's really interesting.

Starting point is 00:42:51 The data tokens represent. specific data sets. So there's a data token for each data set, correct? For a data service, which is typically one dataset, but it could be a thousand, right? But it's whatever granularity the publisher decides. Yeah. Okay. I see. And so that's how you arrive. So there's a liquidity pool essentially, like an AMM for each of these data services. That's how you arrive at price discovery for that specific data service and the data set or data sets that exist within it. Exactly. Exactly. And then, Yeah, and I mean, it's, that's the primary market, right? So when the publisher publishes,

Starting point is 00:43:25 they deploy this pool, that's the primary market. But then people, if they want, they can set up secondary markets too. So we've seen people creating uniswap pool side by side and selling, you know, OTC, data tokens and stuff, right? Which is great. So as time goes on, you know, there's going to be some like large cap data tokens, right? Right now, you know, it's all relatively small. Yeah, maybe traded on exchanges. Yeah, exactly, right? So we're in early days. But already, you know, in the last three weeks, we're at about 2 million ocean staked, which according to the prices right now, it's about 1 million euros worth of a data asset stake. So, you know, it's early days, but that's quite exciting. And we're ocean market usage keeps growing, growing, growing. So I think the most

Starting point is 00:44:04 recent numbers are about 10,000 weekly active users. So I think that's the number. Yeah, things are just growing, growing, growing. And right now our challenges to adjust simply the scale issues, as well as make sure that the environment is as safe as possible to mitigate the effects of rug bulls and fraud and so on. Yeah. Well, we'll get back to the scale in one second. So this ocean market that you mentioned is the first and I presume the only market for the moment. Do you anticipate other markets forming and what kinds of specificities could those have? What are the types of things that would cause that to happen? Yeah, absolutely. So we want to see a lot of them, right? you can't have an economy, a data economy, with just one marketplace, right?

Starting point is 00:44:48 You need to have tens, hundreds, thousands of these things, right? And so Ocean Market Code itself is fully open, right? Apache 2 license, so basically very unrestrictive, very open to use. And so we do encourage that. And we even have, we point to people of how they can do that, you know, forks of ocean market or create their own thing. And under the hood, of course, everything is on an Ethereum main net, including the metadata, all of that, right?

Starting point is 00:45:13 So it's very easy. Anyone spinning up another marketplace can basically get, have all the data assets that are on ocean market on their marketplace. So how is this happening? We view it as sort of a top down and a bottom up thing. So top down, we're working with a few different organizations that are interested in building their own things. So for the better part of a year, we've been working with Daimler and around their own automotive data exchange marketplace. Still have your old enterprise clients, I see. Yeah. I mean, you know, these are long term relationships. and whatnot. And it's kind of exciting, right? Things have changed a lot in the last few years where enterprises are much more open and comfortable about this. So we're pretty happy about that. And there's some other, basically, government organizations and whatnot that we're working with, too, that we haven't publicly announced yet, but some of the information is already out there, I guess. In addition to that, there's sort of startups that have publicly announced that they want to have

Starting point is 00:46:08 ocean power data markets, such as molecule, boson protocol, dex freight, and otherwise. And each of these has their own sort of specific vertical, right? So Dexrade, for example, is in the logistics area where it's about data that, for example, their customers, they have about 10,000 trucking companies working with them. And each trucking company has one or a few trucks. And, you know, what is the specific location of each of these trucks over time? What are the goods inside each of these trucks? And right now, all that data is private. But that information is super, super useful in two ways.

Starting point is 00:46:41 One of them is to Wall Street, right? So rather than having to look at satellite images, you can get much more fine-grained information. But secondly, to optimizing the scheduling of the trucking themselves, right? So you can do basically better, more optimized logistics over time. And right now, the average truck, according to stats, I think it's either one-third or two-thirds empty because just the optimization is so poor. So once it starts to be a market formed around the data on this, then you can actually optimize against it better, better, better.

Starting point is 00:47:07 So those are a few there. But over time, we see that there's going to be marketplaces that's better. along a few dimensions. One is verticals, like I've mentioned, right? Automotive like Daimler or logistics like DexRite. Other dimensions are maybe you'll have a marketplace that's totally tuned to AI, right? Training AI models. And there can be variants of that, like human protocol, who we're working with as well. You know, they do sort of the H-CAPTCHA, which is sort of a variant of recapture that is basically much more incentive aligned with the users, basically. So there's opportunities with the H-captures of the world, the human protocols of the world, and other

Starting point is 00:47:42 AI plays too out there where it's just sort of a win-win. And other things, too, like privacy-first marketplaces or specific geographic regions. And remember, with a marketplace, you have to have a terms of service. It's sort of a last-mile-facing thing. So certain countries might have very specific regulations that you want to serve. So maybe you focus on just that country and maybe even geo-block everyone else as that marketplace. And that's okay, right? You know, we can't control things at the level of the substrate, you know, permissionless Ethereum and all that, or the smart contract. But that last mile, of course, things can be controlled by the marketplace operator. And that will actually help to serve specific niches. So there's quite a lot of

Starting point is 00:48:21 variety there. Oh, yeah, I guess one more important one is besides, you know, the publishing and consuming the data assets, how do you price, right? So by default, we have these balancer pools. But you can also, you know, make it maybe some people want to have uniswap pools or or Bankor or otherwise. Maybe people want to have order book based markets or Dutch auctions. Maybe you want to have a marketplace that does a better job of initial data offerings, IDOs, right? Just like if you think about ICOs, it's all these variants that people had, you know, and in that case, even too, right? Dutch auctions, et cetera, et cetera, et cetera. We can see the same thing for datasets. So there's a huge variety and we encourage people to play

Starting point is 00:48:58 with all of us. You know, the the CAPHA thing is, I think is one of the things that people don't realize to which point their incentives are misaligned, where there is incentive misalignment there. And actually, so you mentioned H. Captcha, and I've seen H. Captcha before, and I'm just on the website, and I had no idea that, you know, that sort of Brandon Ike was behind this

Starting point is 00:49:24 and that they're, you know, built on Ethereum, apparently, or, like, they leverage Ethereum. This is interesting. I don't have to look more into this. Yeah, they're probably, like, the most used DAP that no one's heard of, right? And the people behind it, yeah, it's an amazing team. I believe Brendanike isn't directly involved.

Starting point is 00:49:43 He's just an investor or something like that, but also helping to support it. But he's definitely, yeah, he's involved in some way, I guess, yeah. Yeah, I was confronted with this recently where, at least here in France, to, in order to use certain public services, you're obligated to fill in a Google recapture. And, you know, in French culture, and probably also in Germany would probably be like similar sentiment to this. if people knew that in order to access public services, they were obligated to enrich a GAFA, like what we call in French a GAFA, you know, the Google, Amazon, Facebook and Apple, etc. I think there would be some outrage, at least, at that idea. And maybe we can impact this because it's a nice example.

Starting point is 00:50:24 So, you know, when you fill in a recapture, there's usually two in a row that you do. The first one is to basically verify whether or not you're human. And the second one is basically Google hasn't yet classified whether, which, picture has a truck in it or not, which picture has a car in it or not, right? And you're basically providing labels for it, basically to help train the algorithms, because the algorithm needs to make this mapping of image to yes or no, there's a truck in it, right? And so you're supplying data for training. And that's, you know, hugely useful to Google for its other applications. So, you know, you think that, you know, you just need to get into the site, but Google itself.

Starting point is 00:51:00 I think 99% of people don't know this. Yeah. I think 90% people are totally unaware. And actually, I was unaware of this until recently. In fact, I was unaware that the previous capsules that we had before, which were just like fill in these letters or write this word, it was also operated by, well, most of them were operated by Google and was to train their book scanning algorithms. So, yeah, so to summarize then, right? So basically in doing that second step, you are giving value to Google by your human efforts, right? But what if instead of that value going to Google, that value was going to the person running the website and to the holders of a token? and even maybe back to you directly, right? And that's what human protocol is about, right? The website now can monetize, actually, from this.

Starting point is 00:51:45 They don't even need to serve ads anymore. They just monetize based on people filling out the, you know, proving they're not a bot, right? Which is pretty cool. And then also, though, you know, they're going towards having their token and stuff, and so there can be a nice alignment there for the people who believe in this. And finally, you know, maybe at some point also back to the person that is proving they're not a bot in the first place, right? So to me, it's a great use case. I don't know why, but this is kind of a tangent, but like, I have been just bombarded with

Starting point is 00:52:12 recapses recently. And I don't know if it's because my IP address is flagged or something, but I'll have to do 10 or 12 attempts before I can actually access a website. And it's, it's so painstaking. I think capsules. There's part of a reason for it. And that is basically the AI models have gotten better over the years, right? So they're running out of easy stuff. So they're basically getting you to do the hard stuff because the easy stuff has already been modeled, right? I think it might have to do with the fact that I'm filling in so many capsias that maybe they think I'm some kind of a capsia feeling bot or like someone in some far away country filling in capsules for pennies. That could be it too.

Starting point is 00:52:50 Yeah. So let's talk about privacy, which is so central to this entire topic. Well, I guess my first question is what types of compatibility or compliance does Ocean have with regulations like GDP? or the CCPA in California? Ocean serves these regulations very well. So in GDPR, for example, there's this idea that certain sensitive data can never leave European soil if it's generated on European soil, right, like medical data. Yet if I'm a medical researcher in, say, USA, the ideal is that I have data across,

Starting point is 00:53:31 you know, 10,000 hospitals across Germany and France and all of Europe as well as China and Australia and so on, right? So how do I get access to this data? With GDPR, it would be basically, well, the traditional way was where you basically try to make deals with hospital by hospital at a time and store it all in one big central database and then build a model from that. And actually, Google had something called Project Nightingale doing this and they had a huge pushback and rightly so, right, because like medical data is super sensitive. And actually there's other ones too, like other big organizations that were trying similar approaches. And that's really like, a big no-no. Fortunately, there are better ways to approach this. And so when I think about

Starting point is 00:54:12 privacy, the best way is it's not about like either, it's not a black and white of I see my data and no one else can because that's not very useful. It's more like structuring the flow of data, you know, who can see what, when, right? And that comes down to permission, right? Giving permissions to people, revoking permissions to people to see certain data services. So that's the heart of privacy. Now, going back to this example from health, what you can do is if you're trying to build an AI model, why not do something like Federated Learning, where in Federated Learning, you create an initial random AI model, neural network, whatever, and then it's just random at first. It's super stupid. But then it goes and it sort of, as this bot, it kind of walks to the first hospital, let's say a

Starting point is 00:54:55 hospital here in Berlin, Germany. And it updates itself based on the data in that hospital in Germany. So now instead of, you know, 50% error, like, you know, basically random, it's got 40% error, right? And then it goes to another hospital, let's say, in Paris, and it updates itself. And now it's got 30% error. And by the way, the whole model doesn't have to go to the hospital. It's just an update of the model. So you don't have an attack vector there. And then you go to another hospital in, say, L'Ions, France.

Starting point is 00:55:25 And you keep going, going, going from hospital to hospital to hospital, and the error keeps going down, down, down. So once you went across 10,000 different hospitals, you've got a very accurate model. But what's cool is the data inside each hospital never, ever left that hospital, right? You know, people have been developing these techniques for Federated Learning going back to 2015-2016, and Google started making it really popular and famous in sort of 2017 era. The thing is, Google's version of it is, yes, the data itself can stay at the leaf nodes at the edges. But guess who gets to play the middleman? Google, right?

Starting point is 00:55:59 So a problem once again, and you've got leakage there. So what if you can actually have a middleman that is not incentive misaligned, that can still help to coordinate all of this learning of this model. And that's really where Ocean can really help. So basically what you're doing is you can learn your basically do the training, weight updates, etc. At the last mile at the edge, as well as the orchestration in between can be done using decentralized substrates. And Ocean basically can play a key role in this because it's how. you know, these data tokens are providing the access control of, you know, who can see what data sets when, right? And so that's kind of the dream. Right now, no one has built this

Starting point is 00:56:40 particular application of Ocean with Federated Learning, but, you know, there's some really great efforts around this in a fully open way. Open Mind is a project out of the UK led by Andrew Trask, an amazing person. And, you know, this is kind of where they're headed in. And we hope to see an integration of open mind and ocean at some point. And there's talks around that and stuff too. So I think that's a good example. And you can do this in a simpler way too. You don't even need to get fancy with building an A.M model. It could be simply computing an average across 10,000 different hospitals, right?

Starting point is 00:57:12 And that's very, very useful. You know, if you're a multinational enterprise and the data can't leave any one of your offices, then you can compute an average across each. Or in Canada, right? Canada actually has this problem, too. They're trying to get health data from the different provinces. But each province has its own rules. And so if you sort of take the intersection of the rules from all the different provinces of Canada, you end up with an empty set. It's zero. The rules don't, the rules collide. So a very nice solution to this is something like this, you know, federated analytics for just averages or federated learning to get a bit fancier. And that plays well with GDPR, basically, right? Because then the data never, ever leaves the soil.

Starting point is 00:57:49 So how does ocean address permissioning of data as it's utilized and transformed? And, I don't know if this is the right way to think about this, but I imagine, you know, data being used to train a model. And, and then, you know, then I kind of lose track of like how that data exists, like it went form it exists. And in the context of GDPR or other regulations like it, if the, you know, the owners of that data still have rights to the results of the computations that were performed on that data, you know, essentially can, can one retrieve the data? Once it's baked in, does that even matter? And is this something we should be concerned about at scale, perhaps, more than just at an individual level? Yeah, so I think there's two pieces here to unpack this. One of them is, you know, what about where there's sort of this pipeline of, you know, data being transformed, being transformed, being transformed. And the other is, you know, what are the rights that would attach to that?

Starting point is 00:58:47 So, you know, in an AI compute pipeline, you might have, you know, some initial raw training data. And then it might get cleaned. So then now you have clean raw training data. And then you might train your AI model. And then you'll have your trained AI model. And that's a piece of data as well. And then you might have some new data coming in that you want to get it to predict on. And it makes predictions.

Starting point is 00:59:08 And those predictions themselves are also data. So we've got data at various steps along this flow. Each one of those is its own data asset, right? It can be its own data token if you choose. You don't have to. It depends on what workflow you want. But with Ocean, Ocean doesn't care, right? if it's the super raw training data at the very beginning or whether it's some predictions at the very

Starting point is 00:59:29 tail end or something in between, right? It really doesn't care. If you do want to have it at every step along the way, that actually probably helps towards provenance of the data, right? So with GDPR and all that, actually, one of the requirements is you need to know where the data came from, right, so that it wasn't. So this actually, Ocean really helps that way because then you can have data that is trained in a way where you know where the raw data came from and you can kind of vote short and stuff, right? And that's a big problem, even pre-GDPR days and stuff, right? But this will actually help to address that. But now you actually have the provenance of each step along the way to the tail end of, you know, the providence of the initial raw unclean data, providence of the clean training

Starting point is 01:00:14 data, providence of the trained model, and provenance of the final predictions. And those predictions, you'll probably have many sets of those over time. So that's, you know, helpful for GDPR and just as an asset in general, right, it helps to drive the value of the asset. Just like a scribe days, right? In a scribe, you know, we were doing digital art and the value of an artwork is only as good as its providence, right? If I had a painting and I claimed it to be by Leonardo da Vinci, well, people say, well, prove it, right? Like, show me the lineage of ownership, right? And if I was able to show that, then, you know, I could have a fortune in my hands. But if I can't show that, then it's probably worth zero,

Starting point is 01:00:53 although there can be some arts, you know, experts that come along and try to verify it. And it's often kind of fuzzy. There's, you know, even a fraud market around just that, right? But the point is providence really, really matters. And, you know, what ascribe had done was establish providence for digital art. And that helped to establish the value of the piece in a big way. And for data, it's also super helpful, right? because we'll have, you know, much better sort of culture around provenance of data and models that are trained and so on.

Starting point is 01:01:19 And that will actually lead, you know, make it a lot easier to comply with GDPR and so on. So that's on the sort of the one part. The second part will be quick on this is the rights. So basically, we view data as IP and specifically as copyright. If you have sweat off the brow to generate this data, then this is actually your copyrighted data as an individual or as an or an or organization. And so with that, then you can do whatever you want with it. But the way that ocean sets it up is ocean markets specifically because, you know, the lower levels don't care. But ocean market, it has a terms of service that we thought through and even drew on from our

Starting point is 01:01:58 ascribed days and so on, of course, which in turn draws on things from like second life and otherwise. We, in that terms of and conditions, it basically says, you are claiming that you have copyright or at least the rights to this data initially. And then when the next person, buys it from you, they are getting rights to basically. It's sort of like a license, right? In fact, it is a license. And then you have sublicense, sub license, sub license going along. But it could be where, you know, if you want, you can have the ocean market doesn't support this right now, but what we envision is that you can have licenses that are more restrictive and towards reusing data in various ways. And this kind of goes in the realm of remix rights,

Starting point is 01:02:39 right? Like, you know, when an artist, say, a DJ, does a remix of a song created by, say, a rock musician, then they have to get a license from their rock musician to get to do the remix. And probably the rock musician will get a cut of the royalties at the end as well, right? So it's a similar thing here where there needs to be some sort of legal agreement between the original creator of that work or the exclusive owner of that piece of IP and the sublacenser for the remix rights. And we'll see how this forms. We see that, you know, crypto is a funny thing. There is sort of the fall of the law to the the law, et cetera, et cetera, et cetera, et cetera, approach. And there's also the sort of more

Starting point is 01:03:17 wild westy approach that some people prefer to follow, given that you already have a lot more protection built in into the blockchain itself. So how much do you need to leverage existing IP laws? And we don't have a good answer to this. We don't know how it's going to play out. So we kind of support both. Yeah, and I know I know you guys have spent a lot of time thinking about this. I mean, these conversations were happening, you know, even in the days of IPDB and how IPDB could be compliant with GDPR. And there's been a lot of thought. And there's been a lot of And so I'm like, I trust that that you guys have, you know, found suitable solutions to these problems. One of the things we haven't really talked about very much is the fact that, so the previous version, we're now, of course, in V3 of Ocean.

Starting point is 01:03:59 The previous version was built as an Ethereum side chain. And you mentioned before the show that you're now on the Ethereum main net. Can you describe how that works? So we haven't really talked about the smart contract and what it does exactly. and then what actually does exist on the main chain? Explain for our listeners what that looks like. Sure, happy to. And the reason we initially did the POA side chain was we saw that we would run into scale issues.

Starting point is 01:04:27 And the V1, V2 versions of ocean were pretty complicated. So we actually had even run tests to deploy to Ethereum Mainnet, and it was super painful, frankly. It took, you know, days for the first successful deployment. and this is just simply because of the sheer complexity of the contracts. With our V3, we said, let's change the mental model, where instead of having our sort of own custom access control smart contracts, let's put that into the context of EURC20. And then we can leverage all the infrastructure of EURC20 directly, right?

Starting point is 01:04:59 Just like I mentioned before with analogy to UNSOX, etc. So that's what the heart of the mental model for Ocean is, is, you know, you have access to a given data asset. You have custody of this access if you have 1.0 data tokens. And then you can redeem that if you send the data, 1.0 data tokens to the publisher. And so that's the heart of it. And that allowed us to simplify Ocean a lot. So it's, you know, radically simpler than before and also simpler conceptually to deal with

Starting point is 01:05:27 much more interoperable and so on, right? Metamask traditionally a crypto wallet is now a data wallet, right? Trezor is now a hardware data wallet, right? Balancer and uniswap. are not only Dexas, they are now data exchanges, right? Aragon and Dowstack are now data DAWS, right? So this is all possible. And we can have, you know, stable coins based on data assets.

Starting point is 01:05:49 This is all possible simply because of the ERC 20 route. So that's kind of the heart of it, the mental model of data tokens as access control. And so what it looks like in Ethereum, there's basically three main groups of contracts. And they're quite simple. One of them is a factory to publish ERC20 data tokens. token. And so we have a template for an EEOC 20 data token. And I guess I mentioned before, it's actually simply the Open Zeppelin EUC20 template plus this extra field called blob. And that allows a lot of flexibility in terms of new sorts of data services. And to save gas costs,

Starting point is 01:06:24 rather than, like the reason we have the template, rather than just doing it from a library and stuff, is to make it simpler to deploy and also to save gas costs. We're using EIP 1167 proxy contract approach such that it's just much, much cheaper to deploy. So that's the first part. The second part is the balancer pools. And once again, you know, we're close with the balancer guys. I've been an advisor to them for a couple years now. And we, at the same time, we did not use the balancer contracts that are deployed to a theory main net directly. And the reason is gas costs as well. It's okay for someone who, you know, wants to create a pool of ETH and, say, ocean with, you know, millions in liquidity or ETH and Die with tens of millions in liquidity, whatever, right? But if you've just got a

Starting point is 01:07:09 data token where it's much more long-tail asset of, you know, a hundred bucks, a thousand bucks liquidity, then are you willing to spend the gas fees of $50, $75, whatever the gas prices are just to deploy that? And it's much more of a stretch. So we said, you know what? Let's also do a friendly fork of the balancer contracts, so B-pool and so on, and the factory. So they've got a factory with plus a template. and have it also follow this EIP-1167 proxy contract pattern. So we have that as well. So that's the second part. So we've basically got a factory and template for data tokens,

Starting point is 01:07:43 a factory and template for balancer pools. And the third thing is simply just for metadata. It's a very simple thing. It's a simple contract, ddo.s all. And with it, you have basically, if you are the publisher of a given data token, then you have a slot in that particular smart contract where you can write the metadata and update it.

Starting point is 01:08:06 And the metadata is things like, you know, name with a publisher description, which can be pretty long, and a few other fields. So those are basically the three main things. And they're all simple on purpose. And from that, right, it's sort of in the stack. We have a piece of software called provider, which does the handshaking for the publisher

Starting point is 01:08:23 to receive the tokens and to give the access. We also have another piece of software called Aquarius, which is basically a metadata, cash to make it easy for the ocean market to serve the data without having to retrieve it directly from mine chain all the time. So that's the components, simple on purpose. And even as time goes on, hopefully even simpler. Obviously, you've been working on ocean for some time. And since you began this project, like other blockchains have come along the way that perhaps offer more scalability than, you know, like the V1 Ethereum. Have you considered building ocean

Starting point is 01:09:01 like a separate instance of ocean on other blockchains or perhaps at some point porting it to like something like Salana for example where there's like very high throughput what would that additional scalability provide ocean that it perhaps doesn't have today? Yeah, absolutely we've thought about it, right? I mean, I gave a talk on Bitcoin scalability issues in 2014 and we had such scalability issues with a scribe that we, you know, we built our own blockchain just for scalability, right? Blockchain database, Big ChainDB. So we thought about it a long time.

Starting point is 01:09:31 even theories around it, right, that there's sort of this fundamental tradeoff between decentralization, consistency, and scale, right? Like, you want, ideally, you're fully permissionless, decentralized, and you are consistent as in you solve double spend, and you are scalable as in, can handle sort of planetary needs, right? And when I wrote that post in 2016, I think it was, it was, you know, that was kind of a revelation, and it was very useful as a model, right? And so IPFS, for example, is decentralized permissionless and scalable, but it doesn't solve the double spend problem, right? So it's not really a blockchain per se. But it actually has a really cool data management with CRDTs and stuff. Ethereum and Bitcoin, they are permissionless and decentralized as well as consistent. You know, they solve double spend, but they haven't been scalable traditionally. And then Big ChainDB went the other route where it said, okay, we need scale and we need consistency, solve double spend. And then Big ChainDB went the other route where it said, okay, we need scale and we need consistency, solve double spend. spend. So we're going to loosen off a little bit on the decentralized by starting on being federated, you know, POA-ish, right? And that was the decision at the time. And others have discovered that since,

Starting point is 01:10:37 right? Now, you know, about a year later, Vitalik discovered it or so maybe half a year. And now it's more commonly known as the scalability trilemma, right? I called it the DCS triangle. So it's quite quite a well-known thing. And the cool thing is from the time that I wrote it, I was hopeful I said, this is just an engineering problem. It's going to get solved. People are going to find ways. And lo and behold, they have, right? Which is great, right? And, and, And the usual trick of a lot of these is to leverage random numbers. You know, Mondi-Carlo algorithms are an amazing trick for scalability. Across the board, you know, use them a lot of places.

Starting point is 01:11:09 And, you know, how ETH-V-2 does it, how Pocod does it, how Algorod does it and more, is where you, you know, have this list of a thousand or 10,000 candidate validators, and then you randomly select 100 of them or so, right? And then those become your validators for the next, you know, hour or 24 hours or whatever. and that's a very nice approach because it kind of addresses issues. And there's other approaches as well, right? Solana, for example, yeah, they focused on solving the bandwidth issue, drawing on their days as Qualcomm engineers, right?

Starting point is 01:11:38 And I have great respect for Qualcomm engineers. I worked with them a lot in the past. So I think there's a lot of great approaches out there that, you know, essentially DCS triangle slash scalability trilemma, you know, at first there was just theoretical solutions and then people started building the real solutions. And, you know, they're coming live, which is great. For Ocean, we've been, you know, tracking all of these. and we said, you know, to start with, we're going to deploy in Ethereum.

Starting point is 01:11:59 But after that, you know, we envision that ocean to be truly ubiquitous, and that is part of the goal of ocean, you know, to have a true data economy for the globe. We need to be on all, basically. Any substrate that has any usage at all, we should be on it. And so it's a journey, right, to get there bit by bit by bit by bit. So start with Ethereum and then start deploying to others, either ourselves as the core ocean team or encouraging other people out there in the ecosystem to deploy. But it's not just deploying the ocean. ocean contracts, you need to have bridges to connect to the ocean community because we need long-term sustainability, and we can get into that as well. So that's actually sort of the

Starting point is 01:12:34 constraint that holds us back from deploying a bunch of them right now. And, you know, there's other examples of this out there already, right? Like USDT is on many chains. More recently, Chainlink has sort of done sort of a blitz scale to lots and lots of chains as well, and I think that's great, right? So we envision the same. We think there's a lot of great teams with great technology out there. And Cosmos and PolkaDot have done some extra cool stuff to to make this easier for a lot of things too. So, you know, we have relationships with a lot of these teams. And we're hopeful. And it's definitely a part of Ocean's future towards ubiquity. Cool. It's good to hear. So as we wrap up, I wanted to talk a little bit about the future.

Starting point is 01:13:10 And the ways in which you envision Ocean will evolve. And one of the things that came to mind is this idea of data markets as a data stream. And currently in the ocean market, like as I understand it. One, you know, uploads a data, a data set, and that's sort of like a fixed asset and one can purchase it. And, you know, that data set might have different versions or it might evolve over time. But it's, it's very much like a sort of fixed type of asset. But of course, you know, data flows in permanence. And one, you know, might have used for like a constant stream of data. And I think this would be useful for building applications like social networks, etc. How do you see Ocean evolving towards a more fluid marketplace where data flows can be supported?

Starting point is 01:14:02 Yep. So this is yet another type of data service. So just like right now, Ocean has static URLs and compute two data as two types of services. Definitely streaming data is a type of service that is a priority for us to support. There's actually many variants of that, right? like GraphQL actually has built in streaming support. And there's other, you know, sort of web 2-ish technologies that also help to support streaming data. So we look forward to having support for that.

Starting point is 01:14:31 And then as time goes on, you know, we see that these are going to get become more and more refined in terms of the support. So more specialized. So maybe there's going to be 10 different variants of streaming data. You know, there's some great projects out there that do streaming data. Probably streamer comes to mind as one of them. And, you know, we're collaborating them. Great team. And that will be a very nice feed.

Starting point is 01:14:49 of data into ocean ecosystem and ocean market itself, right? So that's, I think, a good example. There's a nice stepping stone piece, and that is because the URLs themselves are static, under the hood, people can keep updating the dataset. So we see people in ocean market where they post a data set for sale, but then they promise you update it every four hours or every 24 hours, and that's happening, right? So that's sort of a way to kind of get there. A good example of that is Swash, where it's this data union of thousands of members where they're selling collectively, bargaining, collectively selling their browser data, their browser history data as this data union. And then they're selling that actually on Ocean. And that's also related to the

Starting point is 01:15:33 streamer project, too. So we're quite excited for many, many data services over time. And there's other, you know, decentralized data services that we think are going to be very, very useful feeding into ocean as well. You know, the ones that are more pure data-ish, such as numerized signals, chain link feeds, and more. And then also the storage services themselves, right, they're starting to accumulate more and more data too, the file coins and and Cias and Ethereum swarms of the world and so on, right? So all these things, I think, are going to be, you know, better and better supported over time in more direct, less frictiony ways. As a note to end on, I'd like to ask you, what types of things do you hope people will build on ocean? What would be for you, like sort of a sign that ocean has

Starting point is 01:16:16 achieved its gold? Overall, I mean, generally, right, this isn't really, it's a vague goalpost, but ubiquity, right? Like, where it's kind of just part of internet infrastructure in a way that everyone kind of accepts it, right? Just the way that TCIP is that like that and the web on top and so on. And to me, that would be great. to get to that point. But it's going to take, you know, probably decades, right? That's okay, right? What does that look like in specific measurements?

Starting point is 01:16:48 Maybe just, you know, knowing that ocean tools are used by these higher level, but by different, like all the different organizations as just part of their overall toolbox. But also critical to that, you know, because this is not just a one or two year journey, but, you know, decades long sort of thing. And also it shouldn't be dependent on just myself and the core ocean team. we need to actually have a plan for long-term sustainability to help make sure this is well-funded, right? And on that, you know, I won't go into too much detail, but we did design system dynamics around Ocean for exactly that where there can be funding over time. 51% of the token supply follows Bitcoin-style emission curve.

Starting point is 01:17:26 And this goes into basically funding for the community to keep building and developing projects on top, whether it's, well, a few things. Core infrastructure, apps and integrations on top. outreach and unlocking specific data assets and so on. So those four things are things that can get funded over time by this earmarked supply of ocean. The majority of ocean supply is earmark for this. And then how do you curate that? And the answer to that is via Dow technology, right? That's basically technology for collective decision making over time. So we are in the process of rolling out Ocean Dow, which is sort of the final piece of the puzzle of the overall ocean system. And it's going to be a humble sort of thing. We're going to start slowly, start

Starting point is 01:18:08 with small budget, but then over time, bit by bit by bit, grow the amount that is funding, and then at some point flip the switch, and it'll be funding from this 51% supply as well. So that's kind of, you know, critical towards Oceans' goal of ubiquity. And from there, you know, there's going to be other goal posts and stuff that are going to be interesting along the way. But I guess another one is where, you know, when you stop hearing about people complaining about how bad Facebook is and so on, right, like when that's no longer part of the conversation, that'll be a pretty good goal post, right? Or when people talk about how, you know, they're making half of their income from data that they're selling, from their personal

Starting point is 01:18:44 data or from other things on the side, that's a good goal post. So things like that, right, towards this overall goal of ubiquity as sort of infrastructure for civilization. That's a great note to end on. Trent, thanks so much for coming on once again. And hopefully we'll have you on a fifth time, maybe in some in some time. And hopefully we'll get to see each other in person very much. soon. For sure. Thank you very much. Thank you for joining us on this week's episode. We release new episodes every week. You can find and subscribe to the show on iTunes, Spotify, YouTube, SoundCloud, or wherever you listen to podcasts. And if you have a Google Home or Alexa device,

Starting point is 01:19:22 you can tell it to listen to the latest episode of the Epicenter podcast. Go to epistenter.tv for a full list of places where you can watch and listen. And while you're there, be sure to sign up for the newsletter, so you get new episodes in your inbox as they're released. If you want to interact with us, guests or other podcast listeners, you can follow us on Twitter. And please leave us a review on iTunes. It helps people find the show,

Starting point is 01:19:43 and we're always happy to read them. So thanks so much, and we look forward to being back next week.

Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Trent McConaghy: Ocean Protocol – The Platform Making Waves in the Data Industry

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.