Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Trent McConaghy: Ocean Protocol – AI & the Data NFT Marketplace (V4)
Episode Date: February 1, 2023From technological breakthrough to philosophical defiance, the topic of AI has always sparked intense debates ever since Alan Turing & John McCarthy advanced this branch of computer science. One might... argue that this was caused by expectations overwhelmingly surpassing technological capabilities. However, the recent public release of ChatGPT has rekindled former sci-fi apocalyptical scenarios, even if it is (still) just a trained language model. Humanity gazed in awe as a simple input/output text interface could instantly disrupt fields such as software engineering or copywriting. Worst part? It also challenged (arguably) the most important human-specific trait: creativity (Midjourney). Data is the lifeblood of AI as it increases the model’s prediction accuracy: larger input sample translates into more outputs that help fine-tune weights during back-propagation cycles (the model ‘learns’).We were joined by Trent McConaghy, founder & CTO at Ocean Protocol, to discuss the advantages of implementing blockchain (& NFT) technology in data management and trading, what makes a particular data set valuable and what challenges derive from AI implementations and the data digital gold rush.Topics covered in this episode:Trent’s AI background & founding OceanThe evolution of Ocean ProtocolManaging third-party access to data NFTsMinting data NFTs & their metadataDiscovering & verifying data qualityAI generated data sets & intellectual propertyOcean marketplace user baseData farmingData curation & veOcean rewardsEpisode links: Trent McConaghy on TwitterOcean Protocol on TwitterOcean MarketplaceSponsors: Omni: Access all of Web3 in one easy-to-use wallet! Earn and manage assets at once with Omni's built-in staking, yield vaults, bridges, swaps and NFT support.https://omni.app/ -This episode is hosted by Friederike Ernst. Show notes and listening options: epicenter.tv/481
Transcript
Discussion (0)
This is Epicenter, episode 481 with guest Trent McConaughey.
Welcome to Epicenter, the show which talks about the technologies, projects,
and people driving decentralization and the blockchain revolution.
I'm Friedricha Ernst and I'm speaking with Trent McConaughey today,
who has been on many times before. He's one of the co-founders of Ocean Protocol,
which deals with data in an on-chain manner. We will go into that in just a second.
Before we do, let me briefly tell you about our sponsors today.
Our sponsor today is Omni.
Omni is your new favorite multi-chain mobile wallet.
Omni supports more than 25 protocols,
so you can manage all of your assets in one place.
But what's really special about Omni is what you can do inside the wallet.
Want to get yield.
Omni allows you to get the best APIIs with zero fees in three tabs.
Need to swap.
Omni aggregates on major bridges and data.
So you can bridge and swap across all supported networks in one transaction directly in your wallet.
Love NFTs.
Omni offers the broadest NFT support of any wallet so you can collect and manage your favorite
NFTs across all chains in one place.
Omni truly is the easiest way to use Web 3 and is fully self-custodial, meaning you never have to trust anyone with your assets other than yourself.
And they support ledger.
So give it a try at Omni.app.
Hey, Trent.
It's so good to have you on again.
You've been on Epicenter on multiple occasions.
Nevertheless, briefly tell us who you are.
Sure. Hi, everyone.
And thanks for having me on.
It's great to be here again.
So a brief background, I spent more than 20 years in the world of AI,
focusing just on AI, largely on AI for designing computer chips,
driving Moore's Law, that sort of thing,
for use by circuit engineers,
as well as a lot on creative AI, you know, getting
AI to do things that was previously considered things that only humans could do with creative
design of things. Since 2013, I've been deeply focused on blockchain. First with a scribe, which was
basically NFTs on Bitcoin. Ethereum didn't even exist then. Then that pivoted into Big ChainDB,
which was basically MongoDB wrapped with tendermint BFT consensus. And since 2017, the focus has been Ocean Protocol, decentralized data exchange.
critical. So you have worked on Ocean for the better part of a decade. So since 2016 or so,
that's like seven years now. Can you rehash what Ocean was about in the beginning and kind of
what this trajectory has looked like over the past seven years? Sure. So, you know, by background,
right, I had spent a lot of time in AI and that was always sort of as a technology, a general purpose
technology. It was sort of my first love. I just always thought it was amazing and fascinating and
couldn't believe my luck that I could find a way to work on it professionally for years and years.
And, you know, in 2013, when I got also then very, well, by 2010, I had become excited by
blockchain and Bitcoin, a huge nerd. And, you know, that led eventually to doing a scribe.
And when I started working on a scribe and really full time at it, I was a little bit sad that I felt
like I was pausing my work on AI just because AI has such great potential as a technology.
It's a really big lever for changing the world for the better.
And blockchain too.
So it's sort of like there's these two technologies that I felt that I could access, that I could
help that I could help make a difference with.
So I thought that I was putting AI in pause.
But there was this idea with a scribe, basically buying and selling digital art and empowering
creators that it was worth putting AI on pause and you know worked on that
worked on big chain DB which got a bit closer to the AI because of the big data thing but
what led to ocean was in 2016 I started thinking a lot more about AI again in in
depth and I started to learn you know what are the very specific USPs of
blockchain compared to traditional databases and you know
know, Big ChainDB is basically a database with blockchain properties.
So we had to be very precise and understanding those.
And the main properties were decentralized, you know, such that you can get immutability,
sort of decentralized such that you can have political coordination among humans,
all the sort of things you hear about with Dow's now.
The next benefit was immunability or property that leads to benefits, which is immutability,
which allows for provenance trails, censorship resistance, and more.
And finally, the third one was assets, the idea of,
It's only your Bitcoin if you have the keys to your Bitcoin.
You know, your keys, your Bitcoin, not your keys, not your Bitcoin.
To paraphrase Andreas Antonopoulos.
And a little bit later, I realized there is one more, the most important one of all, probably, which is incentives.
You know, incentives are sort of like the superpower of blockchains.
If you, you basically can get people to do stuff by incentivizing them with tokens.
And so those four properties that lead to very specific benefits, decentralization, immutability, assets, and incentives,
and incentives, I thought about a lot.
And then I was playing around and basically hanging with friends as well as writing.
I wrote some blog posts to explore, you know, turn the crank.
How can these four things help for AI?
How can these four things help for big data?
And basically turning the crank on that, you know, it turns out that it helps a lot, right?
For example, decentralization can help for AI and data, things like,
collective bargaining around data.
You know, each as individuals, we have a lot of data,
but we might consider selling it rather than having, you know, Facebook, etc.,
essentially sell it on our behalf and then get all the money.
But it's a huge amount of effort to do on our own.
But if we can join some collective bargaining mechanism, for example,
the ad-dao, then we could, you know, 100,000 of us, a million of us, 10 million of us,
then someone can bargain on our behalf and we can get the benefits.
So turn the crank on each of those things, right?
Decentralization, immutability.
In that one, it's provenance trails of the history of the data and the training.
Assets, you know, not your keys, not your data, not your trained model, et cetera.
And then incentives, right?
How do you incentivize people to, you know, share data in a way that is privacy preserving, et cetera?
So all of those things, you know, I wrote about and got quite excited about, and this was around late 2016.
There was one other set of thinking I had around AI DAOs, this idea that imagine you have a Dow,
where it's humans, sorry, it's AIs rather than humans kind of controlling it, right?
And this was right on the heels of the Dow coming out.
You know, that was, I think, mid-2016.
So it was pretty exciting to think about what might happen there.
And it leads to the ideas of self-owning AIs, rights for AIs, all of that, you know,
basically an AI that, you know, has its own wallet and can accumulate wealth on its own over time.
And there's quite a few different configurations there.
So basically, it's those things that led to Ocean.
The AI Dow is thinking.
and then thinking on how can blockchain help big data and finally how can blockchain help
AI and in those latitude articles all roads pointed the crux of the problem was decentralized data marketplace
the heart of it was data you know one thing that um the AI world discovered in about 2005 or so was
the unreasonable effectiveness of data um you know you don't have to you know have a PhD worth of
inventing new algorithms instead you can just throw 10x more data at a problem and you can
chop your error down by two x or more and you can keep doing it another 10x
and other 10x to chop down your error more and more and more.
And this is a bit embarrassing for AI people because, you know,
everyone wants to have a PhD to say they're cool and they're smart.
And instead, you know, you can just take, you know, pure engineering and throw more data
at it and you can get a lot from that.
And that's what's been happening, you know, ever since 2005 or so.
You know, we're 18 years into that and we're seeing the effects these days with, you know,
GPD3, GPD 3.5, etc.
So, yeah, that's what led to ocean.
We realized, okay, more data can help a lot.
But if we're not careful, all that data is going to end up in the hands of a few powerful players,
like was already happening by 2017, the likes of Facebook, Google, or, you know, well-funded AI companies these days,
the likes of Open AI.
So how can you help to level the playing field around that?
How do you level the playing field around data?
How do you level the playing field around AI?
And that is the aim of ocean.
Cool.
Yeah, no.
That makes a total sense.
And also how you talked about having all these components that,
kind of make Web3 systems special come together.
Absolutely.
So if you look at the last couple of years of Ocean Protocol,
you guys have gone through a series of major upgrades.
Can you speed runners through these?
Sure.
So in the spring of 2022, we released Ocean V4.
So a very brief speed run.
Ocean V1 was about sovereign data.
So it was where if you have the keys to that data,
you own the data.
And in Ocean, we realized the heart of this was access control.
So it's not about storing the data.
That is lower down in the stack,
whether it's centralized storage on your own machine,
on some centralized storage cloud,
such as Amazon S3, or decentralized storage,
such as FileCoyne or R-Weave.
So that's lower in the stack.
One level up is, you know, basically when you go to share or something, right?
So when you go to share something, you can think of that as granting access to someone else.
So ultimately, it's who has the control to be able to share access.
And that's the layer that ocean operates.
So you can think of it as decentralized access control.
So RV1 was really about, you know, enabling decentralized access control where you can access it if you have the keys to it.
And a good way of thinking of it these days, you know, sort of 2023 era framing would be token gated APIs and token gated
DAPS, although we only hit tokens a little later on.
I'll mention that.
So that's V1, with sovereign data,
your keys, your data, not your keys, not your data.
V2, people kept asking us,
okay, this is all great, you know,
I've got data, I can share it to others,
but what if someone else gets access,
you know, has access to that data via the access control?
What if they download it?
What will stop them from sharing it to others?
So in our V2, we said, okay, well, you know,
let's have it also where you have the option
to have compute to data.
So in that case, what happens is your data never leaves the premises and never leaves your local storage wherever you have it.
And instead, there's an algorithm on top, whether it's computing an average, training an ML model, whatever.
And the person who is buying that data, they're only buying the results of whatever that algorithm does, a trained model, an average, whatever.
So that was Ocean computed data, and that was part of our V2.
By the way, for doing this, we had looked at a bunch of other privacy preserving protocols,
such as MoMAWRECNC, Compuited, Multi-Proter Compute, ZK, stuff, et cetera.
And none of it was quite mature enough at the time.
That was several years ago now, I think four years ago.
But at the same time, compute to data, this is much more of a DevOps thing.
Can you arrange your compute and data in just the right way?
So it was practical to do.
You don't hear about this in privacy circles because it's hard to get published in a privacy conference
just for, you know, managing Docker containers, right?
But from a product perspective, it makes tons of sense.
So that was V2, releasing ocean computed data.
And basically, it also is, yes, with managing Docker containers, et cetera.
You can have it play well with all of these other privacy-preserving protocols.
And we have a blog post in that.
That was a V2.
So V-1, sovereign data, V2, privacy.
V3, we had seen in V1 and V2, this was mostly custom contracts for access control, et cetera.
And we had this giant list of things.
we wanted to see in a data economy. We wanted to see data exchanges. We wanted to see really
great data custody in the form of wallets, right? And then things like multi-sig wallets and hardware
wallets. We wanted to see data management, the way that SAP does in a centralized way, for sharing
data with others and, you know, bank level security, etc., keeping management. We wanted to see
Dow's, you know, collective bargaining around data, all of us. And we're like, wow, this is like a
mountain of things that we want to build or see happen.
And Ocean, we've never made, had a huge team in Ocean,
typically between 20 and 40 people and the core team.
So we asked, okay, how can we be smarter about this?
And we realized that what we can do is tokenize access control.
And so what the heart of that means is for any given data set,
when you publish the dataset, you have ERC 20 tokens to access that
data set, whether it's download or computer data,
or otherwise. And so if you, Frederica, have 1.0 data tokens to access, say, Trent's DNA
dataset at CSV file, then you can come to the ocean ecosystem, come to the provider component,
it's a middleware component, and say, hey, here's 1.0 data tokens for Trent's DNA data. I would
like to access it, please. And then it will handshake with you and give you access, either
in the form of URL or for downloading or otherwise.
Okay. So that's what we, that's the heart of the idea of tokenizing access control.
And that is data tokens, right?
Because why, and it's E.R.C. 20. Why E.C. 20? Because this is a fungible concept, right?
I might want to share with 10 people, 100 people, 1,000 people. And then, so that was kind of a big
breakthrough for us because by going ERC 20, it made us, made ocean, it enabled all these things.
It enabled data wallets. Metamask became a data wallet. Treasor.
became a hardware data wallet, NOSIS-sa safe, became multi-sig data wallet.
And same thing for exchanges. Balancer became a data exchange.
Uniswap became a data exchange, right?
All you have to do is go and start a uniswap pool.
Some centralized exchange should come along and start adding data tokens if they wanted as well.
Aragon and other Dow's Molok suddenly become data data data.
And then you can get fancier too, right?
You can have data-backed stable coins, etc.
So that was all enabled by this one thing by saying, let's make the access control itself tokenized, which is E.RC20, therefore interoperable with the rest of the Ethereum and EVM ecosystem.
So we did that for Ocean v3. I'm very happy with that.
So then Ocean you can think of as an on-ramp to publish your data assets into E or C-20s and an off-ramp to consume them.
And everything in between, people can apply whatever blockchain and D5 tools they want.
And we make some things easy, such as exchanges, et cetera.
but anyone can do anything. That was our V3. Actually, as part of it, we also launched an exchange
called Ocean Market. And in it, we had free pricing, which is basically, you know, just share
for free, fixed price, you know, buy and sell at a fixed price. You can price an ocean or whatever
other tokens you want, as well as automatic pricing, which we had with Balancer pools. And I've,
you know, had worked with Balancer for a long time, great relationship with them. So that was very nice.
And that was our V3.
And so we had the underlying technology, you know, the smart contracts deployed to Ethereum
mainnet, all, you know, fully decentralized immutable permissions, et cetera, the middleware
components and the ocean market front end, as well as SDKs slash drivers for both JavaScript and
Python.
That was our V3.
And we launched that, had really great traction in terms of people buying and selling.
And people really like the aspect of the trading on.
you know, speculating on the data assets.
That was actually a little bit too crazy.
They were focusing too much on the speculation side of the data assets rather than
actually buying and consuming the assets. So we said, okay, for V4, we're going to soften this.
And we had seen one specific issue on rug pulls. Basically, when you publish a data asset,
you are a data token whale. You know, you have, you know, tens of thousands, millions that you
can mint yourself as that publisher. And so in an AMM context, that is quite,
quite detrimental to anyone trying to stake on that.
So for a V4, we fixed that, or we thought we fixed that with one-sided staking.
And at the same time, we did two more things for a V4, which came out this past spring.
We introduced Ocean Data NFTs, and I'll get into that in a second, as well as better
community monetization.
So I'll just go through these three things.
So on the one-sided staking, it basically, it looked like a nice, clean, elegant solution to one-sided staking.
But within a month of releasing Ocean v4, someone found an exploit, which was kind of sad.
We had already gone through a lot of security audits, but this is how it is.
It wasn't an exploit in the logic of the software in the smart contract level.
It was actually in some of the incentives.
So we actually realized, you know, there's too much focus on the automatic pricing with the speculation.
So we just simply turned it off and said, everyone, we're going to focus on the fixed price.
And we're going to have another approach to curation because up until that point,
with the AMMs who are a really great way to curate assets,
you know, how much liquidity is in a given data token pool.
And I'll get into the into a bit,
but the summary of how we have curation now
is with VE Ocean and some data farming stuff.
So I'll talk about that in a bit.
So in V4, while we brought in, you know,
a fix for the rug poles of data tokens,
we soon removed all the AMM stuff,
simply because it was detrimental overall.
It's still there on this more contract.
someone else can go and use it if they want,
but we don't recommend it.
But the other two things are still there and going strong
and we're really happy with.
One of them was basically these other two features,
data NFTs and community modernization,
this was all around people using Ocean looking for more flexibility.
So let's stop for a second and think about IP in general.
If you are, say, ACDC and you publish a new album,
say Back in Black, right?
You've already got a record deal with Universal Music.
and you record the master tapes and you have copyright as ACDC the band.
You have the copyright.
So you can do whatever you want with those.
And if anyone else copies that and sells it on their own,
you have full legal recourse to come after them to stop them, etc.
And that copyright lasts for decades.
In fact, 70 years after the death, right, of the creators.
Which is too crazy long, but so be it.
That's how the laws work.
So basically, ACDC would record this and then they could
sell and distribute on their own, but that's not really the forte of a band.
So they instead said, we're going to give an exclusive license of essentially all the rights
that copyright gives to Universal music.
And then Universal, and basically call that the master tapes, right?
So Universal then says, okay, we're going to go and manufacture a million CDs for back
and black.
And each CD has a very specific set of rights.
Anyone who buys that CD can listen to that for personal use.
they can't go and play it directly on a radio station.
That's a different rights.
They can't go and resell that CD.
Sometimes it happens in a use fashion,
but that's not built into the rights, but per se.
That's a CD, but also what about, you know,
yeah, licensing to radio stations or streaming on Spotify
or selling as vinyl or as cassette tapes.
Each of those, so there's a five or six or seven
different distribution formats with different licensing terms
reach against that same base IP.
We saw it the same thing with data, right?
So when people publish a data asset,
They're claiming copyright, they're claiming that they have that IP.
And then they might want to license that IP for just one day, or maybe for one month,
or maybe for computed data context versus a download context.
So basically what we said was, let's map this idea for music of master tapes to the licenses in the form of CDs, etc., to the world of blockchain.
In this case, it's NFTs are the master tapes.
We call those data NFTs.
And then data tokens are the CDs or the cassette tapes or the vinyl, etc.
So when I go to publish a data asset, I publish a data NFT to claim that copyright, etc.
And then on top of that, I can deploy zero, one, or many ERC20 smart contracts.
So one smart contract might be for downloading for a day with a one-day license.
One might be for downloading with a one-month license.
One might be for computed data.
And that's what we built for V4, and it's much more flexible.
And yeah, so it's the data NFTs are ERC 721, ERC 721, 1 of 1, ERC 721.
And the data tokens are the ERC fungible.
So it's a really nice mapping of non-fungible fungible.
Therefore, you know, using these things as the unintended, you know,
the non-fungible is truly non-fungible, the fungible is truly fungible.
etc. And so that's the one thing. And by the way, these can be used anywhere. Like someone
could take Ocean Market, Fork it's all Apache to the front end and launch their own, you know,
music, NFT marketplace with Ocean or digital art NFT marketplace. And that was actually fun for us,
because it kind of reconciles back to ascribe 2013, right, with NFTs then. So, you know, and actually
the terms of service of Ocean Market is actually a riff on the terms of service from Ascribe days.
So, you know, we were able to leverage all the legalese for that. And of course, you know,
anyone running their own market can have whatever legals they want.
At the heart of it, though, you know, at the L1 level, at the blockchain level, it's fully decentralized.
You know, there's no sort of legal's attached.
People can attach legal meaning for whatever jurisdiction they are.
And finally, the monetize, the simple thing there is, basically you can have, people can monetize in various places.
If you're a marketplace selling this stuff, you can get a cut now.
If you're a provider helping to provide some of the compute or some of the middleware, you can get a cut.
If you're the marketplace for publishing or for consuming, I should mention, you can get a cut.
So we made all of that possible and just to make it much easier for people to monetize in various
ways to help keep the community healthy and strong.
So yeah, that's what we're up to, V1, V2, V3, V4.
Maybe it was a bit longer, but to summarize once again, V1 was self-sovereign data with access
control, V2, privacy, V3, E or C20 data tokens for huge interoperability and leveraging all
all the awesome stuff built in the Ethereum and EVM ecosystem. And then V4 was a refinement of V3,
especially with data NFTs and community monetization. That was incredibly comprehensive. And I actually,
obviously, I took notes before I went through your docs and kind of put down questions and things
to talk about before this episode. And you actually went through all of my points one by one.
So I think this was a perfect explanation.
I particularly like the transposing this to CDs because obviously everyone understands that.
Nevertheless, I have kind of a couple of remaining questions regarding this kind of topic area.
So you gave an example of, you know, trans DNA.
And you may be okay with me using your DNA for specific things.
So for instance, I could maybe I'm, I don't know, I'm an Alzheimer's researcher and kind of you want to contribute to that, right?
But you might not want your DNA data to be used for something else.
So you might not want the DNA data to be used for, I don't know,
know, making trend mice chimeras, right?
Like in a bio lab somewhere in China or so.
So once you kind of, you have, you put your data out there and you kind of, you mint
data tokens for access.
Do you have any way of kind of restricting access for those data tokens?
I mean, in principle, they're freely transferable.
So I could, I could sell them to the bio lab in, you know,
Shenzhen and they could use it for whatever.
So what kind of recourse do you have?
Yeah, that is a great question actually.
And we actually have an answer.
You know, we've iterated with multiple users,
large and small over the years,
everyone from individuals to cities,
to governments, to big enterprises.
And one really great collaborator over the years
has been Daimler, which is Mercedes-Benz, right?
And so that goes back yours.
And as we've been iterating with them,
They launched finally in production of their marketplace based on ocean technology about six months ago.
And as we were iterating with them towards launching that marketplace in production, it's
called Acentric.
They have a spin-off specifically just for this.
As they were iterating, they said, you know, we've got this automotive data that we want
to be selling and some of our partners that want to sell via Acentric marketplace, but they
don't want to sell to just anyone, just like you said, what can we do?
So we worked with them and iterated and we came up with the idea of fine-grained permissions.
What does that mean?
It means that on any given data asset at data NFT, you can optionally tack on top and allow list or a deny list.
So you can make it where the allow list is like a white list sort of here's the parties that can access, right?
And that can be specified as eth addresses.
but it's also flexible enough to allow credentials in various forms.
So then it's a form of role-based access control, R-back, right?
Which makes it super flexible.
So you can do this from a white list or a blacklist perspective, basically.
You can say, here's the people who are allowed,
or you can say, I'm going to allow anyone except for these very specific entities, right?
And that solves the problem very nicely.
And it's very useful in some other contexts too.
For example, what if you have, you just want to be sharing data freely among, say,
consortium, right? Then you can say, okay, maybe there's 20 members of that consortium.
You publish a data asset, a data set into that consortium saying if this, if someone shows up
with a credential, a verifiable credential, like W3C style, verifyable credential, or maybe a
sold down to a claim or whatever, saying that they have this particular credential, then
they can access it. Otherwise, they can't. So that's an example. So to summarize, yes, we have a very
explicit answer for that called fine-grained permissions.
That makes perfect sense.
What about the data NFTs?
So I assume they're like, can you give us an idea of how many different data NFTs have been minted?
Yeah.
So, you know, you can mint one, by the way, in like one line of code, right, in JavaScript or
Python or if you go to Ocean Market, you know, you can publish your own in, you know,
a couple minutes, five, you know, five minutes if you've never done it before, two minutes,
if you have, right? And by the way, I should mention, you can do this on a 3M Mainnet,
and we've also deployed to four other production chains so far, and that's Binet Smart Chain,
energy web chain, Polygon, and Moon River. And we've got a short list of another five or 10
that we want to do, including some L2s. And obviously Nosis Chain is on that list. We're big fans of Nisys
chain. So I just wanted to mention that because that means that the publishing cost can be,
you know, just a few cents or less. And for,
from Ocean Market, you just choose which one, whichever one is your wallet is connected to.
But going back to your question then, how many have been minted?
So far, last I checked, it's at least 500.
Maybe it's even 750 by now.
I last checked a few weeks ago.
You can just go to Market.Oceanprocels.com and check.
Maybe I can do that right now.
Just give you a very precise answer for your audience.
It's listed at the very bottom here.
So pulling it up.
And here we go.
Oh, wow, much more than I thought.
we have 1,562 data NFTs that have been published.
And those are ones that are valid for ocean market.
There's ones beyond too.
A lot of these are just test things and so on, right?
You know, to me, what really matters is, are these useful, right?
Are people buying and selling these?
Is there actual consume volume against this, et cetera, right?
So because I could go and write a script that publishes 10,000 more and, you know, run that
script in 10 minutes and then give you this other stat that is kind of meaningless unless it's
really end to end for value creation, right?
But yeah, these ones, you know, as far as they know, no one is writing scripts to publish a whole
bunch of these at once in a crazy way.
So, so yeah, that's the number.
So how do I understand what's in these data NFTs, right?
So basically, I mean, obviously the person who publishes them, they can also, they meant
these data tokens to access them.
So basically kind of before I, you know, go all.
in and purchase a data token to kind of access this,
how do I understand what kind of data is for sale
and whether it's good data?
Yeah, that's a, that latter question is a great question.
So the first one, to understand what kind of data is in there,
for if you go to Ocean Market, which is, you know,
market to oceanprodigal.com or some third party market that also can see the same
data, you know, they're all basically accessing the data that is listed on-chain,
right?
So on chain, you know, the data
NFT has all the metadata about the assets, right?
And that metadata includes a title, a creator, a description,
which is including markup, et cetera,
you can have HTML markup, et cetera.
And then the main thing is it describes access.
How do you access this thing, right?
And usually access is, well, the downloadable version
is it's a URL, right?
A URL that might be, or a URI should be more
general. So it could be an HTTP style URL that points to a specific S3 bucket, for example.
It could be an IPFS URI. It could be an R-Reave, URI, et cetera. And we have native support for IPFS and
R-R-WF, by the way. So that's the heart of it is, you know, a few pieces of metadata and then this
URL. If it's not downloadable and said computed data, then it has a bit more information
about what the basically compute script that needs to run is,
typically wrapped in a Docker container,
and then it can have a white list or blacklist of who can access that too.
So that's the heart of it.
But then there's a bunch of extra fields, of course, too.
And it all follows the ERC 721 format at the base.
We've also extended it such that it follows the open sea format,
therefore anything you publish in Ocean Market,
or one of the Ocean GIS Ocean Pi, you can render it in OpenC as well.
You know, it just shows up there.
There's links from Ocean Market too.
We've also extended it with ERC 725, which is what Fabian Vogelgissela invented, EERC 20 inventor.
He did that for identity for his work with Luxo and fashion, et cetera, but he knew that it would be much more general.
So, you know, we iterated with him to put that into Ocean and ERC 725 is an extension to EERC
721, one part of it allows basically arbitrary key value store.
So you can have any other key value pairs.
And that basically turns it into sort of a no SQL type database, right?
So we have that as well.
And we have that very easy to access.
So that's basically what's inside a data NFT.
And we've biased it towards data use cases, but people can use it for anything, right?
So actually the day that Pooja and Oliver and Battalick published their soulbound
tokens piece. By coincidence, I was publishing a piece on profile NFTs into Ocean Pi
ReadMe's. So right after it published, I DM Pugja like, hey, Pugia, here you go.
Implement, it's done. And, you know, you can turn off transferable yesterday no, right?
So people are using Ocean Data NFTs for Silbone tokens. It's just, you know, another approach
to identity and stuff, et cetera. We have readmys around that too. And actually, we have a
Gitcoin hackathon happening right now around the SolBall and token aspect of data netties.
So I hope that answers your first question. Maybe I'll pause there before I answer your second
question. Yeah, no, this answers my first question, but basically my second question, how do I know whether the data is good data in terms of how do I know it's actually what it says on the face of it is actually what's in there?
Yeah, exactly. That is a great question. And it's not the sort of thing, you know, you can answer definitively.
So what we don't want to do is have some trusted authority in between saying this is good, right? So if you look into the literature and the research,
on data marketplaces and buying and selling data,
there's hundreds, maybe thousands of papers.
And lots of these papers are academic,
they have ideas about what means quality data.
And we take a very pragmatic view,
call it evolutionary or call it a market-based view.
The quality of the data is simply how much people
have spent money on it to buy it and consume it.
That's it, right?
That is the most important signal are people buying
and selling this thing.
And otherwise people can make all their arguments
of why they have pretty data,
why they have good data, at the end of the day,
are people actually using the data, right?
So, but you can view that as a signal.
And overall, you know, why do you want to know
whether data is good or et cetera?
You want it for discovery, typically for people to consume, right?
So then it comes to the question of discovery.
This is a, you know, a long researched question
in the world of interfaces and web apps in particular
over the last 25 years.
So if you think about discovery,
there's three main pillars,
to discovery, browsing, search, and filtering.
And so if you want to have good discovery,
you want to have at least basic support for all three
and then make it better, better, better,
and you want to have good signals around that.
So browsing means you go to markup to oceanprocical.com
or otherwise, and you browse.
You scroll through seeing what's good, what's bad, etc.
And searching, you type in one or a few text-based queries.
It gives you some results, and then after that,
you can filter against it with,
other signals. What sort of signals can you have? So signals, you know, this is complementary to the
discovery to the browse search filter aspect. And so one key signal, like I mentioned, is what is the
volume of a given data asset? Other signals include how much stake is there against it. And I'll
get into that in a bit. Probably when you ask, I'll wait for that. And also, you know, who published
it? Is it someone that I know? They can choose to make themselves anonymous or they can choose to
to attach their publishing profile to ENS, that's supported directly, or to other identity things.
And in the future, we have in the backlog things like comments and ratings.
So there's various signals.
There's also, though, if someone has published something fraudulently, let's say that I publish a dataset that's doing well,
you come along, you decide to be, you know, evil Frederica, which is probably never going to be the case.
But let's say you publish this thing and start selling it as your own.
then the community can flag this.
And if it gets flagged,
it comes to whoever is running that marketplace.
So OceanCore team is running market to oceanprodigal.com.
We take a quick look, and we basically,
then it gets put into purgatory,
which means the asset never shows up in Ocean Market.
But there are rules for that coming out of purgatory.
And by the way, if your asset goes into purgatory,
you as the publisher also go into publisher purgatory.
So we actually talk about it.
a RIF from GitHub policies on this.
And it's around the use of fraudulent publishing,
you know, violation of copyright,
sensitive data, impersonation, all of those things.
So, you know, rather than saying it's blacklisted,
white listed, it has purgatory with very specific rules.
It's basically a state machine of, you know,
a data asset entering that and ways that it can exit
as well as for the actor itself.
So those are the basically overall to your question,
how do people discover if an asset is good or not?
This is really a question of,
discovery where we have the main discovery pillars of browsing, searching, and filtering with really great signals behind, including data consume volume, stake, you know, reading description, as well as for the really flagrant violations, we have purgatory.
I totally agree that there are cases which are clear cut. So say I'm even Friedricha, I take your data set, I republish it as mine.
That is, I mean, that is pretty evident that, you know, it was your data set first and I just republished it.
Or you sell your DNA and I open it and there's nothing in there, right?
So basically, obviously, that's pretty obvious.
But say you sell something as your DNA.
Sorry, I keep hawking on about your DNA now.
You sell something as your DNA and I purchase it and try to clone you.
and instead of you, I get, I don't know, Justin's son.
So, I mean, it's not evident, you know, from the get-go that this is not what it says.
And I assume the Ocean Market Team, marketplace team does not have, you know,
capacity to kind of run inference on all of these things.
So how would that be handed?
Yeah, that was a great question.
So actually, one thing I forgot to mention, we also have it where the publisher can publish sample data.
So that, you know, maybe 5% of the total data, and that can help to give people a feel for if the data is in the right place.
Like if it looks okay, you know, data scientists can look at the distribution, is this kind of saying, et cetera?
Ultimately, though, in a lot of cases, they're not going to know until it's useful.
And even if it's good data, right?
Like, let's see I'm trying to predict the price of Eath and I buy data from ocean market to try to predict the price of Eth better.
Maybe some historical data for Tron, right?
But, you know, turns out maybe that Tron has no bearer.
on the price of Eath, right? Tron price. So you don't know beforehand. So it's actually really,
really tough to tell whether something is useful or not before. And that's where, you know,
ultimately it's people can stake against it. People can give and you can see the consume volume
against it, et cetera, right? But also if it turns out that, you know, it was fraudulently published,
going back to this Justin's example, then after the fact, you could come to me and say,
you know, hey, this was, this is fraudulent. Here's why I believe so. And, and, and,
And the ocean core team makes a judgment call.
Now, this is just for Ocean Market itself.
Like I mentioned earlier, there is actually a GitHub repo,
you know, GitHub.com slash Ocean Protocol slash Market.
It's fully a passion too.
Anyone can come along and copy and paste that.
Like fork that, make their own version,
send it up within an hour.
We actually have a blog post that describes to do this in your hour.
So you can have your own data NFT marketplace or NFT marketplace in an hour.
You want to have your own CSS, great.
Go change it.
Half a day later, you've got your own, you know, branded marketplace, etc.
And so if you really don't like what Ocean Core team is doing with its
duration or with what's in its purgatory what's not, then go have your own, right?
And that's completely okay.
So and you know there are a few third party marketplaces out there.
Daimler is probably the top example, but there's some other ones too.
And there's more people doing that all the time.
And people are not only making small forks, but larger forks.
You know, the Algovera team has leveraged the ocean market code for their version of sort
of like a Web3 native hugging face, if you will, right?
And there's a lot of stuff like that out there.
So, yeah, overall, you know, all of this, we're trying to be permissionless and follow this idea that, you know, at the L1, the on-chain stuff, no one can change.
It's immutable, censorship persistent, all of that.
But the last mile, it really depends on the jurisdiction.
So, you know, Ocean Corps team is running Ocean Market out of the jurisdiction of Singapore.
If you're using Ocean Market, it actually has an arbitration clause that points to Singapore.
But, yeah, in the future, it can be whatever.
As time goes on, someone could have a fork of ocean market that has fully decentralized arbitration using, say, Claros, right?
That would be great.
We'd love to see that, right?
You know, we've had that on our backlog for a long time, but, you know, we have had other higher priority items.
Do you think using AI, there's a way of kind of generating data sets that are indistinguishable from the real thing, right?
So basically, say, I have, like, data on weather patterns.
Right. And basically I feed that into an AI and say,
generate me data patterns that are just like this one.
And I generate a hundred different data patterns.
Obviously, there's nothing new to learn from them because basically all the other information that went into making them was already in the first one.
Do you have any way of kind of making sure that where exactly the data,
come, came from?
Is there some sort of provenance that I can prove that I actually got it from,
I don't know, I'm really NOAA and I really got this from like satellites and ocean
probes and whatever?
Yeah, that's a great question.
So yeah, the ideal is kind of what you're hinting at where the chain of provenance of data.
And, you know, it's basically data compute, data compute, right?
Data being transformed through some compute to another data set being
transform through another compute to another data set or data stream feed, right?
So the ideal is that all of that is fully trustless, on-chain, trackable, etc., right?
You can loosen that slightly where you have the hashes along the way or some other sort
of tracking that's partly there with the claims being made.
And, you know, ultimately, the claims being made, you know, you've got the economic incentives,
which we've kind of been talking about so far, right?
But ideally, you can have it where it's even lower friction by fully on-chain throughout the whole flow.
There is some of that starting to show up in ocean market, such as the Sovereign Nature Initiative team, S&I.
They are having sensor data that it's coming straight from a sensor being put on chain and recorded there and then being put into the ocean ecosystem, right?
So that's a great example.
And then, you know, ideally you have it where it goes from that going through compute.
That is on chain.
And then, you know, to some other new data set maybe of hotspots for, say,
temperature, et cetera, rather than the raw temperature data or otherwise, right? On-chain compute is
still not super mature. There has been, you know, decentralized compute teams, but that's mostly
been around decentralized compute marketplaces, which isn't quite on-chain compute, right?
But we have more and more happening over time where it's towards, you know, verifiable compute,
at least even if it's not on-chain per se, right? And, you know, L2, L3 compute, you can get away
with it there too, as we have, you know, that will be happening over time, too. So we're going to get
to, you know, ideally on-chain data and compute for everything, but the infrastructure is not
quite there yet. In the meantime, we can have hashes and claims along the way, and that can get
you most of the way, right? So, you know, Ocean, basically, we try to support, you know,
the best-in-class technologies at the lower level as it matures, right? So right now, for example,
we're taking a very close look at the state of the art of the decentralized compute technologies,
whether it's the sort of decentralized compute marketplaces, like the IXX and Golems of the world,
or it's more towards pure on-chain compute.
You could run WOSM, sorry, run C or Python on some WOSM, L1, if you want, right?
Whether it's on Cosmos or otherwise.
And then there's other approaches too, of course.
So as time goes on, we're going to just make a point of supporting that better and better, better
as the technology comes ready.
This is maybe out of scope.
But in terms of AI, can you be sure that you can prove that some data is not made up?
Is there a way to actually mathematically prove that?
Right.
Yeah.
So going back, actually, I didn't answer a question before.
Can you generate fake data?
There's actually a subfield of AI called synthetic data generation.
And this is actually used lots and lots, right?
The main general idea is, you know, you build a PDF, a model that describes the distribution of the data and draw more random samples from that PDF, right?
That's one way.
Sometimes you need to go fancier than that.
And that's actually really useful for a lot of AI tools, you know, if you want to have a more biased sampling of zero versus one data, true versus false data for building an AI classifier that's very useful.
So just I wanted to close that loop.
On your question of, can you tell if it's been artificially generated or not?
There are watermarking techniques out there for AI.
I just saw even this past week.
I think it was Open AI or some other team was talking about watermarking inside some of their
chat GPT generated responses.
And of course, though, there's going to be an arm's race back and forth with that, right?
You know, someone with their own AI technology will take a chat GPT generated AI, try to figure
out what chat GPT is measuring against that and optimize against it, right?
So this happens all the time, you know, back and forth optimizing internally in a loop.
And this is the idea, you know, one subfield of that is called generative adversarial
networks, GANS.
And this is really hot in the world of AI two or three years ago.
And it's used a lot, right, in especially a lot of the world of game playing and all that
that we see from Deep Mind, AlphaGo, all that.
So overall, to your question, the answer is no.
There's no great, perfect definitive way.
But there's a bunch of approximate stuff that helps you get along the way.
But this is where we come to the economic side, right?
Are you going to train on data that you don't know where it comes from if you're going to get monetization from it, right?
And I know, like, a friend of mine had been running data at SoundCloud.
This was like eight years ago, 10 years ago.
And they actually came up with a really awesome AI model to predict, to recommend songs.
But the problem is they didn't know where a bunch of that data came from initially.
And they were really worried about copyright issues, et cetera.
And in the world of music, copyright is insane, right?
Like universal music and otherwise will come after you in a heartbeat.
So they actually never shipped that product because of this worry.
And we have this problem right now too with a lot of the art being generated from stable diffusion, etc.
A lot of that is almost certainly, you know, copyright infringing, right?
In a good example of, you know, in copyright basically, if you have a work that you have created
and it riffs even say 10% on someone else's work, then you need to get permission from that person, right?
A great example of this is Vanel Ice with his song Ice Ice Baby.
You know, David Bowie came after him saying, hey, this is too much like one of my songs.
And in the end, David Bowie won, right?
Who was right?
I don't know, but the point is that this is how copyright works, right?
So there's going to be a push in the world of AI for sort of IP clean, you know,
not just for image generation, but for data generation in general.
And then Providence is really going to matter.
So, you know, going on chain is going to help a lot.
And this is where Ocean Protocol can up a lot.
In my view, you know, like this is, it's helpful overall.
But, you know, the bigger picture of AI, like, you know,
we're not going to be worrying about copyright in half a year or one year.
There's much bigger, crazier thing that's going to be happening in the world of AI soon.
So we can get into that in a bit here.
But, you know, copyright is the least of our concerns.
So just to clarify, say, I have something like stable diffusion.
And basically it's trained on, say, 5 million different pictures, right?
And basically one of those pictures is my picture.
But basically there was, I mean, those pictures all came from somewhere, right?
So basically, are there consequences for the AI that's being generated or the model that's being generated by what it's been trained on?
So basically are you saying that kind of the copyright on that basically training something on copyrighted?
or copyrightable things is not necessarily allowed or allowable?
Well, you can do it, but it's going to be hard to make money from that, right?
Let's say, for example, I train an AI on a million different songs,
including David Bowie's song Under Pressure, I believe it is, right?
And then my AI fits at a song, basically that looks like, you know, Ice Ice Baby from Van
Out Ice, right? So it went, you know, and that would have meant it would have meant it
was probably partly inspired by David Bowie's under pressure, right?
So David Bowie will say, hey, like, it doesn't matter whether it went through the AI brain and computation or my brain in computation or Vanel Isis brain in computation.
At the end of the day, it's someone trying to make money from it, right?
Not the AI, but someone, right?
So Vinnell Ice tried making money from it.
And in the end, it didn't pan out.
So if it's some AI researcher trying to sell that song that was generated, but it sounds an awful lot like,
under pressure, then David Bowie has full right.
And at the end of the day, who decides?
The courtroom decides, right?
That's the ultimate arbiter that's very expensive.
But what will happen, I'm sure is going to happen is we're going to have, you know,
thousands of letters sent out by people, you know, copyright lawyers.
And there's actually letter factories that do this already on behalf of the music
industry and the image industry for Getty for universal, et cetera.
And they're going to be basically emailing all the people doing these AI images saying,
This is under copyright, you know, cease and desist, right?
Otherwise, we're going to charge you, you know, $1,000 or $10,000 for every image that you're selling without our licensing, right?
So it's going to get shut down and that's going to happen within a year.
It's sad, but this is how it is, right?
This is simply the state of the art of legals.
And that's not going to change anytime soon because, you know, the lobbies are too powerful.
And ultimately, it's kind of sad because copyright was meant to protect the artist.
And in this case, it actually is protecting David Bowie or Picasso or whatever.
So, you know, if you're a creator and you want your stuff to be remixed and you want to get a cut, then go and publish it CC0, where you're saying, I'm going to give away, you know, I give away all my rights to this and people can do whatever they want with this.
And if you want, go and pay me 1% afterwards as a thank you, right?
And I think that's a really great direction for people to go.
And Simon de Louvrevee has written about this in blog posts about this direction for NFTs, etc.
Publish the NFT CC0 and that way the existing legal system can't interfere.
And instead, it's all about do you have the private keys to that particular image?
Do you have the provenance?
And I think that's by far the healthiest thing you can do, right?
And so we still have this corpus of body of work of all the music of the past and images of the past.
It's going to be problematic.
But as much as people can go CC0, the more they should.
Data has the opportunity to go that sooner because we don't have as much data where people care as much about back catalogs of data.
So, you know, everyone out there thinking about data, please put a CC0, protect it with
with blockchain and let technology protect you rather than this archaic legal system.
That's my summary.
Super interesting.
So say I have an AI and it generates a song or picture that is not sufficiently like any of the
pictures it has been trained on to kind of trigger those copyright infringements.
Do I still somehow infringe on the.
copyright of the artist just by training on the on the picture because I mean clearly I mean
humans do this all the time right say say I'm I'm an artist I can go to the National
Gallery I can look at pictures and kind of yeah it kind of it does something to my brain
and may kind of have an effect on kind of what it outputs after but is this different for
for AI so is this I mean do can can I can I
I demand that an AI not be trained on a picture of.
So basically, it's just training an AI on a picture that I own.
Is that copyright infringement?
Well, if it's a picture that you own, that only you care about, then it doesn't matter, right?
But let's say I train an AI on a picture you own, and I don't do anything with it.
I don't publish it.
I don't try to get any financial gain from it or anything.
then no one cares, right?
It doesn't matter, right?
A good example of that is about 15 years ago,
this musician came along and took the white album,
I think from, I forget who, in the black album,
I think the white album from Beatles and black album from Jay-Z or something
and mixed them, called it the gray album.
Danger Mouse, yes.
But that was totally copyright infringing, right?
But they just dumped it onto BitTorrent and just left it.
They didn't do anything else with it.
So it didn't matter what copyright lawyers tried to do in court because there was no monetary gain.
It didn't matter.
Like, what are you going to sue for?
So that's completely okay.
So ultimately it comes down to is someone trying to make money from it yesterday?
But if basically, if I let someone use my AI, right, then I am making money from it because basically, so that would then not be okay.
So basically I need to own the data.
I need to own rights to the data that I train my AI on if I want to use the AI commercially.
Yeah, yeah.
Okay.
Well, yeah, exactly.
Yeah, it's kind of too bad that, you know, well, it's actually in many ways like copyright law can be used for good or for bad, right?
And, you know, part of the vision in a scribe was saying, okay, and a scribe once again was the NFT marketplace on Bitcoin, basically, right?
Part of the vision was seeing that artists didn't really understand copyright or any of that.
They didn't know how to leverage it to protect themselves.
So when they registered their work on a scribe, you know, and, you know, had the proof of claim on Bitcoin, blockchain, etc.,
they were also getting this copyright claim with pure legal language, et cetera.
And then when they transferred ownership to someone else, it was a license to the next person.
Then they didn't have to think about the legals, right?
It was just there out of the box for them.
So, you know, there's kind of two ways to protect yourself for overallness.
One is legal system, copyright, all that, lawyers, courts.
The other way is technology, right?
And so traditionally it's all been legal system, et cetera.
And blockchain offers the possibility to have, you know, really great protection via technology.
And you can have both.
That was kind of a bit of the vision of a scribe.
But, you know, now we're getting to a realm where we can potentially get away with just a lot or just blockchain if we have the right
technology and we're getting there, right?
So that's why I was suggesting the CC0 thing, et cetera.
So overall, you know, copyright is your friend, even right now with, you know, ocean
market with ocean in general and other data and other NFT, other NFT marketplaces too.
That just think about this well, right?
Like Rarable has awesome licensing.
They were very thoughtful of it.
Full kudos to the Rarable team.
Other marketplaces, you know, for NFTs, you know, depends on the marketplace, but some
really not great.
So, you know, that, uh, status quo,
right now we're in pretty good shape for all NFTs,
a very good shape for data NFTs.
And going forward though, let's all transition to evolve beyond
what the legal system offers and have something pure and clean
with just blockchain and that's the vision, right?
So everything, you know, L1 decentralized and mutable permission is.
Then you don't need this sort of archaic system of copyright, etc.
Let's kind of change gears a little bit.
Let's talk about your marketplace.
So who are your users at the moment?
I mean, both buy side and sell side.
And do you think those users, those user groups will change?
Yeah.
So maybe to just go more general,
so Ocean Market is one of the many applications or uses
on top of Ocean Stack.
So to refresh rate, there is the smart contracts
that are deployed to five chains right now in production.
There is the middleware, including the JavaScript
and Python drivers slash SDKs.
And then there's front-end apps.
So Ocean Market is a front-end app.
Like mentioned before, there's more than 1,500 assets
for sale on that.
And there are buyers and sellers.
If you look at the data for sale,
some of the things with top volumes include some video game data
assets, things like Clash Your Plans, assets there.
Other assets around NFTs and default.
There are some virtual world assets that are quite popular.
So that's sort of now, but I see it as really such early days
that it's not predictive of what things will look like,
you know, one year from now, five years from now, whatever, right?
It's just sort of what's getting going now.
Besides what's in ocean market, there is several,
there are many really great teams that have emerged in the ecosystem
doing really great stuff that are kind of, you know,
in many ways, independent of what's going on in ocean market.
right so um there is i briefly mentioned before there's a team called el govera that is building a web
three version of hugging face for your audience i'll give some background you know probably most of your
audience known as github right um it's a front end for git that is really great that you know most developers
use um that uh use to publish their uh code to have uh version control on um to share to work collaboratively
etc. on. And, you know, there's Web3 versions of that coming along too, of course.
And imagine if you're an AI researcher, a data scientist, whatever, is there an equivalent
to GitHub for sharing your models, to iterate on your models, your scripts around that?
And there have been many tools like that over the years, such as OpenML, but Hugging Face really
emerged in the last year, year and a half as it just blew up and took off in the best possible way
for data scientists to share the models. So it's got tens of thousands of...
of models now, I think 50,000 last I checked.
There's tens of thousands of data scientists on there using, iterating.
Now it is still a web 2.
But it's really great, right?
You know, one step at a time.
So HuggingFace has a really great community.
You know, world-class engineers working at Hugging Face,
iterating with the community.
They're publishing, you know, the latest large language models
within, you know, a week of them being published elsewhere, etc., etc.
So that is the state of the art for sort of community of data scientists out there in the world.
But of course, with Hugging Face, it's really hard to monetize your model, right?
What if you want if you're a data scientist and you want to make money as a living by,
you know, buying and selling models?
What about having different access control?
What about if you have a model that, you know, Hugging Face itself is based in USA?
What if you have some models that the U.S. government doesn't like, right?
From, say, Russia or something, right?
Should Hugging Face the company be stopping you from publishing that, right?
And so this team Algovera, part of the ocean ecosystem, they have been working on a Web3 version of Hugging Face that basically has a lot of the features of Hugging Face, but then also the sort of social community features for buying it for sharing models, etc.
But then also working towards the monetization, et cetera.
One of their stepping stones, interestingly, was integrating Ocean itself into Hugging Face itself.
So that got some of them of functionality, but from that, they realized, okay, this is, you know,
getting there, but then we need to have something more native yet.
So they're doing that.
Another example is in the world of Web 2, there is something called Kegel.
And this is data science competitions.
And, you know, if you're a data scientist, maybe, you know, a PhD student and, you know,
you're probably poor.
Most PhD students are pretty poor and you want to get, make an extra buck, you know, to win.
Can you do it by using your AI chops, right?
You know, win a thousand dollars here, $5,000 there, whatever.
Even 500 bucks makes a difference.
So Kagle came along.
I think in the mid-2000s or so out of Australia,
and pretty quickly it blew up in a good way
and became the de facto platform for competitions among data scientists.
And it rifted on this Netflix prize,
which was a million dollars for Netflix with their stuff.
But anyway, Kegel, you know, over the years,
has been the leading platform for data science competitions.
And anytime you go there to Kegel.com, K-A-G-G-L-E dot com,
you'll see, you know, 10, 20-50 competitions for data scientists.
to participate in with prize money of, you know,
100 bucks, 1,000 bucks, 10,000 bucks more.
Once again, there's challenges because it's a, you know,
centralized company.
In fact, Kegel was bought by Microsoft,
sorry, Google a few years back.
So it's controlled by a centralized company
and similar issues to what Hugging Face has, right?
So the censorship issues, the control issues.
As well, you know, privacy in this case too.
There's large corporations that come to Kegel slash Google
and say, hey, you know, we,
want to have this private competition and then Kagle will hook up maybe their top 100 data
scientists participants and loop them into this thing under K-Y-C, etc. What if you can do that all
trustless? So there's a team now called D-Sense, D-E-S-E-N-S-E, that is doing basically decentralized
Kegel and overcoming the issues of Kegel, right, or decentralized data science competitions in
general. And we've been running our own data science competitions in ocean for about a year now.
ocean data bounties for things like predict the price of Ethan Moore. And now we are starting to use
this dissent platform because it's gotten mature enough to use that. So that's also really healthy,
right? It's sort of a proven model from Web 2, but has issues in Web 2 that, you know, we can leverage
for Web 3. Besides that, maybe I'll just run a quick through more. There's this idea of federated
learning, which Google has, you know, well, pioneered, but others have been driving it, where you want
to collect together datasets from, say, 10 or 100 or 1,000 different hospitals to say predictor
to predict cancer better, right?
But if you collect that together centralized,
it's a privacy nightmare, of course.
Google and others push something called federated learning,
where they said, let's collect it all together,
but it's still running on Google data centers, et cetera.
So it's sort of like privacy theater.
Google can still see it.
They just pretend that they can't.
But what if you could do a truly decentralized federated learning,
right?
And that's where ocean stack can really help.
So there's a team called Felt Token that's doing that,
truly decentralized federated learning,
leveraging ocean computed data and more.
to enable things like, you know, gathering together data sets across a thousand or 10,000 hospitals
so that you can build the model to predict cancer across, say, a billion people, which is amazing,
right? And that makes a huge difference. The sooner that you can detect cancer, the better,
you know? Imagine you can detect, say, lung cancer at stage one instead of stage two. That
might make all the difference for survivability. So these things really matter. Keeping going,
there's Data Dow. There are one of, it's basically doing data co-ops, data DAO's. And there's Delta
Dow, which is working closely with Gaia X.
Gaia X is a European wide data initiative that's being driven by the German government, French government, and more,
around trying to help ensure that Europe itself has data sovereignty.
They really don't like the idea that most Web 2 apps, etc., are running on AWS or Microsoft infrastructure on data centers in Europe, but controlled by American companies.
This is actually sort of very dangerous to sovereignty of Europe.
So they say, we want to be able to not rely on this.
What do we do?
And so they kicked off this initiative called Guy AX about four years ago.
And the vision of it was actually pretty similar to Ocean.
And when we saw it, we're like, great, that's pretty cool.
And we've been monitoring it and we were early members, et cetera.
And now there's this team called Delta Dow that is a just phenomenal team.
And they're working with the GaiaX main core team, the CTO, CEO, etc,
as well as the many various spokes of GaiaX for various use cases,
for automotive, for financial big data, for agricultural, and so on.
And this is basically, you know, sort of, and they're using ocean across the board, though, right?
So this is sort of ocean for, you know, not just individuals or, you know,
pure decentralized Web3 fashion, but really serving, you know, nations and geopolitically,
more broadly and stuff. And, you know, we're happy to serve that, right?
You know, ocean is meant for all. It's like I've mentioned earlier in the call.
It's around leveling the playing field for everyone, whether you're an
individual, a family, a small startup, a large startup, an enterprise, a city, a government, a nation,
multiple nations, right? We're not trying to say no to nations. We can't. It's permissionless,
right? And in fact, you know, we think it's really helpful if, you know, people, everyone is
leveraging this, this technology to manage their data assets. So that's basically a sort of a run
across the gamut of many excellent projects in the ocean ecosystem. There's many more, right? This is
the tip of the iceberg. Actually, one recent one that's probably worth highlighting is something called
Ottawa Ocean. And the reason it's worth highlighting is because it's a, the builder, he's got a
website that shows different stats for ocean and the traction. And one thing that's pretty cool
is the ocean, as of about a week and a half ago, we crossed $1 million in consume volume per week
for ocean. So that's a run rate of more than 50 million a year. Now that comes with a big caveat.
it's a lot of that is driven by our data farming program and maybe I'll get into that in a bit but
you can ask with the caveat but I'll stop there because I otherwise I'm going yeah also stop
let's talk about the data farming let's get into it sure great so um so data farming is uh basically
at the top of it it's inspired by this idea this superpower of blockchains um you know you can
incentivize people to do stuff you can
get people to do stuff by incentivizing them with tokens. And Bitcoin is the OG on doing this, right?
Bitcoin was set up such that it incentivizes people to maximize its security. It's sort of a
subjective function called maximize security, right? And how does it do it? Well, it measures security
as the hash rate of the Bitcoin network. And so if you are contribute, and right now, every 10 minutes,
Bitcoin will pay out 6.25 Bitcoin pro rata to people.
who have added to the security of the network.
And that's pro rata in an expected value sense.
So if I add 10% to the hash rate of Bitcoin network,
so 10% of the security of it,
I can expect every 10 minutes to get 0.625 Bitcoin.
Now that's lumpy, of course.
It's on average one in 10 times, I'm going to get .
I'm going to get 6.25 Bitcoin, the other nine times I'm not.
You can smooth that out if you want, if you join a pool.
But overall, you're getting,
expected 10% of the rewards every 10 minutes when it pays a dot out. So that's the objective function
that Bitcoin is going for. And other protocols do this too, right? Like any other proof of work network
at that level, it is for maximizing security. Bitcoin until recently was like that.
Nosis chain until recently was like that, et cetera. But you don't have to stop there. You can use
incentives to incentivize other things too. And so with O'SCHAI,
Ocean and we like you know I just I thought about this years ago and I realized wow
this is really similar to optimization right if you're from the world of AI or
optimization you're used to writing down an optimization problem which is a set
of objectives and constraints you know maximize this and minimize that
subject to these three constraints being met so in ocean you know a lot of
the early ideas of ocean was like okay hey let's leverage the idea of
incentives to maximize the the traction
of the ocean network towards around getting ocean ubiquitous for this level playing field,
right? And the way you get to ubiquity is sustained growth over long periods of time.
Facebook is ubiquitous because it kept growing at 10%, 30% a month for years and years, right?
So that's what you want. You want ubiquity the way that the web is, the way that internet is,
etc. So you do this by growth and you can catalyze growth by leveraging incentives.
And how do you leverage incentives? It's all about this objective function or set of objectives
and constraints. So in Ocean, we said, okay, what do we want to try to maximize? And we
realize, you know, what is the, we could say, let's try to maximize the amount of data
that we published. You do that, great, great, great. But then you'll have, you know, people
publishing tens of thousands of data sets and there's no consume, right? So that doesn't work, right?
And it's basically watch publishing. And people have seen this happen where in an NFT marketplace,
they tried it and they had washed publishing and, you know, they fixed it after a while,
but it took a while. So you know, you see this happening all the time. You'll see this happening with
exchanges for for wash trading, right? People are used to watch trading where exchanges say,
hey, look at all this amazing volume we have, but it's just like crazy watch trading, right? And so in
Ocean we said, okay, well, what is the best measure of like value creation? That's what you really want
in a data system, right? And like I mentioned before, value creation is really where the rubber
hits the road for quality of data. And you can measure this by, well, I'll just talk about the
data value creation loop, first of all. So data value creation loop. Someone spends money to buy data or
create their own data, then they use that to build an AI model. Then the AM model predict something
basically and guides them to an action. They execute that action from that action. They make money.
And then with that money they've made, they use it to buy more data, create more data. They loop around.
So, and, you know, it can look around quickly in the sort of very fast-dobled effect or it can be slowly, you know, maybe a span of five years, 10 years latency.
You know, a fast example would be D5 for low latency.
And a slow example would be, say, you know, medical health data for prediction of cancer because you need FDA approval, et cetera.
So five years, 10 years.
Okay.
So data value creation loop.
That's the name in the game.
You really want everyone in this data ecosystem to go through this data of value creation loop.
So when we're making incentives then.
And we also want to incentivize people going through this data value creation loop.
So a simple way to measure this is simply at the point of them buying data and then at the point
of them consuming data.
If they're doing both, then it basically is a test for them going through this creation loop.
So that's what data farming measures.
We say, we're going to reward you if you are driving data consume volume.
Once again, data consume volume is the amount of money spent on buying data and consuming it
in a given time interval in a week in a month or whatever so that's what we reward people for now um
we also want to drive staking in the ocean ecosystem and we want to have some form of curation and i
hinted at this before so um you know we have duration in terms of data consume volume you know that's a
great signal for curation but also you know we've got ocean token holders etc they want to stake
they want to put their money where their mouth is and of course it's when you stake you uh we get people
to point their stake towards what they think the assets that will have high volume is.
That's the heart of data farming.
So high data consumed volume, to be precise.
And then they get rewarded based on how much stake they have on those assets and the volume
of those assets.
So it's those two things together.
So the reward is a function of the volume for given data asset and your stake in it.
And that couples the two things.
That's the heart of data farming.
That's the objective function.
And this then therefore incentivizes data consumed volume,
but also really helps creation because people,
they want to point their ocean,
the state, their locked ocean to where they're gonna get reward.
And it's a zero sum on how much state ocean they have, right?
How much locked ocean they have?
So they want to point it towards the high data assets.
So they're incentivized to point to that,
and then they predict what that might be.
So that's what we have.
One detail there is, I've talked about Loct Ocean a couple times.
What does that actually look like?
So we also want to, you know, within the world of blockchain, you know, there's this challenge.
There's the short-termers, the DGEN, the apes.
And, you know, they buy, they sell, they go, they're mercenaries, whatever.
And then there's the long-termers, the people who are really thinking about things for the long-term, you know, one year, four years, ten years, whatever, right?
I'm certainly a long-termer, but at the same time, a huge portion of the market overall is short-termers.
And if you ignore that short-termer, then it's kind of at a disadvantage to help your own long-term growth.
So then the question is, how do you reconcile this? Can you? Is it kind of crazy to think you can?
So it turns out that the folks at curve cracked this problem with something called VE-C-C-R-V.
And they said, we're going to have something called VE-C-C-R-V where you take your curve and you lock it.
Your C-R-V-R-V, and you get REW.
And you get rewards as a function of how much V-E-C-R-V you hold.
critically, VE curve is not transferable.
So if you lock your VE curve, if you lock your curve for four years, you get one VE curve.
If you lock it for two years, you get half of one.
You lock it for one year, you get a quarter of a VE curve and less and less, of course.
So if you want to maximize your yield from curve, you lock it for four years,
and then you point it towards, in Curves case, its own AMMs and stuff, right?
And that reconciles for Curves case, the near term DGens, the apes, etc.
with the long term, right?
In ocean, we do similarly.
We introduced VE Ocean in the fall a few months ago,
and mechanics are just like VE curve.
It's actually a copy and paste of the curve contracts, right?
We think they're wonderful, beautiful,
and they've been battle-hardened over years.
So we use those so people can take their ocean
and lock it up to four years.
And then if you lock for four years,
you get one VE Ocean.
And then you point, in our case,
you point your VE Ocean to various data assets.
the data NFTs. And then depending, you know, if you point your data out, your VE Ocean to data
NFTs with high data consume volume, you can earn a lot, right? 10%, 20% or more API. Or if you point it to,
you know, really bad assets that no one's consuming, then you get much lower return, right? So that's
the summary of ocean. And by doing this, we get this really great new signal for curation. And in fact,
it's such a good signal we found that in the browsing part when you go to ocean market,
it actually sorts the browsing by the amount of ocean staked at that,
the amount of VE Ocean pointed to various state assets.
That's the default sort.
So that is data farming and VE Ocean.
They go hand in hand.
And yeah, you know, the cool thing about this, it's staking,
but there's no risk of impermanent loss, like you might say, you know,
if you're adding liquidity to an A.M.M. or anything,
yet you still get really great yields.
And it has this really great mechanic for ocean.
We get curation.
But at the same time, it's sort of an optimized to the token itself, right?
because we're reconciling near-term incentives with long-term incentives.
It's helping with the, you know, for fixed price demand, it's reducing supply.
So it's sort of a win across the board.
We're quite happy with it.
So, yeah, I'll summarize there.
I'll stop there.
Maybe I'll summarize for everyone.
VE. Ocean allows people to lock their ocean for up to four years and then use that to
gain rewards in the data farming program.
Data farming gives rewards based on how much VE Ocean you have,
locked and allocated to assets and the volume of those assets.
And these things overall drive traction for ocean.
And ultimately for us, you know, our idea of driving traction is driving data consume volume.
And that's the ocean equivalent to Bitcoin, where in Bitcoin, it's about driving security.
I don't understand part of the curation part, I think.
So ideally when you pay for curation or when you pay people for, you also want to pay particularly for people who are
surface as of yet undiscovered content.
And so basically if I listen to your explanation,
economically the smartest thing would be to just point at whatever
most people are already pointed at because that's what's most discoverable.
And basically there's almost no incentive for kind of finding the hidden gem, right?
Yes. Well, so twofold, that is a great question.
And so if no one else has pointed to a given hidden gem yet, actually, I'll explain the mechanics a bit more.
So the way the mechanics are that each data asset gets rewards allocated to it each week pro rata on how much volume it has right now.
Right. So if one asset has 50% data consumed volume, another asset has 25%, another 1 25, then the 1 of the rewards, the rewards right now are.
75,000 ocean a week, which is on the order of $20,000, right? So that is, so that one asset with
50% would get $10,000 a week allocated to that, right? So if you're the sole staker to that asset,
then you'll get that money, right? The reward's going to that asset. If there's two stakers
on that asset, then it's, and they're each staking equally, you'll each get half of the reward
going to that asset. Okay. So that is how it is right now.
which means that you right now you want to find assets that are the undiscovered gems that are just about to hit high consume volume that no one else has staked on right that's sort of the incentive right now and
But as soon as you know you find it other people might do a fast follow and get rewards for the rest of the week too
So you know we measure how much you've staked on an asset day by like 50 times through it a week right now
So that's how the mechanics are but also you know what about some of even publishing?
asset, right? What if there's, you know, should you be incentivized there too? So this is the second part.
And we're actually just about to inject extra rewards for people to publish themselves too.
Now, with all of this, there is one sort of caveat with all this, and that's wash consume.
I can get into that. But hopefully that, I'll pause there for that. Hopefully that answers your
first question of, you know, people basically lazily following everyone else. There's a bit of that
right now. And for right now, we're happy with that, but we keep evolving the objective function.
Just like, you know, if you are someone who is designing trying to solve an optimization problem,
you might run your optimizer, see how it works, and then change the objectors and constraints.
And this is exactly what we're doing. Week by week, month by month, we keep tuning, tuning,
tuning the objectives and constraints based in what we've learned. And this is similar to liquidity
mining programs, such as a balancer, where they came out with a very simple objective function
at first for driving their own liquidity and balancer, the balancer, the balancer liquidity money
program, but then as time went on, they tuned, tuned.
One example was, at first they gave equal reward for people adding liquidity to a stable
coin pool versus adding liquidity to, say, EF balancer pool.
But of course, the risk of implement loss is much higher on ETH balancer pool, right?
So they, after a few weeks, they changed it such that you only get 10%, it's a 10% multiplier
on the stable coin pools because, you know, that sort of balances out the risk.
So we're doing a similar thing.
And by the way, yeah, so Balancer kept evolving over time.
And then once it got hardened enough, they handed off to Dow-like governance of that objective function.
And it's a similar thing happening with Ocean.
So Balancer was a big inspiration here.
And kudos to the Balancer team, great team.
Similar thing happening here.
And yeah, so in the near term, the small perverse incentive you describe where there's not enough incentive for people to publish their own or uncover their own.
It's not fully solved yet, but that's actually changing in coming weeks and months.
Cool. Unfortunately, we're kind of running out of time a little bit.
I would have been really interested, interested to also hear about AI DOWs,
but maybe we can leave that for another time.
Tell us what's on your roadmap for 2023 and how people can kind of get in touch with
the ocean community.
You also have a really cool grants program.
And so how does one get involved and what's, what's,
on the docket for this year. Right. So I'm going to wrap up the data farming thing because it's also
part of a roadmap. So right now, there is still in its end of the reward. If someone publishes an asset
and then fake consumes, they can make money from it still. So we call that wash consume. But we actually
did that on purpose to help to drive engagement. And then we are squeezing it to be such that
in about eight weeks or so, it will be unprofitable because what you can do is you can make it where,
basically profitability goes down with this over time with the right fees and the right place and
stuff. So that's changing. And so in, yeah, about eight weeks, watch consume will no longer be
profitable. And then after that, the only thing that will be profitable is assets with truly
legitimate consume. And so therefore, going back to this one million per week consuming volume, it could be.
It could be where 90%, even 99% of that is fake consume. So even though I stated it as a number,
it's not something to be 100% proud of yet.
I'll be proud of the number once we have wash consume
where it's no longer profitable to watch consume.
So that's really the number to look for.
That said, though, Ocean is getting pretty awesome traction
across the board, like I mentioned, some of these other projects too.
And one other great project, too, is H2O, the stable asset that's backed by Ocean.
And they're doing some other great stuff too.
So I mentioned that.
In terms of roadmap, so, you know, for the last few years,
since we really launched as a project in 2017,
we went through V1, V2, V3, V4, as mentioned,
as well as then launched the data farming and the VE Ocean.
If you think about building a version of something,
that's really about building.
And we've decided, you know, now we're at the maturity,
we built everything we said we'd build on the Wape River,
and then some.
And so now the name of the game is really doubling down
and focusing on traction according to data-consume volume.
So data farming is a key part of that.
And we actually have, we've had two grants program.
two grants program, we filtered down to one.
We had a grants program that was, we called it Ocean Dow grants,
where it was, you know, the community would curate on giving grants to teams to, you know,
build various things, right?
Anywhere from, you know, 1K, 5K, 10K, 20K per team even as often as monthly.
And we realized, you know, there was a bit more gaming happening in that over time.
And we saw that there was two things we wanted to tune it towards.
We wanted to make it retroactive, kind of the way that Gitcoin works.
you know, great project.
And we also wanted to make it where it's really much more objectively measured based on data
consume volume.
And they realized then if we actually built that, built that into Ocean Dow grants, we end up
with data farming.
So end game of Ocean Dow grants, once you add retroactive and fully objective on data
consume volume, you literally end up at data farming.
So we actually wound down Ocean Dow grants simply because we already had shipped data farming.
So, you know, why build something redundant?
And it allowed us to, you know, make data farming that much more awesome.
We still have another grants program called Shipyard,
which is curated by the Ocean Core Team.
The core team has a lot of context about what's important, et cetera.
And there's been several teams going through that.
I think maybe 10 so far have gone through it
with some really nice successes there.
So that's continuing to happen as a grants program.
If you're interested in Shipyard,
just go to OceanProticle.com and click on,
on the top right, there's a link to other funding
or Shipyard, I forget.
Other ways to kind of get involved.
The main one is, well, actually maybe, yeah,
Other ways. So you can go to ocean market, buy assets and consume them as a data scientist.
You can publish assets as a data scientist or otherwise. You can, you know, if you hold ocean or if you want, if you buy ocean, buy it and stake it.
You'll get passive rewards for, so basically half of the data farming rewards are straight for passive holding of the ocean.
The other half are for the active. So, you know, you can be a passive holder and still get some yield.
There's the grants.
And then finally there's the building side.
And there's sort of two approaches to the building side.
There's the Python and the JavaScript approaches.
So the Python is really tuned towards data scientists.
And the JavaScript is really tuned towards app developers, DAP developers.
So on the data scientists, you know, probably if you're a data scientist and you want to, you know, build some really cool ML model,
have some flow with data science going on where you're publishing your assets, you're selling
them, maybe doing the watch consume, whatever, then your main place to go would be OceanPy, ocean.
So that's a GitHub repo, you know, GitHub.com slash ocean protocol slash ocean dot pie.
And it's got readme as a quick start. And to be honest, I think it's actually one of the best
ways anyone in Python to get into Web3 in general. You can just go there, those quick starts.
You know, within an hour, you're going to be doing all this stuff with publishing assets, all this.
It's quite fun.
On the JavaScript side, Oceanjs, similar repo name, you can go to that repo, go through
its quickstarts, as well as, and then also that's for building apps, of course, right?
And that's one path to doing the OceanGS side.
Also, you know, we've got really great docs at docks to oceanprocical.com to help overall conceptually
but also a bit more fleshing out the JavaScript side.
So we've got one blog post that talks about how to fork ocean market to have your own
or just use some of the components of Ocean Market
to build your own DAPs, whether it be for your own data wallet.
There's a project called Real DataWail
that has a data token focused wallet.
Really first-class app and I'm a wallet.
I'm really proud of that team.
Or other things, right?
Data unions, et cetera, et cetera.
So those are the various ways you can get involved.
From a pure building side, there's the Python side with Ocean Pi
or the JavaScript side with Oceanjs.
There's the higher level with Ocean Market.
There's, you know, with the ocean token, pure staking of the ocean or, you know,
maxing out your yield with the data farming.
And then finally, you know, getting grants around any of these.
Cool.
Thank you, Trent.
It's been a pleasure having you on again, always super good conversations.
And I look forward to the next time.
Cool, for sure.
Yeah, there's so much on just the AI site too.
So that's where, by the way, that's where ocean is headed, right?
Much more deeply on the AI, et cetera.
right now. Basically, think of, to wrap up, right? Up till now, it was sort of phase zero of ocean,
building the stack, doing all the hard technology leaps. From here on, a lot of it is basically
doubling down on getting ocean used in all these really cool ways, you know, more and more deeply
for AI. So I'll conclude with that, you know, leveling the playing field for dayday and AI.
And once again, thank you for having me. And yeah. Thank you for coming on.
Thank you for joining us on this week's episode. We release new episodes.
every week. You can find and subscribe to the show on iTunes, Spotify, YouTube, SoundCloud, or wherever
you listen to podcasts. And if you have a Google Home or Alexa device, you can tell it to listen
to the latest episode of the Epicenter podcast. Go to epicenter.tv slash subscribe for a full list
of places where you can watch and listen. And while you're there, be sure to sign up for the
newsletter, so you get new episodes in your inbox as they're released. If you want to interact
with us, guests or other podcast listeners, you can follow us on Twitter. And please leave us a
review on iTunes. It helps people find the show, and we're always happy to read them.
So thanks so much, and we look forward to being back next week.
