Orchestrate all the Things - Compute to data: Using blockchain to decentralize data science and AI with the Ocean Protocol. Featuring Ocean Protocol Foundation Founder Trent McConaghy

Starting point is 00:00:00 Welcome to the Orchestrate All the Things podcast. I'm George Amadiotis and we'll be connecting the dots together. Today's episode features Trent McConaghy, founder of the non-profit Ocean Protocol Foundation. The conflict between access to data and data sovereignty is key to understanding how AI works and moving it forward. The Ocean Protocol Foundation wants to help resolve that conflict by introducing a way of letting AI work with data without giving up control.

Starting point is 00:00:30 The goal is to decentralize data science and AI using blockchain, and the latest milestone called Commute to Data brings this one step closer. We've been working on the Ocean Protocol project since summer of 2017, and it actually even itself was derived from earlier blockchain projects that we had, Ascribe, which was about blockchain for IP, and BigchainDB, which was about bringing the world of big data to blockchain.

Starting point is 00:00:56 And, you know, drawing the background of those, as well as my own background in artificial intelligence going back to the 90s, we realized that the world of AI had a really big problem. And that was that if you're an AI researcher, you want more data. It really helps to have a lot more data to, you know, make more accurate AI models. And AI researchers didn't have very good tools to get more data. They were, you know, they had some tools to need less data, but many people had shown the unreasonable effectiveness of data itself. Because, you know, there's kind of two paths to get an accurate model. One path is to spend, you know, two years or four years investigating better, better algorithms where you can maybe get away with 2x or 10x less data. But the other way is simply to just get more data and then you if you can get

Starting point is 00:01:45 10x more data um in you know one day or one month then that's great news but the question is how do you get that data and so this was um we saw this as um you know a challenge for the ai folks and and we saw that blockchain and cryptography tools could really help um to address that problem and these are tools that aren't really common in the AI space, but we could bring those over. And at the same time, we saw that there was data out there that was not being leveraged, most notably the large data sets in large enterprises

Starting point is 00:02:18 that they were very reluctant to make available at all because of issues of control and privacy. You know, they were worried about data escapes. And, of course, a lot of that data is customer data that has, you know, personally identifiable information. And you don't want that to get out. So, you know, basically there's sort of two options, right? One option is to do nothing and then to just have less accurate AI models. But I leave a lot of opportunities on the table. of two options right one option is to do nothing and then to just have less accurate ai models but

Starting point is 00:02:45 i leave a lot of opportunities on the table things like you know um imagine if you can have more accurate ai models for modeling cancer and stuff right so that was one to do nothing but then uh it's just really slow going the other option was that that people saw was okay have way more data but then uh have this issue of data escapes and privacy issues and control issues. And with Ocean, we realized, what if we could sort of have our cake and eat it too? What if we could resolve this dilemma to get the benefit of having way more data without these issues of privacy and control? And that's really what Ocean is about. So, you know, with that premise at the very beginning,

Starting point is 00:03:30 we saw that there was a few ways to do this. One of them was simply to better connect the people with the data and the people who want the data. And a good way to connect is simply marketplaces, data marketplaces that play this role of connector. And there have been many attempts at data marketplaces in data marketplaces, that play this role of connector. And there have been many attempts at data marketplaces in the past, but they've always been custodial, which means the data marketplace is sort of this middleman that you have to trust. They're holding the data and all that, which is really dangerous.

Starting point is 00:03:56 It's yet another sort of trust issue, and these data scapes get even worse. But what if you could have that marketplace as the connector without them actually holding the data assets, without having to trust the marketplace? And this is where the role of decentralized marketplaces come in. And we realized, okay, we want to have data marketplaces, but not just one. We want to have a whole ecosystem of them, because otherwise you would have, once again, sort of this centralized middle mound that you don't want to necessarily, that you don't want to trust. So you want to have a decentralized data marketplaces ecosystem. So Ocean basically set about building this protocol to make it really easy to do this.

Starting point is 00:04:35 And this is really what we've been up to for the last two and a half years around an ecosystem of decentralized data marketplaces. But then also, even if you have these marketplaces, this set of decentralized data marketplaces. But then also, even if you have these marketplaces, this set of decentralized data marketplaces, you still have this question, okay, if you, George, come along and want to buy data from me as an individual or as an enterprise, you buy that data.

Starting point is 00:04:58 But if you download it, then that data is still leaving my premises. It's leaving my phone or it's leaving my cloud, internal cloud, et cetera. And I don't know what you're going to do with that. So that's still problematic. So we've solved the problem about the centralized data marketplaces, but we still need to solve the problem of you getting benefits from that data without me losing control of it.

Starting point is 00:05:23 And so this is really what we've been working on quite diligently for the last year. And that's really what we have been shipping more recently with our Ocean V2 compute to data. So Ocean V2 is about resolving this other major issue, which is allowing data marketplaces to buy and sell private data without compromising privacy or control. And the big trick is that the data stays with the data owner. And then Ocean basically plays this role of orchestration, such that the computer can be brought to the data, the AI practitioner can build their models,

Starting point is 00:06:08 and then the models that they get, the resulting models, aggregate across all this information such that you can't yank out personally identifiable information or other private data. It's just sort of this aggregate. And that's the big trick, right? So just like if you have 1,000 data points, if you compute the average from that, then that doesn't have any real privacy leakage anymore, right? And so average is, you know,

Starting point is 00:06:29 a very simple model, if you will, but you can do fancier models. And that's really what AI models are doing is basically aggregating data in sort of fancier ways such that you can make predictions and so on. And if you do a good job of it, you won't have any private info leaking. So that's kind of summary of, you know, where we've come from with Ocean. And we've come a long ways, you know, from the initial inception of the idea. We had gone through fundraising rounds and built and shipped our V1 in Alpha last spring, last March, I believe, February, March. And then, sorry, that was beta. And then sort of the production version last summer.

Starting point is 00:07:02 And then for the last, you know, nine months, ten months, we've been working diligently to have this V2. So the V1 was really about the decentralized data marketplaces and addressing, you know, this issue of marketplaces as a centralized middleman. And V2 is addressing this issue of data escapes. And that's what we've been shipping just in the last few weeks here. So that's kind of where we're at with Ocean. We have many exciting things ahead to reduce developer friction more and make Ocean itself even more decentralized. But that's kind of

Starting point is 00:07:37 what we have been up to. Okay, well done. I mean, you summarized a long way and many milestones along that way. Pretty fast, so I have to give it to you. Well, since I'm personally familiar to a certain degree with what you do anyway, it's relatively easy, let's say, or straightforward for me to catch up from the last time we touched base. But there's a number of things that people who are not as familiar would be curious about. So if I try to think, wearing the hat of someone who doesn't know

Starting point is 00:08:19 the first thing about what you do, but is maybe perhaps familiar with some of the techniques that are usually applied in AI, then what I would say is that, well, okay, what you describe about the latest, the computer data, the way you describe it, it sounds a lot like federated machine learning, where, you know, data stays where it is, and you just apply, you run a certain algorithm, and then the algorithm basically sends whatever, the trained model back to wherever it needs to descend to.

Starting point is 00:08:54 Would you say that from the 10,000 feet view, it's similar? And if there's a difference, where exactly would the difference be? Yeah, that's a great question. So they're complementary. And so I'll describe in more detail. So federated learning, traditional federated learning, which, you know where let's say that you are a cancer researcher and you want to build a model that predicts whether or not someone has, say, lung cancer. And so you want to have a model that takes in some scans of their lungs and some other

Starting point is 00:09:43 things, maybe their DNA and some other things, and spits out a prediction of their chance of having early stage lung cancer or medium or whatever, right? And so to build that model, then you need to have a bunch of past examples of healthy patients and patients with lung cancer, right? So, and in the past, traditionally, that data is really hard to come by you know a friend of mine he's an ai researcher and he's working on cancer data and he's a happy man if he gets a data set of 100 people and that's basically from one hospital and for each hospital you go to it's a whole bunch of ndas and agreements that take months and months and months to find so ideally you have data across you know not just not just one hospital, not just 10, but 100 or 1,000 or 10,000, right?

Starting point is 00:10:27 And if you have, like, you know, way more data, then you can have way more accurate models. With my friend, because he's got such a small number of data points, and, you know, he's taking DNA as input variables, then he's got thousands of input variables, just, you know, only hundreds of data points. So his models really aren't very predictive. But if he had, you know, across more and more hospitals, it could be very, very predictive. So the sort of naive approach to do that would be saying, okay, let's just try to pool all this data into a single big database, right? Grab data from hospital one, grab data from hospital two, and so on two and so on and so instead of uh and then in the end you might have data say from 10 000 hospitals uh and maybe a thousand people for

Starting point is 00:11:10 hospital you've got 10 million people which is great you can have you know build a really great model from that but of course this would just kill privacy right and so uh the idea of federated learning and also what it works on here uh but I'll just focus on federated learning for a second, is let the data stay inside each hospital silo. And then, so how this works is, and then basically you bring some training algorithm to that. So you start off with just a random model, just randomly initialized, say, a neural network. And then you have this sort of little robot that drives to hospital number one, you know, electronically drives and updates the weights and then takes those weight updates and sends them to this neural network

Starting point is 00:11:56 that you are running centralized, basically. And then you go to hospital number two and you say, okay, what's the weight update from this hospital? And it looks at that data. But there's computation running inside the hospital, but it's only to use the weight update there. And critically, the person who has kicked off these algorithms and so on is not seeing that data in that hospital.

Starting point is 00:12:19 It's just this weight update algorithm that's only run locally. So then only the updated weights are sent back to the neural network. And by doing this, you know, hospital number one and hospital number two, and then three, four, five, and so on, each of these weight updates comes to this neural network model that's getting learned. And in the end, after, you know, 100 hospitals or 10,000 hospitals, you've got a model that's actually quite accurate. So that's the general idea of federated learning. And with traditional federated learning, it's decentralizing this last model with the data. So kicked off the federated learning and basically the model built at each hospital, there's very centralized aspects there. And so we have the likes of Google, et cetera, promoting federated learning, but they are playing the role of centralized middleman on Google Cloud architecture, as well as some of these higher level things. So there's risk of PII leakage, of leakage of personally identified.

Starting point is 00:13:33 So there's sort of three levels, right? There's the person who's done the kickoff of the job. There's the orchestration in the middle. And there's the hospitals at the end. Traditional federated learning has decentralized that third, that last mile at the end. Traditional federated learning has decentralized that third, the last mile at the end, but it hasn't decentralized the other two things. And, well, the last mile is sort of decentralized, but the middleman, the orchestration, is still centralized. Where Ocean comes in is it really helps to decentralize the orchestration,

Starting point is 00:14:02 the sort of basically connecting the dots between the person who's kicked off the job and the actual weight update jobs at each hospital node. That's where Ocean really plays a strong role. And then by doing that, basically you don't have the centralized middleman doing the orchestration. So the overall idea of federated learning, we think it's a wonderful idea, and we embrace that.

Starting point is 00:14:29 But we make sure we take it another level further and address one of the major issues, which is the centralized orchestration. So Ocean addresses that, basically. So overall, Ocean and federated learning are complementary, and Ocean computer data helps make it that much better. Okay, okay. Actually, well, actually you touched upon the role of the middleman, let's say, or the

Starting point is 00:14:59 data marketplace, as it is called in Ocean Parallels. And I think this is a key role. And actually, you know, even by providing the protocols and doing what you do, you're certainly facilitating the existence of such such marketplaces, of such actors. Again, having been familiar with what you do for a while, it was quite obvious to me from early on that this is a key role in the ecosystem.

Starting point is 00:15:30 And I know that you have been trying to kickstart as many of those as possible. And so what I was wondering is how is that part of the effort going? So what's the status there basically how many you currently have and what's the um the outlook like yeah so um that's a great question thank you and yes overall one way to think about ocean is that we have um two parallel efforts for products one of them is at the level of the platform, which is a platform to enable the building of data marketplaces to make that really easy, right? As well as just sharing data more generally,

Starting point is 00:16:12 as well as supporting direct federated learning flows that might not even have a data marketplace. But that's the lower level platform, and it's got the blockchain aspect with smart contracts and then libraries sitting on top. The second product, if you will, is basically products for building marketplaces themselves at sort of a higher level. And we view this as two pieces. One of them is for small and medium-sized enterprises, SMEs, as well as enterprises themselves, to build their own data marketplaces and deploy them.

Starting point is 00:16:44 So that's one. And we call these third-party marketplaces. And like I said, they can be small businesses, large businesses, whatever. In addition to that, we are moving towards having our own ocean community marketplace that is basically directly for the ocean community without having any, without depending on any given third party marketplace. And, you know, they can have unified backends such that they're shared metadata. But this is sort of the offering.

Starting point is 00:17:18 And by doing it like this, then these offerings are complimentary on the marketplace side. So these third party marketplaces, it's really nice because if you're a developer who's entrepreneurial, you can build and launch a marketplace quite quickly. Or if you're a business that's already existing, then with

Starting point is 00:17:35 a relatively low cost, you can just launch marketplaces, data marketplaces, and data marketplaces that overcome these past problems, right? They're not custodial, like I mentioned before. They don't have the data escapes. And these are really truly these past problems, right? They're not custodial, like I mentioned before. They don't have the data escapes. And these are really, truly killer app functionality, right? So that's very nice.

Starting point is 00:17:56 And then from the community side, it's basically this marketplace that is right now in alpha stage that we will be rolling out as time goes on. And in that case, the additional benefit is that it's not reliant on any given third-party marketplace for their specific vertical or their specific niche. It's just a more broad community. So that's what the products are. Now then, to go back to your question of, you know, where are we at with these marketplaces and customers and partners? So we've already announced Dexrate. They are a logistics startup.

Starting point is 00:18:31 And so they have tracking data that they're gathering from thousands of different tracking companies. And then we're working with them for deploying a marketplace, and we already have an alpha version of this too, a marketplace that then is consumed in a few different ways. One of them is for wall street types that like to use trucking data to trade stocks, etc. The other one is actually for operations research to improve the efficiency of the logistics of the trucking themselves. So, you know, when a truck figures out what load to take, often it's not well allocated. So they can actually be 25% empty or even more empty on any given load.

Starting point is 00:19:14 And this is actually the stats. You know, on average, trucks are 25% empty plus when they go around in the USA. So there's a lot of, you know, money on the table that's being left right now. And Ocean can help in this. So that's the lot of money on the table that's being left right now, and Ocean can help in this. So that's the example with DexRate. We're also working with other partners quite intensively that we've been working with for

Starting point is 00:19:33 even going back to last summer and last fall, large enterprises and so on. And we've actually built and deployed an alpha version with one of them as well, but we can't announce it yet. It's just we're not at that point where we can announce these partners. So there's basically two biggest, most prominent ones that we have going on. And like I meant, of course, we've got more in the pipeline as well.

Starting point is 00:19:55 And of course, this community marketplace that we see that that will be more of a sort of one individual at a time or one SMU at a time that's using that. Okay. Okay. Okay, yeah, I would say, you know, that's, again, that's a lot because, and this is quite diversified, actually, because well, to build your own community

Starting point is 00:20:16 is one thing. It's kind of, you know, eating your own dog food, so to speak. So you're using your own protocols and you're also doing community development, which are two orthogonal things, actually. I mean, you can be very good at building protocols and you succeed in both. And you're also trying to work with other parties, as I figured, to help them build

Starting point is 00:20:40 communities and, well, not necessarily communities, that in that case marketplace because you talked about uh how uh logistics data for example and hearing about that you know it kind of um struck a chord with me because it happened that maybe a couple of years ago i was having this conversation with some people from a startup called Freightos. And what they did sounded similar to what your partners do. So they also collect lots of logistic data from shipping specifically. And they basically try to use them to facilitate things for their customers. So they try to do things such as optimize routing and optimize what goes into what ship and so on and so forth. So having that in the back of my head, the question that kind of naturally pops up would be, okay, so how do you sell it to someone like your partner?

Starting point is 00:21:41 So what benefit do they have by implementing a marketplace using their protocols versus doing something like trade establishments, which is pretty straightforward, pretty centralized. They just collect data, aggregate them, process them, and sell services to their clients. Yeah, so there's actually several benefits uh for the basically people that are considering building marketplaces so um basically if you are uh you know the the world is headed towards being more and more digital you know and we've seen this even accelerated um in the times of this

Starting point is 00:22:22 pandemic where you know more like way more people are working from home, etc. And so and in this sort of more digital era, it's also just a lot more data flowing around and a lot more opportunities for data to be bought and sold. And I'd mentioned before already that, you know, these enterprises, all these large enterprises are sitting on large amounts of data. And right now they're actually treated as liabilities because if they get hacked, then they might have to pay a fine, such as, you know, Equifax getting hacked and having to pay a $700 million fine because all this credit card information of millions of people was leaked. So, you know, it's a pain point for them. And, you know, rather than just solving the pain point, can you flip it around and actually turn it from a liability into an actual asset? And even having those assets on the books. So if you're an enterprise, you know, that's your incentive. But if you think about just selling your data directly, you run into these big issues of privacy and control, right?

Starting point is 00:23:19 So it's a massive issue, especially for enterprises, because they have a lot to lose liability-wise. This is where, if you can monetize that data, it's very useful. But you have to address these big issues. They need to control the marketplace or know that no one else is controlling it strongly, so it needs to be decentralized or controlled by them. They need to make sure that their data isn't going to escape. Ocean Protocol is the only offering out there that actually addresses this, right? So, and it's using, under the hood, it's using blockchain technology, of course, to help solve this problem, right?

Starting point is 00:23:55 So you really need it at both levels, right? You need to decentralize the marketplace and you need to do, you know, basically you need to minimize trust from the marketplace side and you need to minimize trust from the data escape side and it's we're doing both so that would that's kind of the main if you will pitch to enterprises um is that they can um not only you know address the liability of their data they can flip it around and actually monetize um there's other benefits for them too um for example in the automotive industry um we're moving towards an era of um uh self-driving cars autonomous vehicles and um it's once again all about the data right um rand a few years ago estimated that 500 billion miles driven was needed in order to get um sufficiently accurate autonomous vehicles.

Starting point is 00:24:48 And, you know, any of the big car companies, it's going to be a long, long time, like decades plus, before they get that amount of information. So this is actually where Ocean was partly kicked off from. We had a project with Toyota Research Institute in early 2017. It was one of the inspirations that led to Ocean, where they were actually addressing trying to solve this. They realized they don't have enough data on their own, and they did a projection, and they won't have enough data on their own for a long time, for more than a decade, to do their own autonomous vehicles. But what if they could pool their data with other big automakers,

Starting point is 00:25:23 such that they could collectively benefit from each other's miles driven and then ship autonomous vehicles that much sooner, such that they're at least as safe as human driving and ideally much more so. So pooling the data together basically was useful. And it's sort of a defensive industry thing, right? If you're a Toyota, your expertise is not self-driving cars. Same thing for most of the other big um automakers um but what if you know rather than running that risk of not being the world expert in it um you can basically collectively um defuse that risk by collectively having enough data to for everyone to collectively ship their own autonomous vehicles and so from this initial effort with Toyota, that effort actually turned

Starting point is 00:26:05 into the Mobility Blockchain Alliance, MOBI, led by Chris Bellinger, who had been our collaborator at Toyota. And one of the efforts within that alliance is actually about autonomous vehicles. So, and we continue to collaborate with MOBI and work with Chris and the rest of the team towards that mission. And it's a long-term mission, right? These things don't happen overnight, but it's another one. So to summarize, I guess the benefits for enterprises is twofold. Converting data assets, converting data from a liability into an asset that can even be on books if they want,

Starting point is 00:26:40 and that they can monetize. And the second thing is, rather than the selling of data directly, it's about data exchange, so that they can get more data in order to fulfill their business needs, such as autonomous vehicles. That's from the enterprise side. And then, I guess, from the SME side, small and medium-sized businesses, startups, and so on,

Starting point is 00:26:59 it's really just an opportunity to get in at the very beginning of this burgeoning open data economy and create marketplaces that people couldn't create before, ones that are decentralized, ones that address the data escape issue and so on. Ocean marketplaces have these superpowers and we make it easy for others to create new businesses on it. Does that answer your question?

Starting point is 00:27:29 Yeah, it does. I mean, of course, as these things always work with me, I get a ton of additional questions, but I'm sure you have answers to those as well. Just to follow up on that, basically. So how exactly do you work with your partners currently? What kind of support do they ask for? Do they need support with the technical implementation? How to implement the protocols?

Starting point is 00:27:56 How to connect the data sources? How to run their algorithms? Do they need support with the commercial side of things? How do you get... Yeah, so this is great questions. What's your experience? Yeah, so these are great questions. And, you know, when we first,

Starting point is 00:28:17 when we launched V1 in production last summer, we started to kick off initiatives towards working with external partners who have real-world problems and use cases more diligently. We've been iterating with ones before, of course, like Tobobi and stuff, but we started to do it more diligently.

Starting point is 00:28:36 And we did our first POC with a large enterprise last fall, and we actually worked very, very closely on that with them, with our technical team and our technical team, because, you know, the technology is young. So there was a lot of kinks to iron out and whatnot. And we were very curious, you know, how easily can others build on this technology too, though, without as much handling. And to that end, we actually had, we were part of a hackathon last fall, Outlier Venture Institution hackathon, and about eight teams built on Ocean. And we had our technology team there just for people to come to for support,

Starting point is 00:29:11 but these teams were able to build interesting applications on top of Ocean. So that was kind of cool to see. The technology is still nascent, so it's not the easiest to work with, but it was sufficiently already at that point last fall that people could build on it. Then on the heels to request, but it was sufficiently already at that point last fall that people could build on it.

Starting point is 00:29:27 Then on the heels of that, we also had what we called the Data Economy Challenge, which was sort of, think of it like an extended hackathon that has prizes. And we had more than 20 teams that built really interesting applications on top of that. And they had a lot less handling yet so um and that was basically um you know december january february uh um january february of this year and you know that the that was pretty interesting too so it was pretty heartening to see that teams could build an ocean um and you know there's still some lifting like i said the technology isn't fully mature yet it's taking time and um but it was enough was enough that people could do things.

Starting point is 00:30:06 And then with V2 coming out, we've gone out of our way to make it even easier for people to build, have really nice even better documentation, improve the interfaces, and so on to reduce friction for developers onboarding.

Starting point is 00:30:21 And we've continued to work with partners, like I mentioned, DexPrade, as well as this other enterprise partner and engaging with them and more going forward. So we do provide our team, which is Ocean Protocol, working with Bigchain directly, provide support. In addition to that, we've started basically an Ocean Partners program, a broader set of service partners,

Starting point is 00:30:45 companies that have ramped up in building on Ocean Protocol, and then they can offer their own professional services for this. You know, that's not our goal. Our goal is just to help, you know, kickstart this ecosystem, but we want to have a whole ecosystem of companies that are able to build. So to that note, we've been working closely with Altoros, which has hundreds of basically sort of application developers, application engineers who can build. And then the subset of that company, Protofire, with dozens of engineers who know blockchain really well. So we've been working closely with them, as well as a company called DAC, D-A-C.

Starting point is 00:31:24 And similar in that case, they're a product company. They do product, they do applications and so on. And in both cases, you know, we've worked closely with them to ramp up their engineers such that their engineers can take Ocean and run with it with their own customers. And we are excited to see more and more of these sorts of organizations happening in the future. And this is all towards the goal of decentralizing the overall ecosystem, making sure that the platform itself is, you know, that people can build on it, that the marketplace technology,

Starting point is 00:31:50 people can use that to have their own, you know, quick out of the box marketplaces and then have this ecosystem of service partners to help other companies, you know, build the apps and marketplaces they want to build. Yeah. I was wondering, you could just do a bullet-style list of top five or top ten technologies people need to have to be proficient at to be able to work with Austin. Right.

Starting point is 00:32:22 So I'm going to focus on developers to start with because that's basically the main people we're targeting to start with here. And so if you're a developer that's interested in building a marketplace with Ocean or building some other app, then you have to be able to use one of our libraries. We have libraries in JavaScript and in Python. So if you're proficient in one or the other,

Starting point is 00:32:44 then that's critical. And actually more recently, we've been making React hooks. So it's one level above the JavaScript library and that makes it that much easier to have front-end web apps that are using Ocean. So that will help. That's the starting point.

Starting point is 00:33:02 The next thing is you need to have a sort of a conceptual understanding of just what's going on with ocean right and it doesn't need to be super detailed but um just sort of like the bird's eye view and so you need to understand you know computer science concepts things like you know the idea of bringing computer data and other that other things like that so you need to have sort of enough technical proficiency to understand that. Right now also you need to have enough technical proficiency to understand blockchain in a basic level. So you know proficient enough to be able to use Metamask wallet or other blockchain wallets. If you're a developer you're going to be interacting with the Metamask wallet or other

Starting point is 00:33:39 things so generally and that's you know Ethereum, Ethereum technology, right? So generally with Ocean, if you're familiar with Ethereum technology, that will help a lot, right? So, you know, and Web3.py, Web3.js, basically, that's the main interface for that. So basically knowing JavaScript or Python and then the equivalent Web3 library for those. And, you know, having the technical understanding, those are the main things, I think. And you can get going from there. And actually, the React books library, of course, you need to know React better for that. But that actually abstracts things away a bit more.

Starting point is 00:34:14 So I think those are the main ones. Obviously, if you're doing data science apps, then you're going to have to know whatever data science tools you like to use, whether it's like at Lara or PyTorch or whatever. But yeah, those are the other ones. It's not actually a big lift that you have to have. I guess one other thing too,

Starting point is 00:34:30 if you really are going to be using the computes to data, and then you're going to want to know how to set up the sort of server side compute and stuff for the, you know, running all that. And we use Kubernetes there. So you're going to want to actually understand um how to use that and we have tutorials to work through this and so on with all of this we go out of our way to make it easy for developers to ramp up

Starting point is 00:34:52 with uh just docs to oceanprotocol.com and then you know various tutorials in various ways um and you know developers who like to learn and stuff then um I think, you know, with those basics of Ocean and Web3, like the libraries for Ocean plus the Web3 libraries, beyond that, it's just, you'll see what you need to learn as you go along and you can pick that up as you go.

Starting point is 00:35:18 So it's meant to be, you know, not a huge ramp of learning curve. It's still more than we'd like, frankly, but that's where it's at. Again, another follow-up question on the services side of things. So actually, that's the question. Is this, I mean, working with partners,

Starting point is 00:35:41 something that you offer to them as a paid service or something that you do in order to uh to boost adoption of ocean or you know i'm just asking because from uh from a potential adopter side of things you know this is something that they they would need to consider right yes yeah it's towards adoption of ocean right so um we we, you know, the big ChainDB team, which is the core team that has been building Ocean, it's a relatively small team. And we want to make sure that we spend our developer bandwidth on building the core products, right?

Starting point is 00:36:16 The platform and the marketplace products. And so, and when we do, you know, engagements with services ourselves, it's only really to help refine the products and help to kickstart those initial, to show those initially, a few initial use cases. That's it. Whereas, you know, we don't have the bandwidth. We don't want to be the ones doing, you know, more and more and more POCs and handling towards many, many deployments. Right.

Starting point is 00:36:44 And that's where these other service partners come in. These other service partners actually also can simply be helpful even in the POCs, et cetera, that we're doing, right? And they have been, which is great. So it's really about, you know, broader ecosystem usage of Ocean. And also, you know, we are a decentralized project. So this is very explicitly part of the goal, right? So they see, you know, they like the mission of Ocean.

Starting point is 00:37:13 They've invested in Ocean, et cetera, that way. But they also see that, you know, they are businesses and they see opportunity, right? So we want to make sure that Ocean is a technology that is interesting and compelling for them to spend time to learn such that they can have business opportunities, right? And this is what's happening. So it's very nascent, it's young, and that's okay, right? But they're in on the ground floor on this

Starting point is 00:37:42 with already some service engagements. And we are hopeful that that will continue to grow and grow and their opportunities will grow and help their businesses bottom line them too. From an end user perspective, let's say someone who uses social networks or someone who has an electronic health record and so on. Any kind of personal data that they would like to protect and hold on to their privacy, but at the same time, you know, they realize that this data could be of value to someone. I guess, you know, the way for them to work through this would be to basically find some marketplace run by someone they trust and they're comfortable doing business with and engage with them. And then let the selling side of things to the manager of the data marketplace. What would your advice be to end users? I mean, what can they do to promote adoption of this idea and this set of protocols?

Starting point is 00:38:52 Yeah, so I mean, I'll be honest right now, Ocean itself is like, ultimately, we hope that Ocean will have a big influence on consumer applications and so on as from everything from you know helping health to reducing the stranglehold that social media sites etc have on sort of society in a sense you know to really equalize opportunities much more um but it's going to be yours honestly before we get to that point in any major way right so um therefore you know sort of advice towards consumers in the near term there aren't going to be consumer apps in your term so consumers um you know as these marketplaces emerge they can play with them they can put their data on for sale and maybe they will take off

Starting point is 00:39:35 right i think that could be cool um uh and as time goes on though some of these consumers they'll play with it and they'll see maybe opportunities so though these consumers turn into entrepreneurs seeing where hey you know they can work with various folks to um uh to help me sell to help sell health data and so on and i'm excited with that you know like a good example is um many many years ago i signed up for 23andme and they promised they promised they promised that they would never, ever, ever sell my DNA data. So I'm like, great. You know, I spit in the vial, sent it in the mail and got my information. And then, you know, I just didn't think about it for years. And then about, you know, six or nine months ago, I got an email from 23andMe saying, we've changed terms and conditions. Please click here. They didn't say what, but I followed the news and they had changed

Starting point is 00:40:24 their terms and conditions where they're giving themselves permission to sell my data. And so I'm like, well, I don't want this. So I went to 23andMe.com to change, to basically delete my account. But I wanted to download my data first. I mean, I paid for it. And it took them about a month to put together my data to download it. And in the meantime, they sold my data. So to me, this is like super uncool. And just an example of, you know, my very sensitive health data, DNA data might be the most sensitive health data. Many people have claimed that. And here's a company that promised not to sell my data. And here they are. And it's because they're holding it. And they saw that they needed a way to monetize, and they chose to basically do something that broke the most fundamental promise they could have made. So I want to be able to, you know, I as part of Ocean, we as Ocean, want to change the rules here such that consumers never, ever give up this data in the first place.

Starting point is 00:41:21 Yet at the same time, they can monetize rather than 23andme monetizing my data right so most people probably didn't even think about it and 23andme is making a lot of money from this now right um by breaking this promise right and it's similar elsewhere too right like with when facebook bought whatsapp they promised they promised they promised they would never merge WhatsApp with Facebook. They waited about a year and then they merged these things. Right. And the EU scrambled like crazy to try to prevent this, but they saw that they couldn't.

Starting point is 00:41:52 They actually were able to change the law since such that those future things can't happen as easily. But once again, you know, promises get broken all the time. And this is basically, you know, Google or Facebook can say they don't want to be evil. But if they still can be evil, then incentives get in the way. And Facebook and Google and many other companies are incentivized to basically farm us, to get our data to mine it in order to sell more ads. And what we need to get to is to move away from don't be evil to can't be evil. And how do you do this? By people keeping control of their own private data, sovereign personal data, right?

Starting point is 00:42:33 It's going to be years to get there in a full-blown way, but this is the journey that we all need to take, and it's really important for society. Because what's happening right now is the opposite. We're getting these tech companies that keep getting larger and larger and larger as they mine more and more and more of our data. And we've given it away without even realizing. And we don't understand how valuable it is. And it's valuable, especially in aggregate. So this is where we want to head, where we want to head in Ocean. And, you know, there's a path to this, you know, there's paths

Starting point is 00:43:02 with, you know, the marketplaces in the near term, but then the medium term, a big thing is things like data trusts, data co-ops, data unions, and Ocean is going to really help to make these things easy to build and implement because you combine DAO technology, which are these basically organizations that live online. Think of it like a, you know, a Reddit, a subreddit, but that has its own wallet that, you know, all the communities, members inside the sub wallet are working together and they can actually manage, you know, their data collectively. So you can have, for example,

Starting point is 00:43:38 a data DAO that people sign up for, for say location in Berlin. So wherever I am in Berlin, I'm giving permission to a data DAO to take that and then sell it on my behalf in an aggregate fashion. And it can be organized, you know, legally as a trust or as a co-op or whatever. But that's really what it's about. So it's collective bargaining for citizens, you know, tens of thousands, hundreds of thousands of people potentially at once to be able to sell their data for the buyers. And you can choose opt-in or opt-out, however you want.

Starting point is 00:44:10 Rather than right now, it's either click yes or no. Do you want to give all your data to Facebook? Yes or no, right? So in the future, we will be able to choose whether or not we want to sell our data to Facebook. But it will no longer be, you know, this binary yes or no where Facebook basically gets all of our data for free and then reaps the rewards. So this is where we want to head, and this is where we are headed. And like I talked about, it's basically one step at a time, right? We need to have this fundamental protocol for the marketplaces and then extend that to remove the need for trust on data escapes.

Starting point is 00:44:44 So solving the data escapes problem, which we have with computing data. And as time goes on, there's more things too. So, you know, reducing the friction for trading of these assets and such that, you know, these DAOs can manage these data assets more easily. And, you know, that's what we're working on now is basically, yeah, towards just, yeah, much better data asset management via data tokens and more. There's a lot in that to unpack. I'm sure you have many more questions, but hopefully that helped to summarize.

Starting point is 00:45:14 And I guess another way to summarize from the consumer perspective, it's going to be many years before there's a huge impact on consumers broadly, but the steps towards that are super interested consumers playing with marketplaces now, then the next steps is data unions, data trusts, data co-ops implemented via DAOs, and then beyond that, it's the consumer level apps more generally.

Starting point is 00:45:39 Okay, there's one aspect that I think we didn't really emphasize that much in the conversation so far, and that's the data scientist aspects. And wrapping up, a question on that. So what kind of frameworks does the compute to the data support? Actually, that's the entry question. So are there specific frameworks that it works with and it supports, or is it a generic framework that can work with anything? And by frameworks, I mean things like SageMaker or TensorFlow or PyTorch and so on and so forth.

Starting point is 00:46:16 Yeah, so Ocean is designed to be a tool that can be used with any of these. And the first thing we had to do was simply just ship Ocean in order to, like Ocean with computed data, put it out there and then the integrations can come. But one thing we've already done as part of what we shipped is, remember, we have a Python library and most AI ML is in Python. So we have a Python library and then we have, if you go to datascience.oceanprotocol.com it will launch a jupyter notebook on jupyter hub and inside this jupyter notebook it's got um template examples where it's actually

Starting point is 00:46:51 using ocean computer data using uh the ocean libraries so that's a very good starting point and then people can you know um use this you know copy and paste this code um at into what they're doing with with tensorflow or scikit-learn or whatever um or of course they can just you know, copy and paste this code into what they're doing with, with TensorFlow or scikit-learn or whatever. Or of course they can just, you know, use, look at that documentation directly for the ocean Python library, which is called squid five. The JavaScript is squid JS. So yeah, basically out of the box right now, it's already in Jupyter notebooks and Jupyter hub.

Starting point is 00:47:23 But because it's a Python library, then there's lots of room for integration in other leading AI and ML tools. Okay, so one last question then. If the algorithm runs where the data is, then this means how fast it will run depends on the resources available at the host. So the time needed to train algorithms that way will probably be longer compared to the centralized scenario. I mean, if you factor in the overhead for communications and cryptography, and plus giving users access to the algorithms to run on their data, should expect users giving access to algorithms to run on their data,

Starting point is 00:48:08 should expect to see performance degradation. So is there some way to mitigate this, or is this something that data providers and users will have to live with? In a typical scenario, compute needs to move from client to host data side. The compute needs don't get higher or lower, they simply get moved. Ocean Compute-to-Data supports Kubernetes, which allows massive scale-up of compute if needed. That means there's no degradation of compute efficiency if

Starting point is 00:48:39 it's on the host data side. There's also a bonus, the bandwidth cost is lower since only the final model has to be sent over the wire rather than the whole data set. So this is the typical flow. There's actually another flow where Ocean Compute to Data is used to compute anonymized data. For example, using differential privacy where there is random noise added to the training data or using decoupled hashing, where every combination of input value and variable is hashed. Then that anonymized data would be passed to the client side for model building there.

Starting point is 00:49:16 In this scenario, most of the compute is client side, and bandwidth usage is higher because the anonymized data set has to be sent over the wire. So overall, Ocean Compute-to-Data is flexible enough to accommodate all these scenarios. That's all. I hope this answers your question and I hope you have a great day. Thanks. Bye.

Starting point is 00:49:39 I hope you enjoyed the podcast. If you like my work, you can follow Link Data Orchestration on Twitter, LinkedIn, and Facebook.

Your Ad Here

Orchestrate all the Things - Compute to data: Using blockchain to decentralize data science and AI with the Ocean Protocol. Featuring Ocean Protocol Foundation Founder Trent McConaghy

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.