Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Ben Fielding & Harry Grieve: Gensyn – The Deep Learning Compute Protocol

Starting point is 00:00:00 This is Epicenter, episode 471 with guests Harry Grief and Ben Fielding from Jensen. Welcome to Epicenter, the show which talks about the technologies, projects and people driving decentralization and the blockchain revolution. I'm Friedricha Ernst, and today I'm speaking with Harry Greif and Ben Fielding, the founders of Jensen. Jensen is an AI blockchain project that is looking to enable you to buy, AI compute in a decentralized manner. And we will get to that in just a second. Before that, I will tell you about our sponsor this week, though. Our sponsor is Teleho, an open source wallet redefining the wallet as a public good. With Teleho, you can safely connect to Defi and Web 3 with everything you need from Metamask plus a lot more. You can view your NFTs in wallet across

Starting point is 00:01:09 Ethereum, Polygon, Optimism and Arbitrum. There is also no need to manually. at these networks, they already come plugged in. Teleho has the best ledger support around built by a community of developers that listen to users. Swap between assets in wallet at a fraction of the price and conveniently view all of your account balances across multiple networks with our new and improved portfolio tab. Currently, they're running a campaign called DOGSE-Y, DOG-S-E-Y, DOG-2 Adventure, that rewards users for exploring the Arbitrum ecosystem with Teleho. From now until December 2nd, so hurry up.

Starting point is 00:01:49 Bridge funds to Arbitrum with one of their participating bridges and claim your trusty space dog NFT, plus be entered into a giveaway for a rare blueberry club NFT. Head to their blog at blog.t.tale.comash or their Twitter at Telekash for more info. Teleho isn't just building a wallet that works. Teleho is building a wallet Web3 can believe in. Visit teleho.org today. with the wallet and join over 150,000 people in signing their community pledge.

Starting point is 00:02:18 Okay, fantastic guys. Hang on. Let me order you a bit. Put myself in the middle here. Fantastic. Harry and Ben, thank you so much for joining me. Hey, Frederica. Thanks so much for having us.

Starting point is 00:02:33 I think we said this on Twitter, but we've both been longtime listeners of the Epicenter podcast. So really pleased to be here. Fantastic. That's so good to hear. Harry and Ben, tell me about yourselves. What are your backgrounds and what did you do before Jensen? Sure.

Starting point is 00:02:50 So yeah, I guess my backgrounds in machine learning research mainly. So I did a PhD in deep learning focused on neural architecture search as a problem, which is essentially searching the space of deep neural network structures to find one that's kind of most performant for a specific task. So did a PhD in that, finished that in 20, and then moved up in moved into the startup world and co-founded a data privacy startup so i've got quite a kind of strong interest in individual data privacy kind of data sovereignty and things like that um did that for a couple of years and then joined an accelerator program in london called the entrepreneur first which is

Starting point is 00:03:31 where i met harry and we kind of went down the rabbit hole of what we're building with jensen yeah and on on my side, my backgrounds and applied econometrics, so kind of a fusion of economics and statistics. I was sort of introduced to machine learning during my postgrad, doing my master's degree, whilst studying econometrics and fell in love with it. From there, I just thought it was so cool to be able to essentially quantify everything. The kind of next step for me was leading a data research team at an AI startup in London. So whereas Ben comes from more of a kind of technical and kind of academic background, mine's more on the applied side commercially, got to the point where I really wanted to build something in the space.

Starting point is 00:04:15 I saw a lot of issues with respect to scaling. And, yeah, join the Entrepreneur First Accelerator, met Ben. For anyone who doesn't know what the EF or Entrepreneur First Accelerator is, it's been described as Love Island meets Shark Tank. So you join as a individual, and then you find a co-founder, and then they kind of invest in you. So it was sort of pre-idea. Met Ben shared a similar vision for the future of AI,

Starting point is 00:04:42 shared a similar sense of humor. So yeah, the rest is history. So it seems like you both come from a fairly extensive AI deep learning background. What moved you to kind of marry this entire thing with blockchain? Good question. It wasn't a sort of instant thing. It happened over a relatively long period of time to be, honest. And essentially, it was technology driven. So we knew we wanted to build massive scale

Starting point is 00:05:13 AI infrastructure. And essentially, as we were doing the research to figure out how we could make this the absolute maximum scale, we realized that in order to do that, you need to have a trustless layer, essentially. You need to be able to unite compute without having to do centralized onboarding of new providers. Because at that point, you end up with an administrative kind of like scaling limit and we don't want any limits. So we went down the kind of the road of verifiable computation research until we hit that kind of block of that always has to be a trusted third party. There has to be this judge or arbiter when you're checking a computation who makes a kind of a decision on whether something's been done correctly. Blockchain represents a way to kind of

Starting point is 00:05:58 break that and do it by by consensus essentially. So a large group of people can do it without having to nominate a single person to make the decision. And that was the lightbulb moment for us where we said, like this has to kind of be the next step for AI to get the scale that we want, like planetary AI scale. There has to be this kind of consensus layer introduced and blockchain is the way to do it. Before that, interestingly, we were kind of blockchain skeptics to an extent. We hadn't kind of dived into the space before.

Starting point is 00:06:26 We'd sort of taken the typical technical path of kind of saying, oh, read-only database can do the same thing. therefore I won't kind of dive into it. But I know for me personally, realizing that kind of trust layer was an absolute liable moment. It was when I realized the kind of actual power behind it and got very into the space. Yeah. Interestingly, Ben and I shared a lot of the kind of ideals that you see champions kind of in the wider decentralization scene. So we both were very large free speech maximalists. And we kind of a lot of the, a lot of the kind of censorship stuff that we saw with Snowden and things like that. that we bonded over prior to even talking about blockchain. So it kind of felt almost like obvious that we should have started in the blockchain space, but we didn't. Interestingly, right before making the switch,

Starting point is 00:07:12 we were trying to do federated learning, which is an area of deep learning where you train lots of models across distributed data and then combine them to create a kind of meta model that can learn from all the data sources. And we were doing that with banks. So the kind of realization for us or for me at least was that, There's a much bigger problem with accessing compute, or essentially just the processors on which the models could be trained.

Starting point is 00:07:38 And to do that, you need a decentralized kind of method of trust, and that's basically a blockchain. Okay. So basically it's kind of the platform and decentralized incentive layer that kind of did it for you in terms of that form. Yeah, in terms of moving this to a blockchain. Maybe let's talk about AI first, before we kind of go into what Jensen exactly does as a blockchain protocol,

Starting point is 00:08:06 because most listeners of this podcast will be familiar with blockchain to a certain extent, but AI is not so much our usual cup of T. So let's talk about the state of AI today. As an outside, it kind of seems like it's totally on fire. I mean with GPT3 and GPT4 is going to come out soon, I think. And then things like Dolly and, I mean, it's completely mind-blowing. Can you guys talk about the advances in AI in the last couple of years? Absolutely, yeah.

Starting point is 00:08:47 I think it's interesting being in the kind of AI space and watching this explosion happen because the sort of AI and machine learning space over the past seven years, I guess, has basically been a series of mini explosions. So this one is just kind of the next one in the sequence. But I think to the wider world, it's one of the first times they've seen it actually create real impact and create applications that people see the value in, essentially. But yeah, I think deep learning fundamentally has been the big change that's kind of enabled all of this.

Starting point is 00:09:19 It was when I first started my PhD, the deep learning kind of explosion was just happening. It just started. It just kind of taken computer vision. as an area by storm. They'd shown that essentially using a deep neural network, you could blow away all of the benchmarks set by sort of manual computer vision methods in the past. So very, very, very briefly, computer vision before that

Starting point is 00:09:42 used to be kind of manually defining sort of filters over images and then figuring out how to detect lines and things. And then you would have to define this filter to detect the kind of line that you're looking for and textures that you're looking for and is a very manual process. Deep learning is, essentially just came on the scene and said, we can do all of this straight from the data.

Starting point is 00:10:01 And that was such a huge change. It took away all of that kind of expert knowledge that was required and just allowed somebody with enough compute to design a kind of relatively simple model, apply it to a very large amount of data and then just have the outcome that they want. What we're seeing now is essentially the kind of building on top of that, building models that can do even more, and then crucially getting them to the consumer or to the developer who doesn't necessarily know the specific problem. That's been going on for years, but Imogen, Dali, GPT3, et cetera,

Starting point is 00:10:32 have really kind of fast-track to that. I don't know if you want to speak to some of the deep learning stuff as well, Harry. Yeah, I think whenever we kind of talk to crypto crowds about it at conferences, we always do a kind of sharpener on the distinction between free terms. So the AI, machine learning, and deep learning, because they're used essentially interchangeably, but they're quite different. And the best way to think about it are a series of kind of circles, which are like a Matroska doll, almost,

Starting point is 00:10:59 where on the outside you've got AI, and AI, by the loosest definition possible, many people will disagree with this, but the loosest definition is, it's just programming a machine to do something. So, you know, our kind of washing machine is, in a sense, a narrow version of artificial intelligence. You tell it to do something,

Starting point is 00:11:16 and it kind of programmatically does it, or it works out how to do it. Machine learning kind of came into the scene much more prominently in the kind of 90s in the kind of early 2000s, wherein you, instead of having, as Ben said, expert systems where you say, you know, if this, then this, you use data to essentially work out the kind of probability with which a certain decision will be made. Deep learning takes that concept, but allows different kind of concepts

Starting point is 00:11:43 to be modeled much more with much more kind of fidelity. So it kind of has hierarchical feature representation, which means that the way that the model works, different parts, learn different things. If, for example, the classic examples, if you want to recognize handwritten letters, a neural network typically pushes the image through lots of different layers. Each layer will kind of pick up something, like a kind of, this kind of number has a closed loop in it or has a stem. And then over time and over lots of computational cycles and lots of tweaking, the model will be able to generalize any new image it sees to one of these kind of categories, you know, a number between 0 and 9, say.

Starting point is 00:12:27 That's basically the distinction between AIML and deep learning. Deep learning is where you see all the kind of big breakthroughs coming in. So all the things you mentioned, like GPD-free, Dally, et cetera, stuff like stable diffusion, all that's deep learning. And the story for deep learning over the past kind of, I guess, you know, I guess since like 2016, 2015, has been Transformer models, which are a specific type of deep learning model that have been very useful for things like large language modeling. I think what's crucial as well as a kind of more social point is if you told people, you know, at the beginning of the 2010s, that they'd be able to essentially generate a comic book, which is in really kind of convincing, with really convincing art, just from a series of text prompts. I honestly don't think most people would believe that's possible, particularly at the kind of consumer grade for like a normal person just to be able to type text prompts and create a comic book. In my next few years, the kind of same order. magnitude jump is going to happen. So in the 2020s, the ability to sit down in front of, say, you know, Netflix, and instead of picking a movie, which has been, you know, pre-made, you simply

Starting point is 00:13:37 enter a text prompt. And you're like, you know, I want to see free technologists talk on a podcast for an hour, you know, about AI or something. And with other kind of prompts and maybe like a kind of set of initializations, they'll be able to generate an entire movie. And, you know, entire movie, which you can then kind of steer maybe at different points if you want. Or as a final point, maybe you have the same story, but you can change the genre of the story. So you could turn something like, you know, I don't know, Halloween into like a sci-fi movie, or you could change Jurassic Park into a love story or something, all by changing the, using the same script, but changing the kind of rendering. Lots of exciting things coming, in my opinion. Can we talk about

Starting point is 00:14:19 kind of the paradigm shift behind this? So basically, I mean, if you look at, like old school programming. It's a lot of deterministic if this than that and so on. And in my understanding, and admittedly, this is a very lay understanding, you kind of use like some sort of neural network with, you know, like complex connectivity. And where exactly is, I mean, do people exactly understand how decisions in a neural network are actually reached? Is this something that could you kind of transport this?

Starting point is 00:14:59 I mean, obviously, you don't use real neural networks, right? So basically everything's in a regular computer. You don't have to go to like a bio lab, although that would be. Not yet. I'm not sure whether that would be terrifying or fun. But basically everything's anyways in a computer. So basically kind of you're modeling like in a day. a different system that's kind of more interconnected and more flexible.

Starting point is 00:15:28 Maybe you can kind of qualify how the system you're modeling with your regular computers different from just giving the computers prompts. I mean, yeah. Sure. I think the black box sort of nature of deep learning models is just down to the absolute size of them. At the end of the day, you're still tracing a path through a series. of kind of decision points in the in the network it's just that that path is absolutely enormous and it's hard to kind

Starting point is 00:15:58 of link the the weights or the parameters within that model down to exactly why they're that sort of value because they've come to that value after being fed millions of samples and you can deterministically you could do that you could track every single update but the size of data that you would end up generating would be absolutely enormous I think it's there's sort of two things that I see happening as we kind of go through this. One is the black box nature is sort of falling away a little bit as we start to understand more and more about the models that we're building. Deep learning as a kind of research area has sort of gone through an interesting fast period where there's been a lot of experimentation that wasn't driven by the sort of fundamentals of

Starting point is 00:16:45 the research. It was more driven by seeing what we could get out of it. So we throw more data at it, we try out new architectures and we just see what happens, rather than starting from first principles and designing this thing and knowing exactly how it works. So there's been that kind of exciting period where everything has been very black box. I think a lot of the gains that happened there are sort of starting to sort of slow down a little bit. And we're seeing people revisit those architectures and sort of check and say, why does this work so well? Let's dig into it and let's kind of prove it out. So in some ways, that kind of curtain is lifting.

Starting point is 00:17:17 The other thing that's happening, which is a bit more controversial, I guess, is the shift in people's perspectives as to whether a kind of computational system needs to be fully deterministic or whether we can live in a probabilistic world. We live in a probabilistic world as people. The kind of self-driving cars example is probably the clearest where when we're driving around, we accept that there are kind of stochastic events that happen and there can be small accidents and there can be issues that happen. With a self-driving car system, we're We don't accept that at all and we say that this has to be a fully completely deterministic process. I think one of the challenges that the self-driving car industry has had has been an assumption that people would just accept that probabilistic mechanism applied to self-driving cars and they haven't. But I think that will change. And that's probably the controversial bit as we as a society go towards actually allowing kind of probabilistic computational systems to exist alongside us. Not sure if it will be an easy road, but I think it'll happen.

Starting point is 00:18:17 Yeah, thank you. Before we dive into the current landscape, there's one term I have come across often in preparing for this episode also. Maybe that's a question for Harry, because you already talked about the different kind of machine learning, deep learning, artificial intelligence. So basically there's this term of artificial general intelligence. Is that different from the three terms you already talked about? Yes.

Starting point is 00:18:45 So it's a term which was popularized. believe by a Ben Gertso, who's on AI, researcher and entrepreneur, the idea of AGI is similar to also the singularity. So it's the idea that you get human, you level intelligence from a machine. So you have, right now, what you might describe is a kind of like artificial narrow intelligence whereby machines are good at doing certain tasks. So for example, machines are very, very good at detecting certain types of cancer from medical scans right now. pattern recognition. Yes, yeah. But kind of scaling that up to a general intelligence whereby a machine can be good at doing a task, which is kind of maybe simple to humans, but actually quite difficult to reflect in a kind of computational. Can you give an example?

Starting point is 00:19:34 prediction space. Yeah, a good example would be a machine being able to walk through a crowded area in a smooth way, whilst being able to essentially make discrete assumptions about all the inputs around it. It's one of the reasons that I can't remember the level of driverless cars. I think it's like maybe level 10 or something. It's one of the reasons that driverless cars do really well in the motorway because it's a very kind of, it's a problem which to humans might feel quite complex, but it's quite like a simple sort of mathematical problem

Starting point is 00:20:05 because there's not much variation. But when you take that same car and you put it kind of in a city, street in Rome, you know, going over cobbles, tourists walking out in front of everything, it becomes extremely difficult. So it's kind of, yeah, some of the stuff which we think is really kind of difficult, like being really good at chess, it's actually quite easy for a machine. But some of the stuff that we think is really easy, like being able to kind of walk down the street or, you know, being able to like, I guess, certain kind of things in conversation,

Starting point is 00:20:35 like, you know, understanding, looking at someone's entire body language and looking at everything of someone's saying and being able to kind of withdraw an emotion from that. She might be good at various things like pose estimation. You know how someone's sitting, but combining that all together and making a kind of decision is quite difficult. So, yeah, artificial general intelligence basically means a model or a set of models or a system which is able to essentially be as good as humans at everyday tasks. critically, the kind of advent of AGI leads to artificial superintelligence, because it follows that

Starting point is 00:21:13 once a machine has kind of mastered everything a human would reasonably do, their rate of kind of marginal mastery of our tasks moves a lot faster than humans because of as a kind of function of both the kind of complexity of their model and the amount of computer available to them. So if we throw all the computers in the world at a model, which is already at human level, it's got much more energy than a kind of normal human does. and it's also got an infinite lifespan. And it's also got a perfect memory. Or near perfect memory.

Starting point is 00:21:40 So that's where you kind of get into the realm of kind of science fiction horror movies. This is what Elon is afraid of. Yes. You hear it a lot in these kind of examples. And you hear also kind of, one of the kind of pathways that people, or at least I estimate, will kind of take as there as the kind of fusion of, you know, humans and machines. So for example, if you have a kind of brain computer interface or brain. machine interface BMI and you're able to essentially augment your lived experience with,

Starting point is 00:22:11 you know, machine kind of inputs. That machine learns from all your kind of, the way your brain's working and firing, it learns patterns, you're helping it train, it's kind of helping you train your own brain. And that's going to help speed up that process as well. It raises a kind of, you know, treasure trove of ethical, you know, kind of issues. But the, yeah, that That's basically definition of AGI and then subsequently A. S.I. Artificial Super Intelligence. Cool. Super interesting. So let's look at what the landscape currently looks like, right?

Starting point is 00:22:45 So basically you say, I want to run an AI model. Where do I buy AI compute? So, I mean, I could just get an instance on AWS or I could run it on my local machine. So kind of walk me through the options. Yeah, so it really depends on the scalable model you're training. If you're a kind of student learning about AI, maybe you're an undergrad, you typically just use AWS or for small enough models, your local machine, as you kind of say. The next level up, you might be a kind of startup. You've just burned through your kind of 100K of Amazon credits, and you're kind of looking at the kind of marginal cost of training models.

Starting point is 00:23:26 You might go for an on-demand AWS instance, you might go for something more kind of fixed, more kind of permanent, which is typically cheaper when you have. have you book demand in advance. But there reaches a certain point when your training models that you, A, kind of are experiencing enormous cost in AWS, or B, you can actually achieve the scale required in terms of GPUs. So you just get kind of limited by AWS in terms of scale. At that point, you see companies go in-house. So in our kind of research, prior to raising our last funding around, Ben and I spoke to about 150 machine learning researchers and engineers. a variety of places from kind of fan companies to startups to academia. And whilst a lot of academics at top universities have access to kind of clusters

Starting point is 00:24:13 and large kind of high performance compute and people that say Facebook have access to the fair research lab super cluster there, which is the biggest AI cluster in the world, most people in our experience didn't manage to get the scale that they wanted. And one of the ways that some of them kind of dealt with that would be they'd buy GPUs themselves and they'd bring them in-house and they'd manage them. And we heard all these horror stories about people like in the south of England having a spare bedroom with a fan in it and loads of GPUs. It's like a kind of like a Bitcoin mine upstairs and like also people who would have them in their offices. And it's a bit of a kind of fragmented market. However, basically the

Starting point is 00:24:52 bottom line is if you buy the GPUs outright, typically it costs less marginally over the long term to run them. And that's a function of basically not having to pay this. start of 65%ish premium for, or I should say margin for accessing Amazon EC2 instances. So that's kind of cloud, local, or kind of getting your own cluster. There's also high-performance compute if you're in academia and you have access to that type of compute, but then again there can be bottlenecks there. There's other kind of options. So for example, if you're a kind of, I guess, benevolent organization and you're wanting to

Starting point is 00:25:28 solve a highly paralyzable computer science problem. A good example that would be like folding at home. You can you can access volunteer compute networks using things like boink from from Berkeley. Originally maybe a lot of listeners will remember things like setty at home. Well, it's not, you know, it's not, it's not, you know, machine learning. It's kind of just processing signals. It's a really good example of grid computing reaching very large scale. I think right now that folding at home, which is its kind of successor, has the largest kind of compute volume anywhere in the world, even greater than super supercomputers like Fugaku. So yeah, to summarize, you kind of go from your local machine onto the cloud, maybe via a high performance cluster at university you're at, and then ultimately back off the cloud, taking it back on-prem. The goal of Jensen, as a segue, is to give everyone access to the same kind of compute scale that the people who currently have on-prem clusters can achieve,

Starting point is 00:26:33 and crucially to do so in a way which allows fair access. So kind of it's not, it can be turned off by a centralized entity. There have been projects like this in the blockchain space before. One of the very old by blockchain standards projects is God, I believe they actually did their ICO in 2016, which is basically like 50 years ago in blockchain. So how does Jensen compare to Gollum? Yes, great question.

Starting point is 00:27:04 So we think of it in kind of two axes. The first one is the kind of fineness of the protocol, so to speak. So Gollum is a general compute protocol. You can do lots of things on it. And we are a thin protocol, more similar to like render protocol. if you want to kind of analog there, where we do one thing, and that's training machine learning models. The second kind of point is on the kind of scalability of the verification. So what we see in a lot of the kind of earlier projects is a tendency to use things like reputation

Starting point is 00:27:38 or to use kind of less Byzantine tolerant or fault tolerant, I should say, methods of like replication. And when we looked at those those kind of those architectures for verification systems, we just didn't work for us as people who train machine learning models. We just wouldn't have enough faith in the results.

Starting point is 00:27:59 It doesn't mean that they don't work. It just for if it's comfort, purely for kind of machine learning, they just weren't suitable. And when we had conversations with kind of web to machine learning people, they kind of agreed. So for us, the goal

Starting point is 00:28:11 is to basically take a lot of those initial learnings around how do you kind of position our compute protocol in this kind of world, crypto world, but do it crucially in a way which is only for machine learning. So you can make super, you know, optimizations around the kind of the speed and the cost of the protocol. Number one, but number two, how do you kind of reach a satisfactory

Starting point is 00:28:33 level of verification? Right now, that verification and consensus piece is really like the vast majority of our time and energy. You know, it's the question. And we had a, good initial stab at it with our kind of inaugural light paper, but we've expanded on it since then. I don't know, Ben, if you'd add anything to that. Yeah, probably just to emphasize the kind of general purpose approach that most people before have taken. It's quite an attractive one.

Starting point is 00:29:02 You want to get the biggest market you can possibly kind of get to, so saying we do general purpose computation, any scale, any kind of computational problem is attractive at first, but you fall so quickly into the two traps, Harry mentioned. The first trap is the verification problem. It's very, very, very difficult. Our thesis is that you have to narrow and that we will have a big sort of set of thin protocols at the bottom of the kind of the decentralized infrastructure stack.

Starting point is 00:29:29 If you think about AWS, but in Web 3, we think all of the kind of functionality that exists that will be ported over and it'll exist as this sort of hierarchical stack of things, getting closer and closer to the user as you go up. And on the bottom is protocol, like Jensen, protocols like render token, where you do one specific type of computation really efficiently with really strong verification. And then on top of that, you can have the

Starting point is 00:29:53 kind of general purpose compute networks that fall back onto that. So that's our kind of vision for the decentralized infrastructure. I think as part of that when you launch as one of those thin protocols, you have a much easier job in initially targeting your market. So our market isn't doing kind of like chess simulations and things like that. They're just building machine learning models, that's it. It can be really sort of attractive to say, oh, we could just do this extra thing, we could do this extra thing. Maybe we could attach ourselves to an existing sort of thing that's quite popular right now. Maybe we could generate NFTs, things like that. But I think when you do that, you split the mindshare massively in terms of product, and people don't know

Starting point is 00:30:29 what you are. For Jensen, we will always be very clear that we're machine learning compute. If that's what you want, then you come here. If you want something else, you go to a different protocol. Maybe it falls back onto Jensen at some point, but fundamentally, that's all kind of we are. And I think the very long term of it is we're behind the scenes. We're just like HTTP, but for machine learning compute. To an end user and a developer, you won't even know that Jensen exists. All you know is that the world has changed. And now when you train a machine learning model, it goes out somewhere and it gets performed by someone in the world through a series of kind of apps and daps and things until it eventually sits on the Jensen protocol. We think that's the

Starting point is 00:31:08 the kind of best way to provide this compute to the world is via that kind of hierarchical infrastructure where we gradually go more and more behind the scenes essentially. I'd add one final point to that, which is there's when we think about the kind of properties that the network has to have, it needs to be targeted towards machine learning engineers and researchers. It needs to have the verification piece. But crucially, on the kind of permissionless side, it needs to have that level of sort of censorship resistance.

Starting point is 00:31:35 but also kind of an agnostic relationship with hardware. So in the kind of, I guess, deep learning hardware space, you know, dominated by companies like NVIDIA, there's companies which are doing their own proprietary A6, like Google through TPUs, sensor processing units, or Graphcore, another good one, with their IPUs, intelligent processing units. What a kind of trap is, I feel,

Starting point is 00:32:04 which some protocols, not even in the kind of deep, learning space I've gone down before is shipping like proprietary hardware. So the idea that you, you know, I think maybe a good example for general compute. I've listened to actually the epicenter session with ICP and affinity where they have their own boxes basically and they're sold by them. That's actually very attractive to us. The idea that you can essentially ship your own hardware because then you all the kind of issues you have with sort of, you know, rerunning proofs in a way which is deterministic for hashing, et cetera. A lot of that gets solved,

Starting point is 00:32:39 but crucially, it creates us a choke point of centralization. So one of the kind of rabbit holes, we've seen some other kind of computer protocols go down is they rely on certain kind of, how would you say, kind of secure enclaves. So certain secure enclaves like Intel SGX, where they're like, you know, we can run, you know, computations for you in a way which are kind of private,

Starting point is 00:33:04 but, you know, they have to use the specific, chip which is manufactured by this specific company and you know it's only rentable on these specific services and it just it doesn't hold true to the decentralized ethos in our opinion it also doesn't scale well currently at least yeah i mean if if you look at what appears to me about jensen's offering most is that it kind of it can use resources that are currently lying fellow and i mean this would not be the case if you actually had to buy a dedicated piece of hardware to kind of partake in the network, no? Yeah, exactly.

Starting point is 00:33:42 I think, like how he said, it's really attractive to go that route from a technical perspective because it's so easy. But I think it intersects with one of the biggest things that we think about when designing our verification system, which is how, what assumptions are we making and how are we constraining the system? Because essentially, we have to make some assumptions and we have to put some constraints in. But a constraint like that to us is massive.

Starting point is 00:34:05 It's huge. We don't want to do that unless we absolutely, absolutely have to. There's other things that we can do. We can sort of narrow the space of devices in a temporary sense or in a permanent sense. We can look at certain manufacturers. We can look at certain libraries that provide determinism and things like that. But every time that we make any decision like that, we make it very deliberately. And I think it's quite easy to jump over those in the rush to ship something. But if you're going to build the network that we want to build that kind of take. the entire world and turns it into an AI supercomputer, you have to be very deliberate about that. And maybe it takes slightly longer, but you've made it generalizable. And that's the kind of

Starting point is 00:34:45 step change, essentially. It's almost zero or one. If you make those assumptions, you won't reach that, that kind of end state. I think it sort of fits on three axes. There's product assumptions, there's research assumptions and there's technical assumptions. And essentially, you have to balance all of those things, which makes it, I think, uniquely tricky. You have to have kind of voices of each of them equally kind of valid in the company. And that's something that we've focused on quite strongly with hiring and things like that. Just making sure that we don't accidentally overweight a certain kind of area. I think there's some protocols we've looked at before who've fallen into traps.

Starting point is 00:35:22 There's some traps with research where you can go down a, let's make the most formally verifiable system we possibly can. and then you never ship anything. And then you can go the other route where you make the kind of flashiest thing that an end user will like. You ship something really quickly. And in previous startup terms, that would be fantastic.

Starting point is 00:35:39 Ship it, it breaks, build it again. In the Web 3 world, not quite as good. It breaks isn't just a little thing anymore. It's a big problem. So I think it's sort of a unique area, Web 3,

Starting point is 00:35:51 where you have to walk this, I think of it like a ridge, where there's really attractive looking paths that go down either side, but they're not attractive. they quite quickly drop off the cliff, and we're being very careful to stay on that ridge. Cool.

Starting point is 00:36:06 Before we dive into the ins and outs of the protocol itself, so Jensen is its own layer one blockchain. In principle, it could have also been built as a DAP on another chain. Why did you go the layer one route? Yeah, it was a big question for us at the start. like we said, the sort of blockchain world for us was all about tech. So when we, when we entered it, we were quite sort of deliberate about it. We looked at all of the potential ways we could build it. We made a massive list of pros and cons and we kind of navigated through figuring out what the,

Starting point is 00:36:45 I guess, the constraints and assumptions were for each one. We quite quickly moved from layer two to layer one because we wanted the freedom to kind of change things on the layer one side, essentially the consensus mechanism. We didn't want to be constrained by a certain smart contract system. We wanted to be able to do as much as we possibly could because we knew this was going to be a big sort of open-ended problem. Essentially, being a layer one allows us to do a lot more work on the node side than we would otherwise be able to do.

Starting point is 00:37:16 I think if we'd built in the EVM, which you could absolutely do. You could build what we're talking about there. You'd be very, very constrained by what you can do in solidity, essentially. Whereas building in rust for us, we can do certain things, we can fall out and do some machine learning processes, maybe we can do some tensor processing, things like that, that just wouldn't be available to us within the EVM. I guess in a nutshell, it was a future-proofing thing for us. We don't want to constrain ourselves early when we don't understand fully why we're making

Starting point is 00:37:43 those constraints. So we kept it as open as possible. And fundamentally, we also believe in a multi-chain future. We think that the future is true multi-chain. It's not sort of ecosystems full of chains. It's individual chains interacting with each other. with a kind of generally agreed messaging protocol. I think we've seen some movements through the ecosystem,

Starting point is 00:38:03 having their own message passing, and now we're moving back into kind of general message passing. And I think realistically, we're seeing the multi-chain future sort of play out. So we're quite pleased with that kind of bet so far. So you're looking at building this as a parochane. Why built this on substrate in the Pocodot ecosystem? So we're not fully certain whether we'll be a parochain or not yet. The substrate decision was essentially the technology. So when we looked at everything, we looked at the sort of frameworks that we

Starting point is 00:38:37 could use and the libraries that existed from a tech perspective, just what was nice, what had sort of the best technology built in. And substrate came out on top for us. We weren't blockchain people. We were machine learning people. We came in knowing that we wanted to like stand on the shoulders of giants if you will, like we don't want to rebuild consensus from scratch. We want to use whatever the best one is, and then carry on with building the machine learning stuff that we're focused on. And substrate provided that to us as a way to very quickly iterate, build up the chain, and then get on with the off-chain stuff,

Starting point is 00:39:09 with enough flexibility to change it when we need to. So the kind of frame subsystem allows us to quickly get something running, but then if we need to step in and completely change it, which is really attractive. It's written in Rust, where we're fans of Rust, as a language. It just kind of made sense from that perspective. It's interesting. This was a year and a half ago and the kind of two that came out on top were Cosmosin substrate and essentially substrate one because of the tech and the kind of nice libraries and the developer tooling and

Starting point is 00:39:38 things like that. But yeah, the power chain decision is one that we essentially will make later as a bit of a cheat answer. We can be a parochain. We could not be a parochain. We don't need to decide right now. So essentially we don't. if the ecosystem starts to fill up with things that we can interact with, so if there's like storage layers in there, if there's sort of sovereign data layers and things like that, that we would want close ties with, then maybe it makes sense.

Starting point is 00:40:04 If they exist elsewhere, then maybe it makes sense to kind of bolt IBC on and exist in the wider world. But yeah, we'll see. You guys should look at solutions like Cartisi also. So things that kind of allow you to kind of have a, legacy operating system that kind of hooks into a blockchain

Starting point is 00:40:27 for provable compute. It's super interesting. I was just going to say it sounds interesting. I've not come across it before, but yeah, we'll definitely check it out. I'll share the link later. So let's dive into the protocol. So there's a couple of participants in the Jensen economy. There are submitted solvers, verifiers, and whistleblowers.

Starting point is 00:40:50 The submitters are the people who actually once worked on. So let's start in the beginning. Let's say I'm a submitter. What kind of AI problems can I submit? Am I constrained in any way? Yes. Currently you're constrained by your AI problem has to use gradient-based optimization at some point in the computational process, basically.

Starting point is 00:41:17 We use portions of the gradient calculation. as part of our proof system. That's not necessarily set in stone. I think as Harry mentioned earlier, we've got our light paper, which is public. We're iterating on that internally, and there are lots of kind of things in play, essentially. But right now it's gradient-based optimization.

Starting point is 00:41:36 We use the signals from that as part of the verification mechanism. What does gradient-based optimization mean? So to me, as an AI noob, so how would I know whether a problem falls into that category or not? Sure. Yeah, so I guess, fundamentally, if we think about a neural network, it is a big set of layers that have parameters in them. And those parameters are essentially just real numbers. There could be millions, billions,

Starting point is 00:42:02 now trillions of those numbers in there. But fundamentally, they are the kind of deciding factor in the output of the network. And the training of the network is setting those to realistic values that allow data to go through and trigger the kind of outputs that you like at the end of the network. So you go through lots of matrices, layers of these real numbers. It changes the current input as it's going through, and then you get the output that you want by all of those changes that have happened. You need to update those numbers to reflect the output that you want for a certain input. And previously, way, way back in the day, that would be done manually.

Starting point is 00:42:39 So maybe not with a neural network, but with certain systems, you would set those using expert knowledge, and then you would know that when an input goes through, you would get the right output. There's also different ways of setting them programmatically. So you could imagine a super sort of naive way of just randomly setting all of the parameters, running a sample through,

Starting point is 00:43:00 checking how far away from the realistic sample it is, and then just doing random ones again, and then doing a random search essentially until you make a smaller error value at the end, and then you just keep decreasing that error value. You can do other strategies where you do sort of more targeted updates, and there's lots of ways that you can do that. Gradient-based optimization talks about essentially what was the big change for neural networks

Starting point is 00:43:23 and deep learning, which was showing that you could essentially use the gradient or differentiating the parameters of the layer with respect to the error as you go through the network. And you can use the chain rule to apply that all the way back through the hierarchical network that Harry described. Essentially, in that way, you get the position on the hill of loss, if that makes sense. So if you modeled the loss in like Euclidean space, you would see it as this kind of really bumpy area where somewhere there's a big dip and at the bottom is where the loss is really, really small. And you're trying to find that dip. Getting the gradients for each layer essentially shows you for that layer where you exist on that surface and what direction you should go in.

Starting point is 00:44:07 So you use the gradient to say, hey, we've got a massive like drop here. Let's go down it. So the direction that we want to update the parameters in is this. way and we want to update them with this sort of size of step because this is really steep or it's not steep. So we want to make a big jump or a smaller jump. And essentially that's it. You're just navigating this huge bumpy surface looking for a big dip and the gradients give you sort of a position on that surface so that you know which direction to go in. And it was a huge leap because that signal, that direction is kind of really clearly useful rather than just taking

Starting point is 00:44:43 random leaps all over the space and figuring out, hey, I'm on the top of a hill now or, hey, I'm at the bottom of a trench. You know where you are. You know that you're on the side of a trench or that you're on this weird flat bit and you need to make a big jump to try and get out of the flat bit or something like that. How do you know there's only one trench or how do you make sure you're in the right trench? Because basically, if there's lots of trenches, you kind of, you want to end up in the deepest one. You don't want to get stuck on a mole here, right? Yeah, I mean, you want to go to Mount Everest. So basically, how do you make sure that basically, how do you know how, how, how you, how you, how you can go with your mother? Very good question. That's one of the big, big problems

Starting point is 00:45:21 in deep learning itself. Essentially, there's lots of techniques for, for doing that. The very, very simple answer is assume it's convex and then you don't have to think about there being any other but obviously in the real world it doesn't work like that. Essentially, there's lots of sort of regularization techniques that happen in deep learning training that make it a really complex thing. It makes it more of an art than a science because a lot of people have their sort of little tricks that they do. There's things like within learning rate schedules.

Starting point is 00:45:53 So you'll use a learning rate to set the magnitude of the jump that you'll take in that gradient space. But you can use certain schedules to sort of decay the learning rate, make it smaller over time, which means make smaller and smaller jumps so you don't accidentally jump over a kind of trench. But in the same case, you can suddenly randomly introduce a huge jump, which just allows you to know that maybe if I am in a global minimum, maybe if I'm in a tiny trench here and there's a massive one over here, I'll just do a huge jump. I don't know where I'll end up, but it should, it could be better. If not, I'll probably roll back to where I was before. So there's lots of techniques

Starting point is 00:46:29 like that that are sort of more trial and error than they are sort of deliberate. But like I said before, they're becoming more deliberate over time. So now that people have introduced these regularization techniques, drop out norms and things like that, now people are looking back at them and saying, hey, did this work for the right reason? Or was it just some weird random quirk of the model architecture that made it work here? And can we kind of figure out exactly why it works? But yeah, it is. It comes down to an art more than a science, to be honest. It can be very frustrating. So now I understand that gradient optimization problems are what I should submit.

Starting point is 00:47:09 In terms of, I mean, can you talk about like real world problems and say which ones are gradient optimizations and which ones aren't, just so I can get like a feeling for what kind of problems I should be able to submit? Yeah, I mean, the simplest way is thinking pretty much every neural network uses gradient-based optimization. There are other problems that use it as well, but within neural networks, all of the kind of big steps that we've seen, all the big changes have been neural networks recently. So it's a logical place to focus for us whilst also allowing this big space of other places. So any optimization problem, you could theoretically, as long as it's differentiable and can use the chain rule to flow back, it could use gradient-based optimization. And some use other optimization techniques with the gradient as a signal in there. As long as you're calculating a gradient, we can use it.

Starting point is 00:48:05 It's useful. But yeah, fundamentally, it's all neural networks. Every maybe two, three years, somebody comes out with a paper that says, hey, we're training neural networks with evolutionary optimization that doesn't use gradients, and it's better. It's never better. It's better in a really constrained system, and it never takes hold. Not to say it never will do, but so far.

Starting point is 00:48:28 our gradients have managed to stay pretty kind of solidly at the top. Okay, then I'll turn the question around. What kind of problems can't I submit? So what problems are not solvable or not well solvable with neural networks? Hmm, good question. Expanding the question to neural networks in general, neural networks are generally quite data-hungry algorithms. So if you have a problem with very low data volume,

Starting point is 00:48:58 A good example that would be like, I might be wrong about this, but some of the kind of toy examples which are used to teach people to do machine learning, like the IRS data set and stuff where you have like a very, like you could actually have a spreadsheet like 100 rows and maybe I think like seven or eight features. I don't think that intuitively they're like well suited to neural networks are typically better handled by like statistical machine learning techniques. So I think data volumes one of them. There's also just certain types of neural networks are just very large, so like fitting them on edge devices can be a challenge. But in terms of like the actual, I guess when you say problem, if you think about the kind of what's the type of thing you're trying to predict in the world, there isn't something which immediately comes to mind that neural networks aren't, are like explicitly currently and always will be bad at. I don't know if you have any intuition there, Ben. I guess you can think of a neural network as a universal function approximator. So theoretically, it can do all of the things that you would do with other methods.

Starting point is 00:50:04 I think like Harry said, the reason you would not use a neural network would typically be down to data volumes, where with a statistical machine learning mechanism, a method, you could get a better result, essentially, and then you wouldn't train that using gradient-based optimization. But fundamentally, you could do it with a neural network if you would. wanted to. It just might be a bit worse. Okay, cool. So I understand that I can submit almost any question. So basically say, do I ask for an entire program? Or, I mean, do I ask for like a dolly kind of output? Can I say like, I want a picture of accountants in hot air balloons over waterfall and there should also be a rainbow with scrupions on it? And it'll do that for me. Or can I ask, I'm

Starting point is 00:50:52 building, I'm building this car and I need an AI to drive that car. Can you deliver that AI? Is that kind of the both within the scope or do one of those fall out of scope? So I guess it's more like you, you'd use Jensen to train the model itself. So what you would do would be you'd think, I want those things. I want to be able to create my Scorpion Rainbow kind of image generator from the text prompt Scorpion Rainbow, which I love. And you'd build a model, which receives a text prompt and then converts that text prompt into images. And then you would have the training data which facilitated the kind of learning of that model. And then to the Jensen Network, you would submit model data and then some hyperparameters which determine, you know, like

Starting point is 00:51:40 mentioned the learning rate schedule, things like that, how long you wanted to train for. And then you're kind of the artifact you received from that training process, the product you get is the train model. And then that model, And then you can then host that and then you can submit, you know, Scorpion Rainbow. How do I decide which untrained model to use? That's a brilliant question. I think there's kind of two ways of thinking about it. So there's a kind of emerging and highly kind of popular concept around foundation models currently,

Starting point is 00:52:11 which is, you know, you get like a big company like Open AI or someone like Mid Journey or something and they build the base model. And then you take the base model with your training data. which might have lots of rainbows and scorpions in it. And then you train the model on that. And then the output of that. And then that model is very good at, you know, approximating those outputs. That's kind of option one.

Starting point is 00:52:33 And that's the most common for people who are quite compute restricted, which is the kind of theme in the industry just now. Second option is you build the model, but they would otherwise build from scratch. And I don't know if you want to talk about that then from our... Yeah, I suppose a lot of our thinking comes down to the foundation models approach because we think it's the kind of the future of the space. My research back in my PhD was specifically on essentially auto-ML techniques.

Starting point is 00:53:00 So the idea of allowing somebody to optimize that model structure and find the best model structure without necessarily being an expert, that's another way of doing it. And we've seen that sort of happen within like AWS SageMaker, for example, and GCP's compute cloud as well, where they build in some auto-ML techniques to say to a developer, you don't need to know the specific machine learning architecture because essentially we can just see that as something that's trainable as well. And we apply an optimization technique on top of that.

Starting point is 00:53:30 Jensen as a protocol can't have that if you wanted to. We would see that as something that you would build as a DAP that would use Jensen. And that DAP might implement a evolutionary optimization technique or something like that. It would submit the sort of individual architectures it wants to train and test to the Jensen protocol. It would have them trained and then it would iterate on the structure. and it could build up the kind of the model that you want. And that's a bit of a theme in the way that we think about Jensen as being purely machine learning compute.

Starting point is 00:54:00 All of these interesting things that exist around it, we would love to see build out as an ecosystem essentially. So all of the nice things that you see on SageMaker and GCP, we see as being additional things on top. I think it could be very attractive to build them yourselves, but ultimately it's a trap. But yeah, on the foundation models, we've seen that because of the computer,

Starting point is 00:54:20 problem like Harry described. We've seen people take foundation models from very large research papers that have spent maybe $10 million in funding in order to kind of trial out all of these different architectures and then they publish a new architecture and say, hey, this is the best in the world at doing these three computer vision tasks. And then you take that, you use the vast majority of that pre-trained network that cost millions to train. You would add some layers on the end, you chop some layers off, and then you train those layers on a smaller set of data. And you have a kind of usable model for that and it generalized loads of information from the first kind of training it did. So you call that pre-training and then you've got fine-tuning and that's very

Starting point is 00:55:00 kind of classic in the deep learning space. One of the things that we find particularly difficult with that is the bias that gets introduced in that pre-training. So one organization doing that on a proprietary data set or on a data set that they haven't disclosed features about means that when somebody else comes to use it, they don't know what's gone on. Because of those black box issues that you mentioned earlier, you can't go back in and say, why did it make this decision? The solution to that in our minds isn't to go fully deterministic and kind of get rid of the black box. It's to open it up to everyone and say, hey, everyone train this foundation model. So design it together. We train it together on an infrastructure that nobody owns.

Starting point is 00:55:43 And at the end of the day, we have a model that we can all use that's kind of global and hasn't necessarily been biased by a specific companies cache of data that they've kept back and they don't want to tell you what's in it and things like that. So once we've got those kind of global foundation models, then anyone can come along and say, I'll take, I'll find the hash of that model on the chain. I know it's been trained. I'll pick that spot and I'll continue training from there on my data set for my problem or task. And then I'll have a model that I know at least is as biased as the entire global population rather than being as biased as a company in California. Okay, so basically until we have the global foundation model, maybe we can talk about how we plan

Starting point is 00:56:26 on kind of delivering that later. But before we have that, I kind of have to decide on one of the commercially available ones. And I've now submitted my problem. Who gets to work on it? Do solvers need to kind of fulfill some prerequisites? And basically, is it one solver per problem? Or can you parallelize this? Sure. I would say it's at the task level, it's one solver per task, but a model can break out into lots of different tasks.

Starting point is 00:56:58 So typically when large language models are trained, it's interesting. They've kind of been built in a way which maxes out the current hardware at their time of creation. So, you know, they're designed to, fit chunks on, you know, certain video processors, etc. You imagine a similar thing happening across the network. It's complicated by the fact that there's heterogeneous devices on the network,

Starting point is 00:57:18 but essentially for any given task, you know, a supplier of compute, so a verifier or a worker, they have their ability to basically say, I'll take that from the men pool, and then they are randomly chosen from the pool of people who say that they'd like to take that task. So everyone can do it. If the model and the data can't fit on your device and you've said that it can, then it follows that there's the latter likely be a penalty there because it's kind of clogging at the system.

Starting point is 00:57:52 But essentially, if a task can fit on your machine, then yes, your ability to run it is essentially just determined by a verifiably random function, which selects you from a subset of the available miners or workers, I should say. How do you verify what kind of capacities the miners have? So basically if I say I have like a 16 core GPU and 400 gigabytes of RAM, how do you verify that? Yeah, so it's essentially in the verification of the computation.

Starting point is 00:58:28 They won't be able to do the computation if they don't have that compute device, essentially, or that capacity. and when they come to submit their proof, when that gets checked, it'll be found that they couldn't do the computation. There's a little bit of a sort of question there in how big you make a task, because if you made a task an enormous piece of compute,

Starting point is 00:58:50 then that would kind of be an issue because you could quite easily dos the system by grabbing lots of tasks and saying, I can do these, never doing them, wasting everybody's time and kind of money and things like that. So it feeds into that decision. Exactly. It feeds into that decision on the size of a task,

Starting point is 00:59:05 and that's, there's lots of other things that feed into that. There's a parallelization that you mentioned as well and how you split the tasks up into the most optimal structure. At the end of the day, we're doing a lot of research on figuring out what that should be based on the constraints. When we launch our test net, we'll do it based on the kind of practical aspects as well. When we see how this actually works in the real world, we're very conscious that it's easy to kind of define this in the kind of perfect system

Starting point is 00:59:30 and say, yes, this is the best size of task. And then you go out, launch a test net, someone does something really weird and you realize that you have to completely change it again. So it's part research, part. Let's just see how it functions when we get it out there, essentially. How do, so basically, if I get a specified model and the training data, how can you make sure that I've actually done the job, right? Because it's very much not deterministic.

Starting point is 00:59:57 So it's not like, you know, you can make me do a hash. And then the hash will tell you whether I've done it or not. How do you build in checkpoints into this process? Because otherwise I could just pretend to do the work and then kind of, you know, this was a lazy model. It kind of didn't do the work. It kind of maybe stupid. I don't know.

Starting point is 01:00:19 I've done it, but it just couldn't be taught. Essentially, that's the big challenge, that verification system. It's a huge challenge. And I think the simplest, most secure solution to say is a zero knowledge. proof of the entire computation essentially. That's sort of what you think about in X years' time, we should be able to do any computation as part of a zero knowledge proof. And then you can check that proof to say definitively whether someone's done that computation or not. Don't you need a new circuit for each given computation? Yeah. So right now that's the case.

Starting point is 01:00:56 And it's horrible to try and do for machine learning work. The computations are massive. You need a DSL for defining a circuit with respect to a machine learning computation. It's horrible, essentially. Our approach is to have a hybrid between that and a probabilistic mechanism. We sort of follow some principles in a work called Proof of Learning by a Nicholas Papernauts Group, which is, it was a paper within the machine learning world essentially showed that using the path through gradient space that we described before, you can sort of create this certificate proof using checkpoints in that space, that theoretically, it's just as hard to generate a realistic looking path as it is to just do the work. And then using a kind of financially rational assumption

Starting point is 01:01:44 on all of the participants, you can say they would just do the work essentially. There were issues with that paper and kind of flaws with the ways it checked things. But fundamentally, what it showed was using a essentially random auditing scheme on top of a path through gradient space, you can build up a relatively robust check. And essentially, we take that one step further by introducing zero knowledge proofs at certain steps and on top of the kind of global loss of the model just to add another definitive kind of proof on top.

Starting point is 01:02:16 We package all of that up within a game theoretic mechanism that looks quite a bit like Truebit from way back in the day with staking and slashing, solving the verifies dilemma with random jackpots essentially, with whistleblowers, and that's the full system. I'm aware that is just a big word vomit of things. So happy to dig into bits of it. Yeah, maybe there's so much to unpack here. So maybe kind of let's back up to Trubit. So I think lots of people kind of remember Trubit. It's kind of this, basically it lets you do large computations off-chain

Starting point is 01:02:50 and then basically you can prove it with a binary search on chain, if anything. Is that a fair summary? Yeah, I think essentially that's exactly it. So Trubit prove that you could. take a very large computation that wouldn't fit in the EVM or would be absolutely massive and really expensive, do it off chain and using that challenge mechanism and that search that you described, eventually prove it on chain with the chain doing a tiny operation. We take that same principle. We apply it on top of some of the sort of certificate proof stuff that we mentioned before. So if you applied that to a full machine learning training job, you'd be searching forever.

Starting point is 01:03:27 Like, it's enormous. So we distill that down into a kind of a smaller, proof that is still representative of the larger, i.e. rather than doing the full thing, you do one in a hundred checkpoints or something like that. You've already reduced the size by 100. Then you go into that challenge mechanism. There's also some work in the machine learning space again, which is applied the true bit mechanism to neural networks, but rather than using virtual machine instructions for that search, you use a graph and you traverse a kind of Merkel tree graph of a neural network graph, essentially, of operations. And you can do that.

Starting point is 01:04:01 that at different granularities. So you can do it at native operations like within Pytorch or TensorFlow on a convolution and then you can step into that convolution and do the matrix multiplications that are involved. And then you can step into that matrix multiplication and do the individual kind of floating point operations that are involved. It's quite large overhead. So it requires you to do that big reduction before you get to that stage. But once you do, it provides that crucial link that goes from random off-chain participant to full consensus of the chain with the chain running something. And that links back to what we said earlier about being a layer one versus a layer two. As a layer one, we can also increase the size of computation that

Starting point is 01:04:40 the chain can do. So if we make the chain do a matrix multiplication and that's okay, then we get to kind of skip that step in to a matrix multiplication and doing flowing points and things, which is quite nice. At the end of the day, it's constraints and assumptions again. You increase the hardware, the validators and things. So you've got to be careful, but we like the flexibility and being able to kind of change all those levers and things. I totally get that. So I think basically kind of fixing the block gas limit and kind of maybe repricing some upcodes.

Starting point is 01:05:12 And I mean, obviously it gives you, it goes a long way, right? Maybe let's talk about the, how the blockchain itself works in just a bit. But there are two more parties in the process. So there's the verifier, whoever. actually make sure that the checkpoints have been checked. And then the whistleblower, who makes sure that the verifier actually operates correctly. Can you go through what their respective roles are? Yeah, so the verifier and the whistleblower have a relationship similar to the verifier

Starting point is 01:05:48 on the kind of worker in the tribute paper. So essentially, the whistleblower solves the verifier's dilemma problem, which is the idea that you won't necessarily verify work unless you know you can reasonably expect there to be work worth kind of you know catching as being wrong and being rewarded for so uh the whistleblower essentially checks that the the verifiers work's being correct but it's also incentivized to do so by forced errors from the verifier so the verifier that's also from the trubert paper right yeah yeah It's kind of like the dogs at the baggage carousel where basically if they don't find, you know, any drugs, their handlers put, you know, like a suitcase with drugs so, you know, they don't end up depressed and, you know, stop working. The dogs need dog treats occasionally. So that's basically the kind of thinking there. So it follows that basically the solver does the work. If the work is incorrect, the verifier shows it.

Starting point is 01:06:53 it's being incorrect and the whistleblower can then confirm it's incorrect. That then goes back onto the kind of chain, which we can talk to in a moment in terms of being verified on chain, but essentially periodically and also kind of the rate of which is kind of linked to the security of the system, I guess, the verifier will show an error on purpose to the whistleblower, which keeps the whistleblower wanting to be kind of engaged with time. as well. If the whistleblower does find a problem, they play a game, a pinpoint protocol where they narrow down, he whitted down the computation to a single kind of point in the kind of, I guess, you could view it like the Merkel tree of computations for that area of the neural network,

Starting point is 01:07:38 and then that goes to the chain for arbitration. That's the kind of the version of it in kind of a plain way that we originally had. As Ben mentioned earlier, we've advanced on it in a couple of areas after basically closing our seat around and doing more research work. But yeah, that's the verifier and it was still more. So tell me how that fits into the blockchain as a whole. So obviously someone has to build blocks. There has to be, I assume this is some kind of staking network. So there has to be a staking token.

Starting point is 01:08:11 How does all of this fit in with the Jensen protocol? Yeah. essentially it's a vanilla to an extent substrate blockchain. We use the proof of stake, grandpa babe, consensus mechanism, validators, just kind of doing things in the normal way that they will. And all of the parts that Harry described and I described earlier happen off-chain. They're all kind of off-chain participants doing portions of work and kind of being incentivized by the fact that they've staked through a kind of normal staking palette within substrate.

Starting point is 01:08:48 in a smart contract, it could just be submitting a certain amount of tokens, and then there will be rewarded when that work ultimately gets checked. The kind of game theoretic difficulty here is making sure that all is staking, potential slashing amounts, and the reward amounts all add up so that there isn't an incentive somewhere for somebody to either be lazy or to do something that is malicious, essentially. So it gets complicated when you add more participants in. The kind of having the whistleblower there as an additional participant is annoying because it's overcomplicated, but it's crucial for us, given the size of the computations, to have it there to assure the honesty of the verifier, essentially. It's not certain that we'll always have to have that.

Starting point is 01:09:32 We do keep thinking about ways that we can potentially remove the whistleblower. There's certain zero-knowledge-proof techniques that mean that we potentially could, but we don't want to get ahead of ourselves, essentially. So right now, it kind of looks like what's described in the light paper, but we're chipping away at each bit. of it to try and simplify it. We think if you look at the way that other protocols have gone in the past, there's a tendency to launch with a complicated system. And then once you get it out there, realize that you can simplify it. And we're expecting to go through that essentially. We kind of saw the same thing with Pocod on the fisherman mechanism that sort of got removed after the thing

Starting point is 01:10:07 had launched and it was out and live. I'd add one other point there just on the kind of our augmentation of the vanilla kind of substrate chain. There's an issue in the verification system as we originally proposed it and also as it currently looks for us state of the art, whereby if the data which is being used to perform the initial work from the solver is removed or made inaccessible halfway through the verification process, you reach a kind of standoff because if the verifier can't access the data, then there's, it follows that they can't verify. So you need some kind of data availability solution that kind of plugs into it?

Starting point is 01:10:49 Precisely, yes. So we built that in on top of the kind of substrate. So we have a proof of availability. P of A is kind of what we've dubbed it internally layer, which is erasure encoded, et cetera, and basically provides what we couldn't find in the wider kind of storage layer market. If anyone's listening to this who's building in that space and this does exist, I'd honestly be fascinated to see it. But essentially a layer wherein you can lock data for a period of time in a way, which is pinned, unpinnable for that period of time and verified on chain that exists there. You can't do that on our weave?

Starting point is 01:11:33 It's too expensive on Rweave. So Rweave is the answer, but the cost for, if you think about a terabyte of training data, being stored forever on R-Weave, it just doesn't work when the kind of alternative is like, you know, storing it on S-free. So yeah, I should also caveat. It has to be inexpensive. But yes, just on that with R-Weave, the reason we need it is, it's for the data, the training data, but it's also for some of the intermediate, like, proof data. And that doesn't need to be around for very long. It could just be 20 seconds while we go through a certain number of block, block releases or something.

Starting point is 01:12:14 With RWeave, we don't need like 200 years of storage of that thing, but that's what we're paying for essentially. So if somebody has really short term with the guarantees of RWeave, so bringing the price down because it doesn't need to be 200 plus years, that's what we want, basically. We just haven't seen it anywhere. You should, you guys should talk to Rweef. I mean, our weave storage rent, this would be a, you know, this might be a thing.

Starting point is 01:12:40 Yeah, it's like a sort of perma web, a temporary web. A temporary perma web. Correct. So I assume that there's going to be a Jensen token somewhere in this eventually. Tell me about that. Sure. So the Jensen token fundamentally is required for the tech, essentially. Everything we've just described assumes that you have this token built in that can be used

Starting point is 01:13:07 to stake slash provide rewards. et cetera, and also maintain the consensus of the system itself, essentially. So use kind of a small inflation amount to pay out validators and then be used in that game theoretic mechanism to allow us to kind of guarantee that financially rational kind of assumption over the entire system. I think crucially for us, that's what it's for and that's like the only thing it's for. We're very, very deliberate to say it's a technical thing that we need that we will bring it in when we technically need it and not before. We've seen what's happened with with kind of utility

Starting point is 01:13:44 tokens in the past where people have kind of launched them too early and then it's it's a distraction for the team. It's a distraction for everyone. People aren't buying it to use it in the system. They're buying it for other reasons. We don't want any of that ideally. I mean, it's easy to say that. It's hard to kind of see what will happen in practice. But our approach is to essentially delay it as much as possible and then quietly bring it in when it's required to maintain that consensus and pay out the participants with the game theoretic mechanism essentially. It's critical to know it as well that along with some other kind of early movers in the deep learning crypto space, we're very much a minority with respect to the rest of a deep

Starting point is 01:14:25 learning community. There's at least in our experience quite a lot of skepticism about crypto more broadly. I mean, Ben and I, the history's kind of testament to that, you know, we were skeptical. We've, obviously from a technological and kind of ideological point, we like it. We think it's the right way. However, when we initially provide, when the network initially launches, we anticipate that the majority of the deep learning users will pay in fiat, and it will just simply be swapped into tokens. The solvers and the people participating on the supply side of a network will facilitate

Starting point is 01:15:00 with tokens. And we have huge interest from a lot of the kind of old Ethereum one miners who have lots of GPUs who want to attach them. to something. But yeah, it's crucial that there's a kind of, the kind of crypto, the scary crypto words like Tolkien are kind of removed from the end, the end deep learning and machine learning users, which is exciting because to us, this is one of those use cases, which really bridges the two worlds of Web2 and Web3. You know, there's an economic rationale for this existing. There's the technology now to enable it, its existence.

Starting point is 01:15:38 It's now almost like a kind of execution question to a large extent. You know, how do you just, how do you get people comfortable for the idea that they're not using Amazon? And, you know, there's a kind of variable price kind of concept happening here with the token and how do you obfuscate that as much as possible. And crucially, how do you obfuscate it in a way which is decentralized? Because it'd be very easy just to stand up some centralized API front end, which, you know, just, yeah, automatically converts the tokens on a centralized exchange somewhere.

Starting point is 01:16:07 but that brings its own problems. So yeah, I just add that to the pot. So what does the roadmap look like for you guys? There's going to be a test net early next year here. Yep, test net early next year. It won't be incentivized kind of to our chat earlier. And it's more to pick up to kind of do two things. Firstly, to kind of battle test some of the tech that we've been building internally.

Starting point is 01:16:37 to get feedback on the usability of it overall. That proceeds our kind of incentivized test net, which will be, yeah, well, essentially, you'll be able to train models kind of, you know, in anger on it. The rate of which we move is something that we talk about a lot. You know, we could ship something very, very soon, which doesn't really give us any meaningful feedback,

Starting point is 01:17:04 but it kind of looks good because it's like, oh, you kind of shift. something. And we don't want to fall into that trap because, yeah, there's, there's been lots of things which have kind of come and gone for us where there have been like incentives to kind of ship something super early. So even kind of, you know, earlier this year, there was lots of kind of hype around the idea of doing generative NFT art. And we could have kind of provided like an inference solution for that, you know, really quickly. But we decided that it kind of, it kind of buzzed outstep of our principles, you know, it doesn't solve the big problem. It's not really on the way

Starting point is 01:17:36 to solving the big problem. It had lots of other things we have to build. So yeah, I guess what I'm trying to say is we're not like in a kind of immediate rush to release something tomorrow. We'd rather release something which is meaningful, which takes time given how fundamental some of the stuff. It's particularly the zero knowledge stuff, which is, you know, which is pretty involved in terms of time. Cool. Thank you guys. Where can people go to learn more about Jensen? Yeah, so jensen.a.I is your kind primary source. We have a Discord. We don't have a telegram group. The discards where a lot of a chat happens. We're also hiring just now. So if anyone's listening to this and is interested in, you know, building a permissionless deep learning compute protocol, then we're, we're it. And I guess

Starting point is 01:18:20 moreover, next year we should be hosting a zero knowledge machine learning summit. So if anyone's kind of particularly interested in that kind of crossover, there's that. And maybe as a final point, maybe more for traditional kind of deep learning or machine learning people if anyone's listening. We're sponsoring the New Europe's conference in New Orleans next week. So bed and I will both be in Louisiana at the conference, attending the talks, flying the flag for crypto. And yeah, if anyone's there, we're more of an happy to chat. Super cool.

Starting point is 01:18:53 Thank you both for coming on. This was super interesting. We look forward to kind of seeing how this plays out with the TestNet. the maintenance. Fantastic. I appreciate it. Thanks for having a sudden. It's been a pleasure. Yeah, thanks so much for having us on. Really enjoyed it. Great questions as well. Really interesting. Thank you. Thank you for joining us on this week's episode. We release new episodes every week. You can find and subscribe to the show on iTunes, Spotify, YouTube, SoundCloud, or wherever you listen to podcasts. And if you have a Google Home or Alexa device, you can tell it to listen to the latest episode of the Epicenter podcast.

Starting point is 01:19:28 Go to epicenter.tv slash subscribe for a full list of places where you can watch and listen. And while you're there, be sure to sign up for the newsletter, so you get new episodes in your inbox as they're released. If you want to interact with us, guests, or other podcast listeners, you can follow us on Twitter. And please leave us a review on iTunes. It helps people find the show, and we're always happy to read them. So thanks so much, and we look forward to being back next week.

Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Ben Fielding & Harry Grieve: Gensyn – The Deep Learning Compute Protocol

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.