Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Richard Craib: Numerai – The Crowdsourced Predictive Model Hedge Fund

Episode Date: July 14, 2020

Numerai, the "hardest data science tournament on the planet". It's a hedge fund with its own distributed research platform designed specifically for AIs. It's the first of its kind and takes a radical...ly different approach to making market predictions. It's completely crowd-sourced and data scientists around the world compete to create the best predictions and get paid with cryptocurrencies. Users are completely blind to the data and Numerai is blind to the code, and the predictive models users generate. The result is a hedge fund which is market neutral, currency neutral, and geography neutral. It makes all decisions solely on the data.Richard Craib, Founder of Numerai, was first on the show 3 years ago when the project's token, Numeraire, was just launched. Since then they have released a number of additions to the network including Numerai Signals and Erasure Bay, with plenty more in the pipeline.Topics covered in this episode:An overview of what the past 3 years since the last episodeThe Numerai staking modelNumerai datasetsWhat is a market neutral fundNumerai’s current assets and their position in the marketStaking liquid NMRThe latest project - Numerai SignalsNumerai's prediction markets Erasure BayHow the future looks for hedge funding on the blockchainEpisode links:Numerai websiteErasure BayNumerai SignalsNumerai MediumNumerai TwitterRichard Craib TwitterThis episode is hosted by Friederike Ernst & Meher Roy. Show notes and listening options: epicenter.tv/348

Transcript
Discussion (0)
Starting point is 00:00:00 This is Epicenter, episode 348 with guest Richard Crabe. Hi, I'm Sebastian Guicchio and you're listening to Epicenter, the podcast where we interview crypto founders, builders, and thought leaders. On this show, we dive deep to learn how things work at a technical level and we fly high to understand visionary concepts and long-term trends. If you like the show, the best way to support us is to leave a review on Apple Podcasts. If you're on a Mac or iOS device, the easiest way to do that is to go to epicenter.orgs slash Apple. Today our guest is Richard Crabe. Richard is the founder of Numeri, and you may remember that we interviewed him three years ago in June of 2017 shortly after the company had launched. That was episode
Starting point is 00:00:55 191, if you want to go back and listen to it. Numeri is a hedge fund with its own distributed research platform designed specifically for AIs. In fact, it's the first of its kind, and it takes a radically different approach to making market predictions. It's totally crowdsourced, and it incentivizes anonymous data scientists around the world to collaborate and create the best prediction models. Users are totally blind to the data and Numeri is blind to the code or predictive models that users generate. The result is a hedge fund, which is market neutral, currency neutral, geography neutral. It makes all its decisions solely based on the data that is presented to it. And as Richard puts it in the interview, data plus intelligence equals money. If you enjoyed this
Starting point is 00:01:41 conversation, you'll want to stick around afterwards for Federica and Mayer's interview debrief, and you can hear it by becoming an Epicenter Premium subscriber. As a premium subscriber, you'll get access to your own private RSS feed, where you can hear the debrief after every episode, and you get enhanced features like the full episode transcripts and chapters that allow you to easily skip to specific sections of the interview. You'll also get access to exclusive roundtable conversations with Epicenter hosts and bonus content we put out from time to time. We've just made changes to the pricing, and there's now a monthly subscription plan. You can go to premium.competter.tv to learn more and sign up.
Starting point is 00:02:20 And now here's our conversation with Richard Crabe. Hi, my name is Friedrich Ernst, and we're here with Richard Crabe from Numerai. Richard, we've done an episode with you three years ago, so we kind of keep the background short. A lot of the things that we talked about back then still true. Your background is in maths. and then you became a quant and built a machine learning-based fund working on proprietary data. And then your idea was to turn this financial modeling problem into a pure data science and machine learning problem through homomorphic encryption. So basically encryption that keeps the internal structure of the data intact and kind of open it to the wider public of data scientists for them to actually contribute to financial forecasting.
Starting point is 00:03:08 Tell us what has changed in the past three years. Well, I think when I spoke to you guys, we'd maybe just launched Numerare, our own cryptocurrency. So Numerai started in 2015, and we started making payouts in Bitcoin just for convenience. It wasn't really like we were a blockchain company or anything. But there was something about Numeride that was very much like in the spirit of crypto. And when Ethereum did launch, we decided to start paying in Ether. and then we decided around June 2017 to release our own cryptocurrency. And the point of it was to do staking and burning.
Starting point is 00:03:50 And the problem with doing any company like Numeri, where you have to trust that data you're getting from your community is legitimate, it really requires staking. I mean, truly the company couldn't work without it. We needed our data scientists to, when they upload predictions, to put some skin in the game, by staking those predictions with the cryptocurrency. And since we last spoke, that has really grown a lot.
Starting point is 00:04:17 I think maybe when we were speaking, we had like a few thousand dollars staked. And a lot of people barely understood cryptocurrency. Ethereum was very new. I think we announced Numerare when Ethereum was $5. So we were very early and we're lucky enough to have this very technical user base who actually understood how to use these things before other people. but we've grown from a few thousand dollars in stakes to over three million dollars in stakes from the community. That sounds like quite an uptake. Can you give us an idea about how many people are currently placing predictions or giving you model data on a weekly basis?
Starting point is 00:04:59 Yeah, so the whole of Numeri is structured in that way. There's a weekly tournament. People download data, build models and then submit predictions every week. the number that we care about is the number of users who are submitting and staking. So you can just sign up and submit for fun, but we only use the models that are staked. We only want to use the models that the users believe in themselves enough to stake. And we have about 700 weekly stakers. And just those 700 people, some of the top ones are staking almost half a million dollars of Numerare. That's staked and at risk every week. That's what's interesting about Numeri. We're not trying to grow the number of users like a consumer internet company or something like that. In fact, we like it when users join and lose money because they get burned because they're not good enough. So they're automatically dropping out of the system. In some sense, when someone chooses not to continue to stake, that's even a good thing. But over the last eight months or so, that's been the main growth. I think even just eight months ago, we had like $30,000 at stake.
Starting point is 00:06:07 and now it's 100xed. People really believe in their models more and more, and the performance of the whole thing is better and better. Is there some kind of power law in stakers? Some stakers stake a lot, and then it drops off really quickly? Yeah, definitely. If you go to the Numeru.a, if you go to the Numeru.a.ai slash tournament, you can see the leaderboard and you can see all the stakers,
Starting point is 00:06:33 and the top staker has about 10,000 NMR. are at stake, which is about $200,000, but they're allowed to submit multiple different models. So that used in particular has two other models that also have very high stakes. And yeah, it does come down. Some people are staking as little as, you know, $1. But even that is actually a good signal. I mean, what's important is we're not asking people to stake on numerai so that we can make money. There's no way we can ever earn the money that people stake.
Starting point is 00:07:09 The only thing that can happen is it gets burnt or they get rewarded. So they're not playing against us. They're really playing with us to make the meta model, the combination of all the models, do better and better. So just to give some background here, so you've introduced the staking mechanism in order to make it costly for people to actually submit models. that you can't civil attack the system, right? Because otherwise you can just submit like 10,000
Starting point is 00:07:39 different models and just by the virtue of how many models you submit, you can kind of brute force your way into a reward. And basically in order to make that impossible, you have to stake. And then basically the reward that you obtain is also related to the amount you stake, right? That's exactly right. So without the staking, you can never trust the models. And that's in a I find it very surprising. You can have nearly some industries have moved entirely like onto the internet over the last few decades, like travel industry or whatever. You have travel agents online like Expedia or you have any number of industries can move online, especially if they're just about information. And the industry that's almost the most about information is hedge funds.
Starting point is 00:08:28 But why isn't there an internet hedge fund? and the reason is you could never do the incentives correctly because people would sign up and submit whatever they wanted and then they delete their accounts or make a new account if it didn't work or whatever. So the way, the thing you have to get right was how do your normal hedge funds do it? Well, if you're in a normal hedge fund, typically you actually have your own money in the fund. Now, we can't let numerai users have money in our fund and that's not even quite what we would want.
Starting point is 00:09:00 we want them to have a stake in their own predictions. And by doing the staking on the predictions, it achieves that, that sense of, well, now I have something to lose. And it prevents a civil attack and it basically is this huge quality control. If we use the non-staking users, the performance is much worse than the staking users. If you look at the different models that you receive from people who stake or people who don't, own stake, you somehow still have to combine these into a meta model to kind of tell you what stocks to buy and what stocks to sell, right? So has your thinking on this evolved over the
Starting point is 00:09:40 past three years? Because I imagine there's quite a steep learning curve there on how to combine different models that kind of have different sort of assumptions. Yeah, it's a big problem. The first thing we really struggled to beat was can you beat just the simple average of all the users. So every user is predicting on about 5,000 stocks. And if we took the simple average of those users, it would be hard to beat that because that's the whole point of crowd sourcing in a way. You know, if you can average the crowd, you can do better. So it's quite hard to beat averaging. But then what we tried more recently was, well, if people are staking and some people are staking a lot more than others, is that a kind of expression of confidence that we can rely on? Is someone
Starting point is 00:10:28 who's staking 10 times as much as the next guy, does he have a better model? And then we started the stake weighted meta model. So it was your proportion of your stake was your weight in the meta model. And that outperformed averaging. But the sort of third level of this, which we're doing now, which we've just launched, is what if you could kind of do this other layer of machine learning. So you could somehow combine all the models in a very smart way by basically figuring out how they blend best together in a complicated way. And can you have that actually outperform, stake weighted? And it's not that easy to make it outperform. And that's the thing with the stock market data that you're talking about such small edges. So like we're talking about
Starting point is 00:11:19 trying to get a 51% edge over the market. And so if you have a new idea, and it looks like it's 51.5%. That's a very big deal. But the sensitivity is so high that it might actually not be half a percent better. It might be half a percent worse. It's very difficult problem to combine them. But staking is certainly the key thing that makes it possible in the first place. Can it be that a model?
Starting point is 00:11:47 No model can perform equally well in all situations, right? So in theory, you can have a model that performs well in some situations. and not in others, whereas some other model could perform well in more numbers of situations, but its edge might be lower. So how do you weigh how much to pay out which kind of model? In different words, like it isn't maybe always about just performance of a model, right? So there could be more factors. So what more factors are there and how do you weigh these different factors?
Starting point is 00:12:22 These are the kind of key problems with quantitative finance. You can make models and they work brilliantly for 10 years, but they only worked because that whole 10 year period was a bull market. That's sort of the attribution would say, well, the only reason you actually made money was because you would just kind of long the market or something. So figuring out how to be able to determine whether a model is good is also being the big kind of research challenge of the company. if you have a model that if you, every time you go along a certain stock in a certain country, in a certain sector, you're also short a stock in the same country and in the same sector. And you still do well, then that means you really have an edge. You know, we have a lot of people right now, very delusional about their skills in the stock market.
Starting point is 00:13:12 So, you know, you spent the last four years buying tech stocks on Robin Hood and they all went up. But really, if you'd bought the tech industry, you would have done. done better. So you think you're good, but you're not really good. And that's the key way that we filter out models is, are they good in all environments and are they good even when you neutralize them to all known risks and all sectors and countries? And if they're good at that, that's very hard to do by luck. And it's also more likely to be the kind of thing that works in the future. So if you can make money even though you're market neutral, that means you're clearly resistant to the market because you have no market exposure.
Starting point is 00:13:59 For every $100 long you have, you also have $100 short position. So if you can make money like that, that means you're not building a model that's only going to work when the market's going up. How exactly you combine these different models? Because naively it would seem to me if you have different strategies and you mix them up, typically you make them worse. Strategies, ideally, it's something that's self-consistent, and as soon as you kind of mix it up with other strategies,
Starting point is 00:14:27 it's like a bit of this and a bit of that. And there's no really difficult to make sure that these bits actually go together and act as a greater whole. So how much manual labor actually goes into picking strategies that kind of go together or models that kind of go together, and how much do you actually know what assumptions these models are based on because, I mean, people just submit predictions, right? So you don't know what goes into making these predictions.
Starting point is 00:14:54 So how do you actually turn this into a self-consistent meta-model? One thing that's unique about Numeri that people don't really understand unless they've done data science is because Numerize the one who controls the dataset, we control every possible input that is involved in the model. So if there's some thing we do research on and we're like, you know, that feature, tends to make strategies that only work in Japan. So we're not going to even give that feature to the community.
Starting point is 00:15:27 So we're curating the datasets and setting up the problem in a very particular way. That's on the feature side. And then we're also deciding on the target variable. The target variable is this very residualized return. And so what that means is you can't really get paid. And it's also on a specific time horizon. So it's about a one month forward return that you're trying to model. So yes, if we had it much more open and we said you can use any data you want and you can model any horizon you want, you might have someone come up with a model that works extremely well for one day trading in Japan and then interacts very badly with every other model because it's much higher turnover.
Starting point is 00:16:15 And that kind of model basically cannot be discovered or developed on Numeri because the target isn't set up to do that. And users have no idea what data they're looking at because we've obfuscated the data and given it to them in this very structured form. So what that means is because every model is training on the same data, the models are imminently combinable. They're all almost perfectly ready to be averaged together. They all have the same goal because they all use the same data and model the same target. So that's a really important part. It certainly is better for you to have more models that are uncorrelated. And a recent thing we did is something called metamodel contribution.
Starting point is 00:17:05 So you might make a really good model on numerized data, but it's very correlated to models we already have from other. users. So we want to pay you kind of because you've made a good model, but it's not actually additive. If both of you submit the same model, the crowdsourcing part of it doesn't matter. But if you submit an uncorrelated model, that's trained to achieve the same goal, but it's still uncorrelated. That's where the ensembleing really helps. And so we've also started paying people. They cannot stake on their own performance, but stake on how much their model helps our meta model. And And that has been a really important new development at Numeri of the last few months.
Starting point is 00:17:48 And you have people making these really creative and weird models that actually aren't very good by themselves. But when you combine them are extremely helpful to the model. In that scenario, when I'm staking on my own model, in order to decide how much to stake, don't I need to know how the Numerai meta model itself behaves? So that's a really good point. So what's nice about modeling the targets is you have the targets and you know how you can check your own performance. But if you're making a model that's being paid on the contribution, you don't know what you're kind of going to be scored against. But the way we basically deal with that is we just tell people every week your meta model contribution was this.
Starting point is 00:18:33 And they can see like, okay, the meta model is in kind of one place. And they can realize, well, this week I was uncorrelated with the meta. model. And this week, I would have made more if I'd staked on meta model contribution. And so over time, they can kind of learn. So the meta model is in some kind of stable place. And they can easily estimate their contribution after staking a few rounds. So typically people will stake just on their role performance. And then over time, as they kind of become pro users, they might start staking on their metamodel contribution. This is kind of an evolutionary mechanism that's kind of feeds into it. Let's dig down into this a little bit more. So can you tell us more about the
Starting point is 00:19:15 datasets that basically you homomorphically encrypt and give to people. So basically what kind of features, how many features do your data sets have? And how often do you change these and what kind of things are they to give listeners like some idea of what people actually end up working on? We don't say for a reason, you know, we hide it for a reason. If we're we gave away the data in a raw form, people could just kind of run off with it and start their own funds. And they wouldn't be working together on one thing and we wouldn't be able to do all this stuff.
Starting point is 00:19:52 So we don't really talk too much about the data, but it is about, I think it's like about a gigabyte or something. It's about a million rows and the 310 feature columns. It just looks like a million rows with 310 columns and every number is between zero and one. So you don't know what the data means, but we do do a few things. Like we put features into groups where those groups tend to be, have features that are correlated with each other. The kind of data it is, is kind of structured quant data. So the thing is, as much as it seems like there's more and more data available online about stocks, it's actually weirdly not true.
Starting point is 00:20:32 Like it's harder and harder, the barrier to entry to have basic data that every other hedge fund has. is more and more expensive. And so we pay something like half a million dollars a year on data, which isn't very much. We see our edge more in the modeling than the data. Even that is completely prohibitive to a normal person who's just trying to build a model. So by buying it and doing the curation of the data and choosing the features and setting up the problem, we're kind of taking out that side of things, taking out the finance part of it, and just allowing our community to fit that dataset,
Starting point is 00:21:09 which is the fun part, kind of the machine learning part. So in a way that's kind of like fitting like a 310 dimensional scatter plot, right, in like the simplest terms. Exactly. If there were just two features, you could just plot it nicely. But yeah, it's a 310 dimensional space and you have to find a curve, basically, that fits that space to the targets. Yeah, that's super interesting. So do you have any dummy features in there? I've worked in finance
Starting point is 00:21:40 for a long time myself and I know what kind of features you would probably base models on and I know what happens after the fact. I can kind of look at the data you gave me a month ago and then see what happened afterwards. So would I be able to deduct your encryption mechanism or is that difficult for other reasons or you enter dummy data so that I'm kind of sidetracked? Yeah, so we recently did have this genius Japanese guy write a blog post where he seems to have a lot of quantitative finance experience already. And he started to make some claims about some of our features. He thinks he knows that one of the features is momentum, which is a really common feature to have in any quant data set. And so he wrote this kind of like case for why he thinks this feature group is momentum.
Starting point is 00:22:35 And I don't want to say whether or not he's right, but it was interesting to see that some people are thinking about that. You might be able to make a sort of, yeah, estimate like that. Like I think this looks a little bit like momentum, but it'd be very hard for you to fully map row for row, feature for column for column, you know, what that data is because it's basically impossible. Like the obfuscation will make that impossible for you to do. But people are still thinking about that. And it is interesting when people do that. So one reason it's good to obfuscate the data is to protect, you know, the data from leaking and going into other places or people using our data but not actually submitting to us.
Starting point is 00:23:16 But the other maybe more important thing is like it actually stops people from imposing their own human ideas. And if people do think this one feature group is momentum features and they read in the newspaper that momentum going to struggle and their finance professor, they remember their finance professor told them momentum is a bad feature for stocks to use in the long run or in a bear market. Suddenly they'll impose all of these kind of human ideas onto the data. And they might say, I'm going to drop that whole group from my modeling. And that we really don't want people to do. We do want people to use machine learning. And in its real form, it is not about hand-picking things. It is about, you know, the answer is in the data. And you just have to discover the model. That's the best fit for the data.
Starting point is 00:24:12 It's not your job to impose your human biases on us. That's really interesting. The story this reminds me of is, very recently you have these chess engines that are based built using reinforcement learning. And it turns out that these new chess agents that are built using reinforcement learning are able to defeat the older computer programs that were built by humans to play chess. And one of the key differences between this. So these newer ones are something like alpha zero and older one is something like stockfish. So programmers have been building stockfish for 20 years. And it's like this perfect chess program that like beats all humans. But then this new reinforcement learning algorithm comes along and it ends up beating stockfish.
Starting point is 00:24:54 But not only beating stockfish, but playing way more creative. creatively than stockfish. And one of the key differences between these two chess playing algorithms is when stockfish was designed, it's humans designed stockfish. And intrinsically, like humans gave stockfish some of our own value judgments. Like, for example, a bishop is slightly more valuable than a knight. A queen is at least two times more valuable than a knight. So stockfish was built on these, like, human ideas of how valuable the various pieces on the chess
Starting point is 00:25:26 board are and like stockfish is trying to optimize this position given some measure of how valuable these pieces are with respect to each other whereas these new reinforcement learning algorithms they're like you don't tell the machine how valuable a piece is vis-a-vis another piece at all like let the machine discover by itself whatever things and don't even tell the rules of the game yeah don't even tell it like what things are important what teachers or what pieces are important. And then it turns out that like once the machine learns with that blank slate, it actually performs even better, plays even more creatively than machine that is given those features. So it feels very similar in that regard. Yeah, it really is. And it's such an
Starting point is 00:26:14 important like, yeah, almost philosophical change in the last few years where you had the same thing with speech recognition where these hand-built features. There was a huge team of huge field of these people who were kind of really good at this, making features from audio and doing speech recognition. And then 100% of that knowledge is not needed anymore because the neural nets got so good and the algorithms got so good. And you just needed more data and you could outperform. And I think that will definitely be the story of finance. Yeah, there's some sense where the old guard of finance might say, well, you know, you really have to know a few things about the real economy and you have to know a few things about inflation and the macro stuff if you want to
Starting point is 00:26:59 be a good trader. But it's all not true. It's all going to still come down to a mathematical problem. Yeah, I think we are showing that to the extreme. Here's how numerize. 100% of our modelers have never seen the data, right? They're just modeling the obfuscated data. And 100% of the models we use in trading, we have never seen the code that created those. models. So the users are blind to the data and we are blind to the code that they're writing. And somehow that works. Like, that's really taking all this finance stuff out of the problem. I totally see that you don't need to know what exactly it is that a company that a stock belongs to does and so on. But sometimes our world fundamentally changes. An algorithm that is trained on
Starting point is 00:27:51 back data has no way of knowing that. Look at the question. Corona crisis. So basically, had I told you at the beginning of this year, there's going to be a huge pandemic. And you might have, as a critically thinking human, you might have been able to infer possible consequences. So things like airline stocks are probably going to go down, pharma companies stocks may go up, say at home stocks go up and so on. There's no way that a machine learning algorithm that's trained on the past would have known that, right? That's exactly right. We never tried to model those things. There's never been a time where we've had any net exposure to any particular industry. So whenever we were along an airline in the first place, we were also short one. We never had any exposure to any sector or any exposure to any country or any exposure to any economy, basically. So I know that sounds kind of crazy. Like if you are, long and short in all these things. You're not really playing the game people think. Like, we're never
Starting point is 00:28:57 like trying to predict what stocks will go up. I don't know if that like that seems like I'm going to say this crazy thing's day, but like that's not ever what we're trying to do. We're really looking at relative positioning of things in the market and like for this stock, for its features, it is undervalued relative to this one. That that doesn't say anything about what their absolute value should be. you know, if we're buying Tesla and maybe everyone thinks Tesla is overvalued now, it doesn't really matter because we don't care about the overall levels. And that's what's very different about market neutral funds versus funds you might, yeah, or like human-based, just like, I want to buy Snapchat because I believe in the future
Starting point is 00:29:36 of the social media industry or something. It's like that kind of judgment is a kind of for the world of humans in a way. But the relative judgment based on all the data is more the domain of machines. So for people who are not well-versed in the finance universe, what exactly is a market-neutral fund? So basically, how exactly do you construct a set of stock options that kind of have that property? Yeah. Well, the market-neutral part means you don't have stock market exposure. Now, how can you not have stock market exposure if you're trading stocks?
Starting point is 00:30:12 So we trade with about four times leverage. So we have, for every $100, we'll have $200 long in the stock market and $200 short. So if the stock market as a whole fell by 50%, and we owned random stocks in our longs and random stocks in our shorts, how much will we fall by? 0%. Because we were short, an equal dollar amount. So that's the key thing. It should make sense to people that if you buy stocks at random and you go long and short them at random, your outcome will be that you'll make no money and maybe pay money in trading costs.
Starting point is 00:31:00 And so that's the market neutral part. That typically is what it means like your dollar neutral to the market exposure. And that's why, you know, a hedge fund, if the market's down 40% or 50%, if it's a real market neutral hedge fund, you wouldn't be able to predict how much that fund is down because they might have done well because half their portfolio fell by 50%, but they were short that half. And the other half where they were long also fell back. So the usually market neutral means more than that now. So you might have something that is dollar neutral, but is actually exposed to a certain sector.
Starting point is 00:31:37 So 100% of your longs, your long tech, but you're short retail. Okay, all your shorts are retail companies and all your longs are tech. You're not that smart if you, if that portfolio works. It'll only work because of your sector exposure, not because of your stock picking ability. But if you then say, well, I want to be market neutral and sector neutral, and you go even further, and country neutral, and momentum neutral, and value neutral, and volatility neutral, and on and on and on, your whole portfolio is perfectly balanced. the number of longs and shorts you have is always balanced, then in some sense there's very little place for your portfolio to go
Starting point is 00:32:20 because you're neutral to everything. But if you can still make money when you're neutral to everything, that's really powerful because if someone looks at your portfolio, they'd be like, I wonder if you made money just because you got lucky on your sector bet. Oh, you didn't have any sector bet.
Starting point is 00:32:38 You're neutral to sectors. Oh, I wonder if you made money just because the yen crashed. Nope, you had no yen exposure the entire time. So if you had no exposure to any risks and you're still making money, that's the dream of the market-neutral hedge fund. That's absolutely fascinating. So let's talk about your results with this. So can you tell us how much assets under management you currently have? Not really.
Starting point is 00:33:03 We can't really talk about the results. So, yeah, we are a small fund. The way we've done Numeri is completely different. different to other hedge funds. We've raised venture capital for the company. And we've even sold tokens, not in an ICO, but to talk professional investors, crypto funds. But usually hedge funds start by raising a bunch of money into their fund. And there's a rule with the SEC. I think if you have over 150 million, you have to announce that to the SEC and you get regulated in a special way. So it's almost been good for us to stay well below that while we were doing the sort of R&D.
Starting point is 00:33:46 There's so many new things we're doing simultaneously at Numeri. No one's ever done crowdsourcing like this. Even just the machine learning part is unique enough. Then we have the blockchain stuff on top of that. So there was a lot of time that we needed to get things right. So yeah, we don't talk about the AUM, but you can infer the fact that we haven't filed that. we're probably quite a lot less than 150 million. And then returns is also something the SEC doesn't like hedge funds to be talking about
Starting point is 00:34:15 because, you know, we don't want to be seen to be promoting our fund. Our fund is not even investable by individuals. It's really like a institutional grade fund. So there's a couple of individuals in it, but they're just like very big investors in the hedge fund space or something like that. I would have hoped to hear some numbers, but I understand that you can't. Can you tell us how you've done with respect to other hedge funds? So are you hedge fund neutral or are you up or down? Yeah, I mean, we can say like, so we did, I did talk before about market neutral,
Starting point is 00:34:55 spoken to other press about market neutral funds during this pandemic. And some funds did extremely badly, even though they market themselves as market neutral. In fact, Renaissance have a market neutral fund. Renaissance is maybe the most, yeah, successful hedge fund. And it's down 20%. And it's like, why are you down 20%? Like you're supposed to be market neutral. And so things can go wrong in a financial crisis where you realize, even if you're market neutral,
Starting point is 00:35:27 you were holding a lot of the stuff that other hedge funds were holding, and your strategy wasn't very differentiated, and everyone was pulling their money out of the good stuff, the good stocks, and that liquidity event made a lot of liquidity crisis made a lot of funds lose money. But we were also pandemic neutral during this time. So we actually didn't have, we don't hold a lot of the same stocks that other hedge funds hold. So in March of 2020, when this crisis was particularly bad, we were fine. We did a lot better than our peers.
Starting point is 00:36:06 Cool. Being fine in these times is a, yeah, it's a very good thing. So can you tell us what percentage of the liquid NMR supply is staked on a regular basis? Yeah, right now, so it's maybe, yeah, if it's 3.3 million, depending on how you look at the circulating supply, that's about 5% of the total that's out there. And that's a good amount because Numerai has held back a lot of the, because we never did an ICO, there wasn't like this time a long time ago where all of our tokens were bought by speculators who would never use Numeri. We were much more careful.
Starting point is 00:36:48 We gave out a small amount to our users and slowly pay them from our reserves. And because of that, Numeri has a lot of the tokens, like six and a half million of them or something out of 11 million. And the users have a lot. And the sort of speculators are not, yeah, don't have as much as, and don't command the price as much as the actual usage does. And I really like that. Like I really like looking at our, if you look at Numerare, the amount of volume on Uniswap is the kind of interesting thing to look at versus the amount of volume. on centralized exchanges because a lot of our users or a lot of people who are really using a defy app they need 50 tokens in order to use numerai or something will use uniswap not a centralized
Starting point is 00:37:38 exchange but the but the speculators will use a centralized exchange so if a lot of our volume is on uniswap that means like we actually have a lot of organic users which is more than you can say of most crypto projects and I think that's an important important thing. And I think people are looking at that now and they're like, well, do you want to make a bet on something that's actually being used or do you want to just like be part of the speculator clause? I have kind of a general and abstract question. So you have this numerine model where you publish an obfuscated data set. People develop models against it using their, the models are private to you using their models, they submit predictions to you. You combine those
Starting point is 00:38:26 predictions in some way and that method keeps evolving and then using that combination of predictions you're doing a trade and trying to make returns on the stock market. How general is an architecture like that? For example, could I basically record my entire extreme of senses, side vision, sound everything I see, everything I hear, officated, that's the data set, send it to the numeric community and say, hey, my objective is to make the most money over the next five years
Starting point is 00:39:03 and somehow get models on what I should do. And like those models are somehow then combined into what action actually I should take. So maybe that's the individual level problem. But an organization can also think like that. So there's an organization which, which has like 20 plants and it manufactures some things and there's this entire data set. It has its entire data set in SAP.
Starting point is 00:39:29 Somehow extracts it all, offuscates it, publishes it. And then a distributed community is actually analyzing what sort of manufacturing decisions this entity must take in order to get some objective. So is this approach general like that? Does it specifically work only for the stock market problem? Yeah, I think it is not general. I think the stock market is uniquely positioned for something like Numeri. And the reason for that is the sort of productionizing of the whole thing.
Starting point is 00:40:10 So there are websites, data science competition websites, where they give out a bunch of like health care data. And then it's like figure out, you know, based on these x-rays, whether the person has cancer or something. And people attack the problem and they find a solution. And the crowd might find a good solution. But it's the productionizing of that model where it all, the crowd sourcing just part of it stops. It's like, okay, well, thanks for the competition.
Starting point is 00:40:40 We're just going to hire the best person from this who made the best model. And then he's going to develop it in and deploy it into a system. because imagine you are a doctor and you had this x-ray analyzer and if in production time you're going to want to have the model running locally you're not going to want to say okay i'm running an x-ray and i'm going to send this particular example to the crowd to get their feedback you actually need to have the model locally and that's the unique thing with numurai we don't really need to have the models locally because we can just ask users to submit predictions every time we want to trade. And that works out a lot better.
Starting point is 00:41:27 So productionizing it and in a real way where we're using the user predictions on the real stock market is something you can't really do in other industries as well. The other part, yeah, well, that's, yeah, I mean, that is kind of the main, that is kind of the main thing. But the other part is the accuracy. So as it happens, if you were to be, imagine, imagine. you are 85% correct on whether someone has cancer based on an x-ray. If you can get that to 86 or 87, that's actually not that valuable. I know that sounds like kind of crazy, but it's not valuable
Starting point is 00:42:01 to the medical, like if you could tell them that, they would be like, oh, that's a small increase. Our biggest problem is actually implementing this stuff and educating the doctors about this stuff. Our problem isn't the accuracy. But the stock market, to go from a 51% accurate to 53 is extremely valuable and worth the crowdsourcing effort to have that small increase in accuracy. So yeah, I do think it's kind of like, in fact, when I was pitching VCs numerai back in 2015, they're like, you know what you should do? You should just say you're going to be a technology company and you're going to do this for all industries. And that way you'll get a higher valuation than if you're just doing a hedge fund.
Starting point is 00:42:48 And I was like, no, it doesn't make any sense to do this for anything except the hedge fund. So I still think that's true. Cool. So one of the most recent additions and improvements to numerize, numerize signals. Can you tell us about what that is? Yeah. So you almost have to forget everything I just told you. Because numerize signals is sort of a different way of doing this.
Starting point is 00:43:13 So on numerai, we give out a very pre-processed dataset. and we're crowdsourcing the intelligence on the dataset. But how are we going to get more data? So we could go out and buy more data and add it to the dataset we give to users. Or we could tell our users, well, you know what? If you have a data set or a signal on stocks, you can also just send that up to us directly. And that's what numerized signals is. It's really if you already have a model on the stock market.
Starting point is 00:43:47 and you've used some other data source to build that model. And you already know, I want to predict Apple's going up, Google's going up. You want to predict on the actual stocks, and you don't want to look at our obfuscated data. We want to be able to crowdsource the data part of Numeri 2. The crowdsourcing of the models, we've got a huge, you know, maybe the second biggest state science tournament on the planet after Kaggle. And we've got all the staking there. but to get to the next level,
Starting point is 00:44:19 it's always a function of data plus intelligence equals money for hedge funds. And so if we got the intelligence part, perfect, but the data is like just normal data that other people probably have, how do we get the data part like really good? And we wanted to do it in a way that also leverages staking and the other things we've built.
Starting point is 00:44:40 So you're going to be able to upload. Right now you can do it already on signals. Numru.com. you can upload predictions on stocks. In a CSV, you still have to predict like 5,000 stocks. It's still very quanty. It's not like if you think Snapchat's going to go up or something like that, that you can use numerize signals.
Starting point is 00:44:57 It's not for you. It's really for quants who can make predictions on thousands of stocks. And when you upload predictions, you're scored in a way that's not, again, it's not like you make money if the stocks go up. It's not quite like that. It's much more complicated. But it also will have the staking. And so we're about to launch staking for numerize signals where that is how we will know whether we can trust the data.
Starting point is 00:45:22 So the reason you couldn't do, the reason numerai didn't start with this was what's great about numerize is we can control the data part of it. And so we can trust the quality even more. But now that we have staking, maybe we can trust things a little bit that come from numerized signals. Maybe we can trust that no one in their right mind would stake $10,000 on this if they didn't believe it was good data. And we can start to put some of our meta model could be coming from numerai signals as well as numerai normal. And so in that way, we're crowdsourcing both the best data scientists who don't have any data and the best data providers. Does this in a way kind of question the original model of numeric? though. Because basically, I mean, you wanted to target people who have no idea about stocks, right? So basically, you have no financial market experience, but I happen to be very good data scientists. And now you're kind of asking people to give you their best take of what's going to happen with the market. Wouldn't it be easier for them to just go play the market themselves? Why would they actually use you guys?
Starting point is 00:46:36 very good question they might have a model that's not that good by itself but will help us so imagine they had imagine let's say they don't know they still don't know anything about finance but they have scraped a bunch of data from twitter to do like sentiment analysis like how many times are people talking about google on Twitter and they make a little signal out of that where the high numbers are the things with high sentiment and the low numbers of things with low sentiment according to their data. Now, they don't want to make a hedge fund out of this one signal. They don't know anything about hedge funds and they might not make any money on just the raw signal by itself. Let's say that it makes 3% a year that signal by itself. But when combined with numerize data,
Starting point is 00:47:33 because we don't use any sentiment data, it turns off. are say 20% a year return into 25% return. So by itself, they wouldn't want to make a hedge fund out of that signal, but they would prefer to license that signal to a hedge fund that can actually implement it. Remember, the hedge fund industry is very hard to get into. No one's going to let you trade 5,000 stocks on four times leverage and trade swaps in South Korea and, you know, without, without you. having like serious backing and a proper prime broker and things like that. So I think it's a way to
Starting point is 00:48:13 access that that kind of data that by itself might not be valuable, but on numeria would be. Super interesting. So we'd like to move on to erasure. So erasure is this protocol that's being developed by by your organization? So what is it? And my follow-up question there would be, is it different from a prediction market? Yeah, so the protocol that powers the staking on Numeri, we were one of the first things to be doing staking. People know about staking maybe through proof of stake, but now there are many other things that are using staking.
Starting point is 00:48:54 But that was the thing we had to build. So we built some smart contracts that powered Numeri specifically. But then when we're building kind of version two of it and upgrading those smart contracts. We're like, you know what, we might as well build this really well, such that other developers could use the protocol. And maybe other applications would start that use numerare in some new way, some kind of crazy ideas.
Starting point is 00:49:21 Why can't you stake your tweets to have them show up higher on people's feeds because you've got a stake? Or you can grief Donald Trump's tweets if you're unhappy with something he said. and like maybe you could have staking be all over web two. So we were thinking about these ideas like two years ago and decided to build the next version of the smart contracts that power staking on numerite in a general way so that other people could build applications.
Starting point is 00:49:52 And that's what Erasure is. And we also decided to build another application on top of the protocol to demonstrate this. And that's called Erasure Bay. And Erasure Bay is like, it's kind of like a Twitter bot, but it's, but you sign up and you can make a request and put a stake on it so that people can actually trust. You can say, I'm looking for, I don't know, one of the things I've posted on was I'm looking for a Vitalik Buteran's home address. And I'm putting a $500 stake to prove that I'm serious, that I really want this. And you can check in the blockchain that I'm really committed to it.
Starting point is 00:50:32 And also, if you provide an address that's wrong, I get to burn your stake. So someone actually did give me Vitalik's home address on Eresia Bay. But he basically gave me, he said his Vitalik's home address isvatelic.eath. And I thought that was really funny, but it wasn't what I was going for, and I didn't want to pay $500 for that. So I had the right to burn all of his stake. So I did. And that small dynamic, I think, of staking and griefing, kind of peer-to-peer staking and griefing,
Starting point is 00:51:14 I think could be used for many things. And so that's why we built erasure and built erasure Bay. It's all very helpful for numerai. We're going to use erasure, obviously, for numerite signals. We're going to use it for numerai. We've got Erasure Bay using it. And other developers are interested in using it. And I think in the future, there will be, especially when this like layer scalability stuff's fixed, I think you might have websites like one of them is like stack overflow.
Starting point is 00:51:40 Why can't you just put like a little stake reward for someone who provides you the answer to your stack overflow question or grief someone who gives you, you know, malicious code or something? I think the internet, the whole internet could be improved just like the quality of predictions on Numeri were improved with staking. So what happens if you give me a good answer and I decide that the answer somehow isn't good enough? Is there like an arbiter that kind of decides who of us is actually in the right? Or can you then just grieve me back? Or what's going to happen? Yeah, there's no arbiter. So you decide, were you happy with it or not?
Starting point is 00:52:22 And so, but the question is, why would you do it? Why would you grieve the person if you, so what happened with, my stake with the Vatollic's home address, it was like, I had to pay a little bit of money to do the griefing. So I wouldn't do that maliciously because it's hurting me, but I would do it to build a reputation on ERAA that if you mess with me and you give me bad data, I'm going to burn you. And just like Numeri, there is no arbiter saying, except for numerai saying whether we liked your predictions or not, we just decide. And we said, we scored you. No one can check if we're scoring correctly. But they know it's in our interests to score correctly.
Starting point is 00:53:14 They know we want the best models. And so the same on Eresia Bay. If you're, if you're asking people for data, why would you be malicious? And if you look at the real data on Eresia, people are asking about this. Maybe people will do things and maybe need an arbiter. And I think all that's, like such a distraction because on Eurasia about half the requests get fulfilled and something like one or one or two percent get grieved so it clearly is working and the griefing is a small minority uh and even when people are griefed it's like because they didn't quite like they asked for some 10 videos and they got five videos or something and they grieved them a little bit because they were still happy with so it's like yeah i think it's enough
Starting point is 00:53:58 that peer-to-peer ability to torch all the stakes that you both have can actually be enough economic tension to get the right outcome. But from a game theoretic point of view, reputation systems are incredibly difficult to design, right? And because you could just ask for 10 things and then kind of answer this with a second account that you have. And basically it would look like you have a fantastic track record. and then the next person who comes along,
Starting point is 00:54:29 you can just be a total jerk to just because you feel like it, right? Does this ever happen? Has this come up? Well, the other thing, we are, you know, leveraging Twitter. So we are using that Web 2 reputation graph that's already been built. When I asked for Vitalik's home address, people knew it was me because I posted it on Twitter. And people knew, people know I have.
Starting point is 00:54:57 have a lot of NMR more than anybody. And so they know I don't mind griefing. And so bringing in that part helps a lot. So it depends on your threat model, kind of. Like if you're building proof of stake for Ethereum, now you have a serious threat model because you have massively wealthy, bad actors who could influence that system. And if they influence it, it hurts everybody. But on Arachor it's like, it'll only be that period, it only be that one relationship. It doesn't like break the whole system. Like if you broke proof of stake in Ethereum or something.
Starting point is 00:55:37 And then it's also, yeah, kind of lower stakes. And it's also got outside a reputation where you're not really anonymous. So I think people should be more open-minded to, I know I'm like my backgrounds in mathematics and, you know, I study machine learning and game theory and all these things. But I'm actually like, I also think people should lean on the intuitive side of things too. The game theory of bit torrent doesn't quite make sense, but people use bit torrent and it works. And the game theory of all social media doesn't quite
Starting point is 00:56:11 make mathematical sense, but people still click the like button billions of times a day. They are losing, it's costing you time to click that and you're getting nothing from it, kind of. It's like, I don't know. There's a lot of things you can say about that, but yeah, I don't think people should be trying to write mathematical proofs for everything. And it always depends on the model and the emergent behavior on the system. Yeah. Could you give us some practical examples of how people have used erasure? Yeah.
Starting point is 00:56:47 So there have been some really weird things. Some of them made by me, but others as well. So one guy actually asked for lung scans of people with COVID-19 early in the crisis. We launched ERABA pretty much in the beginning of the coronavirus and asked for a bunch of lung scans so he could look at the, there's a special term of art for the kind of damage you can have in lungs from coronavirus. And he asked for these scans, and it was a fulfilled request. We can't see the data. we're not in the middle of the transaction.
Starting point is 00:57:24 It's just a peer-to-peer thing. But he got these lung scans. And then another person asked for videos of Jeffrey Epstein, full deposition videos of Jeffrey Epstein. And he didn't actually get what he wanted, but someone replied who was a lawyer on that case and said, the video that you posted actually is the full video. It looks like it cuts out.
Starting point is 00:57:52 there wasn't much left after that. So he sort of like found out this sort of like interesting information about Jeffrey Epstein and even released and got got a list of all the lawyers who involved in that case. Something I did that was sort of like civil disobedience was I asked someone to dig out the, they filled the skate park in Venice with sand and there were these drone footage of them just filling the skate park with sand trying to prevent people from skating because of coronavirus. And I thought that was like excessive. So I put up a stake for someone to dig, dig out the sand and send me a video of them
Starting point is 00:58:31 digging the sand out and saying numerali or die while they did it. And that video was sent over Eurasia Bay. The guy got like a $400 reward and I released the video to the public. And I just think they are, yeah, what's interesting about that is, is I don't think you can have these kinds of transactions in any other way. I don't think that without the staking part, you can make it work. Because if it's going to take some time and cost for that other person to go and do that thing or get that data, they want to know that the stake really is there.
Starting point is 00:59:10 And it's not the person just saying, because I could just tweet, please give me Vatolix address, right? But somehow, without the stake, it's not like a legitimate thing. they might not, I might not end up paying them or whatever. So setting it up in this like staking way allows for different applications. So I think it's kind of interesting. I think like, yeah, ERABA's like followers on Twitter keep going up, kind of been doubling quite frequently. And the requests are more and more interesting. And it's just like this really simple use of the protocol to kind of get people thinking about how you could use it in other ways. Can you put a number on how much the protocol is being used? So basically, number of asks or
Starting point is 00:59:56 NMR staked or any of those numbers? Yeah, there's only, I think, yeah, there's like 1,500 Twitter followers on the account. And there's been about, there's been like a few hundred requests. Some of the requests are stakes of just $10. Like I'm looking for a list of employees who are recently fired by Airbnb when they cut their stuff. Little things like that for like $10. Yeah. So I think it's like a few hundred requests, very few griefs. And probably right now, there's probably like $6,000 or so of stakes on it. So it's a very small part of the, you know, Numerai has obviously got $3.5 million of stakes. So it's still very early and it's just a small part of the protocol.
Starting point is 01:00:49 But it does make people think and I think there will be other applications of it soon. So yeah, we're working on some new ways to make it even easier for new developers to integrate staking on their website without knowing any solidity code. And I think that could be cool as well. Cool. So listeners, if you're looking to make an easy 75 bucks, there's currently an open ask on Eurasia Bay. And it only asks for an original equity investment idea, must be longer than 500 words with a detailed valuation methodology and analysis. So $75 and there you go. That sounds good.
Starting point is 01:01:31 Yeah, I think that's the kind of thing. I mean, that's a lot of work to ask someone to do, I guess. Yeah, it's not really, yeah, I can't see that. I can imagine someone asking for that kind of thing on the internet, but I can't really imagine it working quite as well without the stating. Can we talk about the future? So as I understand it, you are currently still buying and selling real stocks, right? So basically there's representations of stocks on the blockchain now,
Starting point is 01:02:00 so things like Yuma and similar products. How do you feel about these? So do you think it's time to move away from, you know, the legacy word or do you think are you quite comfortable in that? Yeah, I think it'll take a very long time for it to be to be that all the stocks are on the blockchain or something like that. There's so many regulations in every different country and stuff that these things violates nearly all of them. So it'll take so long to change that. But, you know, ultimately we don't, the stock market is also, it's so messy. Like there's so many little things.
Starting point is 01:02:38 you need to know. Like you bought this Korean equity on swap and it had a split and then it paid a dividend and then it went bankrupt or whatever. And you have to like deal with this. Like we have to have like a back office of people like thinking about these sort of real things that are happening in the world with these companies. But if you could really easily just, you know, get what you really want, which is the, you basically do want just like some derivative that just maps the price. If you could get that exposure without being on the real stock market, I think people would do it, but I don't see it being possible to do properly because it requires so many things, including, you know, Oracle problem and other things like that.
Starting point is 01:03:28 So yeah, we're not against it. And I do think more things will be tokenized and stuff. but I don't think it'll happen in a short time, and I think it'll be mainly a regulatory problem that needs to be overcome. And how do you see the threat from the legacy world the other way around? So basically I listened to the podcast that you did three years ago again, and you said that AI wasn't fully integrated into traditional quant trading. Has that changed? Because machine learning and artificial intelligence has come such a long way,
Starting point is 01:04:02 And if you look at the really cool things that, you know, the Googles and Facebooks and so on of the world are currently doing, why has this not, has this penetrated quant trading? And if not, why not? Yeah, it's a stranger thing. I think it definitely has the, you know, more and more, especially in the last three years, nearly every hedge fund is talking about machine learning. And you never quite know, you know, how much. they're using it because they're hiding everything.
Starting point is 01:04:36 So, you know, you can kind of check how good Google's machine learning is by using Google, but you can't check how good hedge funders at machine learning or whether they're really using it. And there are also so many parts in the process you could use it. Like you could use it to create features or you could use it just in the optimization part of your portfolio construction, but not use it in the deriving alpha part. there's a lot of people who are saying they're using it not really using it I think there's a big difference between the funds that are like ours where we really started with it like that's our whole idea is is send around it versus people just tacking it on
Starting point is 01:05:18 or making a new product that's like this is our ML fund and it just like uses a random forest or something like that like I don't think that's like as important as having the machine learning be like part of your DNA but yeah there was a time a couple of of years ago when I was playing poker at the Math for America poker tournament, this like famous tournament in New York City where Jim Simons does the founder of Renaissance. And I was playing poker with them and I was playing at the table with the CEO of Renaissance, Peter Brown. And he was making fun of me about Bitcoin and stuff. But I also asked him, you know, how much machine learning, what do you think the impact of machine learning will be in finance? And he said,
Starting point is 01:06:04 I think I'll have a really big impact in things like self-driving cars, but I don't think it'll have a big impact in finance. And I thought that was a kind of a strange thing to say from the CEO of Renaissance, because I know he knows a lot of machine learning. And I kind of know that, but it's always unclear maybe what he meant. But yeah, there's something very different about modeling, video data and images and things where the neural nets have really dominated computer vision. Like the accuracy is so much better, but maybe in financial applications, he doesn't see the edge as being so significant that you have to care about machine learning. I mean, I know Renaissance use a lot of simple linear models, but they get all the other stuff
Starting point is 01:06:51 perfectly right. The data is perfectly right. The execution is perfectly right. And maybe you don't need the models to be that good. if you can have data no one else has for 30 years or whatever. So there's a lot of, yeah, there's a lot of interactions. But I do think it will continue to be a big deal, at least in the story of finance
Starting point is 01:07:12 and companies will continue to say they're doing machine learning and probably will use it more and more. So you have this hypothesis that, you know, like these machine learning models will be important. to hedge funds at least and finance broadly. Let's say like that assumption is right and it plays out over 20 years or 25 years. And so in the future there's like the hedge funds using more of these models. What improvement does it do to the market?
Starting point is 01:07:46 How does it improve either the forecasting abilities of the market or the efficiency of the market? Where will society reap its benefit? Yeah, I think that's a very good question. Part of what I like about Numeri, so maybe as I take a step back, like if you have a crypto project, like let's say you have a crypto company that's like a decentralized exchange
Starting point is 01:08:12 and they have lots of people using the thing, but every single token on the decentralized exchange, so maybe the decentralized exchange has usage, but every single token that's being traded on it doesn't have any usage. So the whole thing isn't really having any impact on the real world. And people don't like going all the way to like the final impact on the world question. But hedge funds, I really think, have a very powerful, positive impact on the world. And I know that's not really what you can read in the media.
Starting point is 01:08:46 People hate hedge fund managers and stuff. But if you were to rerun U.S. economic, history without stock markets, without hedge funds, without, you don't get Tesla. You don't get a venture capital industry without hedge funds and stock markets. So I think that's the first thing. It is very important. But how does it look? So I think what's kind of indicative is if you do look at the data in the past, if you look
Starting point is 01:09:15 at the data in the 90s and you can look at these patents in the data and you can see, man, the market was really badly priced at this time. And that's very bad for society. Like imagine there is a company that actually is good, but its price in the market is extremely low. That means employees don't want to join that company. That means banks don't want to lend to that company. And a hedge fund stepping in, buying up shares, moving the price up,
Starting point is 01:09:47 is extremely valuable. it's like invisible. You know, you wouldn't really know that it was actually hedge funds that saved Tesla or something because they went long at just the right time or whatever. That's an important way to like think about it, is that we, in the past, there was a lot of inefficiencies and we fixed them. And now there's less inefficiencies. But there's still a lot of very nonlinear inefficiencies maybe. And now machine learning is exposing those.
Starting point is 01:10:18 And it's saying, well, this isn't quite right. Under this model, with this data, things look very inefficient to us, say. And having those be corrected over time is super beneficial. I mean, the aspect that a hedge fund produces societal good, it's not obvious, but it's also important because in some sense, prices result in decision making. If you don't have a pricing structure in an economy, you won't be able to make decisions. For example, I want to build a road from here to New York and Saigo and some farmers say, this is the price for my land. Because the farmer is able to express a price, there is a decision on whether to do the project
Starting point is 01:11:05 or not. So pricing always translates into decisions. And if you don't have pricing information, decision making will be poor. And as a result, resource allocation will be poor. And so what hedge funds and ultimately the market are doing is trying to generate as accurate price information as possible. And that is the societal utility of it? And is it just that like machine learning feeding into price discovery is basically improving price discovery in some way? It has to.
Starting point is 01:11:38 That's why you're able to make money. But is it possible to quantify exactly in what way it is going to improve price discovery? kind of, is that possible or it's not predictable? Yeah, it is. It's not, there's a new trend in like finance, like, I think it's called ESG funds or something where it's their, their fund has a mandate to, they have to invest in like nice companies and a lot of invest in tobacco or defense or they have to invest in like, I don't know, you know, nice companies that are good for the world. And like, but yeah, we don't really look, yeah, I think, and so it's not really clear to me that, like, because of machine learning,
Starting point is 01:12:19 more of the morally good companies will get more capital or something like that. I don't think you can really say that. But you can say that, you know, it'll be more efficient for all companies. And so there are two things at play. One is modeling and the other is data. So the amazing thing could be if there's a whole new sources of data reach the market. And if it used to be basically a boys club in Wall Street, they had the data and they sort of did inside a trading for like the 20th century. And now it's like, well, there's all these other data sources and it's too much for people to handle. And there's so many different ways of modeling it. And numerai can be the place where it's like we have a lot of data and you can model however you want.
Starting point is 01:13:07 And with numerize signals, you can bring in new data that's not being looked at. and together all of that makes the market a lot more efficient. But I would say the other kind of key thing that I think is very important for the societal stuff is there are 10,000 hedge funds in America or something like that. And they all hire the same people and they all buy the same data and they all pretty much end up building models that are very similar to each other. And the societal cost of that is the most important thing to me. all the brightest people going into finance to basically, you know, dig and fill holes because they're almost doing completely redundant work that's already been done across the street by
Starting point is 01:13:53 their classmates from Harvard. And I don't really like that. I don't think, I think it's over-extended. My favorite thing to happen would be we lose all those 10,000 hedge funds, except for one. but the one that's left is open as in you can put any data you want into it and you can put any model you want into it but you don't have to rebuild the wheel and recreate all the boring stuff like buying all the boring data dealing with all the prime brokers setting up all the trading execution infrastructure all of that is so inefficient and we don't need so many people doing that. And so my dream is that from the 10,000 hedge fund up, hedge fund number 10,000 will decide, you know what I'm going to do? I'm just going to start sending these signals to numerai.
Starting point is 01:14:50 I hate trading. I don't have enough capital to keep this going, but I do think I have a good signal. And according to numerai, it is a good signal. So I'm just going to start sending it to numerai. And then hedge fund number 9,999 says the same thing and 998. And they slowly start basically exporting their alpha to us because we can handle it more efficiently than anybody else and because we can be incentive-aligned through staking. I would totally concur that you don't need 10,000 hedge funds in America, but basically the notion of having a single one, doesn't that kind of defy the purpose? So basically, because I mean with the hedge fund,
Starting point is 01:15:33 what you're trying to do is you're trying to beat the market, right? So in order to beat the market, someone else has to, you know, underperform, right? Yeah, there's only so much you can have. So, I mean, you can think about a limiting case. Like instead, if you just say, well, let's just say there are only two hedge funds left. And but, but both those two hedge funds, they actually do have a lot of, they, they, they, they do have a lot of data and they maybe, maybe you don't. Yeah. So I'm just saying you can actually, it's still possible that with, two.
Starting point is 01:16:06 just two hedge funds, if they had more data and more modeling power and other things like that, you could have more efficient pricing in the market than if you had 10,000. So I think that's undisputed. Like, it's possible that the 10,000 are actually only doing, only just incurring trading costs and don't really have alpha. And actually, by the way, I should say, I think hedge funds are good for society, but the hedge fund industry isn't good for society in its current states. But it could be if we had fewer hedge funds and if we were more efficient about how we did the whole whole thing. So yeah, in the limiting case, let's just say when it gets to just two hedge funds left, I still think we can have a very efficient world. And then if you just stop thinking
Starting point is 01:16:54 about, then to go to the last hedge fund, you'd have to just stop thinking about Numeri as being a hedge fund that's trying to compete. But really just think about Numeri as the stock market. So the sub elements of Numeri are the models. They are now the hedge funds. They're the modelers. And they're bringing data. But Numeri is more just like infrastructure. And we're just like making the trades happen, sending all the money to the companies who need it the most, you know, shorting all the bad companies.
Starting point is 01:17:25 And we're just like the new stock market. Yeah. Cool. I think it's been super fascinating. And I think no one can say you're not ambitious. So tell us what's happening in the next year for Numeri? Well, Signals is a big thing. I'm very excited about it.
Starting point is 01:17:44 It's very key to Numeri. I mean, nearly the best thing you can say, the best argument against Numeri is, is what if the modeling part of hedge funds isn't that important? And you get really good at modeling. You get this amazing community. And the modeling part isn't the magic. magic is having data that no one else has. And so I think numerized signals is the answer to that. It's like we're going to have the best intelligence and we're going to have an open
Starting point is 01:18:16 platform for anyone to submit new data to. And it's going to be trustable because it's saked. So that is sort of phase two of the, of the master plan that I described many years ago. Number one, monopolize intelligence. Number two, monopolize data. Number three, monopolize monopolize money. Number four, decentralize the monopoly. That's it. Simple. Cool. So, four step plan toward domination. Exactly. Thank you. This has been a very illuminating interview. Thank you for coming on, Richard. Yeah, thank you guys. You asked incredibly good questions. Thank you. Thank you. It doesn't end here. There's more to this conversation, and you can hear it on Epicenter Premium. As a premium subscriber, you'll get access to
Starting point is 01:19:04 a private RSS feed where you can hear the interview debrief and get enhanced features like full episode transcripts and chapters which allow you to easily skip to specific sections of the interview. You'll also get exclusive access to roundtable conversations with Epicenter hosts and bonus content we put out from time to time. Go to premium.competenter.tv to become a subscriber and support the podcast.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.