Epicenter - Learn about Crypto, Blockchain, Ethereum, Bitcoin and Distributed Technologies - Richard Craib: Numerai – The Crowdsourced Predictive Model Hedge Fund
Episode Date: July 14, 2020Numerai, the "hardest data science tournament on the planet". It's a hedge fund with its own distributed research platform designed specifically for AIs. It's the first of its kind and takes a radical...ly different approach to making market predictions. It's completely crowd-sourced and data scientists around the world compete to create the best predictions and get paid with cryptocurrencies. Users are completely blind to the data and Numerai is blind to the code, and the predictive models users generate. The result is a hedge fund which is market neutral, currency neutral, and geography neutral. It makes all decisions solely on the data.Richard Craib, Founder of Numerai, was first on the show 3 years ago when the project's token, Numeraire, was just launched. Since then they have released a number of additions to the network including Numerai Signals and Erasure Bay, with plenty more in the pipeline.Topics covered in this episode:An overview of what the past 3 years since the last episodeThe Numerai staking modelNumerai datasetsWhat is a market neutral fundNumerai’s current assets and their position in the marketStaking liquid NMRThe latest project - Numerai SignalsNumerai's prediction markets Erasure BayHow the future looks for hedge funding on the blockchainEpisode links:Numerai websiteErasure BayNumerai SignalsNumerai MediumNumerai TwitterRichard Craib TwitterThis episode is hosted by Friederike Ernst & Meher Roy. Show notes and listening options: epicenter.tv/348
Transcript
Discussion (0)
This is Epicenter, episode 348 with guest Richard Crabe.
Hi, I'm Sebastian Guicchio and you're listening to Epicenter, the podcast where we interview
crypto founders, builders, and thought leaders. On this show, we dive deep to learn how things
work at a technical level and we fly high to understand visionary concepts and long-term trends.
If you like the show, the best way to support us is to leave a review on Apple Podcasts.
If you're on a Mac or iOS device, the easiest way to do that is to go to epicenter.orgs
slash Apple. Today our guest is Richard Crabe. Richard is the founder of Numeri, and you may remember that we
interviewed him three years ago in June of 2017 shortly after the company had launched. That was episode
191, if you want to go back and listen to it. Numeri is a hedge fund with its own distributed
research platform designed specifically for AIs. In fact, it's the first of its kind, and it takes
a radically different approach to making market predictions. It's totally crowdsourced, and it
incentivizes anonymous data scientists around the world to collaborate and create the best prediction
models. Users are totally blind to the data and Numeri is blind to the code or predictive models
that users generate. The result is a hedge fund, which is market neutral, currency neutral,
geography neutral. It makes all its decisions solely based on the data that is presented to it.
And as Richard puts it in the interview, data plus intelligence equals money. If you enjoyed this
conversation, you'll want to stick around afterwards for Federica and Mayer's interview debrief,
and you can hear it by becoming an Epicenter Premium subscriber. As a premium subscriber, you'll get
access to your own private RSS feed, where you can hear the debrief after every episode,
and you get enhanced features like the full episode transcripts and chapters that allow you to easily
skip to specific sections of the interview. You'll also get access to exclusive roundtable
conversations with Epicenter hosts and bonus content we put out from time to time.
We've just made changes to the pricing, and there's now a monthly subscription plan.
You can go to premium.competter.tv to learn more and sign up.
And now here's our conversation with Richard Crabe.
Hi, my name is Friedrich Ernst, and we're here with Richard Crabe from Numerai.
Richard, we've done an episode with you three years ago, so we kind of keep the background short.
A lot of the things that we talked about back then still true.
Your background is in maths.
and then you became a quant and built a machine learning-based fund working on proprietary data.
And then your idea was to turn this financial modeling problem into a pure data science and machine learning problem through homomorphic encryption.
So basically encryption that keeps the internal structure of the data intact and kind of open it to the wider public of data scientists for them to actually contribute to financial forecasting.
Tell us what has changed in the past three years.
Well, I think when I spoke to you guys, we'd maybe just launched Numerare, our own cryptocurrency.
So Numerai started in 2015, and we started making payouts in Bitcoin just for convenience.
It wasn't really like we were a blockchain company or anything.
But there was something about Numeride that was very much like in the spirit of crypto.
And when Ethereum did launch, we decided to start paying in Ether.
and then we decided around June 2017 to release our own cryptocurrency.
And the point of it was to do staking and burning.
And the problem with doing any company like Numeri,
where you have to trust that data you're getting from your community is legitimate,
it really requires staking.
I mean, truly the company couldn't work without it.
We needed our data scientists to, when they upload predictions,
to put some skin in the game,
by staking those predictions with the cryptocurrency.
And since we last spoke, that has really grown a lot.
I think maybe when we were speaking, we had like a few thousand dollars staked.
And a lot of people barely understood cryptocurrency.
Ethereum was very new.
I think we announced Numerare when Ethereum was $5.
So we were very early and we're lucky enough to have this very technical user base
who actually understood how to use these things before other people.
but we've grown from a few thousand dollars in stakes to over three million dollars in stakes from the community.
That sounds like quite an uptake. Can you give us an idea about how many people are currently placing predictions or giving you model data on a weekly basis?
Yeah, so the whole of Numeri is structured in that way. There's a weekly tournament. People download data, build models and then submit predictions every week.
the number that we care about is the number of users who are submitting and staking.
So you can just sign up and submit for fun, but we only use the models that are staked.
We only want to use the models that the users believe in themselves enough to stake.
And we have about 700 weekly stakers.
And just those 700 people, some of the top ones are staking almost half a million dollars of Numerare.
That's staked and at risk every week.
That's what's interesting about Numeri. We're not trying to grow the number of users like a consumer internet company or something like that. In fact, we like it when users join and lose money because they get burned because they're not good enough. So they're automatically dropping out of the system. In some sense, when someone chooses not to continue to stake, that's even a good thing. But over the last eight months or so, that's been the main growth. I think even just eight months ago, we had like $30,000 at stake.
and now it's 100xed.
People really believe in their models more and more,
and the performance of the whole thing is better and better.
Is there some kind of power law in stakers?
Some stakers stake a lot, and then it drops off really quickly?
Yeah, definitely.
If you go to the Numeru.a, if you go to the Numeru.a.ai slash tournament,
you can see the leaderboard and you can see all the stakers,
and the top staker has about 10,000 NMR.
are at stake, which is about $200,000, but they're allowed to submit multiple different models.
So that used in particular has two other models that also have very high stakes.
And yeah, it does come down.
Some people are staking as little as, you know, $1.
But even that is actually a good signal.
I mean, what's important is we're not asking people to stake on numerai so that we can make money.
There's no way we can ever earn the money that people stake.
The only thing that can happen is it gets burnt or they get rewarded.
So they're not playing against us.
They're really playing with us to make the meta model,
the combination of all the models, do better and better.
So just to give some background here,
so you've introduced the staking mechanism in order to make it costly for people
to actually submit models.
that you can't civil attack the system, right? Because otherwise you can just submit like 10,000
different models and just by the virtue of how many models you submit, you can kind of brute force
your way into a reward. And basically in order to make that impossible, you have to stake. And then
basically the reward that you obtain is also related to the amount you stake, right?
That's exactly right. So without the staking, you can never trust the models. And that's in a
I find it very surprising.
You can have nearly some industries have moved entirely like onto the internet over the last few decades, like travel industry or whatever.
You have travel agents online like Expedia or you have any number of industries can move online, especially if they're just about information.
And the industry that's almost the most about information is hedge funds.
But why isn't there an internet hedge fund?
and the reason is you could never do the incentives correctly
because people would sign up and submit whatever they wanted
and then they delete their accounts or make a new account
if it didn't work or whatever.
So the way, the thing you have to get right was how do your normal hedge funds do it?
Well, if you're in a normal hedge fund, typically you actually have your own money in the fund.
Now, we can't let numerai users have money in our fund and that's not even quite what we would want.
we want them to have a stake in their own predictions.
And by doing the staking on the predictions, it achieves that, that sense of, well, now I have
something to lose.
And it prevents a civil attack and it basically is this huge quality control.
If we use the non-staking users, the performance is much worse than the staking users.
If you look at the different models that you receive from people who stake or people who don't,
own stake, you somehow still have to combine these into a meta model to kind of tell you
what stocks to buy and what stocks to sell, right? So has your thinking on this evolved over the
past three years? Because I imagine there's quite a steep learning curve there on how to combine
different models that kind of have different sort of assumptions. Yeah, it's a big problem.
The first thing we really struggled to beat was can you beat just the simple average of all the
users. So every user is predicting on about 5,000 stocks. And if we took the simple average of those
users, it would be hard to beat that because that's the whole point of crowd sourcing in a way.
You know, if you can average the crowd, you can do better. So it's quite hard to beat averaging.
But then what we tried more recently was, well, if people are staking and some people are staking
a lot more than others, is that a kind of expression of confidence that we can rely on? Is someone
who's staking 10 times as much as the next guy, does he have a better model? And then we started
the stake weighted meta model. So it was your proportion of your stake was your weight in the
meta model. And that outperformed averaging. But the sort of third level of this, which we're
doing now, which we've just launched, is what if you could kind of do this other layer of
machine learning. So you could somehow combine all the models in a very smart way by basically
figuring out how they blend best together in a complicated way. And can you have that actually
outperform, stake weighted? And it's not that easy to make it outperform. And that's the thing with
the stock market data that you're talking about such small edges. So like we're talking about
trying to get a 51% edge over the market. And so if you have a new idea,
and it looks like it's 51.5%.
That's a very big deal.
But the sensitivity is so high that it might actually not be half a percent better.
It might be half a percent worse.
It's very difficult problem to combine them.
But staking is certainly the key thing that makes it possible in the first place.
Can it be that a model?
No model can perform equally well in all situations, right?
So in theory, you can have a model that performs well in some situations.
and not in others, whereas some other model could perform well in more numbers of situations,
but its edge might be lower.
So how do you weigh how much to pay out which kind of model?
In different words, like it isn't maybe always about just performance of a model, right?
So there could be more factors.
So what more factors are there and how do you weigh these different factors?
These are the kind of key problems with quantitative finance.
You can make models and they work brilliantly for 10 years, but they only worked because that whole 10 year period was a bull market.
That's sort of the attribution would say, well, the only reason you actually made money was because you would just kind of long the market or something.
So figuring out how to be able to determine whether a model is good is also being the big kind of research challenge of the company.
if you have a model that if you, every time you go along a certain stock in a certain country,
in a certain sector, you're also short a stock in the same country and in the same sector.
And you still do well, then that means you really have an edge.
You know, we have a lot of people right now, very delusional about their skills in the stock market.
So, you know, you spent the last four years buying tech stocks on Robin Hood and they all went up.
But really, if you'd bought the tech industry, you would have done.
done better. So you think you're good, but you're not really good. And that's the key way that we
filter out models is, are they good in all environments and are they good even when you neutralize
them to all known risks and all sectors and countries? And if they're good at that, that's very
hard to do by luck. And it's also more likely to be the kind of thing that works in the future.
So if you can make money even though you're market neutral, that means you're clearly resistant
to the market because you have no market exposure.
For every $100 long you have, you also have $100 short position.
So if you can make money like that, that means you're not building a model that's only going
to work when the market's going up.
How exactly you combine these different models?
Because naively it would seem to me if you have different strategies and you mix them up,
typically you make them worse.
Strategies, ideally, it's something that's self-consistent,
and as soon as you kind of mix it up with other strategies,
it's like a bit of this and a bit of that.
And there's no really difficult to make sure that these bits actually go together
and act as a greater whole.
So how much manual labor actually goes into picking strategies
that kind of go together or models that kind of go together,
and how much do you actually know what assumptions these models are based on
because, I mean, people just submit predictions, right?
So you don't know what goes into making these predictions.
So how do you actually turn this into a self-consistent meta-model?
One thing that's unique about Numeri that people don't really understand
unless they've done data science is because Numerize the one who controls the dataset,
we control every possible input that is involved in the model.
So if there's some thing we do research on and we're like, you know,
that feature,
tends to make strategies that only work in Japan.
So we're not going to even give that feature to the community.
So we're curating the datasets and setting up the problem in a very particular way.
That's on the feature side.
And then we're also deciding on the target variable.
The target variable is this very residualized return.
And so what that means is you can't really get paid.
And it's also on a specific time horizon.
So it's about a one month forward return that you're trying to model.
So yes, if we had it much more open and we said you can use any data you want and you can model any horizon you want, you might have someone come up with a model that works extremely well for one day trading in Japan and then interacts very badly with every other model because it's much higher turnover.
And that kind of model basically cannot be discovered or developed on Numeri because the target isn't set up to do that.
And users have no idea what data they're looking at because we've obfuscated the data and given it to them in this very structured form.
So what that means is because every model is training on the same data, the models are imminently combinable.
They're all almost perfectly ready to be averaged together.
They all have the same goal because they all use the same data and model the same target.
So that's a really important part.
It certainly is better for you to have more models that are uncorrelated.
And a recent thing we did is something called metamodel contribution.
So you might make a really good model on numerized data,
but it's very correlated to models we already have from other.
users. So we want to pay you kind of because you've made a good model, but it's not actually additive.
If both of you submit the same model, the crowdsourcing part of it doesn't matter. But if you submit an
uncorrelated model, that's trained to achieve the same goal, but it's still uncorrelated. That's where
the ensembleing really helps. And so we've also started paying people. They cannot stake on their
own performance, but stake on how much their model helps our meta model. And
And that has been a really important new development at Numeri of the last few months.
And you have people making these really creative and weird models that actually aren't very good by themselves.
But when you combine them are extremely helpful to the model.
In that scenario, when I'm staking on my own model, in order to decide how much to stake,
don't I need to know how the Numerai meta model itself behaves?
So that's a really good point.
So what's nice about modeling the targets is you have the targets and you know how you can check your own performance.
But if you're making a model that's being paid on the contribution, you don't know what you're kind of going to be scored against.
But the way we basically deal with that is we just tell people every week your meta model contribution was this.
And they can see like, okay, the meta model is in kind of one place.
And they can realize, well, this week I was uncorrelated with the meta.
model. And this week, I would have made more if I'd staked on meta model contribution. And so over
time, they can kind of learn. So the meta model is in some kind of stable place. And they can easily
estimate their contribution after staking a few rounds. So typically people will stake just on their
role performance. And then over time, as they kind of become pro users, they might start staking
on their metamodel contribution. This is kind of an evolutionary mechanism that's kind of
feeds into it. Let's dig down into this a little bit more. So can you tell us more about the
datasets that basically you homomorphically encrypt and give to people. So basically what kind of
features, how many features do your data sets have? And how often do you change these and what kind
of things are they to give listeners like some idea of what people actually end up working on?
We don't say for a reason, you know, we hide it for a reason. If we're
we gave away the data in a raw form, people could just kind of run off with it and start
their own funds.
And they wouldn't be working together on one thing and we wouldn't be able to do all this
stuff.
So we don't really talk too much about the data, but it is about, I think it's like about
a gigabyte or something.
It's about a million rows and the 310 feature columns.
It just looks like a million rows with 310 columns and every number is between zero and one.
So you don't know what the data means, but we do do a few things.
Like we put features into groups where those groups tend to be, have features that are correlated with each other.
The kind of data it is, is kind of structured quant data.
So the thing is, as much as it seems like there's more and more data available online about stocks, it's actually weirdly not true.
Like it's harder and harder, the barrier to entry to have basic data that every other hedge fund has.
is more and more expensive.
And so we pay something like half a million dollars a year on data, which isn't very much.
We see our edge more in the modeling than the data.
Even that is completely prohibitive to a normal person who's just trying to build a model.
So by buying it and doing the curation of the data and choosing the features and setting up the problem,
we're kind of taking out that side of things, taking out the finance part of it,
and just allowing our community to fit that dataset,
which is the fun part, kind of the machine learning part.
So in a way that's kind of like fitting like a 310 dimensional scatter plot, right,
in like the simplest terms.
Exactly.
If there were just two features, you could just plot it nicely.
But yeah, it's a 310 dimensional space and you have to find a curve, basically,
that fits that space to the targets.
Yeah, that's super interesting. So do you have any dummy features in there? I've worked in finance
for a long time myself and I know what kind of features you would probably base models on and
I know what happens after the fact. I can kind of look at the data you gave me a month ago and then
see what happened afterwards. So would I be able to deduct your encryption mechanism or is that
difficult for other reasons or you enter dummy data so that I'm kind of sidetracked?
Yeah, so we recently did have this genius Japanese guy write a blog post where he seems to have a lot of quantitative finance experience already.
And he started to make some claims about some of our features.
He thinks he knows that one of the features is momentum, which is a really common feature to have in any quant data set.
And so he wrote this kind of like case for why he thinks this feature group is momentum.
And I don't want to say whether or not he's right, but it was interesting to see that some people are thinking about that.
You might be able to make a sort of, yeah, estimate like that.
Like I think this looks a little bit like momentum, but it'd be very hard for you to fully map row for row,
feature for column for column, you know, what that data is because it's basically impossible.
Like the obfuscation will make that impossible for you to do.
But people are still thinking about that.
And it is interesting when people do that.
So one reason it's good to obfuscate the data is to protect, you know, the data from leaking and going into other places or people using our data but not actually submitting to us.
But the other maybe more important thing is like it actually stops people from imposing their own human ideas.
And if people do think this one feature group is momentum features and they read in the newspaper that momentum
going to struggle and their finance professor, they remember their finance professor told them
momentum is a bad feature for stocks to use in the long run or in a bear market. Suddenly they'll
impose all of these kind of human ideas onto the data. And they might say, I'm going to drop that
whole group from my modeling. And that we really don't want people to do. We do want people to
use machine learning. And in its real form, it is not about hand-picking things. It is about, you know,
the answer is in the data. And you just have to discover the model. That's the best fit for the data.
It's not your job to impose your human biases on us. That's really interesting. The story this
reminds me of is, very recently you have these chess engines that are based built using reinforcement learning.
And it turns out that these new chess agents that are built using reinforcement learning are able to defeat the older computer programs that were built by humans to play chess.
And one of the key differences between this.
So these newer ones are something like alpha zero and older one is something like stockfish.
So programmers have been building stockfish for 20 years.
And it's like this perfect chess program that like beats all humans.
But then this new reinforcement learning algorithm comes along and it ends up beating stockfish.
But not only beating stockfish, but playing way more creative.
creatively than stockfish.
And one of the key differences between these two chess playing algorithms is when
stockfish was designed, it's humans designed stockfish.
And intrinsically, like humans gave stockfish some of our own value judgments.
Like, for example, a bishop is slightly more valuable than a knight.
A queen is at least two times more valuable than a knight.
So stockfish was built on these, like, human ideas of how valuable the various pieces on the chess
board are and like stockfish is trying to optimize this position given some measure of how
valuable these pieces are with respect to each other whereas these new reinforcement learning
algorithms they're like you don't tell the machine how valuable a piece is vis-a-vis another
piece at all like let the machine discover by itself whatever things and don't even tell
the rules of the game yeah don't even tell it like what things are important what teachers or what
pieces are important. And then it turns out that like once the machine learns with that blank
slate, it actually performs even better, plays even more creatively than machine that is given
those features. So it feels very similar in that regard. Yeah, it really is. And it's such an
important like, yeah, almost philosophical change in the last few years where you had the same
thing with speech recognition where these hand-built features. There was a huge team of
huge field of these people who were kind of really good at this, making features from audio and
doing speech recognition. And then 100% of that knowledge is not needed anymore because the neural
nets got so good and the algorithms got so good. And you just needed more data and you could
outperform. And I think that will definitely be the story of finance. Yeah, there's some sense where
the old guard of finance might say, well, you know, you really have to know a few things about
the real economy and you have to know a few things about inflation and the macro stuff if you want to
be a good trader. But it's all not true. It's all going to still come down to a mathematical problem.
Yeah, I think we are showing that to the extreme. Here's how numerize. 100% of our modelers have
never seen the data, right? They're just modeling the obfuscated data. And 100% of the models we use
in trading, we have never seen the code that created those.
models. So the users are blind to the data and we are blind to the code that they're writing.
And somehow that works. Like, that's really taking all this finance stuff out of the problem.
I totally see that you don't need to know what exactly it is that a company that a stock belongs
to does and so on. But sometimes our world fundamentally changes. An algorithm that is trained on
back data has no way of knowing that. Look at the question.
Corona crisis. So basically, had I told you at the beginning of this year, there's going to be a
huge pandemic. And you might have, as a critically thinking human, you might have been able to
infer possible consequences. So things like airline stocks are probably going to go down,
pharma companies stocks may go up, say at home stocks go up and so on. There's no way that a machine
learning algorithm that's trained on the past would have known that, right?
That's exactly right. We never tried to model those things. There's never been a time where we've had any net exposure to any particular industry. So whenever we were along an airline in the first place, we were also short one. We never had any exposure to any sector or any exposure to any country or any exposure to any economy, basically. So I know that sounds kind of crazy. Like if you are,
long and short in all these things. You're not really playing the game people think. Like, we're never
like trying to predict what stocks will go up. I don't know if that like that seems like I'm going to
say this crazy thing's day, but like that's not ever what we're trying to do. We're really looking at
relative positioning of things in the market and like for this stock, for its features, it is undervalued
relative to this one. That that doesn't say anything about what their absolute value should be.
you know, if we're buying Tesla and maybe everyone thinks Tesla is overvalued now,
it doesn't really matter because we don't care about the overall levels.
And that's what's very different about market neutral funds versus funds you might,
yeah, or like human-based, just like, I want to buy Snapchat because I believe in the future
of the social media industry or something.
It's like that kind of judgment is a kind of for the world of humans in a way.
But the relative judgment based on all the data is more the domain of machines.
So for people who are not well-versed in the finance universe, what exactly is a market-neutral fund?
So basically, how exactly do you construct a set of stock options that kind of have that property?
Yeah.
Well, the market-neutral part means you don't have stock market exposure.
Now, how can you not have stock market exposure if you're trading stocks?
So we trade with about four times leverage.
So we have, for every $100, we'll have $200 long in the stock market and $200 short.
So if the stock market as a whole fell by 50%, and we owned random stocks in our longs and random stocks in our shorts, how much will we fall by?
0%.
Because we were short, an equal dollar amount.
So that's the key thing.
It should make sense to people that if you buy stocks at random and you go long and short
them at random, your outcome will be that you'll make no money and maybe pay money in trading costs.
And so that's the market neutral part.
That typically is what it means like your dollar neutral to the market exposure.
And that's why, you know, a hedge fund, if the market's down 40% or 50%, if it's a real market neutral hedge fund,
you wouldn't be able to predict how much that fund is down because they might have done well
because half their portfolio fell by 50%, but they were short that half.
And the other half where they were long also fell back.
So the usually market neutral means more than that now.
So you might have something that is dollar neutral, but is actually exposed to a certain sector.
So 100% of your longs, your long tech, but you're short retail.
Okay, all your shorts are retail companies and all your longs are tech.
You're not that smart if you, if that portfolio works.
It'll only work because of your sector exposure, not because of your stock picking ability.
But if you then say, well, I want to be market neutral and sector neutral, and you go even further, and country neutral, and momentum neutral, and value neutral, and volatility neutral, and on and on and on, your whole portfolio is perfectly balanced.
the number of longs and shorts you have is always balanced,
then in some sense there's very little
place for your portfolio to go
because you're neutral to everything.
But if you can still make money
when you're neutral to everything,
that's really powerful
because if someone looks at your portfolio,
they'd be like,
I wonder if you made money just because you got lucky on your sector bet.
Oh, you didn't have any sector bet.
You're neutral to sectors.
Oh, I wonder if you made money just because the yen crashed.
Nope, you had no yen exposure the entire time.
So if you had no exposure to any risks and you're still making money, that's the dream of the market-neutral hedge fund.
That's absolutely fascinating.
So let's talk about your results with this.
So can you tell us how much assets under management you currently have?
Not really.
We can't really talk about the results.
So, yeah, we are a small fund.
The way we've done Numeri is completely different.
different to other hedge funds. We've raised venture capital for the company. And we've even sold
tokens, not in an ICO, but to talk professional investors, crypto funds. But usually hedge funds start
by raising a bunch of money into their fund. And there's a rule with the SEC. I think if you have over
150 million, you have to announce that to the SEC and you get regulated in a special way. So it's almost
been good for us to stay well below that while we were doing the sort of R&D.
There's so many new things we're doing simultaneously at Numeri.
No one's ever done crowdsourcing like this.
Even just the machine learning part is unique enough.
Then we have the blockchain stuff on top of that.
So there was a lot of time that we needed to get things right.
So yeah, we don't talk about the AUM, but you can infer the fact that we haven't filed that.
we're probably quite a lot less than 150 million.
And then returns is also something the SEC doesn't like hedge funds to be talking about
because, you know, we don't want to be seen to be promoting our fund.
Our fund is not even investable by individuals.
It's really like a institutional grade fund.
So there's a couple of individuals in it, but they're just like very big investors in the hedge fund space or something like that.
I would have hoped to hear some numbers, but I understand that you can't.
Can you tell us how you've done with respect to other hedge funds?
So are you hedge fund neutral or are you up or down?
Yeah, I mean, we can say like, so we did, I did talk before about market neutral,
spoken to other press about market neutral funds during this pandemic.
And some funds did extremely badly, even though they market themselves as market neutral.
In fact, Renaissance have a market neutral fund.
Renaissance is maybe the most, yeah, successful hedge fund.
And it's down 20%.
And it's like, why are you down 20%?
Like you're supposed to be market neutral.
And so things can go wrong in a financial crisis where you realize, even if you're market neutral,
you were holding a lot of the stuff that other hedge funds were holding,
and your strategy wasn't very differentiated,
and everyone was pulling their money out of the good stuff, the good stocks,
and that liquidity event made a lot of liquidity crisis made a lot of funds lose money.
But we were also pandemic neutral during this time.
So we actually didn't have, we don't hold a lot of the same stocks that other hedge funds hold.
So in March of 2020, when this crisis was particularly bad, we were fine.
We did a lot better than our peers.
Cool.
Being fine in these times is a, yeah, it's a very good thing.
So can you tell us what percentage of the liquid NMR supply is staked on a regular basis?
Yeah, right now, so it's maybe, yeah, if it's 3.3 million,
depending on how you look at the circulating supply, that's about 5% of the total that's out there.
And that's a good amount because Numerai has held back a lot of the, because we never did an ICO,
there wasn't like this time a long time ago where all of our tokens were bought by speculators who would never use Numeri.
We were much more careful.
We gave out a small amount to our users and slowly pay them from our reserves.
And because of that, Numeri has a lot of the tokens, like six and a half million of them or something out of 11 million.
And the users have a lot.
And the sort of speculators are not, yeah, don't have as much as, and don't command the price as much as the actual usage does.
And I really like that.
Like I really like looking at our, if you look at Numerare, the amount of volume on Uniswap is the kind of interesting thing to look at versus the amount of volume.
on centralized exchanges because a lot of our users or a lot of people who are really using a
defy app they need 50 tokens in order to use numerai or something will use uniswap not a centralized
exchange but the but the speculators will use a centralized exchange so if a lot of our volume is
on uniswap that means like we actually have a lot of organic users which is more than you can say
of most crypto projects and I think that's an important
important thing. And I think people are looking at that now and they're like, well, do you want to make
a bet on something that's actually being used or do you want to just like be part of the speculator
clause? I have kind of a general and abstract question. So you have this numerine model where
you publish an obfuscated data set. People develop models against it using their, the models
are private to you using their models, they submit predictions to you. You combine those
predictions in some way and that method keeps evolving and then using that combination of predictions
you're doing a trade and trying to make returns on the stock market. How general is an architecture
like that? For example, could I basically record my entire extreme of senses, side vision, sound
everything I see, everything I hear,
officated, that's the data set,
send it to the numeric community
and say, hey, my objective is to make the most money
over the next five years
and somehow get models on what I should do.
And like those models are somehow
then combined into what action actually I should take.
So maybe that's the individual level problem.
But an organization can also think like that.
So there's an organization which,
which has like 20 plants and it manufactures some things and there's this entire data set.
It has its entire data set in SAP.
Somehow extracts it all, offuscates it, publishes it.
And then a distributed community is actually analyzing what sort of manufacturing decisions this entity must take
in order to get some objective.
So is this approach general like that?
Does it specifically work only for the stock market problem?
Yeah, I think it is not general.
I think the stock market is uniquely positioned for something like Numeri.
And the reason for that is the sort of productionizing of the whole thing.
So there are websites, data science competition websites,
where they give out a bunch of like health care data.
And then it's like figure out, you know,
based on these x-rays, whether the person has cancer or something.
And people attack the problem and they find a solution.
And the crowd might find a good solution.
But it's the productionizing of that model where it all, the crowd sourcing just part of it stops.
It's like, okay, well, thanks for the competition.
We're just going to hire the best person from this who made the best model.
And then he's going to develop it in and deploy it into a system.
because imagine you are a doctor and you had this x-ray analyzer and if in production time you're going to
want to have the model running locally you're not going to want to say okay i'm running an x-ray
and i'm going to send this particular example to the crowd to get their feedback you actually need
to have the model locally and that's the unique thing with numurai we don't really need to have the models
locally because we can just ask users to submit predictions every time we want to trade.
And that works out a lot better.
So productionizing it and in a real way where we're using the user predictions on the real
stock market is something you can't really do in other industries as well.
The other part, yeah, well, that's, yeah, I mean, that is kind of the main, that is kind of
the main thing.
But the other part is the accuracy.
So as it happens, if you were to be, imagine, imagine.
you are 85% correct on whether someone has cancer based on an x-ray. If you can get that to 86 or
87, that's actually not that valuable. I know that sounds like kind of crazy, but it's not valuable
to the medical, like if you could tell them that, they would be like, oh, that's a small
increase. Our biggest problem is actually implementing this stuff and educating the doctors
about this stuff. Our problem isn't the accuracy. But the stock market, to go from a
51% accurate to 53 is extremely valuable and worth the crowdsourcing effort to have that small
increase in accuracy. So yeah, I do think it's kind of like, in fact, when I was pitching VCs
numerai back in 2015, they're like, you know what you should do? You should just say you're going to
be a technology company and you're going to do this for all industries. And that way you'll get a
higher valuation than if you're just doing a hedge fund.
And I was like, no, it doesn't make any sense to do this for anything except the hedge fund.
So I still think that's true.
Cool.
So one of the most recent additions and improvements to numerize, numerize signals.
Can you tell us about what that is?
Yeah.
So you almost have to forget everything I just told you.
Because numerize signals is sort of a different way of doing this.
So on numerai, we give out a very pre-processed dataset.
and we're crowdsourcing the intelligence on the dataset.
But how are we going to get more data?
So we could go out and buy more data and add it to the dataset we give to users.
Or we could tell our users, well, you know what?
If you have a data set or a signal on stocks, you can also just send that up to us directly.
And that's what numerized signals is.
It's really if you already have a model on the stock market.
and you've used some other data source to build that model.
And you already know, I want to predict Apple's going up, Google's going up.
You want to predict on the actual stocks, and you don't want to look at our obfuscated data.
We want to be able to crowdsource the data part of Numeri 2.
The crowdsourcing of the models, we've got a huge, you know,
maybe the second biggest state science tournament on the planet after Kaggle.
And we've got all the staking there.
but to get to the next level,
it's always a function of data plus intelligence equals money
for hedge funds.
And so if we got the intelligence part, perfect,
but the data is like just normal data
that other people probably have,
how do we get the data part like really good?
And we wanted to do it in a way that also leverages staking
and the other things we've built.
So you're going to be able to upload.
Right now you can do it already on signals.
Numru.com.
you can upload predictions on stocks.
In a CSV, you still have to predict like 5,000 stocks.
It's still very quanty.
It's not like if you think Snapchat's going to go up or something like that,
that you can use numerize signals.
It's not for you.
It's really for quants who can make predictions on thousands of stocks.
And when you upload predictions, you're scored in a way that's not,
again, it's not like you make money if the stocks go up.
It's not quite like that.
It's much more complicated.
But it also will have the staking.
And so we're about to launch staking for numerize signals where that is how we will know whether we can trust the data.
So the reason you couldn't do, the reason numerai didn't start with this was what's great about numerize is we can control the data part of it.
And so we can trust the quality even more.
But now that we have staking, maybe we can trust things a little bit that come from numerized signals.
Maybe we can trust that no one in their right mind would stake $10,000 on this if they didn't believe it was good data.
And we can start to put some of our meta model could be coming from numerai signals as well as numerai normal.
And so in that way, we're crowdsourcing both the best data scientists who don't have any data and the best data providers.
Does this in a way kind of question the original model of numeric?
though. Because basically, I mean, you wanted to target people who have no idea about stocks, right? So basically, you have no financial market experience, but I happen to be very good data scientists. And now you're kind of asking people to give you their best take of what's going to happen with the market. Wouldn't it be easier for them to just go play the market themselves? Why would they actually use you guys?
very good question they might have a model that's not that good by itself but will help us so imagine they had
imagine let's say they don't know they still don't know anything about finance but they have scraped a bunch of
data from twitter to do like sentiment analysis like how many times are people talking about google on
Twitter and they make a little signal out of that where the high numbers are the things with high
sentiment and the low numbers of things with low sentiment according to their data.
Now, they don't want to make a hedge fund out of this one signal. They don't know anything about
hedge funds and they might not make any money on just the raw signal by itself. Let's say
that it makes 3% a year that signal by itself. But when combined with numerize data,
because we don't use any sentiment data, it turns off.
are say 20% a year return into 25% return.
So by itself, they wouldn't want to make a hedge fund out of that signal,
but they would prefer to license that signal to a hedge fund that can actually implement it.
Remember, the hedge fund industry is very hard to get into.
No one's going to let you trade 5,000 stocks on four times leverage
and trade swaps in South Korea and, you know, without, without you.
having like serious backing and a proper prime broker and things like that. So I think it's a way to
access that that kind of data that by itself might not be valuable, but on numeria would be.
Super interesting. So we'd like to move on to erasure. So erasure is this protocol that's being
developed by by your organization? So what is it? And my follow-up question there would be,
is it different from a prediction market?
Yeah, so the protocol that powers the staking on Numeri,
we were one of the first things to be doing staking.
People know about staking maybe through proof of stake,
but now there are many other things that are using staking.
But that was the thing we had to build.
So we built some smart contracts that powered Numeri specifically.
But then when we're building kind of version two of it
and upgrading those smart contracts.
We're like, you know what, we might as well build this really well,
such that other developers could use the protocol.
And maybe other applications would start that use numerare in some new way,
some kind of crazy ideas.
Why can't you stake your tweets to have them show up higher on people's feeds
because you've got a stake?
Or you can grief Donald Trump's tweets if you're unhappy with something he said.
and like maybe you could have staking be all over web two.
So we were thinking about these ideas like two years ago
and decided to build the next version of the smart contracts
that power staking on numerite in a general way
so that other people could build applications.
And that's what Erasure is.
And we also decided to build another application on top of the protocol
to demonstrate this.
And that's called Erasure Bay.
And Erasure Bay is like, it's kind of like a Twitter bot, but it's, but you sign up and you can make a request and put a stake on it so that people can actually trust.
You can say, I'm looking for, I don't know, one of the things I've posted on was I'm looking for a Vitalik Buteran's home address.
And I'm putting a $500 stake to prove that I'm serious, that I really want this.
And you can check in the blockchain that I'm really committed to it.
And also, if you provide an address that's wrong, I get to burn your stake.
So someone actually did give me Vitalik's home address on Eresia Bay.
But he basically gave me, he said his Vitalik's home address isvatelic.eath.
And I thought that was really funny, but it wasn't what I was going for, and I didn't want to pay $500 for that.
So I had the right to burn all of his stake.
So I did.
And that small dynamic, I think, of staking and griefing,
kind of peer-to-peer staking and griefing,
I think could be used for many things.
And so that's why we built erasure and built erasure Bay.
It's all very helpful for numerai.
We're going to use erasure, obviously, for numerite signals.
We're going to use it for numerai.
We've got Erasure Bay using it.
And other developers are interested in using it.
And I think in the future, there will be, especially when this like layer scalability stuff's fixed, I think you might have websites like one of them is like stack overflow.
Why can't you just put like a little stake reward for someone who provides you the answer to your stack overflow question or grief someone who gives you, you know, malicious code or something?
I think the internet, the whole internet could be improved just like the quality of predictions on Numeri were improved with staking.
So what happens if you give me a good answer and I decide that the answer somehow isn't good enough?
Is there like an arbiter that kind of decides who of us is actually in the right?
Or can you then just grieve me back?
Or what's going to happen?
Yeah, there's no arbiter.
So you decide, were you happy with it or not?
And so, but the question is, why would you do it?
Why would you grieve the person if you, so what happened with,
my stake with the Vatollic's home address, it was like, I had to pay a little bit of money to do the
griefing. So I wouldn't do that maliciously because it's hurting me, but I would do it to
build a reputation on ERAA that if you mess with me and you give me bad data, I'm going to
burn you. And just like Numeri, there is no arbiter saying, except
for numerai saying whether we liked your predictions or not, we just decide. And we said, we scored you.
No one can check if we're scoring correctly. But they know it's in our interests to score correctly.
They know we want the best models. And so the same on Eresia Bay. If you're, if you're asking people
for data, why would you be malicious? And if you look at the real data on Eresia, people are asking about this.
Maybe people will do things and maybe need an arbiter. And I think all that's,
like such a distraction because on Eurasia about half the requests get fulfilled and something like
one or one or two percent get grieved so it clearly is working and the griefing is a small minority
uh and even when people are griefed it's like because they didn't quite like they asked for some
10 videos and they got five videos or something and they grieved them a little bit because they
were still happy with so it's like yeah i think it's enough
that peer-to-peer ability to torch all the stakes that you both have can actually be enough
economic tension to get the right outcome.
But from a game theoretic point of view, reputation systems are incredibly difficult to design,
right?
And because you could just ask for 10 things and then kind of answer this with a second account
that you have.
And basically it would look like you have a fantastic track record.
and then the next person who comes along,
you can just be a total jerk to just because you feel like it, right?
Does this ever happen?
Has this come up?
Well, the other thing, we are, you know, leveraging Twitter.
So we are using that Web 2 reputation graph that's already been built.
When I asked for Vitalik's home address,
people knew it was me because I posted it on Twitter.
And people knew, people know I have.
have a lot of NMR more than anybody. And so they know I don't mind griefing. And so bringing in that
part helps a lot. So it depends on your threat model, kind of. Like if you're building proof of
stake for Ethereum, now you have a serious threat model because you have massively wealthy, bad actors
who could influence that system. And if they influence it, it hurts everybody. But on Arachor
it's like, it'll only be that
period, it only be that one relationship.
It doesn't like break the whole system.
Like if you broke proof of stake in Ethereum or something.
And then it's also, yeah, kind of lower stakes.
And it's also got outside a reputation where you're not really anonymous.
So I think people should be more open-minded to,
I know I'm like my backgrounds in mathematics and, you know,
I study machine learning and game theory and all these things.
But I'm actually like, I also think people should
lean on the intuitive side of things too. The game theory of bit torrent doesn't quite make sense,
but people use bit torrent and it works. And the game theory of all social media doesn't quite
make mathematical sense, but people still click the like button billions of times a day.
They are losing, it's costing you time to click that and you're getting nothing from it,
kind of. It's like, I don't know. There's a lot of things you can say about that, but
yeah, I don't think people should be trying to write mathematical proofs for everything.
And it always depends on the model and the emergent behavior on the system.
Yeah.
Could you give us some practical examples of how people have used erasure?
Yeah.
So there have been some really weird things.
Some of them made by me, but others as well.
So one guy actually asked for lung scans of people with COVID-19 early in the crisis.
We launched ERABA pretty much in the beginning of the coronavirus and asked for a bunch of lung scans so he could look at the,
there's a special term of art for the kind of damage you can have in lungs from coronavirus.
And he asked for these scans, and it was a fulfilled request.
We can't see the data.
we're not in the middle of the transaction.
It's just a peer-to-peer thing.
But he got these lung scans.
And then another person asked for videos of Jeffrey Epstein,
full deposition videos of Jeffrey Epstein.
And he didn't actually get what he wanted,
but someone replied who was a lawyer on that case and said,
the video that you posted actually is the full video.
It looks like it cuts out.
there wasn't much left after that. So he sort of like found out this sort of like interesting
information about Jeffrey Epstein and even released and got got a list of all the lawyers who
involved in that case. Something I did that was sort of like civil disobedience was I asked
someone to dig out the, they filled the skate park in Venice with sand and there were these
drone footage of them just filling the skate park with sand trying to prevent people from skating
because of coronavirus.
And I thought that was like excessive.
So I put up a stake for someone to dig, dig out the sand and send me a video of them
digging the sand out and saying numerali or die while they did it.
And that video was sent over Eurasia Bay.
The guy got like a $400 reward and I released the video to the public.
And I just think they are, yeah, what's interesting about that is,
is I don't think you can have these kinds of transactions in any other way.
I don't think that without the staking part, you can make it work.
Because if it's going to take some time and cost for that other person to go and do that thing or get that data,
they want to know that the stake really is there.
And it's not the person just saying, because I could just tweet, please give me Vatolix address, right?
But somehow, without the stake, it's not like a legitimate thing.
they might not, I might not end up paying them or whatever. So setting it up in this like
staking way allows for different applications. So I think it's kind of interesting. I think like,
yeah, ERABA's like followers on Twitter keep going up, kind of been doubling quite frequently.
And the requests are more and more interesting. And it's just like this really simple use of the
protocol to kind of get people thinking about how you could use it in other ways.
Can you put a number on how much the protocol is being used? So basically, number of asks or
NMR staked or any of those numbers? Yeah, there's only, I think, yeah, there's like
1,500 Twitter followers on the account. And there's been about, there's been like a few
hundred requests. Some of the requests are stakes of just $10. Like I'm looking for a list of
employees who are recently fired by Airbnb when they cut their stuff. Little things like that for like
$10. Yeah. So I think it's like a few hundred requests, very few griefs. And probably right now,
there's probably like $6,000 or so of stakes on it. So it's a very small part of the, you know,
Numerai has obviously got $3.5 million of stakes.
So it's still very early and it's just a small part of the protocol.
But it does make people think and I think there will be other applications of it soon.
So yeah, we're working on some new ways to make it even easier for new developers to integrate staking on their website without knowing any solidity code.
And I think that could be cool as well.
Cool.
So listeners, if you're looking to make an easy 75 bucks, there's currently an open ask on Eurasia Bay.
And it only asks for an original equity investment idea, must be longer than 500 words with a detailed valuation methodology and analysis.
So $75 and there you go.
That sounds good.
Yeah, I think that's the kind of thing.
I mean, that's a lot of work to ask someone to do, I guess.
Yeah, it's not really, yeah, I can't see that.
I can imagine someone asking for that kind of thing on the internet,
but I can't really imagine it working quite as well without the stating.
Can we talk about the future?
So as I understand it, you are currently still buying and selling real stocks, right?
So basically there's representations of stocks on the blockchain now,
so things like Yuma and similar products.
How do you feel about these?
So do you think it's time to move away from, you know,
the legacy word or do you think are you quite comfortable in that? Yeah, I think it'll take a very long
time for it to be to be that all the stocks are on the blockchain or something like that. There's so
many regulations in every different country and stuff that these things violates nearly all
of them. So it'll take so long to change that. But, you know, ultimately we don't, the stock
market is also, it's so messy. Like there's so many little things.
you need to know. Like you bought this Korean equity on swap and it had a split and then it paid a
dividend and then it went bankrupt or whatever. And you have to like deal with this. Like we have to
have like a back office of people like thinking about these sort of real things that are happening
in the world with these companies. But if you could really easily just, you know, get what you
really want, which is the, you basically do want just like some derivative that just maps the price.
If you could get that exposure without being on the real stock market, I think people would do it,
but I don't see it being possible to do properly because it requires so many things,
including, you know, Oracle problem and other things like that.
So yeah, we're not against it. And I do think more things will be tokenized and stuff.
but I don't think it'll happen in a short time,
and I think it'll be mainly a regulatory problem that needs to be overcome.
And how do you see the threat from the legacy world the other way around?
So basically I listened to the podcast that you did three years ago again,
and you said that AI wasn't fully integrated into traditional quant trading.
Has that changed?
Because machine learning and artificial intelligence has come such a long way,
And if you look at the really cool things that, you know, the Googles and Facebooks and so on
of the world are currently doing, why has this not, has this penetrated quant trading?
And if not, why not?
Yeah, it's a stranger thing.
I think it definitely has the, you know, more and more, especially in the last three years,
nearly every hedge fund is talking about machine learning.
And you never quite know, you know, how much.
they're using it because they're hiding everything.
So, you know, you can kind of check how good Google's machine learning is by using Google,
but you can't check how good hedge funders at machine learning or whether they're really using it.
And there are also so many parts in the process you could use it.
Like you could use it to create features or you could use it just in the optimization
part of your portfolio construction, but not use it in the deriving alpha part.
there's a lot of people who are saying they're using it not really using it
I think there's a big difference between the funds that are like ours where we really
started with it like that's our whole idea is is send around it versus people just tacking it on
or making a new product that's like this is our ML fund and it just like uses a random forest
or something like that like I don't think that's like as important as having the machine learning
be like part of your DNA but yeah there was a time a couple of
of years ago when I was playing poker at the Math for America poker tournament, this like
famous tournament in New York City where Jim Simons does the founder of Renaissance. And I was playing
poker with them and I was playing at the table with the CEO of Renaissance, Peter Brown. And
he was making fun of me about Bitcoin and stuff. But I also asked him, you know, how much machine
learning, what do you think the impact of machine learning will be in finance? And he said,
I think I'll have a really big impact in things like self-driving cars, but I don't think it'll
have a big impact in finance. And I thought that was a kind of a strange thing to say from the CEO
of Renaissance, because I know he knows a lot of machine learning. And I kind of know that, but it's
always unclear maybe what he meant. But yeah, there's something very different about modeling,
video data and images and things where the neural nets have really dominated computer vision.
Like the accuracy is so much better, but maybe in financial applications, he doesn't see the edge
as being so significant that you have to care about machine learning.
I mean, I know Renaissance use a lot of simple linear models, but they get all the other stuff
perfectly right.
The data is perfectly right.
The execution is perfectly right.
And maybe you don't need the models to be that good.
if you can have data no one else has for 30 years or whatever.
So there's a lot of, yeah, there's a lot of interactions.
But I do think it will continue to be a big deal,
at least in the story of finance
and companies will continue to say they're doing machine learning
and probably will use it more and more.
So you have this hypothesis that, you know,
like these machine learning models will be important.
to hedge funds at least and finance broadly.
Let's say like that assumption is right and it plays out over 20 years or 25 years.
And so in the future there's like the hedge funds using more of these models.
What improvement does it do to the market?
How does it improve either the forecasting abilities of the market or the efficiency of the market?
Where will society reap its benefit?
Yeah, I think that's a very good question.
Part of what I like about Numeri,
so maybe as I take a step back,
like if you have a crypto project,
like let's say you have a crypto company
that's like a decentralized exchange
and they have lots of people using the thing,
but every single token on the decentralized exchange,
so maybe the decentralized exchange has usage,
but every single token that's being traded on it doesn't have any usage.
So the whole thing isn't really having any impact on the real world.
And people don't like going all the way to like the final impact on the world question.
But hedge funds, I really think, have a very powerful, positive impact on the world.
And I know that's not really what you can read in the media.
People hate hedge fund managers and stuff.
But if you were to rerun U.S. economic,
history without stock markets, without hedge funds, without, you don't get Tesla.
You don't get a venture capital industry without hedge funds and stock markets.
So I think that's the first thing.
It is very important.
But how does it look?
So I think what's kind of indicative is if you do look at the data in the past, if you look
at the data in the 90s and you can look at these patents in the data and you can see, man,
the market was really badly priced at this time.
And that's very bad for society.
Like imagine there is a company that actually is good,
but its price in the market is extremely low.
That means employees don't want to join that company.
That means banks don't want to lend to that company.
And a hedge fund stepping in, buying up shares, moving the price up,
is extremely valuable.
it's like invisible.
You know, you wouldn't really know that it was actually hedge funds that saved Tesla or something
because they went long at just the right time or whatever.
That's an important way to like think about it, is that we, in the past, there was a lot of inefficiencies and we fixed them.
And now there's less inefficiencies.
But there's still a lot of very nonlinear inefficiencies maybe.
And now machine learning is exposing those.
And it's saying, well, this isn't quite right.
Under this model, with this data, things look very inefficient to us, say.
And having those be corrected over time is super beneficial.
I mean, the aspect that a hedge fund produces societal good, it's not obvious, but it's also important because in some sense, prices result in decision making.
If you don't have a pricing structure in an economy, you won't be able to make decisions.
For example, I want to build a road from here to New York and Saigo and some farmers say,
this is the price for my land.
Because the farmer is able to express a price, there is a decision on whether to do the project
or not.
So pricing always translates into decisions.
And if you don't have pricing information, decision making will be poor.
And as a result, resource allocation will be poor.
And so what hedge funds and ultimately the market are doing is trying to generate as accurate price information as possible.
And that is the societal utility of it?
And is it just that like machine learning feeding into price discovery is basically improving price discovery in some way?
It has to.
That's why you're able to make money.
But is it possible to quantify exactly in what way it is going to improve price discovery?
kind of, is that possible or it's not predictable?
Yeah, it is. It's not, there's a new trend in like finance, like, I think it's called
ESG funds or something where it's their, their fund has a mandate to, they have to invest
in like nice companies and a lot of invest in tobacco or defense or they have to invest in like,
I don't know, you know, nice companies that are good for the world. And like, but yeah, we don't
really look, yeah, I think, and so it's not really clear to me that, like, because of machine learning,
more of the morally good companies will get more capital or something like that. I don't think
you can really say that. But you can say that, you know, it'll be more efficient for all
companies. And so there are two things at play. One is modeling and the other is data. So the amazing
thing could be if there's a whole new sources of data reach the market.
And if it used to be basically a boys club in Wall Street, they had the data and they sort of did inside a trading for like the 20th century.
And now it's like, well, there's all these other data sources and it's too much for people to handle.
And there's so many different ways of modeling it.
And numerai can be the place where it's like we have a lot of data and you can model however you want.
And with numerize signals, you can bring in new data that's not being looked at.
and together all of that makes the market a lot more efficient.
But I would say the other kind of key thing that I think is very important for the societal stuff is
there are 10,000 hedge funds in America or something like that.
And they all hire the same people and they all buy the same data and they all pretty much end up building models that are very similar to each other.
And the societal cost of that is the most important thing to me.
all the brightest people going into finance to basically, you know, dig and fill holes because
they're almost doing completely redundant work that's already been done across the street by
their classmates from Harvard. And I don't really like that. I don't think, I think it's over-extended.
My favorite thing to happen would be we lose all those 10,000 hedge funds, except for one.
but the one that's left is open as in you can put any data you want into it and you can put any
model you want into it but you don't have to rebuild the wheel and recreate all the boring stuff
like buying all the boring data dealing with all the prime brokers setting up all the
trading execution infrastructure all of that is so inefficient and we don't need so many people
doing that. And so my dream is that from the 10,000 hedge fund up, hedge fund number 10,000 will
decide, you know what I'm going to do? I'm just going to start sending these signals to numerai.
I hate trading. I don't have enough capital to keep this going, but I do think I have a good signal.
And according to numerai, it is a good signal. So I'm just going to start sending it to numerai.
And then hedge fund number 9,999 says the same thing and 998. And they slowly start basically exporting
their alpha to us because we can handle it more efficiently than anybody else
and because we can be incentive-aligned through staking.
I would totally concur that you don't need 10,000 hedge funds in America,
but basically the notion of having a single one, doesn't that kind of defy the purpose?
So basically, because I mean with the hedge fund,
what you're trying to do is you're trying to beat the market, right?
So in order to beat the market, someone else has to, you know, underperform, right?
Yeah, there's only so much you can have.
So, I mean, you can think about a limiting case.
Like instead, if you just say, well, let's just say there are only two hedge funds left.
And but, but both those two hedge funds, they actually do have a lot of, they, they, they, they do have a lot of data and they maybe, maybe you don't.
Yeah.
So I'm just saying you can actually, it's still possible that with, two.
just two hedge funds, if they had more data and more modeling power and other things like that,
you could have more efficient pricing in the market than if you had 10,000. So I think that's
undisputed. Like, it's possible that the 10,000 are actually only doing, only just incurring
trading costs and don't really have alpha. And actually, by the way, I should say, I think hedge funds
are good for society, but the hedge fund industry isn't good for society in its current
states. But it could be if we had fewer hedge funds and if we were more efficient about how we did
the whole whole thing. So yeah, in the limiting case, let's just say when it gets to just two
hedge funds left, I still think we can have a very efficient world. And then if you just stop thinking
about, then to go to the last hedge fund, you'd have to just stop thinking about Numeri as being
a hedge fund that's trying to compete. But really just think about Numeri as the stock market.
So the sub elements of Numeri are the models.
They are now the hedge funds.
They're the modelers.
And they're bringing data.
But Numeri is more just like infrastructure.
And we're just like making the trades happen, sending all the money to the companies who need it the most, you know, shorting all the bad companies.
And we're just like the new stock market.
Yeah.
Cool.
I think it's been super fascinating.
And I think no one can say you're not ambitious.
So tell us what's happening in the next year for Numeri?
Well, Signals is a big thing.
I'm very excited about it.
It's very key to Numeri.
I mean, nearly the best thing you can say, the best argument against Numeri is,
is what if the modeling part of hedge funds isn't that important?
And you get really good at modeling.
You get this amazing community.
And the modeling part isn't the magic.
magic is having data that no one else has. And so I think numerized signals is the answer to that.
It's like we're going to have the best intelligence and we're going to have an open
platform for anyone to submit new data to. And it's going to be trustable because it's saked.
So that is sort of phase two of the, of the master plan that I described many years ago.
Number one, monopolize intelligence. Number two, monopolize data. Number three, monopolize
monopolize money. Number four, decentralize the monopoly. That's it. Simple.
Cool. So, four step plan toward domination. Exactly. Thank you. This has been a very
illuminating interview. Thank you for coming on, Richard. Yeah, thank you guys. You asked incredibly
good questions. Thank you. Thank you. It doesn't end here. There's more to this conversation,
and you can hear it on Epicenter Premium. As a premium subscriber, you'll get access to
a private RSS feed where you can hear the interview debrief and get enhanced features like
full episode transcripts and chapters which allow you to easily skip to specific sections of
the interview. You'll also get exclusive access to roundtable conversations with Epicenter
hosts and bonus content we put out from time to time. Go to premium.competenter.tv
to become a subscriber and support the podcast.
