The a16z Show - a16z Podcast: Data Network Effects
Episode Date: March 8, 2016If network effects are one of the most important concepts for software-based businesses, then that may be especially true of data network effects -- a network effect that results from data. Particular...ly given the prevalence of machine learning and deep learning in startups today. But simply having a huge corpus of data does not a network effect make! So how can startups ensure they don't get a lot of data exhaust but get insight out of and add value to that data and the network? How can they make sure that the (arguably inevitable) data aspect of their business isn't just a sideshow or accident? How should founders strike the balance between not overbuilding/ building a data team vs. having enough data for those data scientists to work with in the first place? And finally, what are the ethical considerations of all this? The a16z general partners most focused on bio and fintech -- Vijay Pande and Alex Rampell -- join this episode of the a16z Podcast to share their observations and advice on all things data network effects. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
Hi everyone, welcome to the A6 and Z podcast. I'm Sonal, and today we're doing a podcast on data network effects. And we have two general partners here to have that conversation with us. We have Vijay Ponday, who covers all things bio, and Alex Rempell, who covers all things fintech as well as other areas. Welcome, guys. Thank you. Okay, so first let's just kick off by talking about what a data network effect is. In the most simplest form, it's a network effect that results from data. And if a network effect is defined as something where values, where the
value to users and all the participants increase as more users use a particular platform
or marketplace. How does this play out with data?
So if you think about eBay, which is more people, more buyers go to eBay because more
sellers go to eBay, more sellers go to eBay, more sellers go to eBay, because more buyers
go to eBay. That is the canonical network effect. And their commerce is happening. That's the
transaction. For a data network effect, typically the, there's no commerce per se. There's
an extraction. You're either reading or writing, and in most cases, you're reading.
And by reading or writing, you mean in the database sense, like reading to a database,
why you're into a database.
And as more people write, the value of each read goes up.
That's the way of thinking about it.
So an example would be the credit score.
I could figure out what your credit is by just looking at you and profiling you in legal ways, not illegal ways, and saying,
here's what I think your proclivity to repay is.
But if every bank on earth is using one central repository, then they will pay more money to actually extract to read.
the reeds become far more valuable.
And if a new company started tomorrow and said,
hey, we're going to do credit scores
and we're going to charge a dollar per extraction per read
and not $10 per extraction,
well, there's nothing to extract.
Like they can't actually provide any value.
If they end up having more data than the current number one person,
then they could charge a lot more than $10.
They could charge $100.
And in fact, the value of the number two person goes to zero
because they actually have a demonstrably poor product,
which is why there aren't really any competitors to eBay.
Right. It's also a way people often talk about network, companies have network effects as winner-take all markets.
Yeah, which is generally the case, or winner-take most, or winter-take the vast majority.
But I think of it is, if you think about it in the database sense of reads and rights, the reads just become disproportionately more valuable as more people are using this central repository of data.
On the medical side, there's interesting aspects that combine in with machine learning as well.
Because the database model is, I think, a very natural one. But then if you put data science,
and machine learning on top, the reads can become much more higher value because of the insights
that you can gain from the data as well, especially these new modern machine learning methods
like deep learning just crave data. And so often you have to reach a critical mass before they can
even be used. Well, I'll give you another example like Google right now and Facebook.
When you think about translation services, which has nothing to do with fintech, but I want to go
translate text or I want to go look at images and figure out what they are. Google, Bidu, Facebook,
people that have large corpus, corpuses are just, they have such a huge advantage because if I want to figure out translation at scale and I have no data on which to draw, this isn't a read-write problem because you're not making a central repository. Like eBay is a central repository where it's a marketplace where people trade or the credit bureaus or marketplaces where people trade or anti-fraud companies that work with lots of e-commerce companies. There's central marketplaces where people trade. Here it's just like Google can have,
not the best computer science.
They probably do have the best computer science.
But imagine that they didn't,
but they had the biggest corpus of data.
They can just go acquire the best computer science.
And the unfair advantage that they have,
their data network effect is effectively,
as they get better translation,
they can actually use that to make their translation software even better.
And they also have users to autocorrect that as well.
So that's another example of a data network effect
where, like, the corpus is the demonstrable advantage.
So one thing that confuses me here,
and I feel like we overuse this as a result,
is that sometimes people can flate,
having a lot of data, to your point in this case, the large corpus is required in order to create
better results, which is a feature of machine and deep learning. But sometimes people conflate
having a lot of data and say, we have a data network effect. And that's actually not true.
So how do we sort of travel from having data to actually having a data network effect that results
from that data? Yeah, you have to have a plan to actually do something with the data, right?
And usually this is something where you could either provide something of higher quality,
let's say in diagnostics, because you know so many other results that you can actually do a better job at predicting and diagnosing or do something cheaper.
And obviously the combination of the two, higher quality, at lower cost, is really a game changer.
Well, the other thing is that having a lot of data is not a network effect if having a lot of data doesn't have a plan to make your data better.
So go back to Credit Bureau.
I have a lot of data.
I'm Experian.
Therefore, people write and read from me.
And therefore, I get more data and my data gets better as opposed to, like, look at Visa.
where Visa bought my company,
and they have a tremendous amount of data.
They can predict the U.S. economy down to probably the ninth percentage point or decimal point.
But having all of that data doesn't make their data better.
People don't want to go transact.
It's an output.
So it's like exhaust.
So a lot of data actually takes that form of an exhaust,
and it makes it very, very valuable.
But there's no network effect typically to exhaust type data
as opposed to when the data is actually,
it's a key component of the business model.
And there is this concept of more people want to write because more people want to read, more people want to read, more people want to write, and replace that with the commerce aspect of like buyers and sellers.
Right. So if you were to operationalize that and make even more concrete than that, one thing I've heard is that you have to have an algorithm to actually take the data out and then to your point, add value back in.
I mean, how would you sort of operationalize this more concretely for people who are building products if they want to build data network effects?
What should they do?
Yeah, I mean, that really varies, obviously, in terms of the domain and the company,
but, you know, some sort of data science machine learning is very natural to be able to apply to this.
But I think, you know, sometimes this doesn't have to be fancy machine learning or anything like that,
just the ability to monetize something from that.
And really something where your company gets better.
Yeah, I think part of the problem as well is that algorithms, like if you look at compression algorithms over time,
there's one called LZW, which has been around for a very, very long time.
it's pretty good.
And then the next one that was better
was maybe 1% better.
The next one that was better
was 2% better.
And if you are an algorithm company,
it's very, very hard
to build any kind of value
because somebody else comes out
with a marginally better algorithm.
So you need to pair the algorithm
with the data.
And there's actually a shift
going on right now
from outputting just the data
to more outputting
a learned aspect
on top of that.
That's the algorithm part.
I mentioned fraud
and anti-fraud companies.
So there are a bunch of companies.
There's one in our portfolio.
called Signified. There's another one that I invested in as an angel a long time ago called SIF Science. And for a long time, many of these companies will tell you, okay, do we think this is risky? They'll tell you all of the answers that got pulled from the data. So go back to credit reports. Well, credit reports are a combination of like what you did in your past. You got this thing when you got out of college. You didn't pay that loan on time. You were a deadbeat for this doctor, whatever, whatever. And then there's a credit score, which is a heuristic that's built on top of that. So what's actually interesting, if you look at credit reporting right,
now, the goal of applying machine learning is to actually come up with a better heuristic.
So this is the thing where you need the data repository, and ideally it's proprietary to you,
because then you can extract more economic rent if you're building a company here.
And then you want to have a better set of heuristics.
On top of that, that's the algorithm.
And neither one alone is really sufficient.
I mean, it is sufficient.
I guess you could say if you have the data, the data network effect tends to be more valuable
than the algorithm.
But you can extract more value if you're not.
saying, here are 50 things for you to go to analyze on your own, and we're the only one that
that have access to that, and then you have to hire a team of 50 people to go analyze it,
but now you actually have an algorithm that outputs you a decision, and you can use that
decision, and that's an even bigger advantage for a tech company that has a data network effect.
And while they're not formally related, usually one follows the other.
If you're the one with the big giant corpus, you'll attract the very best data scientists
because they'll want to dive into that.
They'll come up with the right features and the right ideas,
will be another sort of effect on top.
So how do you solve the chicken egg problem in this scenario?
And by the chicken egg problem, we talk about the conundrum of where do you start.
Like an example you just shared, Vijay, is it the corpus that comes first and then the data scientists?
Or do you get the data scientist first to create that corpus?
Like, how does it sort of come together?
Yeah, there's a couple different strategies.
One common strategy is to sell something at cost or not necessarily with huge margins in order to be able to gather data.
You know, in principle 23 and me was doing something where they were getting these kits out and gathering
huge datasets and then downstream making big research deals. That's a canonical example,
but that's not easy to do to build up that size so quickly. Yeah, another example is, I mean,
Google didn't set out in whatever it was 1998 or whenever they were incorporated. A long time ago,
almost 20 years ago to become a deep learning company. This was almost like, wow, we've been
scanning the web forever. We have hundreds of thousands of servers or however many they have around
the planet. We have all these images that we've stored. Now we have a corpus, and we also have
a very profitable business, let's go get a bunch of data scientists and machine learning people
and figure out what we can do. So that's called the accident. That's the atypical one, but that is actually,
it's atypical, but at the same time it is quite typical because some of the best people out there today
are working at companies like Google or like Facebook, and Facebook didn't want to be an image
recognition company back in the day. It fell into it because they have such that enormous corpus.
The other example is you kind of move up the value chain over time. So I'll talk about the fraud example
here where a lot of the anti-fraud companies, like Twitter has a fraud problem, but what is the
economic impact of fraud on Twitter? It means that somebody opened an account and they've been
spamming somebody or there's trust and abuse or things that don't have massive economic impact.
They're annoyances, but they're not really, really problematic. Blue Nile has a much, much bigger problem.
Blue Nile sells diamonds online. So, as you know, diamonds are very, very expensive. They're
very, very small, and, you know, one pound of diamonds is worth millions of dollars. So if you lose
the equivalent of one pound in diamonds to fraud, like, that's not good. Right. It has an economic,
severe economic consequences. So, you know, you can imagine on the fraud scale, and yet actually
there's overlap because bad people tend to do lots of bad things. So somebody who's truly a bad person
might open up a bad Twitter account and then actually steal a credit card number as well, and then
use that stolen credit card number to go steal a diamond. And then they might do all sorts of other
and savory things as well.
And the nice thing is that bad people,
because they don't exist in pockets,
there is horizontal overlap here
across all these different verticals.
If you go, and it's almost like what Vijay was saying,
where it's not even giving it away for free,
because that's hard to sustain for too long,
but you can go to people that have vast, fast numbers of rights,
going back to the read-write analogy,
so Twitter would be able to say,
okay, we will give you information
on everybody who's potentially a bad account
or a good account.
We'll just let you watch these people,
not watch their data,
but just profile them,
like, you know, here's their browser type, here's their IP address, here's a cookie that was
on their machine, things like that. And now you build up 50 million bad people. And Twitter will pay
a little bit of money for this, not that much because there isn't a data network effect.
Then you merge that with Tumblr. Then you merge that with somebody else. And none of these people
will pay that much. But now the value of a read is getting of substantial size to Blue Nile the
diamond company or to any other e-commerce company. Whereas if you went to Blue Nile from scratch and
you said, hey, you should use our anti-fraud technology and not these guys' anti-fraud technology,
A, you don't have a data network effect at all. So it's very hard to say, like, you might have a
better algorithm, but again, it's hard to extract that much economic rent from a marginally better
algorithm because it's only marginally better today and not tomorrow potentially. And you don't
have enough data as well. So you might bootstrap yourself by a different vertical.
So as part of what you also touch on is this notion of pooling data among different sources,
How does this play out in both fintech and bio?
Because I would think if data is your advantage and yet you need more data, especially
in science, where you have open science and sharing, how do you then sort of overcome that sort of
silo effect and create that shared central repository when everyone wants to protect their data?
Yeah, it's a huge challenge on the health side because of things like HIPAA, which require
anonymity and become natural barriers.
But that's also there for an opportunity for the company that can put everything together.
But also, you know, what's interesting is.
is that there just is so much data there. I mean, whether we're talking about data from clinical
trials or from patients or from pharma. And so the opportunity is huge if a company can work out
those logistical issues. Yeah, and likewise, I mean, it's very hard to get competitors to work together.
So as an example, if you carry credit card debt, imagine that you have five credit cards.
Every credit card company should want to know how much you're spending on the other credit cards.
Because if you go, imagine that you decide I'm going to flee the country and renounce my U.S. citizenship
and never pay any of my debts back. And you have five credit cards.
credit cards that each have a $20,000 limit, well, you could just go steal $100,000 with impunity,
and that would be very bad. So Chase should want to know how much you're charging on your Amex
card at any point in time. Amex doesn't want to tell Chase, and in many cases, this actually
creates the opportunity for a separate company, and you anonymize everything, you wash it,
you make sure that nothing is actually of discernible value, because if Amex is turning over their
complete customer list to Chase every night, that would not be, like I can't imagine that
agreement ever, ever happening. So part of what the data company does is they figure out how to
sanitize it. They deal with the political issues and then everybody benefits from being part of this
cooperative. And it's very hard to get these things off the ground. But the nice thing is that the
companies themselves have left to their own devices will never do it. And yet at the same time,
it's a very, very big problem for them. So is the ideal opportunity then for a startup to be sort
of at that center of all these different players, like play a broker-like role or to try to create
something in its own vertical. I mean, like, where did the opportunities lie here for startups in both
of your spaces and beyond? I think it really, I mean, I hate to say it depends, but it really
depends. Because, I mean, in some cases, you're creating something new and you're not really, I mean,
like in the fraud case, it's not like you're extracting, like, very, very confidential information
and sanitizing it. Or there's a company called Yodaly, which is very, very interesting. They are, like,
every Fintech company pretty much on Earth right now is in some way, shape, or form using Yodali to
aggregate information across all of these different financial services companies. So you have an e-trade
account, you have your IRA with fidelity, and you've got your bank account with Bank of America,
and you want to put them in a mint-like interface, whether on mobile or on the desktop, Yodali is
typically the player behind the scenes that's aggregating all of that. But then Yotali actually retains
all of that information as well, and they can use it for, on an anonymized basis, their own
purposes. That didn't exist before. So people are doing all sorts of cool things. And
things on that data as well to figure out, you know, what's happening in the world. So it really
depends on whether or not you're, like, there's the, I have to build a cooperative and there are
only 10 companies that have this data and I'm going to be the UN between them. Sure, that's
very, very valuable, but it's very, very hard to be the UN because these are very, very large
monolithic companies that can't agree on anything and getting them to agree to work with you or
anybody for that matter is a, that's an uphill battle. If you can get it, there's a lot of value there.
I tend to like the companies that they're not reliant on playing peacemaker with 10, but there are thousands, and then eventually you can build up with thousands, and then, sure, those 10 have no choice but to use your information, because acting in a centralized manner, it's so important, and there's nothing else quite like it, going back to the network effect piece.
Yeah, I think there's a lot that the healthcare side can learn from the fintech side. My assessment of things is that it's maybe a little bit further behind, and there's a lot of different reasons for this. One reason is even just the use of electronic medical records or EMRs, is,
is only relatively recent.
And that's really changing, but that's much more recent.
And to speak to Alex's point, there are generally just a few big players.
There's not like a thousand health insurance companies or something like that.
So there are these new challenges, but I think I'm always curious when Alex and I chat
to see what tricks can be borrowed from the fintech space into the healthcare space.
So one question, and you may not have the answers for it, but I think it's worth us discussing,
is sort of the ethical implications of users in a system where
the biggest value where the network effect now accrues from data. And as it is, users are always,
you know, there's a lot of advocacy groups who say, like, users should have the right to extract
their data and do whatever they want with their own data, which is a separate point, but related
in the sense that it touches on how much agency, who has that agency, and what is sort of the ethics
associated with all of this. VJ, we should probably start off with you because I think with HIPAA,
it's automatically a constraint in place. Yeah, there's HIPAA, which, you know, requires the
anonymization. And sometimes that is not as obviously you might think. It's not just removing someone's
name. If someone has a scan of your brain, like an MRI of your brain, is that anonymized?
Is that because maybe that could go back to you? Is your genome sequence anonymized?
Just having that sequence alone might be enough to be able to connect it to you with a blood test,
probably it is. And so it's actually a much more of a profound sort of philosophical issue to think about.
But on the flip side, the upside could be really quite huge. It could be the difference between
pooling everyone's information to be able to predict whether you're going to get cancer or not.
And I would like to have my information in there, and I'd like to know those things.
And so there's going to be something that we're going to have to sort of figure out on the regulatory
and policy side to figure out what's the best thing to balance these two forces.
And the other interesting point there is that there is, I mean, in economics, there's this
concept of the public go-to-free rider problem.
And you often have that.
So going back to reads and writes, everybody wants to read, but nobody wants to write.
And in many cases, like, I mean, if writing means giving your blood and actually going to a
lobotomous and getting blood withdrawn from you.
Like, reading is very easy.
Reading is a lot of fun.
Writing actually requires a lot of work.
And so there are two ways that I think about that.
That's obviously a health-related analogy.
But I think of that about this in terms of, on the one hand, you've got kind of regulatory
issues.
And there's also just like a lack of consumer understanding.
So I remember a good friend of mine who's not very literate computationally or technologically
was saying, oh my God, Alex, I have all these cookies on my computer, like I'm being
track. This is terrible. How do it? Like, cookies are dangerous and I have, and I think some tech
column has contributed to all this confusion over cookies. I was trying to explain to this friend that
if you go to the New York Times, do you want to have to log into the New York Times every time you go
to New York Times.com? It's like, no, I'd hate to log in every time. That's annoying. It's like,
well, that's what a cookie's doing. It's remembering on your own browser some information
so the New York Times can reference you and actually de-anonymize you. And then when he understood
it that way, it was like, oh, I like cookies. It was just this kind of fundamental misunderstanding.
user sort of greater than, right? So part of it is it is the free rider thing of like sometimes
providing data actually makes you better. Like if I'm willing to give up more information for
insurance purposes, like, okay, will I let my car insurance company see how fast I'm driving?
And on the one hand, that sounds like really, really spooky. Like, oh my God, they're watching
what I'm doing. And then big brother this in 1984 that, that sounds terrible. On the other hand,
if I'm willing to give that up and I show that I never drive past the speed limit,
and I never veer out of my lane.
You get a better insurance rate.
I get a better insurance.
I take that.
So part of it is like, it's not caveat emptor.
It's like whatever the Latin phrase would be like, choose your own destiny kind of thing.
Some people will value time more than money.
Some people value money more than time.
The same thing goes with privacy.
Some people value privacy more than money.
Some people will value money more than privacy.
And I think part of it is just making it transparent.
So that's one side.
The other side is just how you educate people.
Right.
I think how you talk about it to your point, transparency, how you talk about it.
and sometimes giving users a choice to opt in or out of a system.
And also it doesn't have to be black or white, I think, especially with machine learning,
you could learn features from data without having to share the data itself,
and that's useful for IP or for HIPAA and so on.
So I think there's a lot of ways that one can contribute to network effects
without making your data even publicly known or even exchanging data necessarily.
Right. And I think, actually, in most cases, the benefit of the doubt.
I mean, right now it's like the company that's using data is the evil company,
and they're up to some pernicious, whatever.
And that's almost never the case.
I just think that that's a lot of, like,
Congress goes and investigates company XYZ
because they're using data
or what are they doing with consumer data.
And part of it is, like, the default assumption
is that these guys are out to get you.
And in most cases, that's not true.
And there are a lot of good things
that do come from being part of this cooperative.
And I think as people do, like,
I would love it if you don't get charged more
because that's where, like,
how would people react poorly or negatively?
poorly, your insurance company says, hey, we saw that you were speeding, you're getting charged
a lot more. Can you imagine how terribly people would react to that? It's like up in arms,
congressional inquiries, blah, blah, blah. On the other hand, if you got a giant rebate check
from your insurance company saying, hey, you've been driving very safely, or you haven't gone to
see the doctor in a long, long time, and the last time you went to go see the doctor, all your vitals
were better. Here's a rebate check. People would love that. And that's coming from data as well.
So I think part of it is just the psychology of how you reward people for sharing their data.
When in many respects, it's already being shared anyway.
You're right.
Okay, so this has been helpful so far.
So let's talk about the fact that we think data network effects are really important for software-based companies,
especially in this age, as you mentioned, in machine learning, deep learning, AI, all the things kind of trends coming together.
So what concretely can entrepreneurs do to, A, build data network effects or think more strategically about it early on versus by accident?
And secondly, what do you want to see in pitches from entrepreneurs when they talk about data network effects?
Yeah. So, you know, in terms of a startup, usually startups do well when they focus on one area.
And so the challenge here is that how can the data network effect really accelerate what they're doing?
I think too often that what happens is the data network effect almost suggests a side business or something like that.
And so one challenge is how to think about what is the real good of market?
Is the data network effect really germane and central and key to the data network effect?
the focus, and then how can you monetize it? How can you take advantage of it? What often happens
is I think there's the aspiration for taking advantage of the data network effect or the assumption
that it will just come. But often we see situations where maybe that plan hasn't been
well thought out yet. And I would say that in many cases it's about going up the value chain,
so starting at the bottom, where your data doesn't, like you're accumulating rights with the
purpose of hopefully charging for reads down the road and or hopping,
across different verticals. So you start off in vertical X where, again, you have your right
heavy, which is great, because every right that you're getting is more data that you can eventually
learn from. And even if you're not learning from, as we talked about, there's a network effect
that might play out there. And then eventually you go into an area where it has high monetary
value and you're charging for reeds, but you're still continuing to get rights along the way.
And I think economics is really the best way of looking at how effective this really is,
because there are a lot of people that claim, I have a data network effect, I have a data
network effect. Or they'll say, I will have one eventually. And it's like, sure, like Google had one
eventually or Facebook had one eventually. I mean, everybody has one eventually, but it's very hard
to prognosticate that eventuality or when that happens. And how do you make it more deterministic
versus like this random thing? That's where kind of economics comes in. So like imagine that you're at the
stage where you actually are charging for your product, a good sign is that assuming that you kind of
started off in the low monetary value area and now you're charging for reeds in the high
monetary value area, if you are charging more than the incumbent,
I mean, normally you say, oh, if I can charge one-tenth as much, then it's going to be very disruptive and I'm shrinking the market.
But you actually have the opportunity to charge a lot more, value-based pricing.
So if you can really show that you're charging 20, 30, 40 percent more than the competition, and they're actually willing to pay for it, and they're switching from a lower-priced product.
Either they're totally irrational.
They say, hey, I want to lose more money this year and increase my cost, which, by the way, almost never happens.
Or you've actually demonstrated in the eyes of many, many customers that they are willing to pay.
pay more because your data is better. And they're contributing back to this collective as well,
which almost de facto means that you do have a data network effect. And it's not about prognosticating.
It's like it's actually real. That's a great example. I will say that the exception, it seems to me,
is in the very early days of a company where people are actually not quite, you can't really use
the proxy of people paying quite yet. So then how do you sort of figure it out? There can be other
network effects. I mean, Google and Facebook had clear other network effects and that sort of helped
create the data network effect. So I think that's one mechanism by which this can be bootstrapped.
Or likewise, as long as the entrepreneur has a pretty clear plan and they have access to a lot of
rights and they don't necessarily have to be charging for those, but it just seems pretty evident.
Like the hard thing is, in my example on e-commerce fraud, how do you get Twitter to sign up
to supply you with the rights? Or how do you get some other like massive publisher that doesn't
really have that much economic downside from fraud, but has tons and tons of data if it was
actually assigned to a company like this. If you sign up 10 of those, it's almost very easy to
see the blueprint of, wow, you've solved the biggest problem, which is you now have the rights.
You have a different, more execution-oriented problem, which is how do you go charge for the
reads and how do you show that you have enough value? But at least you've solved the right side of
the database. So what I'm really hearing is a theme from you guys is, you know, it can happen by
accident, but if you have a plan, if you're even aware and intentional about some of the decisions
you make, those are all contributing factors to actually just create and be better at building
data. And the other thing, going back to this winner take all thing, is like you're never going
to get to a network effect if it's a, if there are 25 companies doing exactly what you do,
and they're all about the same size, and nobody gets the big, that nobody has like a just
demonstrably better system, then the data is actually, it looks more like the algorithm.
Remember how we talked about how the algorithm gets like 1% better every year, and this company
can out algorithm that company until they can't. The same thing goes for data. If nobody really
gets to that critical point, then it's never going to be that much better, and you can't
charge excess rent either. Great. So any parting advice for entrepreneurs before we wrap it up?
You know, on the data science side, my personal opinion is that some of the best data science is
ones where people can go deep within the domain. And it's something where it's not just taking
off the shell's algorithms and so on. And this is especially important in this case with a data
network effect because this is where speaking to Alex's discussion of data and algorithms, often
the two are really tightly connected and having a deep experience in the domain and on the algorithm
side can really bring that about. We call it founder market fit. It's almost like data algorithm
founder fit. And there's a profound HR implication of this as well because the algorithm side is
related to the data side because if you have the best data, then guess what, you're going to be
able to hire the best people. Because if you're an amazing statistician, you don't want to work on a database
that has five rows on it. You want to
work on a database that has five trillion rows in it. And if you have that, then you come up with a
better algorithm and therefore more people want to contribute data. You're getting more rights,
therefore you're getting more reads, and that kind of continues on and on. And the HR component
is very important because the best people, again, you can attract them to a startup. I often
advise companies where they say, oh, you know, we're going to hire these five data scientists,
but they don't have any data yet. And what they don't really realize is that if these are,
If they're data scientists who are happy to take out the trash and, you know, clean the toilets and do all the other things that are fun about running a startup, then that's fantastic. But if they really, really are very laser-focused on this one task, they're going to burn out or not even burn out. They're just going to leave because there's nothing to do. Chronologically, I mean it's great if that's in the founding team DNA. But you also just have to be careful about not overbuilding until you actually know that you have enough there.
When you say not overbuilding, you mean...
Well, I mean, I just look at a lot of companies where they say, wow, you know, we're going to hire, we have 10 data, like here's a new ad network or it's a new this, and we have 20 data scientists and they're amazing.
I see these companies all the time.
They have like just, they're overweighted on the data science side and they have no data.
Yeah.
And like what they don't realize is that like they're increasing their burn.
Like just, yeah, if they can't find these people and it's a 20-month hiring process, then okay.
But they're going to lose these people unless they manage the other side of their network.
unless they manage the supply side of the data
and actually figure out how they get those people,
how they get the rights coming in the door,
you're going to lose your team.
That's the vicious cycle.
The virtuous cycle is obviously the opposite,
where if you have data, you can hire the best people.
If you hire the best people,
you get the best algorithms,
you get the best clients, you get the best data.
I'm glad you brought that up
because you're actually focusing on the flywheel effect
and for data network effects
as including talent as a component.
Well, there's a flip side of this,
which is that I think if the data science
is tacked on too late,
It also, I think, has some strong negatives.
The chicken egg problem.
So what Alex spoke to, I think he mentioned it being in the founder of DNA.
I think that's what I love, where in terms of the vision of the company from the beginning, it's there, but not overbuilt, you know, before you're ready to go to war in that area.
But for that to be in the founder of DNA, I think is perfect.
Yeah.
And I think part of that is just what is the architecture of, I mean, like, you know, I was mentioning the rows of the database.
if it's a non-technical team and they've got two columns to their database, they're not really collecting that much stuff.
That makes it harder to actually append the data scientists to do all of these great things,
especially if you end up becoming a data company by accident.
So imagine that you are a background check company.
What is an exhaust of a background check company?
Well, it's how many people are applying for jobs.
I mean, there's all sorts of interesting things on the data side,
but hopefully you are collecting things not in free-form text entry,
but you're doing it in a much more itemized way,
in a much more like defined and controlled way or enumerated way,
or pre-enumerated way,
where you can do things that are much more relevant.
So this is a bland generalization,
but is it fair then to say that almost by definition
as every company becomes a software company,
that every software company is by definition a data company?
Well, maybe.
I think it just, part of it depends on how lucrative
your primary business model is,
because every company of scale has amazing exhaust.
and the question is whether or not you want,
like, you know, I'll go back to Visa.
Visa's exhaust is very, very valuable,
but if they shared that,
then their clients wouldn't like them very much
and then they lose their clients.
So even though their exhaust is worth,
it's probably worth billions of dollars a year.
They can predict the economy down to the line.
Oh, yeah.
I mean, every hedge fund on Earth would pay for that.
They just can't do it.
And they're not making a mistake by not doing it.
So, yes, they are a data company,
but they also are a network.
And being a network is probably more important
than being a data company.
But it really just depends on the particular use case. I mean, I think you will have different products.
I mean, every company that gets to scale that's touching enough consumers or businesses will have the
opportunity to have a very, very valuable data suite beyond just whatever they use for their own purposes.
But the question is whether or not they want to. And part of that is just how lucrative.
Like Apple could be the biggest, you know, insert the X there. A lot of that pertains to data.
But Apple makes too much money selling an iPhone. So I don't think they're going to do that.
You know, towards that end, the accent that Alex referred to is often really an inevitability that, you know, this will happen.
The question is, what do you do with it? And is it part of your core business or is it something that you have to leave it on the side?
Well, thank you, guys. And we'll talk more about this a lot.
Thank you.
Yeah, thank you.
