a16z Podcast - a16z Podcast: Data Network Effects

Starting point is 00:00:00 Hi, everyone. Welcome to the A6 and Z podcast. I'm Sonal. And today we're doing a podcast on data network effects. And we have two general partners here to have that conversation with us. We have Vijay Ponday who covers all things bio and Alex Rampel who covers all things fintech as well as other areas. Welcome, guys. All right. Thank you. Okay. So first let's just kick off by talking about what a data network effect is. In the most simplest form, it's a network effect that results from data. And if a network effect is defined as something where values, where the value to users and all the participants increase as more users use a particular platform or marketplace. How does this play out with data? So if you think about eBay, which is more people, more buyers go to eBay because more sellers go to eBay, more sellers go to eBay, because more buyers go to eBay. That is the canonical network effect. And their commerce is happening. That's the transaction. For a data network effect, typically there's no commerce per se. There's an extraction. You're either reading or writing,

Starting point is 00:00:58 and in most cases you're reading. And by reading or writing, you mean in the database sense, like reading to a database, right into a database. And as more people write, the value of each read goes up. That's the way of thinking about it. So an example would be the credit score.

Starting point is 00:01:11 I could figure out what your credit is by just looking at you and profiling you in legal ways, not illegal ways, and saying, here's what I think your proclivity to repay is. But if every bank on earth is using one central repository, then they will pay more money to actually extract, to read, the reads become far more valuable. And if a new company started tomorrow and said, hey, we're going to do credit scores and we're going to charge a dollar per extraction per read

Starting point is 00:01:37 and not $10 per extraction, well, there's nothing to extract. Like they can't actually provide any value. If they end up having more data than the current number one person, then they could charge a lot more than $10. They could charge $100. And in fact, the value of the number two person goes to zero because they actually have a demonstrably poor product, which is why there aren't really any competitors to eBay. Right. It's also a way people often talk about network, companies have network effects as winner take all markets. Yeah, which is generally the case, or winner take most, or winter take the vast majority. But I think of it is, if you think about it in the database sense of reads and rights, the reads just become disproportionately more valuable as more people

Starting point is 00:02:17 are using this central repository of data. On the medical side, there's interesting aspects that combine in with machine learning as well, because the database model is, I think, a very natural one. But then if you put data science and machine learning on top, the reads can become much more higher value because of the insights that you can gain from the data as well, especially these new modern machine learning methods like deep learning just crave data. And so often you have to reach a critical mass before they can even be used. Well, I'll give you another example, like Google right now and Facebook, when you think about translation services, which has nothing to do with fintech, but I want to go translate text, or I want to go look at images and figure out what they are.

Starting point is 00:02:53 Google, buy-do, Facebook, people that have large corpus, corpuses are just, they have such a huge advantage, because if I want to figure out translation at scale, and I have no data on which to draw, this isn't a read-write problem because you're not making a central repository, like eBay is a central repository where it's a marketplace where people trade, or the credit bureaus or marketplaces where people trade, or anti-fraud companies that work with lots of e-commerce companies, there's central marketplaces where people trade. Here, it's just, like, Google can have not the best computer science. They probably do have the best computer science. But imagine that they didn't, but they had the biggest corpus of data, they can just go acquire the best computer science. And the unfair advantage that they have, their data network effect is effectively, as they get better translation, they can actually use that to make their translation software even better. And they also have users to autocorrect that as well. So that's another example of a data network effect where, like, the corpus is the demonstrable advantage. So one thing that confuses me here, and I feel like we overuse this as a result, is that sometimes people conflate having a lot of data. To your point, in this case, the large corpus is required in order to create better results, which is a feature of machine and deep learning. But sometimes people conflate having a lot of data and say we have a data network effect. And that's actually not true. So how do we sort of travel from having data to actually having a data network effect that results from that data?

Starting point is 00:04:14 Yeah, you have to have a plan to actually do something with the data, right? And usually this is something where you can either provide something. thing of higher quality, let's say in diagnostics, because you know so many other results that you can actually do a better job at predicting and diagnosing, or do something cheaper. And obviously, the combination of the two, higher quality at lower cost is really a game changer. Well, the other thing is that having a lot of data is not a network effect if having a lot of data doesn't have a plan to make your data better. So go back to Credit Bureau, I have a lot of data, I'm Experian, therefore people write and read from me, and therefore I get more data and my data gets better, as opposed to, like,

Starting point is 00:04:50 look at Visa, you know, where Visa bought my company, and they have a tremendous amount of data. They could predict the U.S. economy down to, like, probably the ninth percentage point or decimal point. But knowing, having all of that data doesn't make their data better. People don't want to go transact. It's an output. So it's like exhaust. So a lot of data actually takes that form of an exhaust, and it makes it very, very valuable. But there's no network effect typically to exhaust type data, as opposed to when the data is actually, it's a key component of the business model. And there is this concept of more people want to write because more people want to read. More people want to read because more people want to write. And replace that

Starting point is 00:05:28 with the commerce aspect of like buyers and sellers. Right. So if you were to operationalize that and make even more concrete than that, one thing I've heard is that you have to have an algorithm to actually take the data out and then to your point add value back. I mean, how would you sort of operationalize this more concretely for people who are building products if they want to build data network effects? What should they do? Yeah, I mean, that really varies, obviously, in terms of the domain and the company, but you know, some sort of data science machine learning is very natural to be able to apply to this, but I think, you know, sometimes this doesn't have to be fancy machine learning

Starting point is 00:06:03 or anything like that, just the ability to monetize something from that, and really something where your company gets better. Yeah, I think part of the problem as well is that algorithms, like if you look at compression algorithms over time, there's one called LZW, which has been around for a very, very long time. it's pretty good. And then the next one that was better was maybe 1% better. The next one that was better was 2% better. And if you are an algorithm company, it's very, very hard to build any kind of value because somebody else comes up with a marginally better algorithm. So you need to pair the algorithm with the data. And there's actually a shift going on right now from outputting just the data to more outputting a learned aspect on top of that. That's the algorithm part. I mentioned fraud and anti-fraud companies. So there are a bunch of companies. There's one in our portfolio. called Signified. There's another one that I invested in as an angel a long time ago called SIF Science. And for a long time, many of these companies will tell you, okay, do we think this is risky? They'll tell you all of the answers that got pulled from the data. So go back to

Starting point is 00:07:01 credit reports. Well, credit reports are a combination of like what you did in your past. You got this thing when you got out of college. You didn't pay that loan on time. You were a deadbeat for this doctor, whatever, whatever. And then there's a credit score, which is a heuristic that's built on top of that. So what's actually interesting, if you look at credit reporting right, now, the goal of applying machine learning is to actually come up with a better heuristic. So this is the thing where you need the data repository, and ideally it's proprietary to you, because then you can extract more economic rent if you're building a company here. And then you want to have a better set of heuristics. On top of that, that's the algorithm.

Starting point is 00:07:38 And neither one alone is really sufficient. I mean, it is sufficient. I guess you could say if you have the data, the data network effect tends to be more valuable than the algorithm. But you can extract more value if you're not saying, here are 50 things for you to go to analyze on your own, and we're the only one that have access to that. And then you have to hire a team of 50 people to go analyze it. But now you actually have an algorithm that outputs you a decision. And you can use that decision. And that's an even bigger advantage for a tech company that has a data network effect. And while they're not formally related, usually one follows the other. If you're the one with the big giant corpus, you'll attract the very best data scientists because they want to dive

Starting point is 00:08:16 into that. They'll come up with the right features and the right ideas, and that will be another sort of effect on top. So how do you solve the chicken egg problem in this scenario? And by the chicken egg problem, we talk about the conundrum of where do you start. Like, an example you just shared, VJ, is it the corpus that comes first and then the data scientist? Or do you get the data scientist first to create that corpus? Like, how does it sort of come together? Yeah, there's a couple different strategies. One common strategy is to sell something at cost or not necessarily with huge margins in order to be able to gather data, you know, in principle 23 and me was doing something where they were getting these kits out and gathering huge data sets and then

Starting point is 00:08:50 downstream making big research deals. That's a canonical example, but that's not easy to do to build up that size so quickly. Yeah, another example is, I mean, Google didn't set out in whatever it was 1998 or whenever they were incorporated. A long time ago, almost 20 years ago to become a deep learning company. This was almost like, wow, we've been scanning the web forever. We have hundreds of thousands of servers or however many they have around the planet. We have all these images that we've stored. Now we have a corpus, and we also have a very profitable business. Let's go get a bunch of data scientists and machine learning people and figure out what we can do. So that's called the accident. That's the atypical one, but that is actually, it's atypical, but at the same time

Starting point is 00:09:30 it is quite typical because some of the best people out there today are working at companies like Google or like Facebook, and Facebook didn't want to be an image recognition company back in the day. It fell into it because they have such that enormous corpus. The other example is you kind of move up the value chain over time. So I'll talk about the fraud example here, where a lot of the anti-fraud companies, like Twitter has a fraud problem. But what is the economic impact of fraud on Twitter? It means that somebody opened an account and they've been spamming somebody or there's trust and abuse or things that don't have massive economic impact. They're annoyances, but they're not really, really problematic. Blue Nile has a much, much bigger problem. Blue Nile

Starting point is 00:10:11 sells diamonds online. So, as you know, diamonds are very, very expensive. There's very, very small, and, you know, one pound of diamonds is worth millions of dollars. So if you lose the equivalent of one pound in diamonds to fraud, like that's not good. Right. It has an severe economic consequence. So, you know, you can imagine on the fraud scale, and yet actually there's overlap because bad people tend to do lots of bad things. So somebody who's truly a bad person might open up a bad Twitter account and then actually steal a credit card number as well and then use that stolen credit card number to go steal a diamond. And then they might do all sorts of other and savory things as well. And the nice thing is that bad people, because they don't exist in pockets,

Starting point is 00:10:48 there is horizontal overlap here across all these different verticals. If you go, and it's almost like what Vijay was saying, where it's not even giving it away for free, because that's hard to sustain for too long, but you can go to people that have vast, fast numbers of rights, going back to the read-write analogy, so Twitter would be able to say, okay, we will give you information on everybody who's potentially a bad account or a good account. We'll just let you watch these people, not watch their data, but just profile them like, you know, here's their browser type, here's their IP address, here's a cookie that was on their machine, things like that. And now you build up 50 million bad people, and Twitter will pay a little bit of money for this,

Starting point is 00:11:24 not that much because there isn't a data network effect. Then you merge that with Tumblr, then you merge that with somebody else, and none of these people will pay that much, but now the value of a read is getting of substantial size to Blue Nile the Diamond Company or to any other e-commerce company, whereas if you went to Blue from scratch and you said, hey, you should use our anti-fraud technology and not these guys anti-fraud technology. A, you don't have a data network effect at all. So it's very hard to say, like, you might have a better algorithm, but again, it's hard to extract that much economic rent from a marginally better algorithm because it's only marginally better today and not tomorrow

Starting point is 00:11:58 potentially. And you don't have enough data as well. So you might bootstrap yourself by a different vertical. So as part of what you also touch on is this notion of pooling data among different sources, how does this play out in both fintech and bio? Because I would think if data is your advantage, and yet you need more data, especially in science, where you have open science and sharing, how do you then sort of overcome that sort of silo effect and create that shared central repository when everyone wants to protect their data? Yeah, it's a huge challenge on the health side because of things like HIPAA, which require anonymity and become natural barriers. But that's also there for an opportunity for the company that can put everything together.

Starting point is 00:12:39 But also, you know, what's interesting is that there just is so much data there. I mean, whether we're talking about data from clinical trials or from patients or from pharma. And so the opportunity is huge if a company can work out those logistical issues. Yeah, and likewise, I mean, it's very hard to get competitors to work together. So as an example, if you carry credit card debt, imagine that you have five credit cards. Every credit card company should want to know how much you're spending on the other credit cards. Because if you go, imagine that you decide I'm going to flee the country and renounce my U.S. citizenship and never pay any of my debt. debts back, and you have five credit cards that each have a $20,000 limit, well, you could just

Starting point is 00:13:13 go steal $100,000 with impunity, and that would be very bad. So Chase should want to know how much you're charging on your Amex card at any point in time. Amex doesn't want to tell Chase, and in many cases, this actually creates the opportunity for a separate company, and you anonymize everything, you wash it, you make sure that nothing is actually of discernible value, because if Amex is turning over their complete customer list to Chase every night, that would not be... Like, I can't imagine that agreement ever, ever happening. So part of what the data company does is they figure out how to sanitize it. They deal with the political issues, and then everybody benefits from being part of this cooperative.

Starting point is 00:13:52 And it's very hard to get these things off the ground. But the nice thing is that the companies themselves have left to their own devices will never do it. And yet at the same time, it's a very, very big problem for them. So is the ideal opportunity then for a startup to be sort of at that center of all these different players, like play a broker-like role or to try to create something in a its own vertical. I mean, like, where do the opportunities lie here for startups in both of your spaces and beyond? I think it really, I mean, I hate to say it depends, but it really depends. Because, I mean, in some cases, you're creating something new, and you're not really, I mean, like in the fraud case, it's not like you're extracting, like, very, very confidential

Starting point is 00:14:25 information and sanitizing it. Or there's a company called Yodali, which is very, very interesting. They are, like, every Fintech company pretty much on Earth right now is in some way, shape, reform using Yodali to aggregate information across all of these different financial services companies. So you have an E-Trade account, you have your IRA with fidelity, and you've got your bank account with Bank of America, and you want to put them in a mint-like interface, whether on mobile or on the desktop. Yotali is typically the player behind the scenes that's aggregating all of that, but then Yotali actually retains all of that information as well, and they can use it for, on an anonymized basis, their own purposes. That didn't exist before. So people are

Starting point is 00:15:06 doing all sorts of cool things on that data as well to figure out, you know, what's happening in the world. So it really depends on whether or not you're like there's the, I have to build a cooperative and there are only 10 companies that have this data and I'm going to be the UN between them. Sure, that's very, very valuable, but it's very, very hard to be the UN because these are very, very large monolithic companies that can't agree on anything and getting them to agree to work with you or anybody for that matter is a, that's an uphill battle. If you can get it, there's a lot of value there. I tend to like the company. is that they're not reliant on playing peacemaker with 10, but there are thousands, and then eventually

Starting point is 00:15:41 you can build up with thousands, and then, sure, those 10 have no choice but to use your information because acting in a centralized manner, it's so important, and there's nothing else quite like it, going back to the network effect piece. Yeah, I think there's a lot that the healthcare side can learn from the fintech side. My assessment of things is that it's maybe a little bit further behind, and there's a lot of different reasons for this. One reason is even just the use of electronic medical records or EMRs is only relatively recent. And that's really changing, but that's much more recent. And to speak to Alex's point,

Starting point is 00:16:11 there are generally just a few big players. There's not like a thousand health insurance companies or something like that. So there are these new challenges, but I think I'm always curious when Alex and I chat to see what tricks can be borrowed from the fintech space into the healthcare space. So one question, and you may not have the answers for it, but I think it's worth us discussing, is sort of the ethical implications of users in a system where the biggest value where the network effect now accrues from data. And as it is, users are always, you know, there's a lot of advocacy groups who say,

Starting point is 00:16:41 like, users should have the right to extract their data and do whatever they want with their own data, which is a separate point, but related in the sense that it touches on how much agency, who has that agency, and what is sort of the ethics associated with all of this? Vijay, we should probably start off with you because I think with HIPAA,

Starting point is 00:16:57 it's automatically a constraint in place. Yeah, there's HIPAA, which requires the anonymization. And sometimes that is not as obviously as you might think. not just removing someone's name. If someone has a scan of your brain, like an MRI of your brain, is that anonymized? Because maybe that could go back to you. Is your genome sequence anonymized? Just having that sequence alone might be enough to be able to connect it to you with a blood

Starting point is 00:17:20 test, probably it is. And so it's actually a much more of a profound sort of philosophical issue to think about. But on the flip side, the upside could be really quite huge. It could be the difference between pooling everyone's information to be able to predict whether you're going to get cancer or not. And I would like to have my information in there, and I'd like to know those things. And so there's going to be something that we're going to have to sort of figure out on the regulatory and policy side to figure out what's the best thing to balance these two forces. And the other interesting point there is that there is, I mean,

Starting point is 00:17:48 in economics, there's this concept of the public good or free rider problem. And you often have that. So going back to reads and writes, everybody wants to read, but nobody wants to write. And in many cases, like, I mean, if writing means giving your blood and actually going to a phlebotomist and getting blood withdrawn from you, like reading is very easy. easy. Reading is a lot of fun. Writing actually requires a lot of work. And so there are two ways that I think about that. That's obviously a health-related analogy. But I think of that about this in terms of, on the one hand, you've got kind of regulatory issues. And there's also just like a lack of consumer understanding. So I remember a good friend of mine who's not very literate

Starting point is 00:18:24 computationally or technologically was saying, oh my God, Alex, I have all these cookies on my computer, like I'm being tracked. This is terrible. How do it? Like cookies are dangerous. And I And I think some tech column has contributed to all this confusion over cookies. I was trying to explain to this friend that if you go to the New York Times, do you want to have to log in to the New York Times every time you go to New York Times. It's like, no, I'd hate to log in every time. That's annoying. It's like, well, that's what a cookie is doing. It's remembering on your own browser some information so the New York Times can reference you and actually de-anonymize you.

Starting point is 00:18:54 And then when he understood it that way, it was like, oh, I like cookies. It was just this kind of fundamental misunderstanding. The benefit to the user is sort of greater than, right. So part of it is it is the free rider thing of, like, sometimes providing data actually makes you better. Like, if I'm willing to give up more information for insurance purposes, like, okay, will I let my car insurance company see how fast I'm driving? And on the one hand, that sounds like really, really spooky, like, oh, my God, they're watching what I'm doing. And Big Brother of this in 1984 that, that sounds terrible. On the other hand, if I'm willing to give that up, and I show that I never drive past the speed limit, I never veer out of my lane.

Starting point is 00:19:32 You get a better insurance right? I get a better insurance. I take that. So part of it is like, it's not caveat emptor. It's like whatever the Latin phrase would be like, choose your own destiny kind of thing. Some people will value time more than money. Some people value money more than time. The same thing goes with privacy.

Starting point is 00:19:46 Some people value privacy more than money. Some people will value money more than privacy. And I think part of it is just making it transparent. So that's one side. The other side is just how you educate people. Right. I think how you talk about it to your point, transparency, how you talk about it. And sometimes giving users a choice to opt.

Starting point is 00:20:02 or out of a system. And also it doesn't have to be black or white, I think, especially with machine learning, you could learn features from data without having to share the data itself, and that's useful for IP or for HIPAA and so on. So I think there's a lot of ways that one can contribute to network effects without making your data even publicly known or even exchanging data necessarily. Right. And I think, actually, in most cases, the benefit of the doubt.

Starting point is 00:20:24 I mean, right now it's like the company that's using data is the evil company and they're up to some pernicious, whatever. and that's almost never the case. I just think that that's a lot of like Congress goes and investigates company XYZ because they're using data or what are they doing with consumer data. And part of it is like the default assumption is that these guys are out to get you. And in most cases, that's not true. And there are a lot of good things that do come from being part of this cooperative.

Starting point is 00:20:49 And I think as people do, like I would love it if you don't get charged more because that's where, like, how would people react poorly or negatively? poorly, your insurance company says, hey, we saw that you were speeding, you're getting charged a lot more. Can you imagine how terribly people would react to that? It's like up in arms, congressional inquiries, blah, blah, blah. On the other hand, if you got a giant rebate check from your insurance company saying, hey, you've been driving very safely, or you haven't gone to see the doctor in a long, long time, and the last time you went to go see the doctor, all your vitals were better. Here's a rebate check. People would love that. And that's coming from data as well. So I think part of it is just the psychology of how, how you reward people for sharing their data. When in many respects, it's already being shared anyway. You're right. Okay, so this has been helpful so far. So let's talk about the fact that we think data network effects are really important for software-based companies,

Starting point is 00:21:40 especially in this age, as you mentioned, in machine learning, deep learning, AI, all the things, kind of trends coming together. So what concretely can entrepreneurs do to, A, build data network effects or think more strategically about it early on versus by accident? And secondly, what do you want to see in pitches from entrepreneurs when they talk about data network effects. Yeah, so, you know, in terms of a startup, usually startups do well when they focus on one area.

Starting point is 00:22:04 And so the challenge here is that how can the data network effect really accelerate what they're doing? I think too often what happens is the data network effect almost suggests a side business or something like that. And so one challenge is how to think about what is the real go to market? Is the data network effect really germane and central and key to the focus?

Starting point is 00:22:24 And then how can you monetize it? How can you take advantage of? of it. What often happens is I think there's the aspiration for taking advantage of the data network effect or the assumption that it will just come. But often we see situations where maybe that plan hasn't been well thought out yet. And I would say that in many cases it's about going up the value chain. So starting at the bottom where your data doesn't, like you're accumulating rights with the purpose of hopefully charging for reeds down the road and or hopping across different verticals. So you start off in vertical X where, again,

Starting point is 00:22:57 you have your write heavy, which is great, because every right that you're getting is more data that you can eventually learn from. And even if you're not learning from, as we talked about, there's a network effect that might play out there. And then eventually you go into an area where it has high monetary value and you're charging for reeds, but you're still continuing to get rights along the way. And I think economics is really the best way of looking at how effective this really is, because there are a lot of people that claim, I have a data network effect, I have a data network effect. Or they'll say, I will have one eventually. And it's like, sure, like Google had one eventually or Facebook had one eventually. I mean, everybody has one eventually,

Starting point is 00:23:31 but it's very hard to prognosticate that eventuality or when that happens. And how do you make it more deterministic versus this random thing? That's where kind of economics comes in. So like imagine that you're at the stage where you actually are charging for your product, a good sign is that assuming that you kind of started off in the low monetary value area and now you're charging for reeds in the high monetary value area, if you are charging more than the incumbents, I mean, normally you say, oh, if I can like charge one-tenth as much, then it's going to be very disruptive and I'm shrinking the market, but you actually have the opportunity

Starting point is 00:23:58 to charge a lot more, value-based pricing. So if you can really show that you're charging 20, 30, 40% more than the competition, and they're actually willing to pay for it, and they're switching from a lower-priced product. Either they're totally irrational.

Starting point is 00:24:13 They say, hey, I want to lose more money this year and increase my cost, which, by the way, almost never happens. Or you've actually demonstrated in the eyes of many, many customers that they are willing to pay more because your data is better, and they're contributing back,

Starting point is 00:24:26 to this collective as well, which almost de facto means that you do have a data network effect. And it's not about prognosticating. It's like it's actually real. That's a great example. I will say that the exception, it seems to me, is in the very early days of a company where people are actually not quite, you can't really use the proxy of people paying quite yet. So then how do you sort of figure it out? There can be other network effects. I mean, Google and Facebook had clear other network effects and that sort of helped create the data network effects. So I think that's one mechanism by which this can be bootstrapped.

Starting point is 00:24:57 Or likewise, as long as the entrepreneur has a pretty clear plan and they have access to a lot of rights, and they don't necessarily have to be charging for those, but it just seems pretty evident. The hard thing is, in my example, on e-commerce fraud, how do you get Twitter to sign up to supply you with the rights? Or how do you get some other, like, massive publisher that doesn't really have that much economic downside from fraud, but has tons and tons of data if it was actually assigned to a company like this?

Starting point is 00:25:23 If you sign up 10 of those, it's almost very easy to see the blueprint of, wow, you've solved the biggest problem, which is you now have the rights. You have a different, more execution-oriented problem, which is how do you go charge for the reads and how do you show that you have enough value? But at least you've solved the right side of the database. So what I'm really hearing is a theme from you guys is, you know, it can happen by accident. But if you have a plan, if you're even aware and intentional about some of the decisions you make, those are all contributing factors to actually just create. and be better at building data network effects. And the other thing, going back to this winter take-all thing, is like you're never going to get to a network effect

Starting point is 00:25:59 if it's a, if there are 25 companies doing exactly what you do, and they're all about the same size, and nobody gets the big, that nobody has like a just demonstrably better system. Then the data is actually, it looks more like the algorithm. Remember how we talked about how the algorithm gets like 1% better every year,

Starting point is 00:26:17 and this company can out algorithm that company until they can't. The same thing goes for data. If nobody really gets to the, that critical point, then it's never going to be that much better and you can't charge excess rent either. Great. So any parting advice for entrepreneurs before we wrap it up? You know, on the data science side, my personal opinion is that some of the best data sciences are ones where people can go deep within the domain. And it's something where it's not just taking off the shell's algorithms and so on. And this is especially important in this case with

Starting point is 00:26:45 a data network effect because this is where, speaking to Alex's discussion of data and algorithms, often the two are really tightly connected and having a deep experience. in the domain and on the algorithm side can really bring that about. We call it founder market fit. It's almost like data algorithm founder fit. And there's a profound HR implication of this as well because the algorithm side is related to the data side

Starting point is 00:27:07 because if you have the best data, then guess what? You're going to be able to hire the best people. Because if you're an amazing statistician, you don't want to work on a database that has five rows on it. You want to work on a database that has five trillion rows in it. And if you have that, then you come up with a better algorithm and therefore more people want to contribute data,

Starting point is 00:27:23 you're getting more rights, therefore you're getting more reads, and that kind of continues on and on. And the HR component is very important, because the best people, again, you can attract them to a startup. I often advise companies where they say, oh, you know, we're going to hire these five data scientists, but they don't have any data yet. And what they don't really realize is that if they're data scientists

Starting point is 00:27:42 who are happy to take out the trash and, you know, clean the toilets and do all the other things that are fun about running a startup, then that's fantastic. But if they really, really are very laser-focused on this one task, they're going to burn out, or not even burn out, they're just going to leave because there's nothing to do. Chronologically, I mean, it's great if that's in the founding team DNA, but you also just have to be careful about not overbuilding until you actually know that you have enough there. When you say not overbuilding, you mean...

Starting point is 00:28:08 Well, I mean, I just look at a lot of companies where they say, wow, you know, we're going to high... We have 10 data, like here's a new ad network or it's a new this, and we have 20 data scientists and they're amazing. I see these companies all the time. They have like just, they're overweighted on the data science tide, and they have no data. And like what they don't realize is that like they're increasing their burn, like just, yeah, if they can't find these people and it's a 20-month hiring process, then okay, but they're going to lose these people unless they manage the other side of their network, unless they manage the supply side of the data, and actually figure out how they get those people, how they get the rights coming in the door, you're going to lose your team. That's the vicious cycle. The virtuous cycle is obviously the opposite. where if you have data, you can hire the best people. If you hire the best people, you get the best algorithms, you get the best clients, you get the best data. I'm glad you brought that up because you're actually focusing on the flywheel effect and for data network effects as including talent as a component.

Starting point is 00:29:02 Well, there's a flip side of this, which is that I think if the data science is tacked on too late, it also, I think has a strong negative. The chicken egg problem. So what Alex spoke to, I think he mentioned it being in the founder of DNA. I think that's what I love, where in terms of the vision of the company from the beginning, it's there. but not overbuilt, you know, before you're ready to go to war in that area. But for that to be in the founder of DNA, I think is perfect. Yeah, and I think part of that is just what is the architecture of, I mean, like, you know, I was mentioning the rows of the database.

Starting point is 00:29:34 If it's a non-technical team and they've got two columns to their database, they're not really collecting that much stuff. That makes it harder to actually append the data scientists to do all of these great things, especially if you end up becoming a data company by accident. So imagine that you are a background check company. What is an exhaust of a background check company? Well, it's how many people are applying for jobs? I mean, there's all sorts of interesting things on the data side,

Starting point is 00:29:59 but hopefully you are collecting things not in free form text entry, but you're doing it in a much more itemized way, in a much more like defined and controlled way or enumerated way, or pre-enumerated way, where you can do things that are much more relevant. So this is a bland generalization, but is it fair then to say that almost by definition, as every company becomes a software company, that every software company is by definition a data company?

Starting point is 00:30:23 Well, maybe. I think it just, part of it depends on how lucrative your primary business model is, because every company of scale has amazing exhaust. And the question is whether or not you want, like, you know, I'll go back to Visa, Visa's exhaust is very, very valuable. But if they shared that, then their clients wouldn't like them very much, and then they lose their clients. So even though their exhaust is worth, it's probably worth billions of dollars a year.

Starting point is 00:30:47 They can predict the economy, don't. to the... Oh, yeah. I mean, every hedge fund on Earth would pay for that. They just can't do it. And they're not making a mistake by not doing it. So, yes, they are a data company, but they also are a network. And being a network is probably more important than being a data company. But it really just depends on the particular use case. I mean, I think you will have different products. I mean, every company that gets to scale that's touching enough consumers or businesses will have the opportunity to have a very, very valuable data suite beyond just whatever they use for their own purposes. But the question is whether or not they want to. And part of that is just how lucrative. Like Apple could be the biggest, you know, insert the X there. A lot of that pertains to data. But Apple makes too much money selling an iPhone. So I don't think they're going to do that.

Starting point is 00:31:30 You know, towards that end, the accent that Alex referred to is often really an inevitability that, you know, this will happen. The question is, what do you do with it? And is it part of your core business or is it something that you have to leave it on the side? Well, thank you, and we'll talk more about this a lot. Thank you. Yeah, thank you.

a16z Podcast - a16z Podcast: Data Network Effects

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.