a16z Podcast - a16z Podcast: Revisiting the Gene

Episode Date: January 10, 2018

The complete sequencing of the human genome is one of the most powerful examples of technology and science in action: We've gone from needing $3 billion and over 13 years to read a single human genome... to today, to where we can do that same amount of work for about $1,000 in roughly 2 days -- and the price will only continue to drop. But beyond pricing, what does understanding the gene -- and moving from the sequencing layer to the applications layer -- mean to us; what new questions arise now that we can sequence DNA quickly, reliably, and cheaply? This conversation -- with co-founder and CEO of Jungla Carlos Araya and co-founder and CEO of Freenome Gabe Otte, moderated by a16z General Partner Jorge Conde (based on a discussion that took place at a16z’s annual Summit in November 2017) -- takes a step back and considers all these questions. Every time a human genome sequence is completed, there are on the order of 3,000,000 new variants identified. So how do we think about interpreting all that data? Actionability? And how do we derive meaning from all this, for applications in the clinical space? ––– The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments and certain publicly traded cryptocurrencies/ digital assets for which the issuer has not provided permission for a16z to disclose publicly) is available at https://a16z.com/investments/. Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.

Transcript
Discussion (0)
Starting point is 00:00:00 The content here is for informational purposes only, should not be taken as legal business tax or investment advice or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. For more details, please see A16Z.com slash disclosures. Hello and welcome to the A16Z podcast. Sequencing the human gene has dramatically changed how we understand how we as human beings are coded. We're now entering a phase of building an applications layer on top of the sequencing layer. So how do we make sense of and apply all this new information that genomics gives us? And what will this translate into as it meets the realities of the health care system? This conversation, which took place at our annual
Starting point is 00:00:42 summit event in November 2017, includes Carlos Orya, co-founder and CEO of Jungla, and Gabe Ott, co-founder and CEO of Freenome, and was moderated by A16Z general partner Jorge Condé. The first Human Genome Project took $3 billion over 13 years to generate a single human genome. Today we can do that same amount of work for about $1,000 in a couple of days. In 1999, a writer by the name of Matt Ridley writes a book called The Genome, An Autobiography of a Species in 23 chapters. That book was a fascinating example of the optimism that we all had as the first human genome project was coming to an end. But what we've also learned is that reading the DNA,
Starting point is 00:01:24 isn't the same thing as understanding it. We've gone through this period of really trying to make sense of all of this information. But what's extraordinary is that now that we can sequence DNA quickly and reliably and cheaply, we've created this incredible sequencing layer
Starting point is 00:01:40 on top of which we can build applications. So I think that's one of the big questions is how do we think about actionability? How do we think about deriving meaning from our ability to interpret? Both of you are developing applications in the clinical space. So let's start with Junglo.
Starting point is 00:01:55 Carlos, if memory serves every time a new human genome sequence is completed, there are on the order of three million new variants identified. That's right. So, you know, people talk about missing the forest for the trees. How do you make sense of all of that information? How do you figure out which trees matter in that forest? You start by really trying to form context. And so although we nowadays have access to.
Starting point is 00:02:23 our genomes. When we go to interpret our genomes, there's really, you know, only a certain number of places in the genome that are relevant for any given condition that we're considering. And putting the information in context of a condition, in context of a family history, in context of other tests, is really important. And so you start there, and then you then look at individual genes that are associated with these. Typically, it's genes. There are other types of elements. And we look for the effects of variation in there. And one of the interesting things is that while, yes, it costs roughly a thousand dollars and going down to the hundreds now per genome to acquire the data, the cost of interpreting that data is actually really high. Although there are three million
Starting point is 00:03:08 variants identified, there will be roughly a hundred variants that are novel variants in disease associated genes. So these are places of the genome that we know really matter. And interpreting each one of those under the current clinical practices costs $50 to $100. So we're talking about a hundred to thousand-fold increase in the cost of interpretation relative to the cost of data acquisition. So we build models computational and experimental to provide variant interpretation teams guidance that can tell them or help them understand how these variants relate to a molecular and cellular effects so they can interpret these.
Starting point is 00:03:43 And for the most part, are you looking at variants that are associated with an increased risk of disease, or are you looking at variants that are causing a disease? Most part, it's increasing risk of disease. There are, of course, causal variants. For many genes, there's a spectrum. There are mutations that have very strong effects, and then a gradient of mutations that have lower effects. And understanding these differences is really, you know, it's an ongoing challenge. If we look across all of the disease-associated genes that we know today, we only have clinical interpretations for roughly 0.6% of the possible mutations in them. So 0.6%. 0.6%. Wow. Yeah. How do you actually communicate this information to a physician? Because the vast
Starting point is 00:04:26 majority of doctors out there are not geneticists. Agreed. And, you know, something interesting that's happened is that while it is, you know, physicians and doctors who order the test, increasingly this interpretation is happening in the back end in really the genetic test provider space, where you have dedicated teams of interpreters that are doing the work of classifying these mutations. And so the goal is really, I think, to give to physicians really very clear guidance on what the effects of mutations are. And when we don't know, we need to say, also, we really don't know. What we do is really we build these models that can say, okay, for all these other mutations, you know, 9.4% of mutations that we don't know. Here is a subset for which we can make predictions.
Starting point is 00:05:10 And this is how well the predictions do. and here's the diagnostic metrics of value. They're prospectively tracked and they can be audited. Speaking of diagnostics, Gabe, you're working on a diagnostic application as well using DNA. So walk us through what you're doing. I want to understand how what we're doing now with this kind of technology is different than how we've all sort of historically viewed diagnostics. Sure.
Starting point is 00:05:31 It's good to take a step back and talk about like what do we mean by a gene or what do we mean by the DNA right now, right? So a bunch of people got 23 and me done. The truth of the matter is, is 23 and me sequences is trying to capture an information about you from less than 1%. And that's just the static DNA, what we think of as like DNA that we're born with. But DNA is not static. DNA is actually incredibly dynamic,
Starting point is 00:05:57 and it changes in all sorts of ways. This is the reason why twins with the same DNA have very different outcomes. Think of it within your own bodies as well, right? In your bodies, there are neurons that are a meter long, single cell meter long and then there are white blood cells that are literally turning over every day but they have the same DNA how does that happen how do we get these radically different phenotypes the truth of the matter is is what is in your DNA is probably far less consequential than how
Starting point is 00:06:27 your DNA is being used and when these genes are being turned on and off because the truth of the matter is is less than 1% of your DNA is being used by any particular cell and so it really matter what that 1% is that ultimately makes your cells what they are, makes you who you are. So when you're looking at it from sort of a non-static perspective, the DNA that's, for example, floating in your blood that is turning over every 20 minutes can give you an insight of what's happening in your body. Why are the cells in your body dying at that moment? And what is the composition of the cells in your body? These are all relevant dynamic information that can be read out from your dynamic DNA that you can't get from your static DNA. And the majority of the
Starting point is 00:07:16 focus in the past 10, 20 years on our DNA has been really focused on that static DNA and not to mention a very small sub fraction of that static DNA. What we're looking at at Freenome is that dynamic DNA as assayed from our blood sample that allows us to get an instantaneous snapshot of your molecular health, which will then allow us to know whether you have a particular disease like cancer. So this is an important distinction because I think one of the things that was so incredibly exciting and promising regarding this ability to have a sequencing layer on top of which we could build diagnostic applications was this idea that you sequence once and every time you want to test, it's just a software query. So the idea of every time you want to do a new test,
Starting point is 00:08:02 you basically have to redo the biology, that goes away if you're looking at inheriting. risk. But what you're describing is that since DNA is dynamic, you would actually sequence in time periods, much like we get, you know, dental x-rays every year. Yeah, if you're really talking about being able to get a sense of how your body is changing over time, you'd have to sequence multiple times because the DNA that you're born with, that static DNA is not deterministic enough where you can predict everything from that single point of sequencing. And so we're talking about how of your body, how DNA is so dynamic in your body. Does that mean that the applications for what you could diagnose using this approach are very broad? Or is it just cancer? Well, I think it really
Starting point is 00:08:45 depends on what type of DNA you're looking for, right? If you're looking at DNA fragments that are in your bloodstream that are coming from the cancer cells, and that's all you're focusing on, then presumably you can only detect cancer. What we're detecting is DNA fragments that are actually coming from the immune cells that are turning over in your body. And if you can capture how your immune system is changing at different times, it's sort of the common denominator to all disease conditions. If there's something wrong with you, chances are your immune system is changing in some way. And yes, that signal is extremely convoluted, and it's very hard to figure out what type of change is specific to a particular disease state. But that's where things like
Starting point is 00:09:27 machine learning, artificial intelligence comes in to help us figure out the specific signal so that we can turn that into a specific diagnostic for a disease. But the underlying biology should theoretically enable us to detect any diseases where there is an immune change. So you guys have laid out the promise of this approach. Let's talk a little bit about the potential peril. Carlos, you were talking about what to do when you find a variant that has an unknown clinical significance. And there was a lawsuit in Oregon where a woman had a hysterectomy because her physician misinterpreted the genetic test. Yeah, that's right. And it's really unfortunate because this is a very common
Starting point is 00:10:06 situation where she had a family history. She got a genetic test. The genetic test actually came back negative, but they included in the test information that said that there was a variant that was found that was of unknown significance. And so the test very clearly indicated there was no clinically significant mutation found by the sort of practicing guidelines. It was a negative test. Yet for, I think, reasons that, at least to me, are unknown. The recommendation for her was that she would have a bilateral mastectomy and the hysterectomy. And it's, I think it was particularly sad because the mutation was actually in a gene that's not associated strongly, at least, with breast cancer. It was in MLH1, I believe. And, you know, there's a strong association
Starting point is 00:10:50 with colorectal cancer and endometrial cancer, but there really was no strong support for a breast cancer association. To me, this poses the challenges of how complex, really, this data has gotten to communicate even to physicians, and then to have that message go clearly to patients. That's an incredible story. Yeah. So, look, thinking about Freenome, you know, perhaps this is unfair comparison or an example, but if we think about early screening, we've been using mammograms for 30, 40 years, and the data suggests that while we've actually done a lot of early detection of breast cancer using mammography, the number of late-stage cancers, breast cancers, actually hasn't gone down. So that implies that we've actually over-diagnosed things,
Starting point is 00:11:33 or in some cases diagnosed the wrong things. Is there an analogy here, or is that a worry for you, and if so, how do you control it? So there's a lot of concerns in terms of launching a diagnostic. I think what you're talking about is really on the technology, the science side of things, right? And this is one of the really interesting things for us, because breast cancer and specifically mammography as a screening method has a false positive rate of 50%. 50%. So from a false positive perspective, you're better off flipping a coin than, you know, doing a mammography, really. And there's a reason for this, which is that no clinical trial that we've done has ever been large enough. You don't actually compensate for all the false positive
Starting point is 00:12:15 cases that could potentially happen, all the false negative cases that could potentially happen in a clinical trial. And so once you launch these diagnostics into the market, the only direction the performance goes is down. It really doesn't really go up, right? Because there are all these edge cases that you didn't account for. For the first time in our history,
Starting point is 00:12:35 we're able to compensate for that problem, where because we can now make a diagnostic that's fundamentally AI-based, even after we launched a test into the market, we can actually work with our partners to get results of, the test that we sell back so that we can teach the artificial intelligence that it made mistakes after we've sort of launched a test and it never makes that mistake again. So for the first time,
Starting point is 00:13:00 we have an opportunity to make the direction of the accuracy of a test after we launch it go up as opposed to go down. And that's really the only way we're going to compensate for this because the largest clinical trial that's ever been announced but hasn't been performed yet is 120,000 thousand people. Last year, 35 million people should have gotten screened for colorectal cancer in the United States alone and didn't. 35 million. 35 million. Right. So if you can capture even a fraction of that market and learn from that information and make sure that you don't make mistakes from those tests ever again, then all of a sudden you have a clinical trial that's an order of magnitude, if not too, greater than the largest clinical trial that's ever been
Starting point is 00:13:40 announced. So what you're both doing is so potentially transformative for our health system. Who pays for this? How does this get covered? Especially on the diagnostic side, reimbursement is a particularly tough issue. There's so many stakeholders in health care that it is not clear who pays for it. If you're looking at the average statistics, generally when you're launching a diagnostic test, only about 20% of the tests that you sell actually get fully reimbursed. So 80% of a test that you're selling is actually not being paid for properly. Long story short is most of the time, payers don't see a clear return on investment that the diagnostic test represent. So if I pay $500 for this test now, am I actually going to make that money back because I'm
Starting point is 00:14:28 detecting this disease earlier so we don't have to spend as much money curing this person when the disease has progressed further? That's a question that we need to be able to answer clearly before we can get these tests paid for by the pairs. I think we're stuck in this model where we're relying on pairs to pay for these tests. There are new models that are coming out that's leveraging life insurance companies that's leveraging these closed systems where these hospitals, their own payers that are easier to sort of convince of the value of the tests. I think we're going to see leveraging of these new models much more, but it's still into early days. Excellent. Thank you, Carlos. Thank you, Gabe, for being here. You're working in a fascinating
Starting point is 00:15:12 space, and it's clear you're going to change how we think about disease forever and for always. So thank you. Thank you. Thank you, guys.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.