The a16z Show - When AI and Genomics Collide

Starting point is 00:00:00 We built a language model for biology. The big problem in biology is that biology is hard. Biology is really hard. Biology is very hard. So why are you doing it? What is the big win? I mean, all of it is an AI-enabled architecture. Every part of our technology stat is intrinsically AI-enabled. Where do you think this confluence between AI and life sciences goes from here?

Starting point is 00:00:20 There are so very few people who actually have the language of both disciplines that are able to bring them together. If you are a listener to this podcast, you're probably familiar with Moore's Law, where the number of transistors on an integrated circuit has doubled every two years. But what you might not know is the cost of sequencing the genome has fallen even faster. Since the Human Genome Project in 2003, the cost has fallen from about $1 billion to less than $1,000 today. And now in 2023, these two trends meet. and one of the people at this fascinating intersection is Daphne Collar. Daphne, longtime professor in computer science at Stanford and co-founder of Coursera, has decided to step back into the arena with her company in CITRO, right at this intersection

Starting point is 00:01:13 of computation and biology. In fact, the name is even a nod to this intersection as a blend of in silico and in vitro. And in today's episode, you'll get to hear A16Z Bio and Health General Partner. Fiji Ponday, speak with Daphne about the why now, but also how machine learning is fundamentally changing our ability to understand genetics. Without machine learning, without AI, the space would be so complex and so high-dimensional

Starting point is 00:01:43 that you couldn't even make sense of it far less bridge between those two different worlds. So, what can this unlock? We can now finally, for the first time, measure biology at scale. And this is like a truly engineered approach to discovery. Listen in to get a glimpse into what may be a new era of digital biology and how this AI wave is really not just a moment of opportunity in bits, but also atoms. The next frontier of the impact that AI can have is when AI starts to touch the physical world.

Starting point is 00:02:20 I think this convergence is a moment in time for us to make a really, big difference using tools that exist today that did not exist even five years ago. This episode also continues our coverage of A6 and Z's exclusive AI Revolution event from just a few weeks ago, which housed some of the most influential builders across the ecosystem, including the founders of OpenAI, Anthropic, Character AI, Reblocks, and more. So be sure to check out the full package, including all of the talks in full at A6Z.com slash AI Revolution. As a reminder, the content here is for informational purposes only,

Starting point is 00:03:02 should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16C.com slash disclosures. Daphne is like the OG's OG in AI.

Starting point is 00:03:32 She was a pioneer at Stanford in different areas of AI, especially in PGMs. She left Stanford to co-found Coursera with Andrewing, and actually is now the founder, CEO of Enstitro, a tech bio company using AI to develop drugs in life sciences. So Daphne, given all the things you could be doing, life sciences. It's one of the really hard and really important problems. And there is very few things that are as challenging, as exciting, as intervening in a safe and effective way in human health. And so it's just a thing that absolutely needs to be done if we are going to use AI for good, which I think is one of the things that I think I at least really strive to do. The second part of the answer is why now?

Starting point is 00:04:21 And what brought me back to this field back in 2016 post-Coursera was to realize, that we can now finally, for the first time, measure biology at scale, both at the cellular level, sometimes at subcellular level, and at the organism level via ways of quantitating human biology. And that gives us, for the very first time, the ability to deploy machine learning in ways where it is truly meaningful to do that, because the data sets are large enough for really interesting machine learning methods to be deployed. And the third part of the answer is, well, okay, but why me? And I am a big believer in leverage.

Starting point is 00:05:04 That is places where you can have a disproportionately large impact. And because of the facts that I had spent a large part of my Stanford career, working in these two spaces simultaneously, core machine learning on the one hand and machine learning and service of biomedical data on the other, I actually have the ability to sort of bridge the chasm between these two very disparate disciplines. And when I was leaving Coursera in 2016, and I looked around me and I saw that machine learning was changing the world, it wasn't having much of an impact in the life sciences.

Starting point is 00:05:34 And I believe one of the main reasons for that is because there are so very few people who actually have the language of both disciplines that are able to bring them together. So I felt like I could have impact in AI across many things, but sure I could have disproportionate impact. Well, you know, you spoke about the why now. What's your take on AI for life sciences? What's the why now there and what's different now than even what we could do even just, let's say, five years ago? So I think it comes back to this ability to collect, but even more than collect and generate data at scale. So one of the things that we have at Encitro that is truly unique is we have a data factory. We have put together the tools that have been developed by people who are taking pluripotent stem cells,

Starting point is 00:06:20 which are cells from you or me or anyone in this audience and turning them into this pluripotent stem cells. which can make a Daphne neuron in a dish or a Daphne hepatocyte. And vision is going to be different than the VJ neuron and the VJ hepatocyte because we have different genetics. And that's going to manifest in how these cells behave and how these cells look in different measurements. We can engineer those to introduce a disease-causing mutation and ask what does that disease-causing mutation do to a Daphne neuron versus what does it do to a VJ neuron? And what does this mutation do versus that mutation? Just a quick note for the audience, pluripotent stem cells are unique cells. that are able to undergo self-renewal.

Starting point is 00:06:57 The term pluripotency in particular speaks to these cells being able to give rise to all cell types, including the ectoderm like the skin or nervous system, the endoderm like the liver or respiratory tract, and the mesoderm like bone or muscle. All right, back to Daphne speaking to just how special this application is. So we're able to kind of do data generation on spec, and that is a truly, a truly unique capability, which frankly is not that easy to do even in other areas where AI is being deployed. You don't get to make your own data in many cases, but here we do. And that creates both really important discovery opportunities for life sciences, but also really cool and interesting

Starting point is 00:07:39 machine learning problems. Well, maybe you could dive a little deeper and give an example. So, like, your paper on the posh approach, I think that came out on an archive. Could you double click on that, tell people what you did there? Especially, like, why is AI in life science is a big deal? What could you hope to get? So, first of all, let me tell you. a little bit about that platform, which is called posh or pooled optical screening in humans, you take a bunch of cells and you put them with a pool of CRISPR guides that edit them, and each cell gets a different guide. So now you have a bunch of cells, each with a dynamically diverse mutation. And now they're all sitting there in a pool. You can measure them with a microscope.

Starting point is 00:08:14 You can measure them as they move around and do their thing. You can basically fix them, and you sequence the barcode that came with a guide. So now you can say, oh, this cell that got this guy behaved this way and this other cell behave that way. I can tell you that one of the really challenging things about cells is because they're live. If you put different cells in different wells, then they each have a slightly different environment and you get subtle differences and it's really hard to reconcile. When they're all in the pool, you eliminate all of those artifacts and all of a sudden you have the ability to measure a genome-wide CRISPR screen, basically.

Starting point is 00:08:49 So 20,000 genes in the genome, all modifying the same. cellular background in the same dish with a different genetic intervention and you're measuring that in a genome-wide scale in like 10 or 12 plates in two weeks. Now imagine doing that rinse repeat and doing genome-wide scale on this genetic background or in this cell type and so you can really start to decipher the genotype phenotype connection and the effect in which individual genetics makes a difference on cellular phenotypes which we then translate to what we believe they will have in terms of clinical impact. And that is the beginning of an understanding of what it is that we want to modify in order to have meaningful therapeutic interventions. And this is like

Starting point is 00:09:33 a truly engineered approach to discovery. Another quick reminder for those of you, like me, where biology class was a little further back than you'd like to admit, genotype refers to the genetic makeup of an organism, while phenotype refers to the observable characteristics, like hair color or blood type that are yielded from a genotype, but also an organism's environment. This is really important because phenotype or the observable impact isn't only due to genetics, as sometimes even the same genes can produce different results. One such example of this is in honeybees, in which colonies will have the same genome, but very significantly in phenotype, like size, shape, or behavior of queen bees.

Starting point is 00:10:18 And as Daphne said, new technologies are bringing us closer. to understanding this very important connection. Well, the biology part is really critical because now you get the data, and we all know how important that is. But I think one of the things that's really found intriguing is the creation of a latent space for human biology and especially be able to tell the difference between disease

Starting point is 00:10:38 and non-disease or even different disease phenotype. So how does that come about? And especially, you know, how is AI driving it? So actually, I'm going to go back one step further because he said, of course, one of the things we need to do is get the data. And I should have mentioned that it's a... impossible to run this instrument without AI being built into it, because you can't even segment the

Starting point is 00:10:56 cells. You can't call the barcodes. I mean, all of it is an AI-enabled architecture. Every part of our technology stat is intrinsically AI-enabled. But then, to your point, Vijay, now you have a whole bunch of cellular images and what do you do with them. And so the first thing we do is we built this latent space. We built a language model for biology. Now everyone's an expert in language models. You just have to explain this to people's like, oh, language of biology, no one knew what I was talking about, but now it's like, I'm just saying, look, it's just like GPT, but for cells. So we have the language of, you know, cells and what cells look like or the transcriptional or gene expression profiles of cells, and you measure hundreds of millions of cells in different

Starting point is 00:11:36 states. And now with a much more limited amount of data, because we have this latent space, than just like the large language models for natural language, with a small amount of data, you can start asking, okay, how does disease move you? Like a disease causes you? It's like a disease cause you. gene from one place to the other. How does a treatment move you hopefully back in the opposite direction from the disease state back to the healthy state? And that's super powerful, kind of like other language models, you have this. It keeps getting better. The more data you feed it. And let me just say, this is not just for cellular data, the other source of data that we use is clinical data. So we do the same thing with histopathology. There's so much more in histopathology

Starting point is 00:12:15 than your pathologist typically looks at. In MRI data, your radiologist doesn't see more than like a small percentage of what's there in your radiology images, but also not just imaging. There's also other modalities where there's an equal amount of information left on the table. And over time, we're learning these languages of different biological modalities and the ability to translate between them. I think this concept of foundation model for biology is particularly exciting because, you know, 10 years ago you could have ML that was predicted.

Starting point is 00:12:45 You just needed maybe 100 actives. And the problem is like if you have 100 cases, examples of a drug that works, you don't need to design a drug. And so these low shot, zero shot approaches that come from a foundation model are really night and day. So how far this is go?

Starting point is 00:13:00 I mean, the big problem in biology is that biology is hard. Biology's really hard. Biology's very hard. So why are you doing it? What is the big win? Where does this go, let's say, by the end of the decade? Like, what could you hope to do that we couldn't do before? We want to come up with a very, almost like, systematic recipe for how do you go from a decision that I want to work on ALS or I want to work on fatty liver disease through a sequence of steps towards something that results in a meaningful intervention in the right patient population.

Starting point is 00:13:31 The hope is that by the end of this decade, we will have built this process. We will have run through it a number of times. We will have delivered some medicines to patients in our first tranche of indications. but then we will have learned enough from that so that we can now say, okay, and here's how we're going to do it here and here and here, and because it's not only machine learning that moves forward over time, it's also the biological tools that we're relying on.

Starting point is 00:13:56 I mean, used to be that there wasn't any CRISPR. There was just S-I-R-R-N-A, actually there wasn't even that. And then there's CRISPR base editing, and now there's CRISPR Prime that replaces entire regions of the genome. So the tools that we're building on also get better and better over time, which unlocks more and more diseases. that we could tackle a meaningful thing.

Starting point is 00:14:14 Well, let's step back for a second, because I think this may not be clear for everyone why biology is so hard. And one of the biggest reasons is that if we can do tons of experiments on mice, you know, I have to feel like it's a great time to be a rich mouse. You could be cured of any disease, right?

Starting point is 00:14:28 Like all these diseases can be cured in mice. But, you know, it's obviously unethical to experiment on people. And that's one of the big reasons why trials fail, right? Because when you go into a clinical trial, you spend all this money to get there, you're spending all this money,

Starting point is 00:14:41 hundreds of millions of dollars, in the trial, and turns out mice are different than people, and it fails. Yeah. So, like, how can AI help that? So first of all, and this notion of, you know, we can cure lots of mice is something that really drove our discovery strategy at EnCitro, which is all of our work is done in human and human derived systems that incorporates at least some subset of human cells working together. So that's one piece, and the nice thing about it is that it allows you to intervene in those systems

Starting point is 00:15:10 and ask the what-if questions, the counterfactual. It was like, what if I had this person's biology, but in a world where this gene was inactive versus active or the other way around? So that's great, but obviously you want to cure people, not cells or even organoids. And so the other source of data that we bring in is data from people, from clinical records. And what we end up doing is kind of bridging between the two using machine learning. So using machine learning on the cellular data, using machine learning on the human data, bridging between those in representation space, but also in genetic space,

Starting point is 00:15:46 because genetics is kind of like this thread that ties the two together. And without machine learning, without AI, the space would be so complex and so high dimensional that you couldn't even make sense of it far less bridge between those two different worlds. Yeah. Well, that makes sense. I'm curious to change years a bit and talk a bit about company building. Yeah.

Starting point is 00:16:08 And one of the interesting things that you've done is that you've brought together people that biology experts with people that are MLAI experts. And how do you build that culture? What does that look like, especially since they're from fairly different parts of the universe? So first of all, it may not have been obvious to everybody, but the company name in citro is actually the blend of encyclico and in vitro, in silico being the computer and in vitro being in the lab. Those elements of bringing those two strands together are so deeply woven even into our logo. And how do you build that is really hard because you take your...

Starting point is 00:16:42 average, you know, machine learning scientists and your average life scientist, even if they're very well intentioned, you put them into the room together, they might as well be talking Tai and Swahili to each other. The languages are different. The ways in which you think are totally different. So how do you create a shared language, a shared vision? And so there's a few approaches that we use. First of all, we hire some number of people. You can't get enough of them, unfortunately, who are in the middle, who are able to be translators and talk to both sides and kind of bring them together. And then I think the other really important part is that you create a culture and you hire very rigorously to that culture of people who are genuinely interested in engaging

Starting point is 00:17:23 with, you know, the other side. And we have a list of company values. And the final value, which is one that I hold particularly dear, it's last not because it's least importance, because they're ordered from what we do to how we do it, which is that we engage with each other openly, constructively, and with respect. Openly means an openness to asking really naive questions when you don't understand, and to accepting really naive suggestions from somebody else, because sometimes the best ideas come from an orthogonal mindset. It's something that I experienced even as a kid.

Starting point is 00:17:54 So my parents are scientists, and my mom warned me, no matter what I do, even though this is what I was doing as a kid because of programming as a kid, I should not get into programming and computer science. No one's ever going to make money by selling software. So I think maybe then, you know, especially for this audience here that are coming from the AI side, especially as AI gets into areas that are sort of not just the world of bits, but in the world of atoms. Yeah. Any advice for how to bridge those gaps?

Starting point is 00:18:21 I mean, first of all, having a deep respect for atoms is really important. Yeah, I mean, my closest friends are Adams. I think we're all Adams, but I think having an appreciation for the complexity of Adams and the facts that, especially, especially when your atoms are part of live systems, they behave in unexpected, unpredictable, idiosyncratic ways that sometimes cause a lot of pain. And I can tell you that when you do biological experiments, one of the strongest signals when you apply machine learning to it

Starting point is 00:18:54 is what was the technician who actually did the experiments? You could read that very clearly off the cells because they behave a little bit differently. They pipette a little bit differently. they treat the cells a little bit differently. It's amazing how hard that is to clean that up, which is one of the reasons why we spend so much of our time building robots because they do the same thing over and over again.

Starting point is 00:19:16 So I think having a lot of respect for atoms, but also I would say an appreciation for the fact that the next frontier of the impact that AI can have is when AI starts to touch the physical world. And we've all seen just how much harder, that is, we've all seen how hard it is, astonishingly, to build a self-driving car compared to building a chatbot. So having an appreciation for that complexity, but also an appreciation for the magnitude of the impact, if you can actually nail it. So one last topic, then we'll go into closing, but you're talking about life sciences in terms of health care and drug design, but there's a lot more to biology than just drugs, right? Where do you think this confluence between AI and life sciences goes from here?

Starting point is 00:20:03 So I actually think that there is this incredible opportunity at this point, at this intersection between the two fields. And I think about it from a little bit of a historical perspective of think back on the history of science. And at certain times in our history, there have been eras where a particular scientific discipline has made incredible amounts of progress in a relatively short amount of time because there was kind of like a click. where we started to see the world in a different way, or there was a tool that wasn't available before. So if you think back to the late 1800s, that was chemistry, where we suddenly realized, we couldn't really turn lead into gold.

Starting point is 00:20:46 And there was this thing called the periodic table, and there were electrons, and it really shifted chemistry. And then in the early 1900s, obviously, that discipline was physics, and the connection between energy and matter and between space and time completely shifted our understanding of the universe. In 1950s, that discipline was computing, where we get these machines that perform calculations that up until that point only a human was able to perform. And then in 1990s, there was this interesting bifurcation.

Starting point is 00:21:16 On the one side, there was data science that drew on computers, but also had elements of neuroscience and optimization and statistics and ultimately gave us modern-day machine learning and AI. And then the other side was what I think of as quantitative biology, which was the first, time when we actually started to measure biology in a scale that was more than like trapped three genes across an experiment that took five years. And that was the first micro-ray data and the first human genome and so on and so forth. And I think this is the time when those last two disciplines are actually going to merge. And they're giving us an era of what I think of as digital biology, which is the ability to measure biology at unprecedented stability in scale, interprets the

Starting point is 00:22:03 unbelievable masses of data, different biological scales and different systems, using the tools of machine learning data science, and then bring that back to engineer biology using tools like CRISPR, genome editing, and so on, so that we can make biology do things that it would otherwise not want to do. Well, like what? So I think there's obviously, as we said, applications in human health, but I think there's applications in agriculture. I don't think we need to tell anybody anymore, although there's still some people who

Starting point is 00:22:30 might need to hear about the impact of climate change on our world and the fact that we need to have props that are much more resistant to drought and severe weather. And to feed 10 billion people. To feed 10 billion people. I think there is opportunities in the environment to maybe do better carbon sequestration using plans or algae or who knows what. I actually wish I knew more about that because that is my alternate life would have been to do that. Well, there's still time, right?

Starting point is 00:22:58 Well, there's still time. So are you funding me for that, BJP? You come up with the deck, we'll talk. Okay. Yeah, yeah. So there's that. I think there's, you know, bio-materials and so on. There's so many opportunities at this intersection

Starting point is 00:23:15 that I would encourage any of you in this audience for looking for something truly aspirational and exciting to do. I think this convergence is a moment in time for us to make a really, big difference in the world that we live in using tools that exist today that did not exist even five years ago. Yeah, with that, I think that's the opportunity at hand. Maybe we'll wrap up there. Let's thank Definitely one more time. If you like this episode, if you made it this far, help us grow the show. Share with a friend or if you're feeling really ambitious, you can leave us a review at rate thispodcast.com slash A6CZ. You know, candidly, producing a podcast can sometimes

Starting point is 00:23:58 feel like you're just talking into a void. And so if you did like this episode, if you like any of our episodes, please let us know. I'll see you next time.

The a16z Show - When AI and Genomics Collide

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.