a16z Podcast - When AI and Genomics Collide

Starting point is 00:00:00 We built a language model for biology. The big problem in biology is that biology is hard. Biology is really hard. Biology is very hard. So why are you doing it? What is the big win? I mean, all of it is an AI-enabled architecture. Every part of our technology stat is intrinsically AI-enabled.

Starting point is 00:00:17 Where do you think this confluence between AI and life sciences goes from here? There are so very few people who actually have the language of both disciplines that are able to bring them together. If you are a listener to this podcast, you're probably familiar with Moore's Law, where the number of transistors on an integrated circuit has doubled every two years. But what you might not know is the cost of sequencing the genome has fallen even faster. Since the Human Genome Project in 2003, the cost has fallen from about $1 billion to less than $1,000 today. And now in 2023, these two trends meet, and one of the people

Starting point is 00:00:58 at this fascinating intersection is Daphne Collar. Daphne, longtime professor in computer science at Stanford and co-founder of Coursera, has decided to step back into the arena with her company in CITRO, right at this intersection of computation and biology. In fact, the name is even a nod to this intersection as a blend of in silico and in vitro.

Starting point is 00:01:22 And in today's episode, you'll get to hear A16Z Bio and Health General Partner, VJ Ponday, speak with Daphne about the why now, but also how machine learning is fundamentally changing our ability to understand genetics. Without machine learning, without AI, the space would be so complex and so high-dimensional

Starting point is 00:01:43 that you couldn't even make sense of it far less bridge between those two different worlds. So what can this unlock? We can now, finally, for the first time, measure biology at scale. And this is like a truly, engineered approach to discovery. Listen in to get a glimpse into what may be a new era of digital biology,

Starting point is 00:02:06 and how this AI wave is really not just a moment of opportunity in bits, but also atoms. The next frontier of the impact that AI can have is when AI starts to touch the physical world. I think this convergence is a moment in time for us to make a really big difference using tools that exist today that did not exist even five years ago. This episode also continues our coverage of A6 and Z's exclusive AI Revolution event from just a few weeks ago, which housed some of the most influential builders across the ecosystem, including the founders of OpenAI, Anthropic, Character AI, Reblocks, and more. So, be sure to check out the full package, including all of the talks in full at A6CZ.com

Starting point is 00:02:54 slash AI Revolution. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16c.com slash disclosures. Daphne is like the OG's OG in AI. She was a pioneer at Stanford in different areas of AI, especially in PGMs. She left Stanford to co-found Coursera with Andrew Ng, and actually is now the founder, CEO of Enstitro,

Starting point is 00:03:44 a tech bio company using AI to develop drugs in life sciences. So Daphne, given all the things you could be doing, why life sciences? It's one of the really hard and really important problems. And there is very few things that are as challenging, as exciting, as intervening in a safe and effective way in human health. And so it's just a thing that absolutely needs to be done if we are going to use AI for good, which I think is one of the things that I think I at least really strive to do. The second part of the answer is why now?

Starting point is 00:04:21 And what brought me back to this field back in 2016 post-Coursera was that. the realization that we can now finally, for the first time, measure biology at scale, both at the cellular level, sometimes at subcellular level, and at the organism level via ways of quantitating human biology. And that gives us, for the very first time, the ability to deploy machine learning in ways where it is truly meaningful to do that, because the data sets are large enough for really interesting machine learning methods to be deployed. And the third, part of the answer is, well, okay, but why me? And I am a big believer in leverage. That is places where you can have a disproportionately large impact. And because of the facts that I had

Starting point is 00:05:11 spent a large part of my Stanford career working in these two spaces simultaneously, core machine learning on the one hand and machine learning and service of biomedical data on the other, I actually have the ability to sort of bridge the chasm between these two very disparate disciplines. And when I was leaving Coursera in 2016 and I looked around me and I saw that machine learning was changing the world, it wasn't having much of an impact in the life sciences. And I believe one of the main reasons for that is because there are so very few people who actually have the language of both disciplines that are able to bring them together. So I felt like I could have impact in AI across many things, but sure, I could have disproportionate impact.

Starting point is 00:05:50 Well, you know, you spoke about the why now. What's your take on AI for life sciences? What's the why now there? And what's different now than even what we could do even just, let's say, five years ago? So I think it comes back to this ability to collect, but even more than collect and generate data at scale. So one of the things that we have at Encitro that is truly unique is we have a data factory. We have put together the tools that have been developed by people who are taking pluripotent stem cells, which are cells from you or me or anyone in this audience, and turning them into this pluripotent status which can make a Daphne neuron in a dish or a Daphne hepatocyte.

Starting point is 00:06:29 And vision is going to be different than the VJ neuron and the VJ hepatocyte because we have different genetics and that's going to manifest in how these cells behave and how these cells look in different measurements. We can engineer those to introduce a disease-causing mutation and ask what does that disease causing mutation do to a Daphne neuron versus what does it do to a VJ neuron and what does this mutation do versus that mutation. Just a quick note for the audience. pluripotent stem cells are unique cells that are able to undergo self-renewal.

Starting point is 00:06:57 The term pluripotency in particular speaks to these cells being able to give rise to all cell types, including the ectoderm like the skin or nervous system, the endoderm like the liver or respiratory tract, and the mesoderm like bone or muscle. All right, back to Daphne speaking to just how special this application is. So we're able to kind of do data generation on spec, and that is a truly unique capability, which frankly is not that easy to do even in other areas where AI is being deployed.

Starting point is 00:07:29 You don't get to make your own data in many cases, but here we do, and that creates both really important discovery opportunities for life sciences, but also really cool and interesting machine learning problems. Well, maybe you could dive a little deeper and give an example, so like your paper on the posh approach, I think, that came out on an archive. Could you double-click on that, tell people what you did there?

Starting point is 00:07:49 Especially, like, why is AI and life science is a big deal? What could you hope to get? So, first of all, let me tell you a little bit about that platform, which is called posh or pooled optical screening in humans. You take a bunch of cells and you put them with a pool of CRISPR guides that edit them, and each cell gets a different guide. So now you have a bunch of cells, each with a dynamically diverse mutation. And now they're all sitting there in a pool.

Starting point is 00:08:13 You can measure them with a microscope. You can measure them as they move around and do their stuff. you can basically fix them, and you sequence the barcode that came with a guide. So now you can say, oh, this cell that got this guide behaved this way, and this other cell behaved that way. And I can tell you that one of the really challenging things about cells is because they're live, if you put different cells in different wells, then they each have a slightly different environment, and you get subtle differences, and it's really hard to reconcile.

Starting point is 00:08:39 When they're all in a pool, you eliminate all of those artifacts, and all of a sudden you have the ability to measure a genome-wide. CRISPR screen, basically, so 20,000 genes in the genome, all modifying the same cellular background in the same dish with a different genetic intervention. And you're measuring that in a genome-wide scale in like 10 or 12 plates in two weeks. Now, imagine doing that rinse-repeat and doing genome-wide scale on this genetic background or in this cell type. And so you can really start to decipher the genotype phenotype connection and the effect in which individual genetics makes a difference on cellular phenotypes, which we then translate to what we believe they

Starting point is 00:09:23 will have in terms of clinical impact. And that is the beginning of an understanding of what it is that we want to modify in order to have meaningful therapeutic interventions. And this is like a truly engineered approach to discovery. Another quick reminder for those of you, like me, where biology class was a little further back than you'd like to admit. Genotype refers to the genetic makeup of an organism, while phenotype refers to the observable characteristics, like hair color or blood type that are yielded from a genotype, but also an organism's environment.

Starting point is 00:09:57 This is really important because phenotype or the observable impact isn't only due to genetics, as sometimes even the same genes can produce different results. One such example of this is in honey, bees, in which colonies will have the same genome, but very significantly in phenotype, like size, shape, or behavior of queen bees. And as Daphne said, new technologies are bringing us closer to understanding this very important connection. Well, the biology part is really critical because now you get the data, and we all know how important that is. But I think one of the things

Starting point is 00:10:31 that's really I found intriguing is the creation of a latent space for human biology, and especially be able to tell the difference between disease and non-disease or even different disease phenotype. So how does that come about? And especially, you know, how is AI driving that? So actually, I'm going to go back one step further because he said, of course, one of the things we need to do is get the data. And I should have mentioned that it's impossible to run this instrument without AI being built into it because you can't even segment the cells. You can't call the barcodes. I mean, all of it is an AI-enabled architecture. Every part of our technology stat is intrinsically AI-enabled. But then, to your point,

Starting point is 00:11:07 day. Now you have a whole bunch of cellular images and what do you do with them? And so the first thing we do is we built this latent space. We built a language model for biology. Now everyone's an expert to language models. You just have to explain this to people. It's like, oh, language of biology. No one knew what I was talking about. But now it's like, I'm just saying, look, it's just like GPT, but for cells. So we have the language of, you know, cells and what cells look like or the transcriptional or gene expression profiles of cells. And you measure hundreds of millions of cells in different states, and now with a much more limited amount of data, because we have this latent space,

Starting point is 00:11:42 than just like the large language models for natural language, with a small amount of data, you can start asking, okay, how does disease move you, like a disease-causing gene from one place to the other? How does a treatment move you hopefully back in the opposite direction, from the disease state back to the healthy state? And that's super powerful, kind of like other language models, you have this. It keeps getting better, the more data you feed it. And let me just say, this is not just for cellular data,

Starting point is 00:12:08 the other source of data that we use is clinical data. So we do the same thing with histopathology. There's so much more in histopathology than your pathologist typically looks at. In MRI data, your radiologist doesn't see more than like a small percentage of what's there in your radiology images, but also not just imaging. There's also other modalities where there's an equal amount of information left on the table. And over time, we're learning these languages, of different biological modalities

Starting point is 00:12:36 and the ability to translate between them. I think this concept of foundation model for biology is particularly exciting because 10 years ago, you could have ML that was predictive. You just needed maybe 100 actives. And the problem is like if you have 100 cases, examples of a drug that works, you don't need to design a drug.

Starting point is 00:12:53 And so these low shot, zero shot approaches that come from a foundation model are really night and day. So how far this is go? I mean, the big problem in biology is that biology is hard. Biology is really hard. biology is very hard. So why are you doing it? What is the big win? Where does this go, let's say, by the end of the decade? Like, what could you hope to do that we couldn't do before? We want to come up with a very, almost like, systematic recipe for how do you go from a decision

Starting point is 00:13:21 that I want to work on ALS or I want to work on fatty liver disease through a sequence of steps towards something that results in a meaningful intervention in the right patient population. The hope is that by the end of this decade, we want to do that. have built this process. We will have run through it a number of times. We will have delivered some medicines to patients in our first tranche of indications. But then we will have learned enough from that so that we can now say, okay, and here's how we're going to do it here and here, and here. And because it's not only machine learning that moves forward over time, it's also the biological tools that we're relying on. I mean, used to be that there wasn't any CRISPR. There was

Starting point is 00:13:59 just S-I-R-R-N-A, actually, there wasn't even that. And then there's like CRISPR base editing, and now There's Christopher Prime that replaces entire regions of the genome. So the tools that we're building on also get better and better over time, which unlocks more and more diseases that we could tackle and meaningful. Well, let's step back for a second because I think this may not be clear for everyone, like why biology is so hard. And one of the biggest reasons is that if we can do tons of experiments on mice, you know, I have to feel like it's a great time to be a rich mouse.

Starting point is 00:14:26 You could be cured of any disease, right? Like all these diseases can be cured in mice. But, you know, it's obviously unethical to experiment on people. And that's one of the big reasons why trials fail, right? Because when you go into a clinical trial, you spent all this money to get there, you're spending all this money hundreds of millions of dollars in the trial, and turns out mice are different than people, and it fails. Yeah.

Starting point is 00:14:46 So, like, how can AI help that? So first of all, and this notion of, you know, we can cure lots of mice is something that really drove our discovery strategy at EnCitro, which is all of our work is done in human and human-derived systems that incorporates at least some subset of human cells working together. So that's one piece, and the nice thing about it is that it allows you to intervene in those systems and ask the what-if questions, the counterfactuals. Like, what if I had this person's biology, but in a world where this gene was inactive versus active

Starting point is 00:15:20 or the other way around. So that's great, but obviously you want to cure people, not cells or even organoids. And so the other source of data that we bring in is data from people, from clinical. records. And what we end up doing is kind of bridging between the two using machine learning. So using machine learning on the cellular data, using machine learning on the human data, bridging between those in representation space, but also in genetic space. Because genetics is kind of like this thread that ties the two together. And without machine learning, without AI, the space would be so complex and so high

Starting point is 00:15:57 dimensional that you couldn't even make sense of it far less bridge between those two to different worlds. Yeah. Well, that makes sense. I'm curious to change years a bit and talk a bit about company building. And one of the interesting things that you've done is that you've brought together people that are biology experts with people that are MLAI experts. And how do you build that culture? What does that look like, especially since they're from fairly different parts of the universe? So, first of all, it may not have been obvious to everybody, but the company name, in citro is actually the blend of in silico and in vitro, in silico being in the computer and in vitro being in the lab,

Starting point is 00:16:32 those elements of bringing those two strands together are so deeply woven even into our logo. And how do you build that is really hard because you take your average machine learning scientists and your average life scientist, even if they're very well-intentioned, you put them into the room together, they might as well be talking Thai and Swahili to each other.

Starting point is 00:16:51 The languages are different, the ways in which you think are totally different. So how do you create a shared language, a shared vision? And so there's a few approaches that we use. First of all, we hire some number of people. You can't get enough of them, unfortunately, who are in the middle, who are able to be translators and talk to both sides and kind of bring them together. And then I think the other really important part is that you create a culture

Starting point is 00:17:16 and you hire very rigorously to that culture of people who are genuinely interested in engaging with, you know, the other side. And we have a list of company values. And the final value, which is one that I hold particularly dear, it's last not because it's least importance because they're ordered from what we do to how we do it, which is that we engage with each other openly, constructively, and with respect. Openly means an openness to asking really naive questions when you don't understand. And to accepting really naive suggestions from somebody else, because sometimes the best ideas come from an orthogonal mindset.

Starting point is 00:17:52 It's something that I experienced even as a kid. So my parents are scientists. And my mom warned me, no matter what I do, even though this was what I was doing as a kid, because of programming as a kid, I should not get into programming and computer science. No one's ever going to make money by selling software. So I think maybe then, you know, especially for this audience here that are coming from the AI side, especially as AI gets into areas that are sort of not just the world of bits, but in the world of Adams. Yeah. Any advice for how to bridge those gaps? I mean, first of all, having a deep respect for Adams is really.

Starting point is 00:18:25 important. Yeah, I mean, my closest friends are atoms. I think we're all atoms, but I think having an appreciation for the complexity of atoms and the facts that especially when your atoms are part of live systems, they behave in unexpected, unpredictable, idiosyncratic ways that sometimes cause a lot of pain. And I can tell you that when you do biological experiments, one of the strongest signals when you apply machine learning to it is what was the technician who actually did the experiments? You could read that very clearly off the cells because they behave a little bit differently.

Starting point is 00:19:03 They pipette a little bit differently. They treat the cells a little bit differently. It's amazing how hard that is to clean that up, which is one of the reasons why we spend so much of our time building robots because they do the same thing over and over again. So I think having a lot of respect for atoms, but also I would say an appreciation for the fact that the next frontier of the impact that AI can have

Starting point is 00:19:30 is when AI starts to touch the physical world. And we've all seen just how much harder that is. We've all seen how hard it is, astonishingly, to build a self-driving car compared to building a chatbot, right? So having an appreciation for that complexity, but also an appreciation for the magnitude of the impact if you can actually nail it. So one last topic, then we'll go into closing,

Starting point is 00:19:52 but you're talking about life sciences in terms of health care and drug design, but there's a lot more to biology than just drugs, right? Where do you think this confluence between AI and life sciences goes from here? So I actually think that there is this incredible opportunity at this point, at this intersection between the two fields. And I think about it from a little bit of a historical perspective of think back on the history of science. And at certain times in our history, there have been, eras where a particular scientific discipline has made incredible amounts of progress in a relatively

Starting point is 00:20:29 short amount of time because there was kind of like a click where we started to see the world in a different way or there was a tool that wasn't available before. So if you think back to, you know, the late 1800s, that was chemistry where we suddenly realized, you know, we couldn't really turn that into gold and there was this thing called the periodic table and there were electrons and it really shifted chemistry. And then in the early 1900s, obviously that discipline was physics and the connection between energy and matter and between space and time completely shifted our understanding of the universe. In 1950s, that discipline was computing, where we get these machines that perform calculations that up until that point only a human was able to

Starting point is 00:21:12 perform. And then in 1990s, there was this interesting bifurcation. On the one side, there was data science that drew on computers, but also had elements of neuroscience and optimization and statistics and ultimately gave us modern day machine learning and AI. And then the other side was what I think of as quantitative biology, which was the first time where we actually started to measure biology in a scale that was more than like trapped three genes across an experiment that took five years. And that was the first microwave data and the first human genome and so on and so forth. And I think this is the time when those last two disciplines are actually going to merge. And they're giving us an era of what I think of as digital biology,

Starting point is 00:21:56 which is the ability to measure biology at unprecedented stability in scale, interprets the unbelievable masses of data, different biological scales and different systems using the tools of machine learning data science. And then bring that back to engineer biology using tools like CRISPR, genome editing and so on, so that we can make biology do things that it would otherwise not want to do. Well, like what? So I think there's obviously, as we said, applications in human health, but I think there's applications in agriculture. I don't think we need to tell anybody anymore, although there's still some people who might need

Starting point is 00:22:31 to hear it about the impact of climate change on our world and the fact that we need to have props that are much more resistant to drought and severe weather. And to feed 10 billion people? To feed 10 billion people. I think there is opportunities in the environment to maybe do better carbon sequestration using plants or algae or who knows what. I actually wish I knew more about that because that is my alternate life would have been to do that. Well, there's still time, right? Well, there's still time.

Starting point is 00:22:59 So are you funding me for that? You come up with the deck, we'll talk. Okay. Yeah, yeah. So there's that. I think there's, you know, bio materials and so on. There's so many opportunities at this intersection that I would encourage any of you in this audience

Starting point is 00:23:18 who are looking for something truly aspirational and exciting to do. I think this convergence is a moment in time for us to make a really big difference in the world that we live in using tools that exist today that did not exist even five years ago. Yeah, with that, I think that's the opportunity at hand. Maybe we'll wrap up there. Let's thank Definitely one more time.

Starting point is 00:23:38 If you like this episode, if you made it this far, help us grow the show. Share with a friend, or if you're feeling really ambitious, you can leave us a review at rate thispodcast.com slash A16c. You know, candidly, producing a podcast can sometimes feel like you're just talking into a void. And so if you did like this episode, if you liked any of our episodes, please let us know. I'll see you next time. Thank you.

a16z Podcast - When AI and Genomics Collide

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.