a16z Podcast - Mark Zuckerberg & Priscilla Chan: How AI Will Cure All Disease

Starting point is 00:00:00 This is a space that, I mean, that there's just going to be a huge amount of leverage with AI. It still seems like there could be a lot more effort in this space around building tools. And it's kind of this crazy thing that we're, you know, here in, you know, 2025 and there's not the kind of periodic table of elements equivalent for biology. We think that this is, like, probably one of the most important sets of tools that you need to build. When we first set out that the goal to cure and prevent disease by the end of the century, people like honestly most scientists couldn't look at us with a straight face and they're like you're crazy yes and it was true because if you just decided to spend the money

Starting point is 00:00:38 funding the next best grant for every single lab in the country like you there's no pathway to that being true the biology folks i think looked at it as if it were crazy ambitious and then the ai folks are like well that's kind of boring that's just automatically going to happen I know, it's like, okay, there's something in between there that needs to be bridged. The scientific community needs fundamentally new tools to cure disease, not just more funding. For decades, biological research has been constrained by the same limitations. Small grants that fund incremental progress, isolated labs working on narrow questions, and the lack of shared infrastructure to tackle the biggest challenges in medicine.

Starting point is 00:01:18 But what if we could change that? Today, you'll hear from Priscilla Chan and Mark Zuckerberg on their 10-year journey building the infrastructure for modern biological research. We discuss how they accidentally created the standard for biology data with their cell atlas project, cataloging millions of cells in an open source format. We explore why they're betting on virtual cells that let scientists test high-risk hypotheses in silico

Starting point is 00:01:40 before investing in extensive wet lab work. And we dive into BioHub, their play to accelerate discovery by pairing frontier biology with frontier AI. Hope you enjoy. Mark Bressel. Hello, welcome to the A-SX-Z podcast. Thanks for having us.

Starting point is 00:01:57 Yeah, great to be here, excited. All right, excited to have you. You're doing exciting stuff. Yeah. To that end, almost a decade ago, you guys started the Chan Zuckerberg Initiative with the mission and intent to cure, prevent, manage all disease by the end of this century. There's a lot of missions that you guys could have poured your time and resources into. Take us behind the conversations of why you guys pick this one.

Starting point is 00:02:16 Maybe Priscilla, why don't we start with you and your side of the story? It always surprises people when I talk about how we work in basic science research. I trained as a pediatrician and people always think, oh, it must be about medicine. And for me, I went into medicine because I wanted to improve people's lives. I wanted to make a difference. I wanted to be able to help others. And I think training as a pediatrician at UCSF, I met a lot of patients and frankly, like little kids and families, for which we just had no idea what the problem was. and they might have a specific gene that they could name if they were lucky or they could be grouped into a bunch of other diseases

Starting point is 00:02:57 and there'd be a general sort of PDF they'd print out like this is what we know and then it was my job as an intern or resident to try to translate like a few lines of information to how we were supposed to take care of the patient. And for me, that's when I really realized the power of basic science and how we'd need to work on basic science to advance the forefront of what's possible, I think of it as the pipeline of hope. Yeah.

Starting point is 00:03:25 And why did you think you could cure all disease? Because that's like a very, like, aggressive goal. Do you want to answer that one? Yeah, well, I mean, we're not going to cure all diseases, to be clear. I mean, the strategy is to help scientists and the scientific community cure all diseases. So the strategy is really one of accelerating the pace of basic science. And the theory that we had was if you look at

Starting point is 00:03:49 history of science, most major breakthroughs are basically preceded by the invention of a new tool to observe phenomenon in a new way. Right. So I think about things like the microscope, right? Being able to observe bacteria or other fields, the telescope or, you know, but it's, just to use an engineering example, without those kind of tools, it's kind of like you're coding without being able to step through the code. And you both things, right? That's like the old days, isn't it? Yeah, yeah. So our whole approach on this is, basically, let's help build tools that will accelerate the pace of the whole field. And I think that there's a niche that I think fits that, because if you look at how funding works in science,

Starting point is 00:04:31 the vast majority of funding comes from the government and NIH grants. It's parceled out into these relatively small grants that allow individual investigators to investigate usually pretty near-term things. And the development of these kind of new types of tools, whether it's imaging or building now a lot of AI things like virtual cell models are longer term, oftentimes more expensive to develop. So think about like on the order of maybe $100 million to a billion over a 10 to 15 year period. And then you try to unlock those tools and give them to the scientific community to accelerate the pace. So that's kind of the theory. Right. And it seems like there's also something that is you don't really get credit for the tools in a lot of ways.

Starting point is 00:05:13 I mean, we have companies that use your tools and they're very happy about it. But I I didn't even know that that was the case. That's why it's philanthropy. Yeah, well, it is, but most people do philanthropy to get credit, too. I mean, that's kind of a part of it. So I guess did you think about that, or were you just like, no, like, this is going to work, and if it works, that's all we need? We're super focused on, like, actually making every scientist better and beyond science,

Starting point is 00:05:40 like startups, startup founders, because the point is we can't do this alone. and when we first set out the goal to cure and prevent disease by the end of the century, people, like, honestly, most scientists couldn't look at us with a straight face. And because...

Starting point is 00:05:56 You're crazy. Yes. And it was true because if you just decided to spend the money funding the next best grant for every single lab in the country,

Starting point is 00:06:06 like, there's no pathway to that being true. But if you forced people to really think about this and, like, okay, what is the most credible pathway to doing this, and what are the barriers to that credible pathway? Then we sort of got somewhere, right? They were like, well, like, there's no shared tools or we're not working on

Starting point is 00:06:25 big projects and building the right data sets. And we're like, okay, well, then we can start doing something about that. And so that's where the idea of building shared tools, because no one right now in the site. Well, that's so interesting. So basically, you're like, we're going to cure all disease, and they're like, can't be done. Why can't it be done? Well, because we don't have the tools, okay? That's a pretty cool sequence. Yeah, I mean, there's also this funny thing where the biology folks, I think, looked at it as if it were crazy ambitious, and then the AI folks are like, well, that's kind of boring. That's just automatically going to happen. I know that's like, there's something in between there that needs to be bridged. And if you can like kind of use the kind

Starting point is 00:07:05 of modern AI tools in order to build the types of tools that biologists need. So that's a big part of how we think about our work is... AI has got to be the most overestimated and underestimated technology ever, like simultaneously. I mean, yeah, we'll probably, like, the internet early on. But we kind of think about ourselves and the work that we're doing at the BioHub

Starting point is 00:07:24 as Frontier Biology paired with Frontier AI. So there are labs that do Frontier AI that basically, you know, are building the most advanced models. And then there are lots of biological research organizations that effectively do

Starting point is 00:07:40 very leading-edge research to build to either discover new data sets or looking to certain challenges. But so far, there hasn't been anyone who's tried to do both of those at once. And when you look at, I mean, even something like AlphaFold, which is amazing, right? It was built off of this data set

Starting point is 00:07:58 that was a public data set that had been produced decades ago, right? And what I think you have the opportunity to do if you do both of those together is produce specific data sets for the purpose of training AI models to build virtual cells that can do specific things. So I think that that's like a pretty interesting zone to be in.

Starting point is 00:08:18 And of all the things that we've worked on, actually when we started CZI, we kind of actually focused on a number of areas. And what we found is just that the science research has had by far the biggest returns who just doubled down on it over and over and over until now we're at the point that we're 10 years in. And BioHub is really the main focus of our philanthropy at this point. But yeah, I mean, that's kind of, that's basically the focus.

Starting point is 00:08:39 Maybe you're not giving yourselves enough credit because you're sort of saying, well, there's bite-sized science. We don't want to do that. There's century-scale science, and that seemed like a long-time horizon, but achievable, ambitious. But you've actually identified, which I think is really fantastic, grand scientific challenges that are right in between. They're 10-15-year horizons, at least per kind of the way you communicate about them and the way you energize the scientific community about them. Ten to 15 is kind of an interesting time horizon. Sort of like similar to the time horizon. of a venture-backed companies, similar to the time horizon on which a team can work together for that period of time. How did you get to that number? And then how are you thinking about the challenges that you take on in each 10-to-15-year wave? Because that's concrete, achievable. You build a lot of credibility around it the way that you've announced those challenges. Well, I'm curious how you guys think about it. But for us, when we looked at the grand challenges on the 10 to 15 year time horizon, it needs to be, like,

Starting point is 00:09:40 when you look at it, you're like, I see a path. Right. Not everything needs to be solved for us to take it on. In fact, if everything's solved, then that feels like that should just go. And it wasn't ambitious enough. Yeah, like we have some risk appetite, right?

Starting point is 00:09:53 So we want things where we're like, there's a credible pathway, someone who is at the home who can do this, and there's enough ambiguity where we feel like we could take on that risk And if we do it, like, the returns could be higher than even expected. And the way we modeled that in the biohubs is we have three biohubs. We have one in San Francisco, one in Chicago, one in New York.

Starting point is 00:10:18 The one in New York works on cell engineering. Can we engineer cells to go in and detect signals, read it out, or to take certain actions? In Chicago, we're building tissues and looking at cell communications within tissues. And then in San Francisco, we're looking at deep imaging and transcarriage. and that work, the locations are not by accident. We also look at the partner universities because we have folks who come to the biohubs to do this work, collaborative, interdisciplinary, and sort of unconstrained by the traditional lab, but we also build off of the labs at these academic institutes that support the work. And so that's how we sort of choose the grand challenge

Starting point is 00:11:02 and the locations. And then the sort of layering and the large language models and AI coming into the picture has been so interesting because we were already building tools to measure interesting data, building the data sets.

Starting point is 00:11:20 But we didn't really know what to do with them yet. And large language models coming onto the scene, we're like, wow, we can make sense of all of this now. I'm curious what you've used success as in the therapeutic realm. So, you know, we think a lot about understanding biology and

Starting point is 00:11:39 sometimes we bet on startups that want to unlock completely new biological areas, diseases, where we don't know what's going wrong. And then there's another group of folks who kind of say, hey, okay, now that we understand what's going wrong, let's fix it. Let's come in with a drug. Let's come in with a new type of chemistry, any type of antibody. How do you, what do you think success for the CZ Bio Hub looks like? 10, 20, 50 years from now, in terms of the new medicines that you've enabled? We want there to be like an explosion of a community who are building these, just the new wave of what it means to be deploying precision medicine.

Starting point is 00:12:16 Like I think for rare diseases and common diseases alike, you're really talking about individual biology that we sort of lump together. And we often don't know how it happens, right? we know that you have this mutation or the worst nightmares, you have a variant of unknown significance. What does that even mean? The horrible of the U.S. Yes, horrible. And you're like, you tell someone you kind of know something, but we don't know what it means.

Starting point is 00:12:43 But if you look at the way we've been able to look at variants and look at single cell transcriptomics, we're starting to be able to say, okay, this variant actually impacts this set of downstream cells. And then we start looking at the proteins that get expressed and how it looks similar or different. to what a healthy cell would look like. Then you can start targeting, okay, like, let's look at that as a target, and you both know the specificity of the target you want to build based on the ability to connect mutation to protein expression, as well as to be able to predict off target effects.

Starting point is 00:13:20 What are the side effects? Because you also know where else that drug will be able to interact with the body. And so those are rare, like, but I really think most diseases should be thought of as rare diseases, because each one of our biology is different. And right now we just get lumped, right? We get lumped based on age, demographics, ancestry, if we're lucky to have that level of understanding. But truly, each one of our biology is different and say, like, if you look at hypertension or depression, like we kind of just go by trial and error and saying, like, let's just try that drug. see what happens. But what should really happen is being able to precisely and accurately and quickly treat people by looking at individuals' biology. We want to enable the basic science, and we would

Starting point is 00:14:09 be thrilled if people picked up the models that we build to be able to build the diagnostics, the therapeutics that need to come. You've built amazing data sets. I have to say, like, I mean, you may not hear the feedback from the startup community and the pharma community and the R&D community, but it's there because you've committed to open source. And so people may not be, they may not all be writing papers, but they are using those tools. There's a startup in our portfolio working on idiopathic pulmonary fibrosis. The name tells you how vexing the diseases.

Starting point is 00:14:42 It's idiopathic. We don't know why it happens. The IPF is named that way. And so, you know, he was telling me that he used your cell by gene atlases to look at millions of single cells in patients, with disease, without disease, try to pinpoint the fibroblasts, double-click on the fibroblasts and their gene expression. It's incredible.

Starting point is 00:15:02 And try to, you know, use that to inform, hey, where could I go after a new drug target in this disease that's fundamentally a strange clump of idiopathic origin. So I think there's a huge group of innovators who love the tools, the visualizations, the query systems, and really the software approach that you built to making that data incredibly accessible. So cell by gene is like almost an accident though. Tell us more. So do you want to share a little bit about cell hygiene or do you want me to start? Well, I mean, I don't know which part you want to get into. But I mean, but the Cell Atlas work overall. And it's kind of this crazy thing that we're, you know, here in 2025 and there's not the kind of periodic table of elements equivalent

Starting point is 00:15:46 for biology. Right. So that was sort of a lot of the inspiration of it was, all right, how do we both through work that we're going to do in the biohub and through other grants, be able to pull together and standardize a format where you can have all this data. And when we were starting off, we didn't even necessarily have in mind that we were going to use that to build virtual cell models. I think that that's sort of just come into focus as the AI work has advanced. But that's a very exciting thing. We should definitely spend a bunch of time on the virtual cell models.

Starting point is 00:16:16 But I'm not sure what you wanted to get into on the Cell Atlas. Well, the single-cells work was one of our first RFAs 10 years ago we started, and we were like, okay, we think this is possible. We actually funded the methodology for it to standardize how it was going to be done. So that was 10 years ago. And we then we ceded a few labs to start building out that data set. But we're like, there are like millions or billions of different cell types and different permutations. Like, how are we going to do this? and especially with like a burgeoning technique.

Starting point is 00:16:50 And so we ended up seeding a few groups and they started doing work. And then they told us they had a problem. There was a bottleneck in their workflow because they couldn't annotate the data fast enough. And so we built, Cell by Gene was an annotation tool. That's the original source of this. So we built the annotation tool

Starting point is 00:17:12 to make it easy for people who are doing single cell science to be able to annotate the data. And then we put the data that we collected publicly so people could share. But because everyone started using the same annotation tool, everyone was standardized then on the same data formats. And then there started being a community around the tool, and they wanted to share back and build the Atlas.

Starting point is 00:17:35 So now after 10 years, there are millions of cells that have been built into this shared resource for the entire scientific community. we only funded about 75% of it. Sorry, that's wrong. We've only funded 25% of it. 75% came from the broader community saying this is useful and there's an easy way for us to standardize and build this together.

Starting point is 00:17:58 You have the same metadata. Yeah. That's right. It's like what you'd call a network effect. I was going to say it sounds like the internet. Come for the annotations, stay for the virtual cell model. Well, it was very important when we were getting started with the work to have everyone who is doing it have a consistent format. So that way it could be used and portable.

Starting point is 00:18:19 And then once that kind of took off as the way that it would get done, then other people just found it valid. Yeah, and even relative to prior data bases like geo and whatnot, they're just simply not as standardized or QC. Yeah. Yeah. Let's get into virtual cells. Sure. Sure. The great challenges that the grandchild you would focus on. Maybe talk about what does the promise or the hope and maybe some of the challenges are where we're at with it. Yeah. I mean, we think that one of, this is going to be one of the most important. tools at this point is basically building up the kind of hierarchy from proteins to to just different structures from the cell to whole to like whole like a virtual immune system

Starting point is 00:18:57 or different levels of hierarchy and we think this is going to end up being like a very important set of tools for people to effectively generate hypotheses for for different science work you know even before you get to the point where you're really running full experiments in it you can come up with some estimate of how that might run. It will be useful for some of the precision medicine type examples that Priscilla was talking about a few minutes ago. But we think that this is probably one of the most important sets of tools that you need to build. And it's not a single thing, right? So there's different angles to come at this from. The cell Atlas data is helpful for understanding things on a cellular level.

Starting point is 00:19:42 One of the kind of most important things that we're doing right now, there's this great company evolutionary scale actually had a bunch of researchers who'd formerly worked at meta and protein folding models is joining a biohub and Alex Reeves, the leader of it, is actually going to be the kind of head

Starting point is 00:20:01 of the whole science program, which is actually kind of interesting when you think about it, where it's like you have AI and biology coming together and really it's like an AI person who understands biology is running it rather than a biologist who has some understanding of AI I think just kind of speaks a little bit

Starting point is 00:20:15 to where we think the relative weight of these things is but I mean we basically view you know like Priscilla was saying with the different biohubs and then New York doing cellular engineering will basically make it so that you can have cells that can record

Starting point is 00:20:31 different things that are going on around the body and share that data and then you can build that into models. The Chicago BioBioHub being able to record inflammation and basically study that in order to kind of help understand that's a different data set. We have the Imaging Institute, which is we just trained our first set of models around that, which are the first like spatial models around understanding like the way that that kind of cells look in different states. And eventually, just like you have this analogy on the

Starting point is 00:21:04 kind of the industry side or on language models where you have different capabilities and then over time you train them into models and it gets more and more general. That's kind of the idea here. So we'll build the biohubs around grand biological challenges. The biohubs will build tools that will generate novel

Starting point is 00:21:21 data sets. We will build models based on those and then eventually combine the models into an increasingly general view of a virtual cell that will be useful both for scientists and hopefully startups and companies that are working on finding drugs, which is not our part of the whole thing, but I think is obviously a really important part of what needs to happen.

Starting point is 00:21:41 Yeah. And, you know, you guys think about risk all the time in terms of when you make investments. Like, I think the promise of being able to do virtual biology using a virtual cell model is you can actually take on riskier ideas. Right now, like grant funding can be hard to come by. and the wet lab work is expensive and slow, and it's not just money, it's also time. And so you have to choose something that you think is going to have some likelihood of success to keep your lab career going.

Starting point is 00:22:14 And so it naturally lends people to take on some risk, but not a lot of risk, because they need to make sure that they are hitting a certain percentage of the time to make tenure or publish or whatever they need to do. But if you had a virtual cell model where you could simulate really high-quality biology, you could actually then start testing and tinkering on the computational side and, like, ask riskier questions, things that would have been expensive and costly

Starting point is 00:22:42 in terms of time and resources to do in the lab, and actually see if there is promise doing the experiments in silico before you make the time and money investment in the wet lab. Do you think of it kind of like a model organism? Yeah. Like it's the new foot fly. Yeah. I was just going to ask, given the complexity of a cell, like, how close, like how accurate do you think you'll get the model too? I mean, just assuming, I mean, maybe you get it to like a perfectly accurate representation of a cell, but like how accurate to be useful would the virtual cell have to be?

Starting point is 00:23:16 I think it will obviously iterate and get better and better because right now we, like right now we're still just talking about transcriptomics. We're expanding into different ways of looking at the cell. but you get more and more accuracy. But I don't think it needs to be 100% accurate to be useful because you just want to be able to de-risk the idea on the front end a little bit. And the more and more you de-risk it, the more efficient it gets, obviously. But it will be useful if you even get directional signal. And yes, we do think about it as a model organism.

Starting point is 00:23:51 But in a way that's like has fidelity to the human body. Like, you know, like, I don't want to... All models are wrong, some are useful. Yeah. Yes. Yes. It has utility on certain acts. Exactly.

Starting point is 00:24:04 And just like the language models, you build in specific capabilities. So it's not... So, for example, you know, one of the models that we're publishing is variant former, right? It basically, you know, makes it so that, you know, it's trained on a bunch of, effectively, pairs. If you have a cell, you apply CRISPR to it in a place, you see what comes out at the other side. So it basically is able to make that kind of a prediction. Like, okay, if you have this edit that you're doing to a cell, what is likely going to happen?

Starting point is 00:24:34 Another one of the models is it's this diffusion model. Basically, you can describe a type of cell that you would like it to simulate, then it will just produce a kind of synthetic model of the cell. Again, I mean, it's kind of interesting because to Priscilla's point before about how everyone is different and different cells have kind of,

Starting point is 00:24:56 you want to be able to simulate these kind of rare configurations, having at least a synthetic version of what that could look like is interesting, and then you can test against that. The cryo model, I think, is interesting because it's spatial. So it kind of gives you a sense of there are all these different models that you can have that allow you to basically look at different kinds of things, and then you just train them in to be increasingly general over time. Wow.

Starting point is 00:25:20 Very interesting. And is the modeling technology basically, LLMs are like, is there a reasoning model? Is it like a just... Oh, that's actually, yeah, no, that's a fascinating one too. Because one of the new models, I think this one is very early, but it's basically the first reasoning model over biology. So the idea is that, yeah, you effectively have these models that kind of simulate world

Starting point is 00:25:46 models in different ways, and then you want it to be able to not just be able to spit out correlations, right, in terms of like what it's found. but actually be able to kind of reason through how things would evolve and why things would happen. I know it's quite early, but it's – but it is interesting, conceptually, as what I think is clearly going to be an important direction in terms of how these models evolve. Yeah, because that's what I was thinking, you know,

Starting point is 00:26:18 that if it doesn't work, the next question you have is why. Yeah, you know, like – But I think what you find in reasoning – Because if you're married to your hypothesis. Sure, sure. Yeah, I mean, the, yeah, I think you're saying if the reasoning model doesn't work, why. I mean, I think the language model analogy for that would be

Starting point is 00:26:40 you need better kind of world models or better pre-trained models in order to get the reasoning to be good. But it's, yeah, you build more capabilities into it. And I think that there's probably an order to. So the work that Alex and the evolutionary scale folks worked on is a lot of it is protein, which is interesting because that's at a kind of smaller resolution, obviously, than the cellular data, the cell atlas. But part of the hypothesis is that you can look at all these different cells and you can kind of simulate how they might behave, but you're going to have a somewhat shallow understanding unless you actually have this hierarchical understanding of how the subcompensate. components of the cells are going to interact. So our view is that you basically want to build up a state-of-the-art protein model

Starting point is 00:27:29 and then have that be a part of the state-of-the-art cellular model. And then once you have that, you build things like the virtual immune system, which allows you to simulate much more complicated systems. But it's sort of this hierarchical approach to building up these virtual models. That makes a lot of sense. Because also as you get into personalization, you've got like common protein. combining into a unique cell so that

Starting point is 00:27:55 makes it like from a system standpoint that makes it like much more manageable that makes a lot of sense. Interesting. Yeah. Wow. Yeah, no, it's it's very fascinating stuff. Yeah. So you guys are announcing some big news this week. Do you want to give us a sneak preview? Well, the big news is thinking about how we are going to be coming together

Starting point is 00:28:17 as one team. And you know, in the past we have done We've run biohubs, and we've done built software, we've done some AI research. But all of it has been really thinking about, has been a little bit decentralized. But now under Alex's leadership, we are going to come together as the biohub, an operating philanthropy where we are doing the science in service of a singular goal together, and how do we actually advance the state of biology and research at the intersection of AI and biology. Amazing. Alex's amazing. Yeah, no, he's great. And then the other thing is the piece that I mentioned earlier, which is just, yeah, I mean, CZI has focused on a number of

Starting point is 00:29:01 different things. We've really just found over time that we feel like we've been able to make the biggest difference in science. So we've just kept on doubling down on it. And we're going to continue doing work in education. We're going to continue supporting local communities and in those different pieces. But going forward, the biohub is really going to be the main thrust of our philanthropy. And we're very excited about that because I think that this is, there has been, you know, when we started the mission to see if we could help the scientific community cure and prevent diseases by the end of the century, I do think with the advances in AI, that should be possible to do significantly sooner. And that is a very worthy and important and very exciting goal that we think we kind of have a unique place in the ecosystem that we can help empower others to make fast progress on that. So there's obviously like plenty of advantages to do. decentralization from a management communication overhead and so forth.

Starting point is 00:29:55 And so, like, what are you trying to add by adding this kind of new layer slash unification on top? Like, what are the outputs? And then I guess what are the complexities to that? Because that's, I'm sorry, to ask a CEO question. No, no. I mean, I'm like, I'm just like, obsessed with this stuff. We think about this.

Starting point is 00:30:14 You want to go for you? Then I can jump out. Yeah. So they're obviously amazing groups doing Frontier AI and a lot of groups doing great Frontier Biology. And where we think we can do uniquely is actually tie these two together. And we are, we've funded datasets, we've built datasets. We're like building the instrumentation now to be able to look at the cell, whether it's, you know, at the tissue cell cell communication, our cryo-em, where we can look at the cell at nearly atomic level. So we have the ability to

Starting point is 00:30:48 not only build the data sets, but actually shape and form them the way we want based on what we see as necessary to complement the existing body of knowledge. And so we have amazing teams doing that work, and we're building these AI models. And so the reason to do it together is then we can actually complete the flywheel. Like, you know, the model is looking like it has some gaps and blind spots in this area. Okay, who do we talk to? How do we build the next data set? And, you know, we're seeing this in the lab.

Starting point is 00:31:23 Like, the metadata is going to be so rich that we can feed back into the way that we do this modeling. And so if we can close that loop, which is our goal and bringing everyone together, I think it's going to be incredibly powerful. And it's more than just, like, you know, writing down a spec and saying, like, please deliver this.

Starting point is 00:31:42 Like, these people need to be sort of working shoulder to shoulder and shaping each other's work for this to actually be a more and more accurate model of how the human cell works. Well, you know, it's so interesting because that is exactly, like, that has been the biggest surprise in the industry for us in AI world, like forget biology for one second, is that the domain-specific models have been, like, super interesting. The original piece is, well, like, there's just some AIs are going to. It's so smart.

Starting point is 00:32:15 They're going to be smarter than everybody at everything. But like on video models, like every video model is best at something, but not everything. And so knowing what problem you're solving actually turns out to be sort of ironically very important in AI because you can actually get to a way better result if you put the two together. Like, yeah, we're seeing that over and over again in a way that is, I would say, very, very, counterintuitive to the whole narrative kind of going into it. In biology, it used to be the, or at least, you know, one assumption was all the data sets aren't on the Internet.

Starting point is 00:32:53 So part of the reason you need a domain-specific model is that the data sets are not public. You guys are kind of bucking that trend, too, by creating a lot of open-source access to the data. And then even then, it sounds like you're betting, you know, on the trend that we're seeing in other industries. But still, there will be nuance in how you annotate that data, curate that data, well, how you talk to a scientist, right? Right, because you have to not only know the data in the model and so forth,

Starting point is 00:33:18 but like the conversation is what we keep finding out ends up being very, very important. So rich and so important in how you actually. A scientist isn't going to talk to it like, you know, I talk to chat GPT or whatever. Well, this is the fly you can talk to. Yeah, yeah, yeah, that's really, that's super exciting. And the user interface is actually really important. You talked about you guys have a founder who's using Cell by Gene. That user interface was intentionally designed to not need to have a computational or really a very deep biological background to be able to use because you want people coming from different fields to look at the problem.

Starting point is 00:33:53 It's like, look here, help us solve problems here. And so building the user interface in a way where it's not a very high barrier to entry to be able to poke around and learn something and bring knowledge back to your work, that's intentional. And we're really hoping when we build these virtual models that we get to a place where we can allow a lower and lower barrier entry for people to say, like, you know, like, I have some knowledge about this. Maybe I can contribute. A very pertinent example is, turns out, I think immunology has a ton to do with neurodegeneration, right? Seems like immunology is behind all this. Everything. So might be part of your century vision.

Starting point is 00:34:33 So you need to be able to allow the immunology. to come in and understand their degeneration and understand how their world fits in. And so the more you lower the barrier to entry allows people to actually think in a sort of truly collaborative and interdisciplinary way. So will the biohub grow as a team? Like will you employ more people at the biohub proper or are you moving towards more of a network model with more sites, more labs, more community-driven data sets, like which is the thrust or maybe it's both.

Starting point is 00:35:07 Probably a little of both. And we've added new biohubs over time. And then we're also building up more of this like central AI team. Cool. So, but I know, I think that these organizational questions of how do you set this up are fascinating. And a lot of our approach is sort of informed by what the rest of the field is doing. Because you kind of think about science as it's this portfolio, right? Society has a portfolio of stuff.

Starting point is 00:35:36 that it's trying to do. And in terms of philanthropy, you want to be the most additive that you can be by trying to figure out what else is underrepresented. So science by default is very decentralized, right? It's like kind of the way that granting has worked, the way that I think scientists by default want to work. So I think a lot of what we've found

Starting point is 00:35:57 is that figuring out ways to encourage collaboration in ways that otherwise seem very simple but weren't happening before can unlock a lot of value. So the very first biohub, what we did, there were two kind of interesting things. One was, it was this collaboration between UCSF, Stanford, and Berkeley.

Starting point is 00:36:18 And there were all these really smart people at all these different places who previously, I guess in theory, they could have figured out a way to work together, but there was not really a formal construct for them to do that. And this just allowed a lot more collaboration. The other one is cross-discipline,

Starting point is 00:36:33 basically having biologists sit next to engineers and this view that like these two disciplines are things that need to um and i don't know i mean i'm sure you know you've seen this in a lot of in a lot of the companies but like it's there's so many interesting and the companies they always set them apart well it's interesting no it's interesting how many organizational questions or problems you can fix just by having two teams sit together right it's like it doesn't matter what the org chart is or like whatever it's like you guys need to sit next to each and until you get this thing to work. And that's something I really believe in.

Starting point is 00:37:07 And you have time. You have 10 to 15 years. Well, no, it's all like communication is such an underrated problem in general in all kinds of, and building anything or solving anything. So that's pretty neat. Yeah, and it's just really kind of simple stuff. But I think it's sort of novel as a model. And one of the things that's, so we've now copied this from the first

Starting point is 00:37:32 BioHub to the biohub network and expanded it to other models. But it's also just been neat to see other folks who are working in the field also adopt similar models because it's a pretty intuitive thing. But at some point, you'll reach the point where, you know, actually it's really good to have decentralized work too, right? So it shouldn't be that like, we're not saying that this is like the way that all science should work. We're just saying that there's a space for this that can unlock a lot of value because it, for whatever reason, hasn't been the default. Yeah, and we still rely on, like... Yeah, there's famous, like, stories in the MIT lab about that.

Starting point is 00:38:05 That's how they invented lasers and so forth. Is it put a bunch of people from different departments in the same space? Well, actually, physics is where we got a lot of the inspiration. Like, physics has just historically been, like, labs have just rallied around big projects and big shared resources. And we will, you know, we are relatively centralized, but we still depend on a lot of labs who are doing sort of exact frontier work or complementary work to come together to support those. There's that.

Starting point is 00:38:35 But one more thought on your expansion question is like, and maybe this is like the modern AI lab, we are not expanding like a lot of square footage per se, but we're expanding our compute. The research, they don't want employees working for them. They don't want space. Yeah, they just want GPUs. Yeah.

Starting point is 00:38:53 So it's just like, in a sense, that's new lab space. It's much more expensive. than what lab space. And you guys have always been creative on that even in the last few years. You've created ways to share access to compute, you've enabled academic labs to, you know, it forgot the name of your program.

Starting point is 00:39:14 Scientists and residents, or something like that. Yeah. But you're a complete rental kind of pro-telling of clusters. You know, if you look at individual labs, they'll have like a large lab would have tens of GPUs. And we were the first to really build a large-scale compute cluster. A thousand, now we have plans to move to the 10,000 range. And that, one, requires a different type of project, obviously.

Starting point is 00:39:44 You're able to ask different types of questions. And it's a resource that we use, but also we've invited scientists to apply and say, like, what question do you have that could use this amount of resource? and be able to step, sort of seed collaborations that way. And so if a scientist is out there listening, like, who's not employed by the biohub or working at the biohub but wants to collaborate with the biohub, that you're going to create really interesting, interesting doors

Starting point is 00:40:15 to utilize their resources. That's awesome. Yeah, I mean, the GPUs are somewhat zero-sum, right? So the data is, so, yeah. Yeah, fair enough. Yeah. So you're about to celebrate 10 years. doing this. As you look out in the years to come, what else can you tell us about either things

Starting point is 00:40:34 that you're thinking about for the future or maybe even principals or a North Star that's going to guide how you guys grow and evolve going forward? You know, it's been really interesting in the past 10 years because I actually spent the first few years completely envious of people working for for for-profit companies because there's so much clarity. Like the market will tell you, whether or not it's private or public, we'll tell you if you're doing a good job. If they think you're doing it. If they think you're doing it. They're not always right.

Starting point is 00:41:04 They're not always right. But I was still envious because that was, I was like, I craved that feedback, like, am I doing a good job? And, you know, 10 years in, you know, the reason why we're doubling down on biology is like, not only did we achieve what we said we were going to do and when we set out to set out on these projects, it actually delivered. more than we thought we were going to. And I was like, okay, that's a signal I can latch onto. And like, that's a signal. We can really continue doubling down and doing more of that.

Starting point is 00:41:35 And so I think it's continuing to tolerate the early ambiguity when you're like, okay, I'm going to do more of this. And being patient, but being willing to have a long time horizon, but be impatient at the same time. Because it's all those iterations along the way that have sort have allowed us to get to this place where, you know, to get lucky, ready, having built data sets to take advantage of AI and large language models, that's because of all the work that we have been doing. And so being able to continue moving forward in this ambiguity and sometimes lack of signal on a big goal, like I think we sort of set the DNA for that.

Starting point is 00:42:18 Amazing. Oh, no pun intended. Yeah. But we get to see how many people use the tools and the feedback. Yeah, yeah. Yeah, you have customers for that. which is pretty cool. Yeah.

Starting point is 00:42:27 Yeah. For philanthropy. Like, that's awesome. Yeah. No, it's one of the fun things about building tools. It's like, you kind of get to see, how valuable do people find the tools? Do people use the tools in order to publish important work? Right.

Starting point is 00:42:41 Right, right, right. Yeah. Well, I mean, our feedback is they're awesome. Our feedback is great. And completely unique, by the way. So, like, the other thing is, like, what would you use if you didn't have this? It's like, there's nothing. No, yeah. It's a real kind of void. I mean, there's this whole pipeline that needs to exist from accelerating basic science to funding a lot of people to use it to then you can get into the biotechs that basically can start to work on basically coming up with novel therapies and then you get the pharma companies that do them at scale. And then there's a space for philanthropy on the other side of public health of basically taking the therapies and kind of bringing them out to everyone in the world. But this is a space

Starting point is 00:43:25 that there's just going to be a huge amount of leverage with AI. And it is, yeah, it still seems like there could be a lot more effort in this space around building tools and just accelerate the whole thing a lot better. Yeah, and I do think it is the place where you are completely unique, right? The other things, there are other people who can do that, but there's nobody doing what you're doing. Yeah, it's got good, good founder market. Yeah, it's founder market. If we didn't exist, it would it be a problem?

Starting point is 00:43:55 Yes, like those questions really land. It's like one of us as an engineer, the other one is a scientist, doctor. Yeah, very happy in this direction. We thank you very much, not only for our companies, but for us as humans, for work on this work. It's amazing work. Oh, thank you. Thank you. Thank you, guys.

Starting point is 00:44:15 Thank you so much. Thanks for listening to this episode of the A60Z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or review and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on X and A16Z and subscribe to our Substack at A16Z.com. Thanks again for listening, and I'll see you in the next episode. As a reminder, the content here is for informational purposes only.

Starting point is 00:44:46 Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures.

a16z Podcast - Mark Zuckerberg & Priscilla Chan: How AI Will Cure All Disease

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.