No Priors: Artificial Intelligence | Technology | Startups - Biohub: The Future of Biology is Open-Source with Co-Founders Mark Zuckerberg, Priscilla Chan, and Head of Science Alex Rives

Starting point is 00:00:00 We just want to give tools to the whole scientific community. We want to understand how biology works. I want to understand the genetics of this person. I want to understand the risks they have to different illnesses. My goal is to be able to treat the individual as an individual, understand the mechanisms, and be able to intervene. We'll have a bigger impact by getting this and more scientists hands quicker, by doing it as open source projects instead.

Starting point is 00:00:23 It's not just like there's some factory somewhere that you can pay to produce the data. You actually need to invent new novel, scientific approaches. The theory isn't that we're going to cure the diseases. We're not. It's that we want to help accelerate the pace of progress for the whole scientific field. We folded over 1.1 billion proteins and predicted their structures. And we didn't design a model for antibodies. We didn't design a model to be able to bind one particular target. We just designed a model that could understand proteins. If we could design a protein to actually change the physiology, then we can actually cure someone.

Starting point is 00:00:58 Today on No Priors, we're joined by Mark Zuckerberg, Priscilla Chan, and Alex Reeves. We'll be talking about BioHub and all their various efforts to now start applying AI at scale to do world models of cells and different levels of interactions across biology. Mark, Priscilla, thank you for doing this. Yeah, thanks for having us. This is fun. Alex, congratulations on new missions. Thank you. You guys made BioHub your primary philanthropic effort and then committed $500 million to this virtual biology initiative.

Starting point is 00:01:34 can you tell us a little bit about, you know, why do that and how did you go from we should fund this to this is like who we are? So BioHub in its current form, we're super excited about. We feel like it's a really good fit for who we are and what we bring to the table and what we can achieve together. But this work started 10 years ago when we were thinking about how can we give back. And Mark had, Mark wanted to build an organization that could cure, prevent, and manage all disease by the end of the century. And we had a series of hilarious meetings with scientists that, like, famous Nobel Prize winning scientists were just laughing at us. Was that your starting line? We're just going to cure all disease. No, no. And to be clear, we don't think that we're going to be the ones curing the diseases.

Starting point is 00:02:21 Our goal was always to build tools that could accelerate the whole scientific fields. That way, the scientific field collectively could cure all the diseases. But still, I think people thought that by the end of the century was a stretch. Now I think it's like too conservative. And so we kept being like, okay, well, we had these series of funny, awkward educational conversations where we were like, okay, but like why? Like, why do you think it's impossible? And like, you know, just being the person in the room is just like, well, I don't know why. You tell me. Finally we got people to like, they're like, fine, if you really must know.

Starting point is 00:02:58 And we're like, you know, we do. It seems important. It's, you know, they were like, well, we work in silos. and when you publish information doesn't get shared, it gets locked up for long periods of time, and we don't have tooling. They gave the example of, like, we build a great tool by one postdoc in a lab, and it lives on their computer, and when they graduate, the tool is gone. And they just, it was, what we heard was very hard to build shared tools, to move science

Starting point is 00:03:28 faster, build a shared knowledge base to quickly move science faster. And that's sort of where we begin and thinking about, okay, like if those are the problems, like what can we contribute? Yeah. I mean, so the original BioHub model was basically focus on long-term tool development by bringing together engineers and scientists across multiple universities to focus on long-term tool development. And basically it worked. And we started off with ZZI doing a number of different things. And I think over time we just felt like, okay, the science piece is really working. And we just kept on investing more and more and more in it until now it is basically the primary and main thing that we're doing. And we've expanded the original San Francisco Bio Hub to a handful now at this point. There's New York.

Starting point is 00:04:18 There's Chicago. The real focus in the unifying theme at this point is the virtual biology initiative around taking the unique data sets that are. are able to be generated in order to model effectively, starting with the smallest pieces of proteins, but then eventually cells and whole biological systems. But that's kind of how we've evolved is, you know, this idea that we talk about around that some of this is an AI problem

Starting point is 00:04:52 and you want to build a frontier AI lab, but you need to couple that with a frontier biology effort that can do the work of of basically being able to understand and get the data that you need to actually be able to build these models. Because unlike language models, there's just like a lot of data out there on the internet, that's not really the case with biology. I mean, there are obviously a bunch of different data sets that exist that academia and scientists have generated over the decades. But a lot of the stuff that I think we want to put into this, it doesn't exist, right? It's like you want to be able to visualize things that people haven't been able to see before, which is why we're doing the imaging work.

Starting point is 00:05:30 You want to be able to record things that are going on inside the body, which is why we're doing the kind of cellular engineering work. You want to be able to measure things like inflammation in ways that haven't been possible, which is why the Chicago Bio Hub is focused on building those kind of devices and being able to do that. And that will fundamentally create new types of data sets that will allow new types of models. And I think it's just a very exciting thing that, I'm going back to what you're saying, if the scientific field, it primarily needs kind of tool to be able to, that now is going to empower scientists across the field to build to do their work faster, that's what we think we can provide through this kind of long-term focus on tool development. But I think there's a fun through line on where we started and, you know, bringing us to our work to, with that Alex is driving now, is that our very first request for

Starting point is 00:06:23 application RFA here was around single cell sequencing. And we wanted to look at sort of like the RNA that is transcribed in individual cells. And that was possible, but it was still pretty early on in understanding how different cells were expressing their DNA. To the point where at the beginning, we were just funding methods, like getting people to describe how to do it so that others could share that methodology. And then that became us funding the human cell Atlas, which is now one of the largest databases of, single-cell transcriptomes. It was getting hard for scientists to annotate the data. So we built cell by gene, which was like a very simple annotation tool that scientists could use to make use of that data. Then a community came around cell by gene, built around cell by gene, and started

Starting point is 00:07:13 contributing more and more data that we had nothing to do with sort of creating or funding or making happen in the world. And now cell by gene is a corpus of knowledge that a lot of the transcriptomic-based models are based off of and is used regularly by the scientific community. But still, there are always critiques. Like, this is just stamp collecting. Like, you're just gathering bits of knowledge, sorry, bits of data. And we're not going to be able to pull scientific knowledge and wisdom and insights out of. And we're like, well, we didn't have an answer for a while. And then imagine our delight when large language models became a huge topic of conversation that could make sense of large amounts of data. And I just, for me, is like, what if we could actually understand how biology worked?

Starting point is 00:08:06 Move it from a discovery-based science to an engineering-based science where we could systematically understand how living beings, living cells worked and be able to understand why things go. wrong. And so when we saw that moment, we're like, this is it. Something really big could happen here. Alex, you were, you started at MetaFair, but you were on the path to, you know, you'd assemble the team at evolutionary scale and you'd raise venture and you were making progress in your models. What was the pitch from Mark and Priscilla where you said, like, that's actually the right way to go after the mission? Well, I think for me, it was really kind of the moment when I understood that, you know, they really saw this as an integrality. of frontier AI and frontier biology. And I think I had developed conviction that, you know,

Starting point is 00:08:57 this is really a new era of science that's just beginning kind of what's going to be possible with artificial intelligence. And, you know, we're in the age of information theory at scale. And we have these systems that can basically kind of predict the next token. And they can, you know, learn world models from that. They can learn biology from the data. And so, you know, I think that it just, it was really clear. that, you know, to build kind of that next, that next kind of institution for the next era, you would really need to have frontier artificial intelligence. You would have to have frontier biology. You would need to start to put those things in feedback and really have models that are learning from the

Starting point is 00:09:36 biology. And I think, you know, it's just, and you need the right scale and the right people. And so this just really felt, I think, like the way to do that. There's a variety of different models that you all have been working on. And I think it's kind of interesting because some of the earliest breakthroughs in biology were things like alpha fold, where, you know, there was a Google model that showed that you could do protein folding at scale in a really interesting way that people didn't realize was very tractable, and this was pre sort of the really big transformer waves that came later. And then you're working on a variety of different things at different scale, right? You're doing incremental molecular modeling and protein folding.

Starting point is 00:10:09 You're doing cell-based stuff. You're thinking about interrogating larger-scale systems in biology. How well do you think that extends from sort of the micro to the macro? You mentioned almost starting with building blocks and building up, but modeling cellular behavior is very different from modeling protein folding. The data is very different. The modeling is different. I'm just curious, like, do you think it's all similar in terms of just data and you train stuff? Or do you think it's actually, there's some differences in terms of how you actually have to deal with these systems? I mean, there are probably some differences. I mean, you can probably talk more to the specifics around this.

Starting point is 00:10:39 But, like, I mean, I think each layer is going to end up being somewhat qualitatively different, right? But you need to be able to understand the protein interactions in order to be able to understand. how cells work. So you can't just go straight to cells in a way without understanding the protein modeling. And then if you're trying to understand something like the, you know, the way the immune system works or a bunch of cells interact together, then, you know, it's tough to do that without first understanding cells. I mean, you might be able to, like, a very high level of abstraction simulate a system. But if you really want to like understand how it's going to work, you kind of want to build the simulations at each level hierarchically. So that's basically

Starting point is 00:11:14 the approach that we're going through, starting with the building blocks. So the, the the protein. But yeah, I mean, I think that there's going to be different types of data that you want to collect for each. The modeling techniques, I think we'll see. I mean, that'll all keep on advancing across the board. But I do think that a big part of the strategy is this view that you need to build it up hierarchically. And, you know, one of the things that's unique about us in the space is we are very intentional that the AI efforts and the wet lab efforts were a single effort. And we've done a lot of work to bring them together. And the really neat things, that we can do is really try to pull and gather data that helps us connect across sort of the

Starting point is 00:11:58 hierarchy. You know, you can look at transcriptomics with space within a cell and look at where it's localizing. We can look at translucent zebrafish and look at the development across different cells and when the brain develops. We have sensors that allow us to look at cell cell communication and different molecules. And so we can be strategic about the types of experiments and data we want to collect that helps us bridge across these that makes it so that there's some connective tissue that helps drive the modeling that, you know, the modeling magic that happens. Yeah, the reason I asked a question, by the way, is I used to be a biologist. I have a PhD in biology, and I worked in wet labs for almost a decade and everything else.

Starting point is 00:12:40 Are you looking for a job? We can talk about that later. It's not a no. At this point in my career, you know. I'm like Danny Glover, you know, and he's a weapon. I'm almost at retirement. But I think, you know, one of the things that was always lacking was this integrative nature across the different layers of biology and the developmental biologists would work on their own. The molecular biologists would be doing different experiments. And so that's what I was curious about.

Starting point is 00:13:09 Yeah. Typically, there's a reductionist view of biology. and there's a system's view, and those people didn't really work together deeply. And so one of the exciting things about what you're doing actually is how you're bridging that. And so that was kind of the basis for the question as well. Yeah, and if I could add something there, you know, I think that, you know, we're in the age of this kind of information theory and biology. And so, you know, there are levels of complexity and hierarchy and biology. And kind of each level is made up of and, you know, constituted by the lower levels.

Starting point is 00:13:38 And so as you want to have that kind of more complete description, and you want to have systems that can really generalize and begin to actually answer experimental questions digitally that you could ask in the lab, you need to have kind of the right basis for modeling at every level. And so I think what's really unique about what we can do is to, as Priscilla and Mark were saying, you know, really build information at each of these different layers,

Starting point is 00:14:04 collect them, collect kind of those connection points, but then also will we kind of do it at the scale that will reveal that underlying information architecture? And that's going to be really critical to actually be able to build digital representations that can answer new experimental questions. One of the things that inspires me most about this effort is really what Priscilla said, which is like, well, there's so much we actually don't understand about biology and what if we could, which I think is actually very different from lots of other incredibly interesting and useful AI problems we attack, we're like trying to replicate human behavior. And I'm like, a lot of that data is,

Starting point is 00:14:40 you know, on the internet or captured. And without pretending to understand all human behavior, you can predict a lot of it. I thought one of the most interesting things in your release was actually, you know, the like mechanistic interpretability stuff you alluded to, which is can we actually extract new knowledge from, you know, what the model believes is happening, right? Can you talk a little bit about that? Yeah, I'm really excited about that. So I think, you know, in mechanistic interpretability kind of traditionally it's been applied to large language models with the goal of understanding, you know, kind of what is the representation space of a large language model? How does it compute things? And does that really connect to, you know, what we understand about

Starting point is 00:15:21 our intuitive understanding of the world? And so there's, I think, this really rich toolkit that has been developed to start to be able to ask those questions. So kind of what does that mean for biology? One of the classes of models that we train are these, protein language models. So they're really, you know, is trained on the codes of proteins. And so anything they learn about biology is kind of emergent. And we've seen that they can learn things like biological structure and biological function. And that's just kind of emergent from this, you know, token prediction training task. So, you know, as we think about like mechanistic interpretability in those models, you know, we're really seeing the unknown because the models have

Starting point is 00:16:03 been trained on billions of protein sequences. They've been trained on, you know, both known and unknown biology. And yet they're developing these representations that start to kind of capture things that we can really see correspond to that reductive picture of biology that's been built up over the centuries. So kind of you can start to connect the dots between proteins where we kind of really don't know anything about them with proteins where we do know something because there's that kind of underlying structure, grammar that's linking them in the representation space of the model. And at the extreme, it could be, you know, we're going to understand systems in the body that we didn't before or the mechanism of action for a new treatment because we can ask the model,

Starting point is 00:16:46 right, interrogate that representation. That's right. The hope is that you kind of really learn the underlying basis for how it's making the predictions. And so you open up the black box and you can actually understand kind of the biology that the model is representing. So asking for a friend you know, you guys all believe in venture-backed companies as a way to have impact on the world. What was it like collecting data on zebrafish or the span of the data or the wet lab work or just the scale? Like what makes this a better fit for this big nonprofit, you know, ecosystem effort versus a venture-backed company? Well, I think we just want to give tools to the whole scientific community. And I mean, like so I think in order to have the business.

Starting point is 00:17:30 biggest impact. I mean, part of it is just we're, I mean, it's not actually clear that we couldn't run it as a business if we wanted to. I just think that we'll have a bigger impact by getting this and more scientists hands quicker by doing it as open source projects instead. So, yeah, I mean, I think that that's, that's kind of the approach. But I don't know, it's an interesting question. I'm not sure that, I mean, obviously you were doing it as a, as a for-profit company, a bunch of the modeling before, then you run into certain issues. I mean, you have to raise a large amount of money in order to build a compute

Starting point is 00:18:06 clusters. You know, I mean, it's, I think in a lot of ways, the data is actually even more of a constraint. And because if you look at like the scale of these models compared to language models, they're smaller, but they're smaller because the amount of data is less. In order to get the data, it's not just like there's some factory somewhere that you can pay to produce the data. Like, you actually need to invent new novel scientific approaches to be able to do the, you know, for example, the type of cellular engineering we're doing in New York or the

Starting point is 00:18:37 types of devices in Chicago, which is why, you know, when we're talking about this concept of frontier biology and frontier AI, the frontier biology is you need to do real science to advance different biological methods in order to be able to observe the things that create the data that go into the model. So it's not just like an off-the-shelf thing that you can create. Now, That's a pretty big effort. I don't know that there are like that many things like that that are done as biotechs. I think it's just the scale of the ambition of what we're doing, the time horizon over which we're committed to doing it. I think part of the theory is like if you're building tools that are this complicated, you kind of want to have a 10 to 15 year time

Starting point is 00:19:16 horizon on building out these efforts. And then the scale of capital required. I mean, I guess there's no rule that said that you couldn't do it as like an incredibly well-funded startup, but I think that this just made more sense. And then it also is simplifying strategically to not have to think about how you're going to make money with the different things. And we just, we want to get the models in people's hands. We release them as open source. I think that that's a very valuable thing to do. And again, I mean, the theory isn't that we're going to cure the diseases. We're not. It's that we want to help accelerate the pace of progress for the whole scientific field. As the person least experienced with making money here, I would say that there, you,

Starting point is 00:19:56 the sort of neutral nonprofit nature of our work actually helps harness more people to enter this effort. And to actually achieve the mission of like understanding the totality of human biology and to cure, prevent, manage all diseases, you actually do need the entire academic biotech industry to come together and to work on this in a sort of unified way, in part because there's a lot of talent out there, and it's not helpful to leave any talent, exclude any talent from the effort. And there's a super long tail of diseases. There are the common ones, and even the common ones, I think if you unbundle heart disease, cancer, neurodegenerative diseases, even if you unbundle like dementia or depression, there are many, many, many subcategories that become more

Starting point is 00:20:51 and more niche, and that's not even looking at the long, long tail of rare diseases. Those often get orphaned and don't get brought along when we're sort of looking at the most efficient way to impact the lives of many. But if you sort of decentralize the effort and put the tools in many people's hands, you start getting people who are like, you know what, I am super interested in spinal mass muscular atrophy. And that's something I care deeply about. And if you put the tools in that person's hands, they're going to be able to make progress in a way if you had to focus your efforts and make big bets, you probably wouldn't because it's just a niche individual small group disease that actually will in turn, if we can understand that disease process helps us unlock knowledge

Starting point is 00:21:38 about a lot more about how the human body works. Do you have any thoughts or predictions in terms of what disease areas this work will impact first? I know it's very hard to be predictive about these things. But just given the nature of the work and the nature of the models, other areas are most optimistic about in the short to medium term? That's actually not how I think about it, at least. The way I think about it is like we want to understand how biology works. The ideal world, as you would say, I understand, I understand the genetics of this person. So I want to think about people at the individual level. I want to understand the genetics of this person. I want to understand the risks they have to different illnesses. I want to understand.

Starting point is 00:22:17 understand the mechanistic connection between, say, a gene variant, a protein, and a disease process. Because if you understand that through chain, then you can design a protein, design a drug, bespoke to them, and actually make an intervention. And right now, I'm sure we've all had experiences being sick. And if you have something that's even remotely non-standard, you go into PubMed, you look up a paper, you look up the supplement, and then you start going through the methods, and you're like, am I represented in this paper? And we're just making guesses. We really have no mechanistic understanding. We're saying, like, okay, you're kind of like these people that we studied.

Starting point is 00:23:03 And this drug kind of impacts the pathway that we think is implicated. Let's try and see if anything happens. And time passes, and sometimes it works and sometimes it doesn't. So my, goal is to be able to treat the individual as an individual, understand the mechanisms, and be able to intervene. And there are different diseases that are different stages of filling out that whole through line. And so for some diseases, you just want to understand which gene variants actually cause disease and which don't. And that in itself can be super empowering to patients. And if beyond that, there are some diseases where we understand the chain, we just can't intervene and change a specific protein function. That's super

Starting point is 00:23:55 exciting too. Like if we could design a protein to actually change the physiology, then we can actually cure someone. But to me, like, that is just as exciting as understanding, contributing to our understanding of like how someone gets sick in the first. Yeah. And so it's a very exciting vision because you're basically saying you can bring generalizable tools to provide very personalized things for each individual person. Yes. And that's the power of the approach is you have these big models that you build that can then apply anywhere. I know that you mentioned earlier that you were going to try and cure prevent all diseases within 100 years. And you mentioned, hey, it could actually be sooner now, given all the advances in AI. Do you have some thought of when we think we'll be

Starting point is 00:24:35 closer to that goal or some? I mean, I'm optimistic it'll be sooner. I mean, I think the thing that's complicated is that it's a dynamic system, right? So if you fix it. something. There will obviously be future things that you need to work on. So I don't think that the current set of things that we're aware of are going to be the only things that need to get worked out. But I don't know. I think that the progress with AI is really, is obviously, you know, very exciting on this. The other thing that I'd say, just adding to what you were saying a second ago, is we really look at more kind of systems than specific diseases. So, for example, one area that seems really important to understand is inflammation.

Starting point is 00:25:18 We talked about this a bunch. This is a big focus of the Chicago Bio Hub. There's a lot of data on that. It seems quite clear that it's connected to a bunch of different diseases, but we don't, rather than studying the specific diseases, we think that by trying to understand inflammation more broadly, that will make it so that other companies that can then use these tools can work on specific therapies. Another example is, I think that the immune system, I think, is a very good case to study for some of the work that we're doing in cellular engineering.

Starting point is 00:25:54 And when we're kind of ladder up from proteins to cells to like whole dynamic systems within the body, I think that that one makes sense. I mean, it's sort of privileged. It can, you know, the cells can travel around through the body, all that. You know, so obviously that has a big part in addressing different diseases. How do you make the immune system function better? but exactly how do you connect that last mile, I think is going to be more something that biotech or other academics individually studying things will be better suited to do. So this is like kind of how we think about building out the tool set that just helps accelerate

Starting point is 00:26:24 all these other folks. Whether the timeline is 10 years, hopefully, you know, less than 100 now, I think it's useful for maybe your average doctor or patient, human being, everybody's a patient, to to think about like what's externally visible in the progress here. You worked with patients for a long time at UCSF. Like what should doctors look out for? What should people look out for if you're actually accelerating progress? This is the part, you know, I'm super excited about the progress, especially with this launch that Alex and his team have put forward. And I think it's very clear that science is going to start moving pretty quickly. And I think the thing that's less

Starting point is 00:27:09 clear to me is exactly how we translate to the clinic and what that looks like. And I think what has to change is actually the way we do clinical research. And my hope is that we're really shortening the distance between bench research and patient impact. But there's a lot of steps there that we need people who actually take care of patients to think creatively and think about how to deploy safely. And that's a gap that we have some work in. We partner with Jennifer down on a CRISPRC program at UCSF. So we're dipping our toe and understanding how the deployment of research needs to change, given how quickly research will be progressing. But that one is still, I think, is still shaping up. Maybe I could say something about our most recent launch.

Starting point is 00:28:07 because I think it also kind of, you should actually explicitly about it. Yeah. Yeah. So, you know, because I guess it was just a week ago about now. So we announced the new ESM fold. And so this is basically an open system for scientific discovery and protein biology. It's a world model of protein biology that's been trained. It's a language model base. So it's been trained on billions of protein sequences, kind of learns these emerging. representations of protein biology. And then we can use it to make predictions of atomic resolution protein structure, and we can use it to, and it's really fast. So it's blazing fast. It's kind of illustrating this Pareto optimal frontier of kind of speed and accuracy in structure prediction. And so this allows us to kind of characterize, you know, really vast kind of stretches of the protein universe. So we folded over 1.1 billion proteins and predicted their structures and identified kind of features connecting all of them through mechanistic interpretability. But I think the thing that I thought was most exciting about this model is it's this really

Starting point is 00:29:21 general model of kind of protein biology. And so you can use it as a world model. You can actually really start to search the space of the world model to design new proteins. And it's It's really hitting state-of-the-art across pretty much every structure prediction benchmark, and especially on protein-protein interactions and protein-anibody interactions, which is really critical for therapeutic design. And so what we found is you can actually now use the model to design proteins and to design actually single-chain antibodies. And so you can do all of this digitally and then, you know, really in a small number of experimental trials,

Starting point is 00:30:02 basically like a 96 well plate, you know, select from hundreds of thousands of trajectories digitally, actually synthesize, you know, 96 proteins, tested in the lab, in a really kind of short, easy experimental cycle. And we found nanomolar binders there. And so, you know, that's really the level for therapeutic activity. So it's really, I think, showing that you can have these kind of general purpose models. We didn't design a model for antibodies. We didn't design a model to be able to bind one particular target. We just designed a model that could understand proteins, and you kind of get protein design as an emergent property. And then I also think it illustrates this kind of the power of open science and open source, because we release this as basically an open discovery engine.

Starting point is 00:30:57 And so really anyone can build on it. And so it takes what are these really intensive laboratory experiments where, you know, you have to screen through hundreds of thousands or millions of antibodies and high throughput screens in the lab. And, you know, you can really just kind of spin up an instance and compute and now, you know, be able to generate antibodies. You should say more about sort of like we took that data when we did an antibody screen and then we validated it. We looked at PDL in cells. And then we looked at it under the cryo-em and sort of how all that complemented, validated what you were seeing in the models. That's right. Yeah. So, I mean, I think it's really critical, you know, to actually go and characterize these molecules in the lab. And it's, you know, we have a structural biology centered here. We have incredibly powerful cryoem microscopes.

Starting point is 00:31:52 And so we're really able to kind of look at these proteins biophysically. and functionally. And so, you know, we designed proteins for several therapeutically relevant targets. And we're able to confirm their function. It's delightful when it works the way it's supposed to. Yeah, that's very amazing. We're able to look at the structure also. So you can see atomic resolution, kind of at the binding interface.

Starting point is 00:32:14 Is correct. I know a lot of your work is really focused on basic research and kind of building out the fundamentals. If I look at actual translation into drugs or drug development, often a clinical trial will be 15 years, it'll cost $1.5 billion. About 50 million of that often is the molecule in preclinical work, and it's a few years of work. And the other 1.45 billion and decade plus is actually the drug development side of it. A lot of that seems to be gated on some regulatory issues, some of its recruitment, it's a variety of things, but a lot of it also has to do with the failure of drugs

Starting point is 00:32:43 and trials around things like absorption or toxicity or things like that. Have you considered at all tackling that other chain of sort of molecular design and thinking, or is the primary focus more on the basic biology and sort of the initial sort of molecules. I mean, at least my hope in building this like comprehensive model of how, you know, cells work is actually also being able to predict off-target effects. I think you can do some of that actually with biological models. Because right now some of the off-target effects are we just didn't know, you know, your kidney cell also expressed this receptor.

Starting point is 00:33:17 And then when we tested in human, like we see it happening and we see renal toxicity. And so being, and if you have a single cell atlas that looks at all the different cell types, some of which actually were not predicted before we modeled them, you can start looking at which cells actually do have receptors for the target you thought you were exclusively targeting and be able to predict some of these downstream effects before we get into the human trials. And I think that that's actually one of the more exciting applications of the like a transcriptomic model to understand actually how the different cells will react when you intervene and do something. And, you know, but I think when you think about delivery mechanisms and patient care, you start, that's where you start having to be creative about when you ask you like, what disease do you want to care first. there are certain diseases that will be easier to, like, deliver a therapeutic to or the risk reward is, makes more sense. And, you know, I think we were all inspired by Baby KJ, I think last year now, when the team at CHOP was able to deliver a CRISPR therapeutic to edit a mutation that he had would have, that would have inevitably led him to significant neurodoxicity. and altered his life. But we were able to, that disease was very carefully chosen because we needed to target his liver cells.

Starting point is 00:34:53 And if we could easily deliver a product that would work in his liver. And I think that's when the creativity, the wherewithal to choose the right applications can help us unlock the first applications. Maybe something just to add to that also. You know, because, I mean, kind of you describe the conventional, you know, drug development process, right? And I think, you know, these kind of tools have the potential to have a lot of impact on that process. But, you know, what's interesting is to really start to think about kind of the new paradigms that can open up. And, you know, what does it mean if you can, you know, the barrier to develop a drug, to design a molecule, you know, to kind of get through all of those stages is so much lower.

Starting point is 00:35:37 And so you have programmable biology. and you can, you know, really start to, you know, create a medicine for every individual patient. I think that has enormous implications for how we, you know, how we do drug development and what the future of medicine looks like. It'll be an exciting day when the FDA accepts like a virtual clinical trial for the phase one or something or, you know, it's based on some person to view of that person. Yeah. Yeah. But even short of that, like thinking about the specific, like, mechanisms where you see this acceleration, like, I imagine if people feel like they can prove. predict impact in kidney cells or have a stronger perspective on talks because they have this

Starting point is 00:36:15 broader understanding, they'll be willing to try many more programs, right? Yeah, the recruitment could also change. And we have this program rare as one. And the basic idea is that a lot of people focus on the most common diseases, but there's this long tail. And the economics don't quite work out for companies to focus on those diseases. but if you can make it so that the groups of patients can kind of come together and organize and say, hey, we would take an experimental drug on this, then it actually, because of the cost that you're talking about

Starting point is 00:36:47 and how that's a huge amount of the overall cost, if you can flip that, then it actually makes it, so that the economics make a lot more sense to then if you can generate something more easily and you can pair it with a group of people. I think one of the interesting things from science and engineering is that often, you know, you can hit your head against the wall on the common problems and in this case diseases. But a lot of times you learn a lot more about a system from finding some kind of rare or like weird side thing that's happening. Edge case. So I don't know, I think that that's like always been kind of an interesting part of this that actually connects pretty well to this because now you're going to be able to enable a long tail of new kind of

Starting point is 00:37:29 ideas to get tried and enable them to potentially get tested more easily. Yeah. That's a really good point on rare. In our rare disease cohorts, first of all, they're incredibly inspiring and powerful. But patient groups are self-organizing patient registries, natural history registries, biobanks. They're organizing their own clinical trials. There's gene therapy that one disease group has moved forward over the course of like, I want to say, like, three to five years rather than decades. And the speed is so fast because the patients, themselves. And the patients, have organized the resources that a scientist or a clinician might need. And it's, it's, it's, it's, incredible. But I think to some degree you're going to need something like this because there are

Starting point is 00:38:18 going to be many more new things that can get created. But that doesn't mean that for like the general population that you're not going to want the same level of vetting that we've had historically. But making it so that people who want to be on more of the frontier have the ability to do that is, I think, also going to be pretty helpful. Yeah. Letting people opt in to be part of trials, I think, is one of the big ships that is starting to happen but could really help accelerate biology in general. All three of you have mentioned at different points, like the power of open ecosystems in such a large space. Like, I think some of that logic around open source and the breadth or diversity of data collection, you as were describing, it should also

Starting point is 00:38:57 apply in the like language model world and the multimodal AI world. Like, do you think that's right? Does any of the work you're doing here change? how you think about AI and meta. I mean, I think it's sort of a similar philosophy overall. And, you know, Priscilla was talking about this, that, you know, a lot of our focus is building tools that empower individuals to do things. And that's sort of a common theme across a lot of the things that I work on is just kind of putting the technology in individuals' hands.

Starting point is 00:39:27 We don't believe in this, like, very centralized future where there should be a small number of institutions that basically are advancing all of the stuff. Our vision is not that there's going to be like some central superintelligence that solves all of science. I think like people are really important and I think we'll be more important in the future. And giving people more tools to be more productive is going to be like a critical part of any kind of positive future. That both and that's how progress has always been made historically, right? It's not through centralization. It's through empowering individuals to try things that are somewhat out of the mainstream that other people didn't think were good ideas because they thought they were good ideas that already have been done.

Starting point is 00:40:06 So I think that that's very central to the whole ethos of, I mean, to some degree, it's like why you create something like social media or to give people a voice. It's, you know, I think a lot of the stuff that I care about in terms of empowering people with individual AI. Open source is one instantiation of it. It's not the only way to do it. It certainly is one way that you basically are saying we're going to take this technology and put it in everyone's hands.

Starting point is 00:40:32 In terms of science, I think it really makes sense. and we're deeply committed to open source. There are obviously interesting considerations on this that are important too because there's a lot of considerations around biosafety and things like that that we're going to need to balance and think through how to how to handle. But I think overall this is like very deep in the ethos of the work that we're doing both at BioHub and like probably a theme for a lot of the stuff that I do is just like we believe that a positive future is one where you build a technology as a tool,

Starting point is 00:41:03 you put it in individual's hands, and that's kind of how society makes progress. You have this, like, I think, an incredibly ambitious mission at BioHub, and yet, you know, the AI scientists that work here could also go work in commercial enterprises. How do you think about the talent and, like, how to bring people to BioHub? I mean, when you want to start? You know, yeah, I mean, it's a very hot market. for AI researchers. But I think that part of the part of what that means is that there's a lot of demand and you like if they're very in demand and can work on the things that they want to work on.

Starting point is 00:41:44 And I think this gets back to this point again about frontier AI and frontier biology. Right. So if, so yeah, I mean, I think like the AI researchers who work here could go work on on language models or things at any of the main labs. But those labs don't have the, frontier biology part attached to it. So I think that there's also a just very large mission component of this, which is like there's an ability to do this unique work here that you just can't really do it the other places. So if that's what your focus is, then this, then, you know, I don't actually think that there's any other organization in the world that's doing both the frontier biology and the frontier AI. Yeah. Why are you here, Alex? I mean, I think it's really simple.

Starting point is 00:42:30 Yeah. Our mission is take care of disease. And I think, you know, there's, it's just such a powerful. And you say it with a straight face and a less than 100 year time line. It's very serious now. There's no more. Yeah. Yeah. It's a really powerful mission. And I think, you know, you, yeah. I mean, it's just, you know, scientists, I think are very motivated by that. It's something people are deeply motivated by. And I think, you know, we're at this moment and talk. time where that actually seems like something that can be achieved. And I think, you know, we're building a really unique place where we're tackling that problem. And, you know, we have the resources. I think kind of the right things to actually really, really go after that and do that. Yeah. I mean, that resonates with me as somebody who, you know, talks to and hires a lot of research scientists. They want to know if you have the data, if you have the tools, if you have the compute, if you have the talent and then what the mission is. And so I actually think that's super competitive.

Starting point is 00:43:34 The other thing is that you don't need a very large team. Right. So I think it's like an interesting thing about the world is that people care about different missions. And that's good. I think that's like part of the whole, I mean, part of why building these tools and giving people the ability to explore what they care about, whether it's like across science or just across everything, is like such a powerful way to make progress in society is that people care about different things. And in order to make progress in AI, you don't need like many, many hundreds of

Starting point is 00:44:04 AI researchers or thousands or anything like that. I think you can really make progress with, you know, a very strong group of a dozen or a couple dozen people. And yeah, I mean, finding people who like care about this mission is not a particularly hard thing. I mean, this is like a super important thing in the world. So I think that that's, yeah, it's just kind of a cool thing about the world is that people obviously are drawn to different missions. So I think the simplest mental models that folks have, even if they're paying attention to the space, are essentially like, okay, you know, structured prediction models for proteins and protein protein interaction models. And then so there's this one piece, which is fundamental

Starting point is 00:44:44 understanding. And then there's this like theory of someday we're just going to be able like zero shot things into either the clinic or the clinic with much, much better hit rate. What needs to happen for us to go from ESM Fold 2 to this other piece? Yeah. Is that feasible? I think that's a great question. I mean, I would say that I'm really optimistic on that. So I think, you know, on the one hand, you know, these are problems that historically, you know, people could spend kind of an entire career working on. Like, how do you figure out how to effectively optimize a drug? How do you get it, you know, get it through preclinical? How do you do the early safety. I think that, you know, when you have a new scientific paradigm, kind of, you know,

Starting point is 00:45:26 questions that were wants hard kind of become simplified through the new paradigm. And so I'm very optimistic that kind of many of these core problems will be solved kind of in an emergent way through these models. And I think one great example of that is toxicity, whereas if you can kind of really digitally, digitally kind of simulate everything and be able to predict, you know, where a drug is going to distribute and bind across the human body. You know, like, you kind of have the beginning of a solution to that kind of problem. So I think that once you have these kind of accurate representations at the molecular level, you know, we're going to start to see really rapid progress on a lot of these core problems.

Starting point is 00:46:13 What is the most exciting use or experimentation with the models you've seen in the last week since release? Yeah, I mean, it's just been great to kind of see it get integrated in all kinds of things. I think one of the really interesting things that we've been seeing is people kind of connecting it with agentic systems to just kind of do automated design and kind of just automate that whole process. So it's really, I think, another example of how you can kind of see bringing together agentic and frontier AI with the ability to have a world model for biology and actually reason about biology. really kind of start to automate the entire design process. Are you taking, you know, how do you decide what the next step in the research agenda is? It's like world model for biology and then I could, I'm just going to be very coarse here, like I could scale it up, I could add more data, I could add, like adding data is a non-trivial thing in terms of new methods and domains. Like what is, do you take input from the larger ecosystem about, you know, how people are using it,

Starting point is 00:47:18 and what would make it more useful? Or is it really like we understand, like, the next step of structures or coverage that we're looking for? I mean, I think there's two things. So, like, we have a view on kind of the next big challenge, which I think is, you know, the virtual cell. And, you know, really being able to kind of ladder up the hierarchy of biological complexity to the cell. And... Sorry, very basic question. Yes.

Starting point is 00:47:40 Virtual cell model. Like, what is the input and output, I should expect? Yeah. I mean, I think there's a different views on that. But I think kind of what you ultimately... want is a system that can really model each of the levels of complexity. So, you know, the proteomic layer, the genetic layer, the transcriptomic layer, and connect that to the phenotype. And you need enough generality so that you can ask the model questions about a new intervention in a

Starting point is 00:48:08 context that it hasn't been trained on and kind of get an answer from it. And, you know, the gap that we we need to close as a field is being able to really make those predictions that can generalize. So that's going to require an enormous effort to generate data. Yeah, and then, I mean, in terms of what you decide to do next, I think this is like, you know, a pretty normal process of constraint management, right? I mean, it's like, I think every lab and every field across the world probably feels compute constrained.

Starting point is 00:48:40 I think that that's probably true here too, right? it's like, so, I mean, I know, like, you know, there's always questions. It's like, okay, should we double down more on advancing the protein piece? Should we do more of the cellular stuff? I think those are kind of ongoing debates in terms of how you sequence that. And then, yeah, within that, there's kind of being at the Pareto frontier about how much you want to train the different models in order to, like, and the size of the models is also dependent on the scale of the data that you have because, you know, for obvious reasons. So, yeah, I mean, I think it's, there's some of that is just where you want to be on the curves and the normal constraints. But I think that this is like

Starting point is 00:49:17 probably the same process that like any research organization goes through of like you want to go in all these different directions and you're just trying to constraint to optimize and make enough progress to do world class work at one thing at a time while planting some seeds that can blossom over the next couple years as well. Yeah. This has been the most dynamic period of technology at least I've seen over my career. I mean it's so exciting in terms of everything that's happening with AI. And every week there's something new that's changed. Are you tired or invigorated? I'm both. I feel like everyone feels. I feel like everybody's in this manic phase. Yes, it's a combination of invigorated and exhausted. Yeah, it's wonderful. And so I guess, you know, things are very

Starting point is 00:50:00 unpredictable right now. It's really hard to know what's coming. We have this almost like early signs of experimentation on the model side with agentic flows that we're starting to see in really interesting ways. models starting to help more and more with models that's still very, very early days for that, if you're thinking back five years from now and you were to define what success was relative to your efforts, and I know things are very dynamic, things changed a lot, but you have this common threat of tooling for the biohub, you have a common threat of empowering scientists at scale. You're looking back five years from now. Is there a specific thing that you really want to make sure that you've accomplished or achieved or a primary goal?

Starting point is 00:50:37 Well, but I think we have a pretty clear view of this, like, hierarchical. set of world models that we want to build around biology. And the other part of that is that we want to do the highest quality work in the world. Right. I mean, I think we're basically set up to do that between having a world-class AI research team and this collection of biohubs, which are world-class life sciences research organizations. I think that that's like fundamentally a setup that no other organization in the world has. But, you know, you can have a lot of great ingredients and that doesn't guarantee that you succeed. So, I mean, to me, like five years from now looking back, I think, you know, it's other, I'm sure other labs or efforts will try to produce, like, things that

Starting point is 00:51:21 approximate what we're trying to do. And I just think that we should be able to do something that is meaningfully better and a unique intellectual contribution to the world, right? I think that that's kind of what you, whenever you do any kind of research, that's what you're trying to do. Right? So, yeah, so if we do that, I think we'll all feel very good. I would also expect that at some point we'll just start seeing a lot more idea generation from the people using the models. But I have enough faith that that part will materialize, that for me it's more just about like making sure that we do world-class work. And I think if we do, like the rest almost will take care of itself. Very last question for you. Snapshot of its mid-20206 was the biggest update in your own

Starting point is 00:52:00 thinking about BioHub or the domain from the last year? Well, from the last year, I mean, you joined in the last year. I mean, I think the biggest thing that we basically rotated, and I think in the last year we basically kind of formalized that BioHub is the main focus of our philanthropy. So I think this is like, I've been a very big shift. But Alex and the team coming in, I think, has been interesting, not only because it's a world-class group, right? I mean, you guys have worked together for a while. I think also, I mean, you talked about how stuff is changing so much in the field. I think one thing that's underrated is like this is like a extremely talented group of people who also are like know each other and work well together and like are stable and good and like I think that that also

Starting point is 00:52:46 is underestimated in terms of the compounding benefit of like people being able to like work well in a stable environment over time. So I think that that's a really important piece. But part of what we wanted to do was prior to Alex leading the effort, the previous leaders of the biohub were basically primarily biologists who were interested in technology. Right. And now I think this is the point where we really flipped that, right? Where, I mean, obviously you have a background in biology as well, but like you are primarily an AI researcher who has a background in in AI and in biology. I think that that's like a deep reflection on on kind of the way that we expect that this is going to kind of drive more value in the future. So those are

Starting point is 00:53:38 probably the biggest updates in the last year in terms of the work that we're doing. I mean, it's a new leader, not just the leader, but a team that I think has been, is like a really good. And then yeah, I mean, I think on the rest of the industry, it's like, it's on track. I mean, I think like it's kind of this crazy thing because like when you have an exponentially growing curve, I think the way that an exponential curve feels is it's growing so quickly that the kind of emotional feeling is it can't possibly keep going, right? Because like it's, because it's just like, but I mean, the nature of an exponential curve is it doesn't just keep going. It keeps accelerating, right? Exponential growth is accelerating. So I think that that has all

Starting point is 00:54:20 of these like emotions and psychology attached to it. But I think fundamentally when you look at the curve in the industry, the kind of fundamental thing is it is on track. It has remained on that curve, which I think has all these very profound implications for all of these domains. But certainly it validates and makes one feel very good about making a very big investment in the things that will play out if you stay on that track. And it seems like we are. So that I think is very good news. I think the most important aspect of what you're doing there is you're actually closing the loop with the actual biology. Because with code and research, it's closed loop systems. And so they're very fast to iterate. This is an open loop system. So you're closing a loop.

Starting point is 00:55:01 And that's really crucial to progress. Yeah. For me, one of the biggest changes with the strategy we're driving now and Alex at the helm is, you know, before we had amazing teams moving generally in the same direction and understanding like the potential collaborations and interconnectedness of our work. But now we are arms linked moving together with the single goal. It's very directed. And it's very exciting. It's a little bit scary. But it's like truly a team playing off each other and trying to make progress towards this goal.

Starting point is 00:55:42 And that has taken a lot of work, but also the maturity, our teams being able to have their work at a level of maturation where it actually does make sense to. Turlock. Amazing. Well, to teams being on the curve, thank you guys for doing this. Thank you for joining us. Thank you. Find us on Twitter at No Pryors Pod.

Starting point is 00:56:04 Subscribe to our YouTube channel. If you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.

No Priors: Artificial Intelligence | Technology | Startups - Biohub: The Future of Biology is Open-Source with Co-Founders Mark Zuckerberg, Priscilla Chan, and Head of Science Alex Rives

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.