Dwarkesh Podcast - Evolution designed us to die fast; we can change that — Jacob Kimmel

Starting point is 00:00:00 Today, I have a pleasure of chatting with Jacob Kimmel, who is president and co-founder of New Limit, where they epigenetically reprogram cells to their younger states. Jacob, thanks so much coming on the podcast. Thanks so much for having me. Looking forward to the conversation. All right. First question, what's the first principles argument for why evolution just like discards that so easily? Look, I know evolution cares about our kids, but if we have longer, healthier lifespans,

Starting point is 00:00:23 we can have more kids, right? Or we can care for their longer. We can care for our grandkids. So is there some pliotropic? effect that anti-aging medicine would have, which actually selects against you staying young for longer? Yeah. So I think there are a couple different ways one can tackle this. One is you have to think about what's the selective pressure that would make one live longer? Right. And encode for higher health over longer durations. Do you have that selective pressure present? There's another which is,

Starting point is 00:00:49 are there any anti-selective pressures that are actually pushing against that? And there's a third piece of this, which is something like the constraints of your optimizer. If we think about the genome as a set of parameters and the optimizer's natural selection, then you've got some constraints on how that actually works. You can only do so many mutations at a time. You have to kind of spend your steps that update your genome in certain ways. So tackling those from a few different directions, like what would the positive possible selection be? As you highlighted, it might be something like, well, if I'm able to extend the lifespan of an individual, they can have more children, they can care for those children more effectively, that genome should propagate more readily

Starting point is 00:01:21 into the population. And so one of the challenges then, if you're trying to think back in sort of a thought experiment style of evolutionary simulation here would be, what were the conditions under which a person would actually live long enough for that phenotype to be selected for, and how often would that occur? And so this brings us back to some very hypothetical questions, things like, what was the baseline hazard rate during the majority of human and private evolution? The hazard rate is simply, what is the likelihood you're going to die on any given day? And that integrates everything.

Starting point is 00:01:52 That's like diseases from aging. That's getting eaten by a tiger. That's falling off a cliff. that's like scraping your foot on a rock and getting an infection and dying from that. And so from the best evidence we have, the baseline hazard rate was very, very high. And so even absent aging, you're unlikely to actually reach those outer limits of possible health, where aging is one of the main limitations. And so the number of individuals in the population that are going to make it later in that lifespan,

Starting point is 00:02:17 where using some of your evolutionary updates to try and actually push your lifespan upward is relatively limited. So the amount of gradient signal flowing back to the genome then is not as high as one might intuitively think. Right. By the way, just on that, often people who are trying to forecast AI will discuss basically how hard did evolution try to optimize for intelligence? And what were the things which optimizing for intelligence would have prevented evolution for selecting for at the same time, which would make it so that even if intelligence were a relatively easy thing to build in this universe, it would have taken evolution so long to get at human level intelligence. and potentially if intelligence would be really easy, then it might imply that we're going to get to super intelligence and Jupiter-level intelligence, et cetera, et cetera, the sky is a limit.

Starting point is 00:03:03 So one argument, you know, like birth canal sizes, et cetera, or the fact that, you know, we had to spend most of our resources on the immune system. But what you just sent out at is actually an independent argument that if you have this high hazard rate, that would imply that you can be a kid for too long because you got to, you know, the kids die all the time and you've got to become an adult

Starting point is 00:03:26 so that you can have your kids. Yeah, you've got to contribute resources back to the group. You can't just be a freeloader. You need to get calories, go out in the jungle, get some berries. Like if you're just hanging out learning stuff for 50 years, you're just going to die before you get to have kids yourself. So obviously humans have bigger brains and other primates. We also have longer adolescents,

Starting point is 00:03:45 which help us make use potentially of the extra capacity our brain gives us. But if you made the adolescents too big, then you would just die before you get to have kids. And if that's going to happen anyways, what's the point of making the brain bigger, aka, you know, maybe intelligence is easier than we think, and there's a bunch of contingent reasons

Starting point is 00:04:03 that evolution didn't turn as hard on this variable as it could have. I entirely agree with that particular thesis. You know, I think in biology in general, when you're trying to engineer a given property, be it being healthier longer, be it making something more intelligent. And this is true even at the micro level of trying to engineer a system

Starting point is 00:04:17 to manufacture a protein at high efficiency. You always have to start by asking yourself, Did evolution spend a lot of time optimizing this? If yes, my job is going to be insanely hard. If no, potentially there are some low-hanging fruit. And so I think this is a good argument for why potentially intelligence wasn't strongly selected for. And I think actually the lifespan argument plays back into intelligence to a degree.

Starting point is 00:04:36 Interesting. You start to ask, okay, if I have intelligence that's able to compound over time and I can develop, you know, for instance, in some hypothetical universe, my fluid intelligence lasts much longer into my lifespan. If the number of people who are reaching something like 65 is very small in a population. You're not necessarily going to select for alleles that lead to fluid intelligence preservation laid into life. This is actually part of my own pet hypothesis around some of the interesting phenomenology in when discoveries are made throughout lifespans. So there are some famous

Starting point is 00:05:02 results where, for instance, I'm going to get the exact age a little bit wrong, but in mathematics, most great discoveries happen roughly before 30. Why should that be true? That doesn't make sense. You can sort of put down a bunch of societal reasons for it. Oh, maybe you sort of, you know, become stayed in your ways. Your teachers have, you know, cause you to restrict your thinking by that point. But really, that's true across centuries. That's true across many different unique cultures around the world. That's true in both cultures from the East and cultures from the West. That seems unlikely to me. I think a much simpler explanation is that for whatever reason, our fluid intelligence is roughly maximized at the time where the population size during human evolution was maximal. If you had to

Starting point is 00:05:38 pick an age at which fluid intelligence was selected most strongly for, it's probably around 25 or 30. That's probably about the age of the adults in the large populations that were being selected. for during the rest of evolution. And so I think there's a lot of reason here to think that actually there's interplay between many features of modern humans and how long we were living and how that dictates some of the features that occur that rise and fall throughout our lives. So in one way, this is actually a very interesting RL problem, right? It's a long horizon an RL problem, 20-year horizon length, and then there's a scalar value of how many kids you have, I guess that's a vibe, et cetera. And if you know how hard, or I don't know, but if you've heard from

Starting point is 00:06:17 your friends about how hard RL is on these models for just very intermediate goals that last an hour or a couple hours. It's actually surprising that any signal propagates across a 20-year horizon. By the way, on the point about fluid intelligence speaking, so not only is it the case that in many fields achievement peaks before 30, in many cases, if you look at the greatest scientists ever, they had many of their greatest achievements in a single year. So... Yeah, the annulus mirror bliss. Yeah, exactly. Yeah, yeah, exactly. Yeah, Newton, what is it, optics, gravity calculus 21. Do you know the Alexander von Hummelts story?

Starting point is 00:06:51 No. So Alexander von Hummelts, one of the most famous scientists in history, is kind of forgotten now. But he had this one expedition to South America where he climbed Mount Chimborazo at a time when very few Europeans had done that. And so he was able to observe various ecological layers that were repeated across latitudes and across altitudes. And it caused him to formulate an understanding of how selection was operating on plants

Starting point is 00:07:13 at different layers in the ecosystem. And that one expedition was the basis of his. entire career. And so when you see something named Hummel, just to give you a sense of how famous this guy is, it's usually Alexander von Hummel. It's not like this is some like massive, prosperous German family name that just happens to be really common. It's this one guy. Right, right. And so really, it was like this singular year in which he conceived a lot of our modern understanding of botany and selective pressure. Interesting. So that's one out of three components of the evolutionary story. Yeah. So then the next piece of the evolutionary story is like, is there anything selecting against

Starting point is 00:07:42 longevity? Like, okay, let's just pretend everything I said was wrong. Can I still make an argument that maybe evolution hasn't maximally optimized are for our longevity. One argument that comes up, and I'll caveat and say, I don't know how strong some of the mathematical models that people put together here are. You can find people using the same idea to argue for and against. But there's this notion of what's called kin selection, that if you sort of take a selfish gene view of the world, that really this is the genome optimizing for the genome's propagation, it's not trying to optimize for any one individual, then actually optimizing for longevity is a pretty tricky problem because you have this nasty regularization term, which is that

Starting point is 00:08:16 if you're able to make a member of the population live longer, but you don't also counteract the decrease in their fitness over time, meaning you maybe extend maximum lifespan, but you haven't totally eliminated aging. Then the number of net calories contributed to the genome as a function of that person's marginal year and their own calorie consumption is less than if you were to allow that individual to die

Starting point is 00:08:36 and actually have two 20-year-olds, for instance, that sort of follow behind them. And so there is a notion by which a population being laden demographically with many aged individuals, even if they did have fecundity persisting out some period later in life, is actually net negative for the genome's proliferation, and that really a genome should optimize for turnover and population size at max fitness.

Starting point is 00:08:56 I love this idea of aging as a length regularizer. So people want to be familiar with the idea that when companies are training models, they'll have a regularizer for you can do chain of thought, but don't make the chain of thought too long. And then you're saying, like, how many calories you consume over the course of your life is that one such regularizer? Yeah. That's interesting. Okay. And then the third point was... The third piece is basically optimization constraints.

Starting point is 00:09:22 Yeah. So I think this is where another ML analogy is helpful, which is something like, well, actually a two-layer neural network is technically a universal approximator, but we can never actually fit them in such a way. Yeah. And why does that occur? People will wave their hands, but it basically comes down to we don't really know how to optimize them, even if you can prove out in a formal sense that they are universal approximators. And so I think we have similar optimization challenges with our genome as the parameters and evidence. evolution as the optimization algorithm. And one of those is that your mutation rate basically bounds the step size you can take. So if you imagine that at each generation, you get some number of inputs, you can select for some number of alleles. Well, the max number of variations in the genome is set by your mutation rate. If you dial your mutation rate up too high, you probably get a bunch of cancers. So you're selected against. If you have it too low, you can't really adapt to anything.

Starting point is 00:10:07 So you end up with this happy medium, but that limits your total step size. And then the number of variants you can screen in parallel is basically limited by your population size. And so for most of evolution, there were lots of forces constraining population size as well. One of the dominant source of selection on the genome is really prevention of infectious disease. And it seems like when you study the history of early modern man, infectious disease is actually what shaped a lot of our population demographics. And so there's a lot of pressure pushing for those step sizes, those updates to the genome, really to be optimizing for protection against infectious disease rather than other things. And so even if you imagine

Starting point is 00:10:39 that maybe the arguments on the former and the first and the second of these possible, you know, positive selection being absent for longevity and potentially some negative selection existing. You could, I think, construct a reasonable argument for why humans don't live forever, why the genome hasn't optimized for that, simply based on these optimization constraints. You have to imagine not only that the positive selection is there and the negative selection is absent, but that when you think about sort of the weighted loss term of all the things the genome is optimizing for, that the weight on longevity is high enough to matter. And so even if you imagine it's there, if you simply imagine that the lambdas are dialed

Starting point is 00:11:11 toward infectious disease resilience more effectively, then you can construct an argument for yourself. And so I think really when you start to ask, why don't we live forever? Why didn't evolution solve this? You actually have to think about an incredibly contingent scenario where both the positive selection is there, the negative selection is absent, and you have a lot of our evolutionary pressure going toward longevity to solve this incredibly hard problem in order to construct the counterfactual in which longevity is selected for and does arise in modern man and in which we are optimal. And so I think that puts human aging and longevity and health, really in this category of problem in which evolution has not optimized for it.

Starting point is 00:11:46 Ergo, it should be, relatively speaking, relative to a problem evolution had worked on, easy to try and intervene and provide health. And I think in many ways, the existence of modern medicines, which are incredibly simplistic, we are targeting a single gene in the genome and turning it off everywhere at the same time. And yet the fact that these provide massive benefit to individuals is another sort of positive emission or piece of evidence. Antibiotics are an even more clear case of that because here is something that

Starting point is 00:12:11 evolution actually cares a lot about. Right? So it feels like antibiotics should have been... Why didn't humans of all their own antibiotics? Yeah. It's actually an excellent question that I haven't heard post before. So we think about where do antibiotics come from? To your point, we could synthesize them.

Starting point is 00:12:23 They're just metabolites largely of other bacteria in fungi. You think about the story of penicillin. What happens? Alexander Fleming finds some fungi growing on a dish. And the fungi secrete this penicillin antibiotic compound. And so there's no bacteria growing near the fungi. And he says he has this light bulb moment of, oh, my gosh, they're probably making something that kills bacteria.

Starting point is 00:12:39 Yeah. There's no prima facie reason that you couldn't imagine encoding an antibiotic cassette into a mammalian genome. I think part of the challenge that you run into is that you're always an evolutionary competition. There's this notion of what's called the Red Queen hypothesis. It's an allusion to the story in Lewis Carroll's through the looking glass where the Red Queen is running really fast just to stay in place. So when you look at sort of pathogen host interactions or competition between bacteria and fungi that are all trying to compete for the same niche, what you find is they're evolving very rapidly in competition with one another. It's an arm's. race. Every time a bacteria evolves a new evasion mechanism, the fungus that occupies the niche will evolve some new antibiotic. And so part of why there is this competitiveness between the two is they both have very large population sizes in terms of number of genomes per unit resource they're consuming. There are trillions of bacteria and a drop of water that you might pick up. So there's trillions of copies of the genome, massive analog parallel computation. And then at the same time,

Starting point is 00:13:33 they can tolerate really high mutation rates because they're prokaryotic. They don't have multiple cells. So if one cell manages to mutate too much and it isn't viable or it grows too fast, it doesn't really compromise the population in the whole genome. Whereas for metazzoans, like you and I, if even one of our cells has too many mutations, it might turn into a cancer and eventually kill off the organism. So basically what I'm getting at, and this is a long-winded way of getting there, is that bacteria and other types of microorganisms are very well adapted to building these complex metabolic cascades that are necessary to make something like antibiotics. And they are necessary to maintain that same mutation rate and population size in order to maintain the

Starting point is 00:14:12 competition. Even if our human genome stumbled into making an antibiotic, most pathogens probably would have mutated around it pretty quickly. Actually, that should imply that there's, through evolutionary history, millions of quote unquote, naive antibiotics, which could have acted as antibiotics, but now basically all the bacteria have evolved around it. Do we see evidence of these like historical antibiotics that some fungi came up with and a bacteria revolved around and there's evidence for a remnant in their DNA? I'm going a bit beyond my own knowledge here. So I want to say my strong hypothesis would be yes. I can't point to direct evidence today. There are some examples of this where, for instance, bacteria that fight off viruses that infect them, bacteria phages,

Starting point is 00:14:56 have things like CRISPR systems. And you can actually go and look at the spacers, the individual guide sequences that tell the CRISPR system. Which genome do you go? Where do you cut? And you find some of these guides that are very ancient. It seems like this bacterial genome might not have encountered that particular pathogen for quite a while, and so you can actually get sort of an evolutionary history of what was the warfare like, what were the various conflicts throughout this genomic history, just by looking at those sequences. In mammals, where I do know a bit better, we do have examples of this, where there is this co-evolution of pathogen and host. Imagine you have some antipathogen gene A fighting off some virus X. Well, you then actually

Starting point is 00:15:31 updates, so now you have virus X prime and antipathogen gene A prime. Now, virus X prime goes away, but actually virus X still exists, and we've lost our ability to fight it. Those examples really do happen, and so there's a prominent one in the human genome. So we have a gene called Trim 5 Alpha, and it actually binds a endogenous retrovirus that is no longer present, but was at one point actually resurrected by a bunch of researchers. And it was demonstrated that it is the case. We have this endogenous gene, which basically fits around the capsid of the virus, like a baseball in a glove, and prevents it from infecting. And it turns out, if you look at the evolutionary history of that gene, and you trace it back through. monkeys. You can actually find that a previous iteration inhibited

Starting point is 00:16:08 SIV, which is the cousin of HIV in humans. And so old world monkeys actually can't get SIV, whereas new world monkeys can and humans can, obviously. And so it seems like what happened, and you can actually make a few mutations in Trim 5 alpha and find that this is true, is that Trim 5 alpha once protected against an HIV-like pathogen in the primate genomes. And then there was this challenge from this massive endogenous retrovirus. And it was so bad that the genome lost the ability to fight these HIV-like viruses in order to restrict this endogenous retrovirus. And you can see it because that retrovirus integrates into our genome.

Starting point is 00:16:41 There are like latent copies, like the, you know, half bodies of this virus all throughout our DNA code. And then this particular retrovirus when extinct. Reasons unknown. No one knows why. But we didn't like re-update that piece of our host defense machinery to fight off HIV again. And so we're in a situation where you can go in and take human cells and make just a couple edits in that trim 5 alpha gene.

Starting point is 00:17:00 And it's currently protecting against a virus which no longer exists. And you can edit it back to actually. restrict HIV dramatically. So there are plenty of examples. You could imagine the same thing for antibiotics. We're like, hey, this particular, you know, defense mechanism went away because the pathogen evolve its own defense to it. Well, the pathogen might have lost that defense long ago, and if you could sort of extract that historical antibiotic, that historical antifungal, potentially it actually has efficacy. Isn't the mutation rate per base pair per generation, like one in a billion or something? It's quite low. So you're saying that in our genomes we can find

Starting point is 00:17:32 some extended sequence which encodes how to bind specifically to the kind of virus that SIV is. And the amount of evolutionary signal you would need in order to have a multiple base pair sequence. So each nucleotide consecutively would have to mutate in order to finally get the sequence that binds to SIV. That seems almost implausible that you could, I mean, I guess evolution works. So like we can come up with new genes, right? But like how would that even working out. I think a great explanation for understanding a lot of evolution and how you're able to actually adapt to new environments, new pathogens, is that gene duplication is possible. And this explains a whole lot. If you look at most genes in the genome, they actually arise at least at some point

Starting point is 00:18:15 in evolution from a duplication event. So that means you've got gene A, it's doing, you know, it's performing some job, and then some new environmental concern comes along. Maybe it's like a lack of particular source of nutrient. Maybe it's a pathogen challenging you. And maybe genea, if it were to dedicate all of its energies, so to speak, you were to mutate it to solve this new problem, could be adapted with a minimal number of mutations. But then you lose this original function. So we have this nice feature of the genome, which is it can just copy and paste. And so occasionally what will happen in evolution is you get a copy paste event. Now I've got two copies of gene, and I can preserve my original function in the original copy. And then this new copy

Starting point is 00:18:51 can actually mutate pretty freely because it doesn't have a strong selective pressure on it. So most mutations might be null. I've got two copies of the gene. I can have lots of mutations in it accumulate, nothing back. really happens because I've got my backup copy, my original, and so that you can end up with drift. So you're saying that even though the per base pair mutation rate might be one in a billion, if you've got 100 copies of a gene, then the sort of like mutation rate on a gene or on a low-hamming distance sequence to the one you're aiming for might actually be quite high and so you can actually get the target sequence. It's not that the base rate goes up. It's not like

Starting point is 00:19:23 DNA polymerase is, you know, more erroneous or that you're just like doubling it. It's like, oh, well, I've got two copies. That is true, but I don't think it's. the main mechanism. One of the main mechanisms that just makes it difficult for evolution to solve a problem is that if a mutation breaks a gene or somewhere along the path of edits, imagine there are three edits that take a host defense gene from restricting SIV to restricting this new nasty PT endogenous retrovirus. Well, if one edit just breaks the gene, two edits just breaks the gene, three edits fixes it, it's really hard for evolution to find a path whereby you're actually able to make those first two edits because they're net negative

Starting point is 00:19:58 and that net negative for fitness. And so you need some really weird contingent circumstances. So through duplication, you can create a scenario where those first two edits are totally tolerated. They have no effect on fitness. You've got your backup copy. It's doing its job. And so even though the mutation rate is low,

Starting point is 00:20:13 some of these edits actually aren't that large. I'm going to forget the number of edits, for instance, in Trim 5 Alpha for this particular phenomenon we're talking about for memory, but it's in like the tens. It's not that you need massive kilobase scale rearrangements. It's actually a fairly small number of edits. And basically, you can just,

Starting point is 00:20:28 align the sequence of this gene in New World versus Old World Monkeys and then for humans, and you find there's a very high degree of conservation. Conceptually, is there some phylogenetic tree of gene families where you've got the transposons and you've got like the gene itself, but then you've got like the descendant genes, which are like low-haming distance?

Starting point is 00:20:49 I don't know. Is there like some conceptual way in which is they're categorized? You can arrange genes in the human genome by homology to one another. What you find is even in our current genome, even without having the full historical record, there are many, many genes which are likely resulting from duplication events. One, like, trivial way that you can check this for yourself is, like, just go look at the names of genes.

Starting point is 00:21:07 And very often you'll see something where it's like gene one, gene two, gene three, or, you know, type one, type two, type three. And if you then go look at the sequences, sometimes those names arise from like they were discovered in a common pathway and they have nothing to do with each other. A lot of the time, it's because the sequences are actually quite darn similar. And really what probably happened is they evolved through a duplication event and then maybe did some swapping with some other genes. and you ended up with these quite similar, quite homologous genes that now have specialized functions. So it's like when evolution has a new problem to solve, it doesn't have to start from scratch.

Starting point is 00:21:35 It starts from like, what was the last copy of the parameters for encoding a gene that is getting close to solving this? Okay, let's do a copy paste on that and then iterate and fine tune on those parameters as opposed to having to start with like ab initio, some random stretch of sequence somewhere in the geno has to become a gene. Interesting.

Starting point is 00:21:51 Man, this is fascinating. Okay, back to aging. You'll have to cancel her evening plans. I've got so many questions for you, and I... Keep going. So the second reason you gave, which was that there's selective pressure against people who get old, but still keep living, but they're, like, slightly less fit. They're suboptimal from a calorie input perspective. Right.

Starting point is 00:22:19 The number of calories they can gather for the population. And that's how people love thinking about their grandpas, you know. Yeah, yeah. Some optimal from a calorie provider. A total calorie provider right there. Anyways, so a concern you might have about the effects of longevity treatments on your own body is that you will fix some part of the aging process, but not the whole thing. It seems like you're saying that you actually think this is the default way in which an anti-aging

Starting point is 00:22:46 procedure would work because that's the reason evolution didn't optimize it for it. It's just that like, we're only fixing half of the aging process and not the whole thing. Whereas sometimes I hear longevity proponents be like, no, we'll get the whole thing. There's like going to be a source that explains all of aging and we'll get it. Whereas your evolutionary argument for why evolution didn't optimize against aging relies on the fact that aging actually is not monocausal. And evolution didn't bother to just fix one cause of aging. Yeah, I think that's correct. I don't think that there is a single mono causal explanation for aging.

Starting point is 00:23:22 I think there are layers of molecular regulation that explain a lot. For instance, I have dedicated my career now to working on epigenetics and trying to change which gene cells use because I think that explains a lot of it. But it's not that there is like some upstream bad gene X and all we have to do is turn that off and suddenly aging is solved. And so I think the most likely outcome is that when we eventually develop medicines that prolong health in each of us, it's not going to fix everything all at once. There's not going to be a singular magic pill.

Starting point is 00:23:48 But rather, you're going to have medicines that add multiple healthy years to your life. you can't otherwise get back. But it's not going to fix everything at the same time. You are still going to experience for the first medicine some amount of decline over time. And this gives you an example of if you think about evolution as a medicine maker in this sort of anthropomorphic context, why it might not have been selected for immediately.

Starting point is 00:24:09 What would the AI Foundation model for trading and finance look like? It would have to be what LLM starts to NLP or what the virtual cell is for biology. And it would have to integrate every single kind of information from around the world, from order books to geopolitics. Now, think about how insane this training objective is. Here's this constantly changing our environment

Starting point is 00:24:28 with input data that's incredibly easy to overfit to where you're pitted against extremely sophisticated agents who are learning from your behavior and plotting against it. Obviously, there's very few things in the world that are as complex to global capital allocation. It's a system that reflects billions of live decisions in real time. Now, as you might imagine, trading in AI to do all of this is a compute-intensive task.

Starting point is 00:24:49 That's why Hudson River Trading continually upgrades this massive in-house cluster with fresh racks of brand-new B-200s being installed as we speak and more on the way. HART executes about 15% of all U.S. trading equities volume, and researchers there get compensated for the massive upside that they create. If the newest researcher on the team improves an H.R.T model, their contributions are recognized and rewarded right away, regardless of their tenure. If you want to work on high stakes, unsolved problems, unconstrained by your GPU budget, check out HART at hussendriver trading.com slash forecash. All right, back to Jacob. All right, so evolution didn't select for aging.

Starting point is 00:25:31 What are you doing? What's your approach to new limit that you think is likely to find the true cause of aging? Yeah, so we're working on something called epigenetic reprogramming, which very broadly is using genes called transcription factors. I like to think about these as sort of the, orchestra conductors of the genome. They don't perform many functions directly themselves, but they bind specific pieces of DNA, and then they tell which genes to turn on, which genes to turn off. They eventually put chemical marks on top of DNA, on some proteins that DNA surrounds. And this is one of

Starting point is 00:25:59 the answers, this particular layer of regulation, called the epigenome. It's the answer to this fundamental biological question of how do all my cells have the same genome, but ultimately do very different things. Your eyeball and your kidney have the same code, and yet they're performing different functions, and that may sound a little bit simplistic, but ultimately I think it's kind of a profound realization. And so that epigenetic code is really what's important for cells to define their functions. That's what's telling them which genes to evoke from your genome. What has now become relatively apparent is that the epigenome can degrade with H.

Starting point is 00:26:28 It changes. The particular marks that tell your cells which genes to use can shift as you get older. This means that cells aren't able to use the right genetic programs at the right times to respond to their environment. You're then more susceptible to disease. you have a less resilience, too many insults that you might experience. And our hope is that by remodeling the epigenome back toward the state it was in when you were young right after development, that you'll be able to actually address myriad different diseases,

Starting point is 00:26:53 whose one of strong contributing factors is that cells are less functional than when you were at an earlier point in your life. So we're going after this by trying to find combinations of these transcription factors that are able to actually remodel the epigenome so that they can buy into just the right places in the DNA and then shift the chemical marks back toward that state when you were a young individual. If you were just making these broad changes to a cell state through these transcription factors, which have many effects,

Starting point is 00:27:21 are there other aspects of a cell state that are likely to get modified at the same time in a way that would be deleterious, or would it be a sort of straightforward effect on cell state? Oh, how I wish it were straightforward. No, it's very likely. Each of these transcription factors bind hundreds to thousands of places in the genome.

Starting point is 00:27:41 And one way of thinking about it is if you imagine the genome is sort of the base components of cell function, then these transcription factors are kind of like the basis set in linear algebra. It's different combinations and different weights of each of the genes. And so most of them are targeting pretty broad programs. And there are no guarantees that aging actually involves moving perfectly along any of the vectors in this particular basis set. And so it's probably going to be a little tricky to figure out a combination that actually

Starting point is 00:28:05 takes you backward. There's, again, no guarantees from evolution. it's just a simple reset. And so it's actually a critical part of the process that we run through as we try and discover these medicinal combinations of transcription factors we can turn on, is to ensure that they not only are making an age cell revert to a younger state. We measure that a couple different ways. One is simply measuring which genes those cells are using. They use different genes as they get older. You can measure that just by sequencing all of the MRIs, which are really the expressed form of the genes being utilized in the genome at a given time.

Starting point is 00:28:33 You see that age cells use different genes. Can I revert them back to a younger state? colloquially we call this, you know, looks like assay. Can I make an old cell look like a young one based on the genes it's using? And maybe more importantly, we go down and drill to the functional level and we measure, can I actually make an age cell performance functions? It's object rolls within the body the same way a young cell would. And these are the really critical things you care about for treating diseases. Can I make a hepatocyte, a liver cell in Greek? Function better in your liver, so it's able to process metabolites like the foods you eat, how it's able to process toxins like alcohol and caffeine. Can I make a T-cell respond to pathogens and other

Starting point is 00:29:06 antigens that are presented within your body. So these are the ways in which we measure age. And so we need to ensure that not only does the combination of TFs that we find actually have positive effects along those axes, but we then want to also measure any potential detrimental effects that observe that image. So there are canonical examples where you can seemingly reverse the age of a cell, for instance, at the level of a transcriptome, but simultaneously you might be changing that cell's type or identity. So Shinya Amanaka, a scientist who won the Nobel in 2012 for some work he did in about 2007, discovered that you could just take four transcription factors, and actually just by turning on these four genes, turn an adult cell all the way back

Starting point is 00:29:41 into a young embryonic stem cell. This is a pretty amazing existence proof that shows that you can reprogram a cell's type and a cell's age simultaneously just by turning on four genes. Out of the 20,000 genes in the genome, the tens of millions of biomolecular interactions, just four genes is enough. That's a shocking fact. And so we actually have known for many years now that you can reprogram the age of a cell, the challenge is that simultaneously you're doing a bunch of other stuff, as you alluded to. You're changing its type. And that might be pathological. If you did that in the body, it would probably cause a type of tumor called a teratoma. So we measure not only at the the level of the genes a cell is using, do you still look like the right type of cell? Are you still

Starting point is 00:30:15 hepatocyte? Are you still a T-cell? If not, that's probably pathological. But you can also use that same information to check for a number of other pathologies that might develop. Did I make this T-cell hyper-inflammatory in a way that would be bad? Did I make this liver cell potentially neoplastic proliferate too much, even when the organism's healthy and undamaged. And you can check for each of those at the level of gene expression programs, and then likewise functionally. Before you put these molecules in a human, you actually just functionally check in an animal. You make an itemized list of the possible risks you might run into. Here are the ways it might be toxic. Here are the ways it might cause cancer. Are we able to measure deterministically

Starting point is 00:30:49 and empirically that that doesn't actually occur? Okay. This is a dumb question, but it will help me understand why an AI model is necessary to do any of this work. So you mentioned the Yamanaka factors. From my understanding, the way he identified these four transcription factors was that he found the 24 transcription factors that associated, that have high expression in embryonic cells, and then he just turned them all on in a somatic cell. Basically, he systematically removed from this set until they found the minimal set that still induces a cell to become a stem cell, and that just doesn't require any fancy AI models, et cetera. Why can't we do the same things for the transcription factors that are associated with younger cells,

Starting point is 00:31:35 or express more in younger cells as opposed to older cells, and then keep eliminating from them until we find the ones that are necessary to just make a cell young? I wish it were so easy. You're entirely right. You know, Shinya Amanaka was able to do this with a relatively small team with relatively few resources and achieve this remarkable field. So it's entirely worth asking. Why can't a similar procedure work for arbitrary problems in reprogramming cell state, whether it be trying to make an age cell act like a young one, disease cell act like a healthy one, why can't you just take 24 transcription factors and randomly sort through them?

Starting point is 00:32:05 So there were two features of Shinya's problem that I think make it amenable to that sort of interrogation that aren't present for many other types of problems. And this is why he's such a remarkable scientist. Most of science is problem selection. You don't actually get better at pipetting or running experiments after a certain age, but you do get better at picking what to do. And he's amazing at this. So the first feature is that measuring your success criterion is trivial in the particular case he was investigating. He's starting with somatic cells that in this case were a type of fiberblast, which literally is defined as cells that stick to glass and grow in a dish when you grind up a tissue. So it sounds fancy, but it's a very simplistic thing.

Starting point is 00:32:38 So he's starting with fibroblast. You can look at them under a microscope, and you can see their fiberblast just based on how they look. And then the cells he's reprogramming toward are embryonic stem cells. So these are tiny cells. They're mostly nucleus. They grow really, really fast. They look different. They detach from a dish.

Starting point is 00:32:53 They grow up into a 3D structure. And they express some genes that will just never be turned on in a fibroblast by definition. So actually, how he ran the experiment was he just set up a simple reporter system. So he took a gene that should never be on in a fibroblast, should only be on in the embryo. And he put a little reporter behind it so that these cells would actually turn blue when you dumped a chemical on them. And then he ran this experiment in many, many dishes with millions upon millions of cells. The second really key feature of the problem is this notion that those cells he's converting into amplify. They divide and grow really quickly.

Starting point is 00:33:25 So in order for you to find a successful combination, you don't actually need it to be efficient, almost at all. The original efficiency Yamanaka published, the number of cells in the dish that convert from somatic to an induced pluripotent state back into a stem cell is something like a basis point or a tenth of a basis point. So like 0.01, 0.01%. If these cells were not growing and they were not proliferating like Matt, you probably would never be able to detect that you had actually found anything successful. It's only because success is easy to measure once you have it and even being successful in very rare cases, one in a million, amplifies and you can detect it that this, I think, was amenable to his particular approach. So in practice, what he would do is dump these factors or this group of 24 minus some number eventually whittling it down to four. He would dump these onto a group of cells. And over the course of about 30 days, just a few cells in that dish, like a countable number on your fingers, would actually reprogram.

Starting point is 00:34:19 But they would proliferate like mad. They form these big what we call colonies because it's like a single cell that just proliferates and forms a bunch of copies of itself. They form these colonies. You can see with your eyeballs by holding the dish up to the light and looking for opaque little dots on the bottom. You don't need any fancy instruments. And then you could stain them with this particular stain and they would turn blue based on the genetic reporter he had. So now we look at those key features of the problem. And we pick any other problem we're interested in aging, so that's what I'm going to pick for explanation.

Starting point is 00:34:45 How difficult is it to measure the likelihood of success or whether you've achieved success for cell age? Well, it turns out age is much more complicated in terms of discriminating function than actually just comparing two types of cells. An old liver cell and a young liver cell, prima facie, actually look pretty darn similar. It's actually quite nuanced the ways in which they're distinct. And so there isn't a simple trivial system where you just like label your one favorite gene or you can just give the young cells cancer. They'll grow, you know. Yeah, yeah. You want to see them.

Starting point is 00:35:12 Just make the old ones cancer, and then they'll grow. Yeah, Dorcas, you've solved it for me. So there's no trivial way that you can tell whether or not you've succeeded. You actually need a pretty complex molecular measurement. And so for us, a real key enabling technology, and I don't think our approach would really have been possible until it's emerged with something called single cell genomics. So you now take a cell, rip it open, sequence all the MRNAs it's using. And so at the level of individual cells, you can actually measure every gene that they're

Starting point is 00:35:36 using at a given time and get this really complete picture of a cell state, everything it's doing, lots of mutual information to other features. And from that profile, you can train something like a model that discriminates young and aged cells with really high performance. It turns out there's no one gene that actually has that same characteristic. So unlike in Yamanaka's case where a single gene on or off is like an amazing binary classifier, you don't have that same feature of easy detection of success in aging. The second feature is, as you highlighted, we can't just turn these into cancer cells.

Starting point is 00:36:04 Success doesn't amplify. And so in some ways, the bar for a medicine is higher than what Yamanaka achieved in his laboratory discovery. You can't just have 0.001% success and then wait for the cells to grow a whole bunch in order to treat a patient's disease or, you know, make their liver younger, make their immune system younger, make their endothelium younger. You need to actually have it be fairly efficient across many cells at a time. And so because of this, we don't have the same luxury Yamanaka did of taking a relatively small number of factors and finding a success case within there that was pretty low efficiency. We actually need to search a much broader portion of TF space

Starting point is 00:36:39 in order to be successful. And when you start playing that game and you think, okay, how many TFs are there? Somewhere between 1,000 and 2,000, depends on exactly where you draw the line and developmental biologists love to argue about this over beer. But let's call it 2000 for now.

Starting point is 00:36:52 And you want to choose some combination, let's say you guess it's like somewhere between one and six factors might be required. The number of possible combinations is about 10 to the 16. So if you do any like math on the back of a napkin, in order to just screen through all of those, you would need to do many orders of magnitude,

Starting point is 00:37:06 more single cell sequencing than the entire world has done to date cumulatively across all experiments. And so it's just not tractable to do exhaustively. And so that's where actually having models that can predict the effect of these interventions comes in. If I can do a sparse sampling, I can test a large number of these combinations. And I can start to learn the relationship of what a given transcription factor is going to do to an age cell. Is it going to make it look younger? Is it going to preserve the same type? I can learn that across combinations. I can start to learn their interaction terms.

Starting point is 00:37:34 Now I can use those models to actually predict in silico for all the combinations I haven't seen, which are most likely to give me the state I want, and you can actually treat that as a generative problem and start sampling and asking which of these combinations is most likely to take my cell to some target destination in state space. In our case, I want to take an old cell to a young state, but you could imagine some arbitrary mappings as well. And so I think as you get to these more complex problems

Starting point is 00:37:56 that don't have the same features that Shinia benefited from, which were the ability, again, to measure success really easily. You can see it with your bare eyes. You don't even need a microscope. And two, amplification. As you get into these more challenging problems, you're going to need to be able to search a larger fraction of the space to hit that higher bar. So we can think of these transcription factors as these basis directions,

Starting point is 00:38:15 and you can get like a little bit of this thing, a little bit of that thing and some combination. And evolution has designed these transcription factors to, is that your claim to have a relatively modular, self-contained effects, that work in predictable ways with other transcription factors? And so we can use that same handle to our own ends? Yeah, yeah, that would be very much my contention. And one piece of evidence for this is that's the way the development works. You know, it's kind of a crazy thing to think about, but you and I were both just like a single cell, and then we were a bag of undifferentiated cells that were all exactly alike. And then somehow we became humans with hundreds of different cell types all doing very different things. And when you look at how development specifies those unique fates of cells, it is through groups of these transcription factors that each identify a unique type. And in many cases, actually, the groups of transcription factors, the sets that specify very different fates. are actually pretty similar to one another. And so evolution has optimized for being able to just swap one TF in or swap one TF out of a combination

Starting point is 00:39:14 and get pretty different effects. And so you have this sort of like local change leading to local change in sequence or gene set space leading to a pretty large global change in output. And then likewise, many of these TFs again are duplicated in the genome. And because mutations are going to be random and they're inherently small changes at the level of sequence at a given time, evolution needs a substrate where in order to function effectively, these small changes can give you relatively large changes in phenotype. Otherwise, it would just take a very long time across evolutionary history for enough mutations

Starting point is 00:39:46 to accumulate in some duplicated copy of the gene for you to evolve a new TF that does something interesting. And so I think we're actually in most cases in biology due to that evolutionary constraint. Small edits need to lead to meaningful phenotypic changes in a relatively favorable regime for generic gradient-like optimizers. You know, it would be maybe a little bit overstating to say evolution is like using the gradient, but there is a system kind of like if you've heard of evolution strategies where basically the way you optimize parameters is you can't take a gradient on your loss. So you make a bunch of copies of your parameters. You randomly modify them. And then you compute a gradient on your parameters against your loss. And so you can take a gradient in that space. That's kind of how I imagine evolution is working. And so you need lots of those little edits to actually lead you in to have meaningful step sizes in terms of the ultimate output that you have. Interesting. You're just like designing the laura that goes on top of... Yeah, yeah, in a way. And to think like, you know, why would transcription factors,

Starting point is 00:40:41 maybe this is getting a little bit too gigabrained about it, but like, why is the genome even have transcription factors? Like, what's the point? Why not just have every time you want a new cell type, you like engineer some new cassette of genes or some new totally de novo set of promoters or something like this? I think one possible explanation for their existence rather than just an appreciation for their presence is that. it while having transcription factors allows a very small number of base pair edits at the substrate of the genome to lead to very large phenotypic differences. If I break a transcription factor, I can delete a whole cell type in the body. If I retarget a transcription factor to different genes, I can dramatically change when cells respond and have, you know, hundreds of their downstream effector genes change their behavior in response to the environment.

Starting point is 00:41:24 And so it puts you in this regime where transcription factors are a really nice substrate to manipulate as targets for medicines. In some ways, they might be like, evolution's levers upon the broader architecture of the genome. And so by pulling on those same levers and evolution has gifted us, there are probably many useful things we can engender upon biology. Yeah, you're sort of hinting that if we analogize it to some code base, we're going to find a couple lines that are like commented out that's like de-aging, you know, and then like unhyphen or unparr parentheses. I don't know about that, but if I can give you, I'll give you like a real cringe analogy that sometimes I deploy, but it requires a very special audience. I think you'll

Starting point is 00:42:00 probably be one who fits into it. You're flattering our listeners. Only cringe listeners will appreciate it, but your audience will love this. I don't know about your audience, but you will. You can kind of think about it like, you know, if you think about how attention works, like queries, keys values. TFs are kind of like the queries, the genome sequences they bind to, kind of like the keys. Genes are kind of like the values. And it turns out that structure then allows you to very efficiently in terms of editing space. You can change just one of those embedding vectors, in this case, one of those sequences, and get dramatically different performances or total outputs.

Starting point is 00:42:35 And so I do think it's kind of interesting how these structures recur throughout biology, you know, in the same way that the attention mechanism seems to exist in some neural structures. I think it's kind of interesting that you can very easily see how that same sort of querying and information storage might exist in the Gmail. Interesting. Yeah, a previous guest and a mutual friend Trenton Brickend has a, had a paper in grad school about how the brain implements attention. Yeah, Eddie Chang has found, like, positional encodings probably exist in humans using neuropics.

Starting point is 00:43:02 Really? If you haven't read these papers. Oh, yeah. So he implants these neuropixel probes into individuals. And then he's able to talk to them, look at them as they read sentences. And what he's able to talk to them as certain representations, which function as a positional encoding across sentences. So they fired a certain frequency and it just increases as the sentence goes and then like resets. And so it seems exactly like what we do when we train large language models where you've got some term function. It's so funny the way we're going to learn how the brain works is just like trying to first, for the first, principles engineer intelligence and AI. And then it just happens to be the case that each one of these things has a neural correlate. Gemini's CLI just one shoted an automated producer for me in one hour. Basically, I went at this interface where I could just paste in a raw episode transcript and then get suggestions for Twitter clips and titles and descriptions and some other copy, all of which cumulatively takes me about half a day to write.

Starting point is 00:43:50 Honestly, it was just extremely good. I described the app I wanted and then asked Gemini to talk through how I would go about implementing it. It walked through its plans. It asked me for input where I hadn't been sufficiently clear. And after we ironed out all the details, Gemini just literally one-shot at the full working application. With fully functional backend logic. Making this app literally took 10 minutes, including installing CLI. Then I spent 50 minutes, fine-tuning the UI, messing around.

Starting point is 00:44:16 And by the way, this process did not involve me actually editing or even looking at any of the code. I would just tell Gemini how I wanted things moved around. And the whole UI would change is Gemini rewrote the files. Despite building and then fine-tuning an entire working application, the session context didn't even get 10% exhausted. This is just a super easy and fast way to turn your ideas into useful applications. You can check out Gemini CLI on GitHub to get started. All right, back to Jacob. If you're right that transcription factors are the modality, evolution has used to have complex phenotypic effects, optimize for different things.

Starting point is 00:44:53 two part question. One, why having pathogens, which have a strong interest in having complex phenotypic effects on your body, also utilized the transcription factors as the way to fuck you over and steal your resources. And two, we've been trying to design drugs for centuries. Why aren't all the big drugs, the top-selling drugs, ones that just modulate transcription factors? Yeah, yeah, why don't we have a million of these pills? Okay, I'll try and take those in stride, and they're pretty different answers. First answer is there actually are pathogens that utilize transcription factors as part of their lifecycle. So like a famous example of this is HIV. HIV encodes a protein called TAT. And tat actually activates NF Kappa B. And so HIV, sorry, to back up a little bit as a retrovirus, starts out as RNA, turns itself into DNA, shoves itself into the genome of your CD4 CT cells. And so then it needs this ornate machinery to actually control when does it make more HIV and when does it go latent so it can hide and your immune system can't clear it out. And this is why HIV is so pernicious is you can kill every single cell in the body that's actively making HIV with like a really

Starting point is 00:46:02 good drug. But then a few of them that have like lingered and hunkered down just turn back on. And so people call this the latent reservoir. Same with HB, right? Well, Hep B, HEPC can both do this sort of like latent sort of behavior. And so HIV is probably the most pernicious of these. And one way it does it is this gene called TAT actually interacts with NFCAPA B. NFCAPAB is a master transcription factor within immune cells. Typically, if I'm going to, like, horribly reduce what it does and some immunologists can crucify me later, it, like, increases the inflammatory response of most cells. They become more likely to attack given pathogens around them on the margin. And so it'll turn on an F-CAPA-B activity and then uses that to drive its own transcription and its own lifecycle. And so it,

Starting point is 00:46:43 I can't remember quite all the details now exactly of how it works, but part of this circuitry is what allows it to, in some subset of cells, where some of that upstream transcription factory machinery in the host might be deactivated. it goes latent. And so as long as the population of cells it's infecting always has a few that are like turning off the transcription factors upstream that drive its own transcription, then HIV is able to persist in this latent reservoir within human cells. So it's just one example offhand. Then there are a number of other pathogens and unfortunately don't have quite as much molecular detail on some of these, but they will interface with other parts of the cell that eventually

Starting point is 00:47:15 result in transcription factor translocation to the nucleus and then transcription factors being active. This actually segues a little bit to your second question on why aren't there more medicines targeting TFs. In a way, I think many of our medicines ultimately downstream are leading to changes in TF activity, but we haven't been able to directly target them due to their physical location within cells. And so we go several layers upstream. If you think about how a cell works in sensing its environment, it has many receptors on the surface, it has the ability to sense mechanical tension and things like this. And ultimately, most of what these signaling pathways lead to is to tell the cell, use some different genes than you're using right now. That's often what's occurring. And so that

Starting point is 00:47:53 ultimately leads to transcription factors being some of the final effectors in these signaling cascades. So a lot of the drugs we have that, for instance, inhibit a particular cytokine that might bind a receptor or they block that receptor directly or maybe they hit a certain signaling pathway. Ultimately, the way that they're exerting their effect is then downstream of that signaling pathway. Some transcription factor is either being turned on or not turned on, and you're using different genes in the cell. And so we're kind of taking these like crazy bank shots because we can't hit the TFs directly. So it sort of begs the question, like, why can't you just go after the TF directly? Traditionally, we use what are called small molecule drugs, where they're defined just by

Starting point is 00:48:29 their size. The reason they have to be small is they need to be small enough to wiggle through the membrane of a cell and get inside. And then you run into a challenge, which is if you want to actually stick a small molecule between two proteins that have a pretty big interface, meaning like they've got big swaths on the side of them that all, you know, sort of line up and form a synaps with one another, then you would need a big molecule in order to inhibit that. And it turns out that TF's binding DNA is a pretty darn big surface. And so small molecules aren't great at disrupting that and certainly even worse at activating it. So small molecules can get all the way into the nucleus, but they can't do much once they're there. They're just too small. And then the other classic

Starting point is 00:49:04 modalities we have are recombinant proteins. We make a protein like a hormone in a big fat. We grow it in some Chinese hamster ovary cells. We extract it. We inject it into you. This is how, for instance, like human insulin works that we make today, or you make antibodies. Antibodies produced by the immune system, these run around and find proteins that have a particular sequence, they bind to it, and often they just stop it from working by glomming a big thing onto the side. So those are too big to get through the cell membrane, so then they can't actually get to a TF or do anything directly.

Starting point is 00:49:30 So we take these bank shots. So what changes that today, and why I think it's pretty exciting, is we now have new nucleic acid and genetic medicines, where you can, for instance, deliver RNAs to a cell that can get through using tricks like lipid nanoparticles, you wrap them in a fat, looks kind of like a cell membrane. It can fuse with a cell, put the MRNAs in the cytosol. You can make a copy of a transcription factor there. And then it translocates the nucleus the same way a natural one would and exerts its effect.

Starting point is 00:49:54 And likewise, there are other ways to do this using things like viral vectors. But I think we've only very recently actually gotten the tools we need to start addressing transcription factors as first-class targets rather than treating them as like maybe some ancillary third-order thing that's going to happen. Interesting. So the drugs we have can't target them. But your claim is that a lot of drugs actually do work by. binding to the things we actually can target and those having some effect on transcription factors. So this brings us to questions about delivery, which is the next thing I want to ask you.

Starting point is 00:50:24 You mentioned lipid nanoparticles. This is what the COVID vaccines were made of. The ultimate question, if we're going to work on deaging, is how do we make every single cell in the body? Even if you identify what is the right transcription factor to deage a cell, and even if they're shared across cell types, or you figure out the right one for every single cell type. How do you get it to every single cell in the body? Yeah. How do you deliver stuff? How do you get them in there? So I think there are many ways one could imagine solving it.

Starting point is 00:50:54 I'll sort of narrow the scope of the problem to saying, I think delivering nucleic acid is a pretty good first order primitive. Ultimately, the genome's nucleic acids, the RNAs that come out of it are nucleic acids. So if you can get nucleic acid into a cell, you can drug pretty much anything in the genome effectively. So you can reduce this problem to asking, how do I get nucleic acids wherever I want them to any cell type very specifically? So today, there are two main modalities that people use, both of which have some downsides. The first one that we've touched on already is lipid nanoparticles. These are basically fat bubbles.

Starting point is 00:51:23 And by default, they get taken up by tissues which take up fat, like the liver. And they can be used sort of like Trojan horses. So they can release some arbitrary nucleic acid, usually RNA, maybe encoding your favorite genes, in our case, transcription factors, into the cell types of interest. You can play with the fats, and you can also tie stuff onto the outside of the fat. Like you can attach a part of an antibody, for example, to make it go to different cell types in the body. and I think the field is making a lot of progress on being able to target various different cell types with lipid nanoparticles. So even if nothing else worked for the next several decades, I think companies like ours would have more than enough problems to solve and with the cells that we can actually target.

Starting point is 00:51:59 Another prominent way people go after this is using viral vectors. The basic idea being viruses had a lot of evolutionary history and very large population sizes. They've evolved to get into ourselves. Maybe we can learn something from them, even better Trojan horses. So one type of virus people use a lot. It's called an AAV. Those AVs carry DNA genomes, so you can get genes, whole genes into cells. They've got some packaging sizes.

Starting point is 00:52:20 You can think of it kind of like a very small delivery truck so you can't put everything you want into it. They can go to certain cell types as well. And then on top of just where do you actually get the nucleic acid to begin with? You can engineer the sequences a bit. And that basically allows you to add like a knot gate on it. You can make it turn off the nucleic acid in certain cell types, but you're never going to use the sequence engineering to get nucleic acid into cells where it didn't get delivered in the first place.

Starting point is 00:52:42 So you can sort of start broad with your delivery vector and then use sequence to narrow down to make it more specific, but not the other way around. So I think both of those methods are super promising. Again, if nothing else emerged for decades, we'd still have tons and tons of problems as a therapeutic development community to solve even using just those.

Starting point is 00:53:00 I do think I have one sort of very controversial opinion which, you know, people can roast me for later. You have just one? You're trying to solve aging? You think you have only one? I have many controversial opinions. One of them is that I think both of these probably in the limit will not be the way that we're delivering medicines in the year 2100. If you think about viral vectors, no matter what, there are always going to be some amount of immunogenic.

Starting point is 00:53:21 You're always going to have your immune system trying to fight them off. You can play tricks. You can try and cloak them, et cetera, et cetera. But they're always going to have some toxicity risk. They also don't go everywhere. It's not that we have examples of like a single viral species that infects every cell type in the body and we just need to engineer it to make it safe. We would have to also engineer the virus to go to new cell types. So there's some limitations there.

Starting point is 00:53:41 L&Ps likewise have some problems. They can go to tons of cell types. That's what largely we're working on. We're super excited about it. But there are some physical constraints. They just have a certain size, and they have to get from your bloodstream, out of your bloodstream,

Starting point is 00:53:52 toward a given target cell, and they have to not fuse into any of the other cells along the way. So there's a whole gamut they have to run. Ultimately, I think we're probably going to have to solve delivery the way that our own genome solved delivery. So we have this same problem that arose during evolution, which is how do I patrol the body, find arbitrary signals in the environment and then deliver some important cargo there when some

Starting point is 00:54:13 set of events happens. How do I, you know, find a specific place and only near those cell types release my cargo? And really, the problem was solved by the immune system. So we have cell types in our body, T cells and B cells, which are effectively engineered by evolution to run around, invaginate whatever tissues they need to, they can climb almost anywhere in the bodies, there's nowhere they can't get access almost. And then once they sense a particular set of signals, and they've got a very ornate circuitry to do this. They run basically an endgate logic. They can release a dispecified payload.

Starting point is 00:54:44 And right now, the way our genome sets them up, the payload they release is largely either enzymes that will kill some cell that they're targeting or kill some pathogen or some signal flares that call in other parts of the immune system to do the same thing. So that's super cool. But you can think about it as a modular system

Starting point is 00:55:00 that evolution's already gifted us. We've got some signal and environmental recognition systems so we can find particular areas of the body that we want to find, and then some sort of payload delivery system. I can deliver some arbitrary set of things. And I imagine if we were to rip-on-winkle ourselves into 2100 and wake up, the way we will be delivering these nucleic acid payloads is actually by engineering cells to do it to perform this very ornate function. Those cells might actually live with you. You probably will get engrafted with them, and they might persist with you for many years. They deliver the medicine

Starting point is 00:55:30 only when the environment within your body actually dictates that you need it. And so you'll actually won't be seeing a physician every time this medicine is actually. rather you'll have a more ornate, responsive circuit. The other exciting thing about cells is that they're big and they have big genomes. And so you actually have a large palette to encode complex infrastructure and complex circuitry. So you don't need to limit yourself to like the very small RNAs you can get in that might encode a gene or two, or in our case a few transcription factors. You don't have to limit yourself to this tiny AavV genome that's only a few kilobases.

Starting point is 00:55:59 You've got billions of base pairs to play with in terms of encoding all your logic. So I think that's ultimately how delivery will get solved. We've got many, many stepping stones along the way. But if I could clone myself and work on an even riskier endeavor, that's probably what I would do. This is actually, I mean, in a way, we treat cancer this way with CAR teeth therapy, right? We take the T cells out and then we tell them, go find a cancer with this receptor and kill it. But is the reason that works is that the cancer cells are trying to target are also free-floating in the blood. And is that what the targets?

Starting point is 00:56:29 Basically, could this deliver to literally every single cell in the body? Not literally every single cell. I'll like asterisk it there. So example, T cells don't go into your brain. You don't have, they can, but it's generally a pathology when they get in there. So it's not like literally every cell. But almost every cell in your body is surveilled by the immune system. So there are very, very few what we call immune privilege compartments in your body.

Starting point is 00:56:52 It's things like the joints of your knees and your shoulders, your eyeball, and your brain, basically. There might be a couple of these. I think the ear probably falls into that category. A funny way of thinking about this is all the gene therapy people using viruses. they want to deliver to the immune privilege compartments because their drugs are immunogenic and they're limited to a very, very small set of diseases. So in a way, it's like the shadow of all the diseases

Starting point is 00:57:12 you can address with viruses is what you can address with cells. And given the complementarity between them, it's like, okay, you can probably cover the entire body. And so they can't literally go everywhere, but I think your analogy to the CART work is very apt as well, where you can think about that two-component system,

Starting point is 00:57:27 I've got some detection mechanism for the environment I want to sense to perform some function. and then I have some sort of payload that I deliver. Carty's engineer the first of those and leave the second exactly the same as the immune system does. So they engineer go recognize this other antigen that you wouldn't usually target some protein

Starting point is 00:57:43 on the surface of a cell, for instance. And then deliver the payload you would usually deliver if it was infected by a virus or if you saw that it was foreign in some way, whereas cancer cells usually don't actually look that foreign. Most of their genes are the same genes that are in your normal genome, and that's why it's hard for the immune system of surveillance.

Starting point is 00:57:58 Interesting. You know, it's funny that whenever we're trying to cure infection diseases we just started to deal with. Fuck, viruses have been evolving for billions of years with our oldest common ancestor and they know exactly what they're doing and it's so hard. And then whenever we're trying to do something else, we're like, fuck, the immune system has been evolving for billions of years and they knows what it's doing and how do we get past it. Yeah. Yeah, the red queen race is like quite sophisticated. If you want to just like throw a new tool into biology, you somehow have to get around one side of that equation. Right. Given the fact that it's somewhere between impossible,

Starting point is 00:58:32 will and very far away. And it's necessary for full curing of aging. Does that mean that in the short run, in the next few decades, we'll have some parts of her body, which will have these amazing therapies, and then other parts which will just be stuck the way they are. So you mentioned hepatocytes are some of the cells that you're able to actually study in or deliver to, and these are our liver cells. So you're saying, look, I can.

Starting point is 00:59:02 get drunk as much as I want, and it's not going to have an impact on my long run liver health, because then you'll just check me with this therapy. But for the rest of my body, it's going to age as normal. What is the implication of the fact that the delivery seems to be lagging much behind your understanding, at some point your understanding of aging? Yeah. Just to give the delivery folks credit, they're currently ahead. There are currently no reprograming medicines for aging, and there are medicines that deliver nucleic acid. So, like, they're still winning the race against us right now. But to your point, I hope the lines cross. I hope. we out-compete them.

Starting point is 00:59:34 So I do think, actually, even if you were able to only target some subsets of cells, it's not that you would see, like, this strange Frankensteinian benefit in health in some aspects and lack of benefit entirely in others. I think what we found across the history of medicine is that actually the body's an incredibly interconnected complex system. And if you're able to rescue function, even in one cell type and one tissue, you often have knock-on benefits in many places that you didn't initially anticipate. One way we can get examples of this is through transplant experiments.

Starting point is 01:00:05 So both in bone marrow and in liver, for example, we have fairly common transplant procedures that occur in humans. And so we can compare old humans who get livers from young people or old people. And in a way, ask a pretty controlled question. What occurs as a function of just having a young liver? Is it that, for example, you can eat a lot of fatty food and drink a lot and be fine? Or is it that actually you see broader benefits? And the latter seems to be true. They have reduced risk of several other diseases and overall better survival as a function of having a younger liver than they do for an older one.

Starting point is 01:00:37 Suggesting that actually because these tissues are so interconnected, many of these organs like the liver, like your adipose tissue or endocrine organs, they're also sending out signals to many other places in your body, helping coordinate your health across multiple tissue systems. Even just one tissue can benefit other tissue systems in your body at the same time. HSCs are another example where there are many circumstances where one of the, and then I'll summarize, this is mostly examples taken from a wonderful book by Frederick Applebaum, who trained with Don Thomas, the physician who invented human bone marrow transplants. There are many circumstances where patients got a bone marrow transplant and actually cured another disease they had as a result, maybe unanticipated, where it's even just the replacement of this one special cell type, HSCs, has knock-on effects throughout the body. You know, there were symptoms of these diseases that presented in myriad ways throughout their system, but ultimately its root cause was even just a single cell. There are counter examples as well where you can go into animals

Starting point is 01:01:32 and break even just one gene in one specific subset of T cells. You can break a gene in there that encodes for a transcription factor in their mitochondria called TREM, and you actually dramatically shorten the lifespan of mice. One gene in one special type of C cells

Starting point is 01:01:46 can give you that type of pathology. And so it sort of implies the inverse may also exist. Is this related to why Zempec has so many downstream positive effects that seem even not totally related to its effects just on making you leaner?

Starting point is 01:02:01 Yeah, I think it's one example. Because it is a hormone, and your endocrine system coordinates a lot of the complex interplay between your tissues, I don't think the story is fully written yet on exactly why GLP1 and GIP1, broadly incriminate and memetic medicines like OZempic, have so many knock-on benefits, but I think they're a great example of this phenomenon. If someone told you, I'm going to find a single molecule,

Starting point is 01:02:25 and I'm going to drug it, and it's not only going to have benefits for weight loss, but also for cardiovascular disease, also possibly for addictive behavior, and maybe even preventing neurodegeneration, you would have told them they were crazy. And yet, just by acting on the small number of cells in your body, which are receiving this signal, the interplay and the communication between those cells and the rest of your body seems to have many of these knock-on benefits. So it's just one existence proof. Very small numbers of cells in your body can have health benefits everywhere.

Starting point is 01:02:51 And so even if cellular delivery does not emerge by 2100, as I imagine it with, then I still think that you're going to have the ability to add decades of healthy life to individuals by reprogramming the age of individual cell types and individual tissues. Interesting. How big will the payload have to be? How many transcription factors? Yeah. I think just a countable number.

Starting point is 01:03:09 I think some of those that we found today that have efficacy or, you know, somewhere between one and five, and that that's a small enough number that you can encapsulate it in current MRNA medicines. So already in the clinic today, there are medicines that deliver many different genes as RNA. So there are medicines where, for instance, it's a vaccine as a combination of flu and COVID proteins, and they're delivering 20 different unique transcripts all at the same time. And so when you think about that already is a medicine that's being injected into people in trials, the idea of delivering just a few transcription factors is seemingly quotidian.

Starting point is 01:03:41 And so, thankfully, I don't think we'll be limited by the size of the payloads that one can deliver. One other really cool thing about transcription factors is that the endogenous biology is very favorable for drug development. The expression level of transcription factors in your genome relative to other genes, is incredibly low. So if you just look at the rank-ordered list of what are the most frequently expressed genes

Starting point is 01:04:04 in the genome by the count of how many MRNAs are in the cell, transcription factors are near the bottom. And that means you don't actually need to get that many copies of a transcription factor into a cell in order to have benefits. And so what we've seen so far, and what I imagine will continue to play out, is that even fairly low doses of these medicines,

Starting point is 01:04:21 which are well within the realm of what folks have been taking for now more than a decade, are able to induce really strong. efficacy. And so we're hopeful that not only will the actual size of the payload in terms of number of base pairs not be limiting, but the dose shouldn't be limiting either. And is it, would it have to be a chronic treatment or could it just be a one-time dose? In principle, it could be one time. I think that would be an overstatement for today. But I can sort of talk you through the evidence from like the first principles back to

Starting point is 01:04:48 the reality of like what's the hardest thing we have in hand. So epigenetic reprogramming is basically how the cell types in our bodies right now are able to adopt the identities that they have. And the existence proof that those epigenetic reprogramming events can last decades is that my tongue doesn't spontaneously turn into a kidney. So these epigenetic marks can persist for decades throughout a human life or, you know, hundreds of years if you want to take the example of a bowhead whale, which uses the same mechanism. And we also know that with very targeted edits, other groups have done this, folks like Luke Gilbert now at the Ark Institute, who I think of as like one of the great unsung scientists of our time, have been able to make a targeted edit in a single locust.

Starting point is 01:05:27 and then show that you can actually make cells divide 400 plus times over multiple years in an incubator in the lab. So imagine like a hot house where you're just trying as hard as you can to break this mark down, and it can actually persist for many years. Other companies have actually now dosed some editors similar to the ones that Luke developed in his lab in monkeys and shown they last at least a couple years. So in principle, the upper bound here is really long. You could potentially have one dose and it lasts a very long time, you know, potentially decades, as long as it took you to age the first time, maybe.

Starting point is 01:05:56 We don't have data like that today, so we don't want to overstayed. We do have data that these positive effects can last several weeks after a dose. And so you could imagine, even without many leaps of faith, up toward this upper bound limit of what's possible, just from the data we have in hand now, that you could get doses every month, every few months, and actually have really dramatic benefits that persist over time, rather than needing, for instance, to get an IV every day, which might not be tractable. So we've got 1,600 transcription factors in the human genome. Is it worth looking at non-human TFs and seeing what effects they might have, or are they unlikely to beat the right search base? I think it's less likely. I think you have a prior that evolution has given you a reasonable basis set for navigating the states that human cells might want to occupy. And in our case, we know that the state we're trying to access is encoded by some combination of these TFs.

Starting point is 01:06:48 It does arise in development, obviously. We're trying to make an old cell look young, not look like some Frankenstein cell that's never been seen before. That said, we don't have any guarantees that the way aging progresses is by following the same basis set of these transcription factor programs in the genome that are encoded during development. So I don't think it's unreasonable to ask, would your eventual ideal reprogramming medicine necessarily be a composition of the natural TFs? Or would it include something like TFs from other organisms as you posit or even entirely synthetic transcription factors as well? Things like Super Sox. Super Sox is a particular publication from Sergei, I might mispronounce his last name, Villicenko. where they mutated the SOX2 gene,

Starting point is 01:07:28 and they made more efficient IPSC reprogramming. So they could take somatic cells and turn them into pluripotent stem cells more effectively than you could with just the canonical Yamanaka factors, which are Oct4, SOX2, KLF4, and MEC. IPSC reprogramming never happens in nature. So there's no reason to necessarily believe

Starting point is 01:07:45 that the natural TFs are optimal. And so even really simple optimizations, like just mutagenizing one of the four Yamanaka factors we already know about or swapping some domains between a few TFs, seem to improve things dramatically. So I think that's a pretty good signal that actually there's a lot of gradient to climb here

Starting point is 01:08:01 and that potentially for us, the end-state products we're developing in 2100 are more like synthetic genes that have never existed rather than just compositions of the natural set. What about the effects of aging, which are... Okay, so I don't know, your skin starts to sag because of the effects of gravity over the course of decades. Is that a cellular process?

Starting point is 01:08:20 How would some cellular therapy deal with that? The best evidence is that it's probably not cellular. So the reason your skin sags is there's a protein in your skin called elastin, which does exactly what you'd think it would based on the name. It kind of keeps your skin elastic-y like a waistband and holds it to your face. So you have these big polymerized fibers of elastin in your face. And as far as we understand it, you only polymerize it and form a long fiber during development. And in the rest of your life, you make the individual units of the polymer. But for reasons, no one, as far as I can tell understands, they fail to polymerize.

Starting point is 01:08:49 And you can't like make new long cords to hold your skin up to your face. So I think the eventual solution for something like that is likely that you need to program cells to states that are extra physiological. There might not be a cell in your body. It's not just like a young skin cell from a 20-year-old is better at making these fibers. As far as we can tell, they don't. But you could probably program a cell to be able to reinvigorate that polymerization process to run along the fiber and repair it in places where it's damaged. Obviously, these things get made during development, so it's totally physically feasible for this to occur. Maybe there's even a developmental state which would be sufficient to achieve this.

Starting point is 01:09:24 I don't think anyone knows, but that would be the kind of state that one might have to engineer de novo, even if our genome doesn't necessarily encode for it explicitly. Interesting. Okay. What is Ehrum's Law? Irom's Law is a funny portmanteau created by a friend of mine, Jack Scannell, where he inverted the notion of Moore's Law, which is the doubling of compute density on silicon chips every few years. So Moore's Law has graciously given us massive increases in compute performance over. over several decades.

Starting point is 01:09:52 And Eroom's Law is the inverse of that, because in biopharma, what we're actually seeing is that there's a very consistent decrease in the number of new molecular entities, so new medicines that we're able to invent per billion dollars invested. And this trend actually starts way back in the 1950s and persists through many different technological transitions

Starting point is 01:10:08 along the way. So it seems to be an incredibly consistent feature of trying to make new medicines. So in a weird way, Airwom's Law is actually very similar to the scaling laws you have in ML, where you have this, very consistent logarithmic relationship of you throw in more inputs and you get consistently

Starting point is 01:10:25 diminishing outputs. The difference, of course, is that this trend in ML has been used to raise exponentially more investment and to drive more hype towards AI. Whereas in biotech, you know, modular new limits, new round, it has driven down valuations, driven down excitement and energy. With AI, at least you can sort of internalize the extra cost and the extra benefits because there's a general purpose model you're training. So this year you spent $100 million training a model next year a billion dollars the year after that $10 billion. But it's one general purpose model,

Starting point is 01:10:57 unlike we made money on this drug and now we're going to use that money to invest in 10 different drugs in 10 different bespoke ways. Okay, anyways, I was gearing up to ask you, what would a general purpose platform where even if you had diminishing returns, at least you can have this sort of like less bespoke way of designing drugs look like for biotech?

Starting point is 01:11:15 Okay. I'm going to slightly dodge your question first to maybe analyze something really interesting that you highlighted, which is you have these two phenomena, again, ML scaling and then scaling in terms of the cost for new drug discovery, why is it that the patterns of investment have been so different? I think there are probably two key features that might explain this difference. One is that the returns to the scaled output in the case of ML actually are expected to increase super exponentially.

Starting point is 01:11:37 You actually reach AGI. It's going to be a much larger value than just even a few logs back on the performance curve that people are following. Whereas in the life sciences thus far, each of those products were generating further in, further out on the E-room slot curve as time moves forward haven't necessarily scaled in their potential revenue and their potential returns quite so much. And so you're seeing these increased costs not counterbalanced by increased ROI. The other piece of it that you highlighted is that unlike building a general model where potentially by making larger investments, you can be able to solve a broader addressable market, moving from solving very narrow tasks to eventually replacing

Starting point is 01:12:11 large fractions of white collar intelligence. In biotech, when you're traditionally able to develop a medicine in a given indication, I was able to treat disease X. It doesn't necessarily engender you to be able to then treat disease Y more readily. Typically, where these firms, biotech firms in general, have been able to develop unique expertise, is on making molecules to target particular genes. So I'm really good at making a molecule that intervenes on gene X or gene Y. And it turns out that the ability to make those molecules more rapidly isn't actually reducing the largest risk in the process. And so this means that the ability to go from one or two outputs one year to then going to four, the next, is

Starting point is 01:12:46 much more limited. And so this brings us then to the question. And so this brings us then to the question of what would the general model be in biology? And I think it kind of reduces down to how do you actually imbue those two properties that create the ML scaling law curve of hope and bring those over to biology so that you can take the Eroom's law curve and potentially give it the same sort of potential beneficial spin. So I think there are a few different versions of this you could imagine, but I'll address the first point. How do you get to a place where you're actually able to generate more revenue per medicine so that potentially the outputs you're generating are more valuable, even if each output might cost a bit more.

Starting point is 01:13:20 Traditionally, when we've developed medicines, we go after fairly narrow indications, meaning diseases that fairly small numbers of people get. And that's actually increased in terms of the narrow scope of what medicines are addressing as we've gone forward in time. And so there's a sort of an ironic situation where we've gone from addressing pretty broad categories of disease like infectious disease to narrower and narrow or genetically defined diseases that have small patient populations. Because these only affect a few people, if you think about the vet,

Starting point is 01:13:47 function of a medicine is, you know, how many years of healthy life does it give how many people? Right. If how many people is pretty small, it just really bounds the amount of value you're able to generate. So you need to then be able to find medicines that treat most people. All of us will one day get sick and die. So arguably, the tam for any really successful medicine could be everybody on planet Earth. Right. So we need to find a way to be able to route toward medicines that address these

Starting point is 01:14:09 very large populations. The second piece, then, is how do we actually build models that enable us to take the success in one medicine we've developed? and lead that to an increased probability of success on the next medicine. Traditionally, we haven't been able to do that. Maybe you're better at making an antibody for a gene Y because you made one for gene X five years ago, but it turns out making an antibody isn't really the hard part of drug discovery. Figuring out what to make an antibody to target is the hard thing about drug discovery.

Starting point is 01:14:34 What gene do I intervene upon in order to actually treat a disease in a given patient? Most of the time, we just don't know. And so that's why, even if a given drug firm becomes very good at making antibodies to gene X, they have a successful approval. When they then go to treat disease Y, they don't necessarily know what gene to go after. And most of the risk is not in, how do I make an antibody to treat my particular target? It's in figuring out what to target in the first place. I'm not sure how to understand this claim that we know, you know, we know how to engage with the right hook.

Starting point is 01:15:07 We just don't know what that hook is supposed to do in the body. I don't know if that's the way you to describe it. With another claim that I've seen that, you know, with small mom, We have this Goldilocks problem where they had to be small enough to percolate through the body and through cell walls, et cetera, but big enough to interfere with, like, protein-14-18 interactions that transcription factors might have or something. So there, it seems like getting the hook is the big problem. Yeah, in this particular case, if we bound ourselves to we must use small molecules as our modality, then there are lots of targets which are very difficult to drug. There are many other modalities by which you can drug some of these genes. And I would say I don't have a formal way. of explaining us, but if you were to write out a list of well-known targets that many, many folks would agree are the correct genes to go after and to try and inhibit or activate in order to

Starting point is 01:15:55 treat a given set of diseases. And the only reason we don't have medicines is that we can't figure out a trick in order to be able to drug them, it's a fairly small list. It would probably fit on a single page, whereas the number of possible indications that one could go after and the number of possible genes that one could intervene upon, especially when you consider their combinations, is astronomical. I think, you know, the experiment you could run here is if You lock 10 really smart drug developers in a room, and you tell them to write down some incredibly high conviction target disease pairs where they're sure if they modulate this biology, these patients are going to benefit. And all they need is some molecular hook, as you put it, in order to do this. It's a relatively short list.

Starting point is 01:16:31 What you're not going to get is anything approximating the panoply of human pathologies that develop. And you can actually look for this. There are some existence proofs you can look for out in the universe, which is to say, if the only problem was that we didn't have the ability to drug something, using current therapeutics that we can put in humans, we should still be able to treat it in the best animal models of that disease because we can use things like transgenic systems. You can go in and you can engineer the genome of that animal. And so this gives you all sorts of superpowers that you don't have in patients,

Starting point is 01:16:59 but allow you to, for instance, turn on arbitrarily complex groups of genes and arbitrarily specific or broad groups of cells in the organism at any time you want, at any dose you want in the animal. And for the majority of pathologies, we just don't have many of those examples. Okay, so then what is the answer to what is the general purpose? The general purpose model. Every marginal discovery increases the odds you make the next discovery or something like that. So there are multiple ways one might approach this problem.

Starting point is 01:17:26 The most common today, this is often what people are describing when they talk about a virtual cell. This is sort of a very nebulous idea, sometimes luminous, if you'll let me describe it in that way as well. But I think most concretely, what most people are trying to do is measure some number of molecules or some sort of of perceived emissions like the morphology of a cell, and then perturb it many times, turn some genes on, turn some genes off, and measure how that molecular morphological state changes. The notion is that there's a lot of mutual information in biology. So if I measure something like most commonly, all the genes the cell is using at a given moment, which you can get by RNA sequencing, that I get a decent enough picture of most of the other

Starting point is 01:18:04 complexity going on, and so that I can, for instance, take a bunch of healthy cells and a bunch of cells that are in a diseased or age state. And I'm able then to compare those profiles and say, okay, my disease cells use these genes, my healthy cells use these. Are there anti-interventions I can find that I'm able to experimentally in the lab that shift one toward the other? And then the hope would be because you're never going to be able to scan combinatorily all the possible groups of genes just to make that concrete. They're just going to be round with it. But there's something like 20,000 genes in the genome. You can then choose however many genes in your combination you want. It's not crazy to think of hundreds at a time. That's what transcription factors control. That's how

Starting point is 01:18:40 development works. So the number of possible combinations is truly astronomical. You just can't test it all. So the hope would be that by doing some sparse sampling of those pairs, your inputs are, here's what the cell looked like beforehand, here's the particular genes I perturbed. You have some measurement then of the state that the cell resulted in. So here's which genes went up. Here's which went down. And then you can start to ask, once I've trained a model to predict from the perturbations to the output on the cell state, what would happen for some arbitrary combinations of genes. And now in silico, I can search all possible things that one might do and potentially discover targets that take my disease cells back to something like healthy cells. So that's another

Starting point is 01:19:15 version of what would a all-encompassing model look like, where you actually have compounding returns in drug discovery. Right. And you basically described one of the models you guys are working on at New Limit. You're training this model based on this data where you were taking the entire transcriptome and just labeling it based on how old that cell actually is. If you've got all this data you're collecting on how different perturbations are having different phenotypic effects on a cell, why only record the, like, whether that effect correlates with more or less aging, why can't you also label it with all the other effects that we might eventually care about and eventually get the full virtual cell?

Starting point is 01:20:00 Because that's a more general purpose model, right? not just the one that predicts whether a cell looks old or not. Yeah, absolutely. So I think what we actually do both today. So we can train these models where basically the inputs are a notion of what that cell looked like at the starting place. Here's what a generic old cell looked like. And then representations of the transcription factors themselves. We derive those from protein foundation models.

Starting point is 01:20:24 Their language models basically train on protein sequences. Turns out that gives you a really good base level understanding of biology. So the model is kind of starting from a pretty smart place. And then you can predict a number of different target. it's from some learned embedding, the same way you could have multiple heads on a language model. And so one of those for us is actually just predicting every gene the cell is expressing. Can I just recapitulate the entire state and guess what effect these transcription factors will have on every given gene? And you can think about that as like an objective rather than a value judgment on the cell.

Starting point is 01:20:52 I'm not asking whether or not I want this particular transcriptome. I'm just asking what it will look like. And then we also have something more like value judgments. I believe that that transcriptome looks like a younger cell. And I'm going to select on that and train ahead to predict it where I can denoise across genes and then select for younger cells. But you could do that for arbitrary numbers of additional heads. What are some other states you might want? Do I want to polarize T cells to a less inflammatory state in somebody with an autoimmune disease?

Starting point is 01:21:18 Do I want to make liver cells more functional in a patient who's suffering from certain types of metabolic syndrome, be that maybe even orthogonal to the way that they age? Do I want to go in and change the way a neuron is functioning to a different state to treat a particular type of neurodegenerative disease? These are all questions you can ask. They're not the ones we're going after, but that is the more general, broader vision. This is so similar to, in LLMs, you have first imitation learning with pre-training that builds a general purpose representation of the world. And then you do RL about a particular objective in math or coding or whatever that you care about. And you are describing an extremely similar procedure where first you just learn to predict

Starting point is 01:21:59 perturbations and genes to broad effects on the cell and that's like that's the sort of pre-training just like learn how cells work and then there's another afterward layer of these like value judgments of okay well how would we

Starting point is 01:22:15 how would we have to perturb it to have effect X which actually seems very similar to how do we get the base model to answer this math problem or answer this coding problem I don't know I don't know if people usually put it this way but it actually just seems like an extremely extremely, I mean, that makes me more optimistic on this because like LLMs work, right? And RL works.

Starting point is 01:22:34 Yeah, they do. I think the conceptual analogy is very apt. You know, we don't actually use RL at the moment, so I don't want to overstate the level of sophistication we've got. But I think the general problem reduces down in a similar way. And so you can think about, you know, your earlier question of what does the general model look like that enables you to actually have compounding returns in drug discovery? Well, you might have something like this base model, which, as you said, just predicts this object function of how are these perturbations hitting these targets going to change which genes are turned on and off in this cell. Then there's an entirely other task, which is, well, which genes do you want to turn on and off? And what state do I want

Starting point is 01:23:07 the cell to adopt? Our lens on that is that across many different diseases people have, age is one of the strongest predictors of how they're going to progress, whether that disease arises. And so in many, many circumstances, you have evidence in humans where you can say, oh, if I could make the cell younger, maybe that's not a perfect fix, but that's going to dramatically benefit not only patients who have a diagnosed disease, but it might actually help most of us stay healthier longer, even subclinically, before anyone would formally say that we're sick. Now, that's another more general function, the same way that in LLMs, you might have to create these particular RLVF environments.

Starting point is 01:23:41 You need to have places where you can state a value function of the particular task that you're trying to optimize for. In drug discovery, you would then need to know, well, what are the cell states I want to engineer for? That's kind of the next generation of what a target might be, beyond just which genes do I want to move up and down and which gene perturbations do I put in, you then need to know, what cell state am I engineering for? What do I want this T-cell to do? You'll have a bunch of labelers in Nigeria, like clicking different pictures of cells, like, oh, this one looks young,

Starting point is 01:24:05 this one looks old. This one looks really great. I love that one. Potentially. It's more like developmental biologists locked in a room, as my friend Cole Trapnel would say. It seems like what you're describing seems quite similar to perturbseek. And we've had perturbs seek for, I don't know, when it was done. What year was it? There were three papers almost simultaneously in 2016. Okay, so almost a decade. I don't know. We're still waiting, I guess, for the big breakthrough is supposed to cause. And this is the same procedure. So why is this going to have an effect? Why is this taken so long? Yeah, yeah, good questions. So the original procedure is created by a bunch of brilliant folks. There's a group in Edel Amitz Lab at the Weizmann's Lab at the Brod, where Tray Dixit, a friend of mine helped work on this. And then Jonathan Wiseman's lab at U.S.F. where Britt-Adamsson did a lot of the early work. They all constructed this, idea where you can go in and you label a perturbation that you're delivering to a cell. So this is typically a transgenic perturbation, meaning you're integrating some new gene into the genome of a cell, and that turns another gene on or off. They used CRISPR, but there's

Starting point is 01:25:06 lots of ways to do it, and the concept's pretty general. And then you attach on that new trans gene, that new gene you put into the genome of the cell, some barcode that you can read out by DNA sequencing. So now, when you rip the cells open, you're able to not only measure every gene they're using, but you also sequence these barcodes and you know which genes you turned on and which are off. So you can then start to ask questions like, well, I've turned on genes A, B, and C, what did it do to the rest of the cell? So that's the general premise of the technology. And so it's useful to just set that up because it explains why this didn't all happen earlier. Yeah. One, the actual readout, ripping the cells open and sequencing them, used to be pretty

Starting point is 01:25:37 bad, and it used to be really expensive. And it's gotten much better over time. So the metric people often think about here is like cost per cell to sequence. It used to be measured in dollars, and now it's measured in cents and down to the fractions of cents because because that cost curve has improved dramatically. The cost of sequencing has likewise come down. So even beyond the actual reagents necessary to rip the cell open and turn its mRNAs into DNAs that are ready for the sequencer, now the sequencer is cheaper.

Starting point is 01:26:01 The other piece is actually getting these genes in and then figuring out which ones are there started out pretty bad. So when we started with this technology, it was a beautiful proof of concept, but I don't think anyone would tell you it was 100% ready for prime time. When you sequence to cell, only about 50% of the time could you even tell which perturbation you put in. Sometimes you just wouldn't detect the barcode, and you'd have to throw the cell away, or you detect the wrong barcode, and now you've mislabeled your data point.

Starting point is 01:26:26 So this might sound like a trivial sort of technical piece, but imagine you're running this experiment the old-fashioned way, where you test different groups of genes and different test tubes on a bench. Now imagine you hired someone who every other tube labels it wrong. So when you then collect data from your experiment, you basically have no idea what happened because you've just randomized all your data labels. You wouldn't do much science, and you wouldn't get very far that way. So a lot of those technologies have improved to the point where you had a number of processes which are pretty inefficient and you multiplied a lot of these things together and ended up with like a very small outcome of successful cells you could actually sequence. They've all improved to the degree where now you can actually operate at scale. And then groups like ours have had to do a bunch of work in order to actually enable combinatorial perturbations, turning on more than just one gene at a time, which it turns out is much, much harder for the same reason we're just alluding to. Imagine you're having trouble figuring out which one gene you put in this cell and turned on or off.

Starting point is 01:27:16 Now imagine you have to do that five times correctly in a row. Well, if you start out with the original sort of performance of like you could detect roughly 50% of them, then the fraction of cells that would be correctly labeled is like one over two to the end where N is the number of genes you're trying to detect. And very quickly, it's like more of your data is mislabeled that's labeled. So there's lots of technical reasons like this that have gotten worked out over time. And so only now are we really able to scale up where we're able to run experiments that are in the millions of cells in just a single day at, for instance, a small. company like New Limit, there was a point even just six or seven years ago where the companies that made these reagents were publishing the very first million cell data set just as a proof of concept and only they could do it as the constructors of the technology. And now two scientists in

Starting point is 01:27:59 our labs can degenerate that in an afternoon. If it actually is the case that the, this is actually very similar to the way LLM dynamics work, then once this technology is mature and you get the gpt three equivalent of the virtual cell. What you would expect to happen is there's many different companies that have, you know, are doing these cheap, perturpsi like experiments and building their own virtual cells, or at least a couple.

Starting point is 01:28:30 And then they're like leasing this out to other people who then have their own ideas about, well, we want to see if we can come up with the labels for this particular thing we care about and test for that. What seems like happening right now, is at least a new limit, you're like, we know the end use case we're going after. It would be like if cursor or whatever is like we're going to,

Starting point is 01:28:53 in like 2018, it's like we're going to build our own LLM from scratch so that we can enable our application rather than some foundation model company being like, we don't care what you use it for, we're going to build this. Does that make sense? Like it seems like you're combining two different layers of the stack. And it's just because nobody else is doing the other layer. And so you're just doing both of them. I don't know to extend this analogy maps on, but...

Starting point is 01:29:16 Yeah, yeah, maybe to play with the analogy a bit. Imagine that, you know, you think about New Limit as an LLM company. If I'm going to put us in the shoes of cursor, which, oh, so I wish. Imagine we're trying to, in 2018, create cursor tap, but we're not trying to create a full LLM. Right. I'm not, I don't know enough about the underlying mechanics to know if that would have been feasible, but it's a much more feasible problem than trying to create, like, their most recent cursor agent or compete with, like, modern cloud code, right?

Starting point is 01:29:41 I think that's roughly the equivalent where the problem where we're breaking off is a subset of the more general virtual cell problem. We're trying to predict what do groups of transcription factors do to the age of very specific types of cells. We only work on a few cell types at New Limit because those are the only cell types where some of the only cell types today, we believe we can get really effective delivery of medicines. And so we think they're just more important because we can act on them today if we solve the problem of what TFs to use, we can make a medicine pretty quickly. So in a way, we're carving out a region of this massive parameter space and saying, if we can learn the distribution of effects, even just in this small region,

Starting point is 01:30:17 it's going to be really effective for us, and we can make really amazing products unlike the world has ever seen. And over time, we can expand to this more general corpus of predicting every possible gene perturbation and every possible cell type. And so I think that's maybe the way the analogy maps on. But it is true that we are vertically integrating here. We're generating our own data in a way that's proprietary. We think we have a much, much larger data set for this particular regime than the rest of the world combined. and that enables us to build what we think are the best models.

Starting point is 01:30:44 And in many cases, what we found is that unlike with LLMs, where a lot of the data that was necessary to build these was sort of a common good, it was produced as a function of the internet, shared across everyone, it's pretty common, across all the domains everyone wants to use it for. This biological data is still in its infancy. It's like, imagine we're in like the early 1980s,

Starting point is 01:31:04 and we are just now thinking about trying to create some of the first web pages. That's kind of the era we're in. And so we're going after generating some of our own data in this very niche circumstance, like building the very high-quality corpus, the Wikipedia, that you might train your, you know, overly analogized now LLM on, and then building the first products based on that, and then expanding from there. And so we think that's necessary because of where we are today. There isn't this internet-like equivalent of data that everyone can go out and reap rewards from. Interesting.

Starting point is 01:31:33 And then this is more a question about the broader pharma industry rather than just new limit, which is that in the future, how are people going to be able to be? going to make money if you have, you know, with the gLPs, we've got peptides from China that are just a gray market that people can easily consume. And presumably with these future AI models, even if you have a patent on a molecule, maybe finding an isomorphic molecule or an isomorphic treatment is relatively easy. If you do come up with these crazy treatments and a farmer in general is able to come up with these crazy treatments, will they be able to make money? The gray market piece will maybe put aside and say, you know, that's a lot of, you know,

Starting point is 01:32:10 sort of a IP enforcement at a geostrategic level that I'm maybe not qualified to speak to, but I do think it comes down to IP enforcement effectively. I think for that gray market piece, another reason that sort of the traditional pharmaceutical industry, I think, will still continue to reap the majority of rewards here, is that most of the payment in the United States, which provides most of the revenue for drug discovery in the world, goes through a payment system that is not just direct consumer. It goes through payers. And so if you have the opportunity to,

Starting point is 01:32:40 either like order a sketchy vial off of some website from some company in Shenzhen, or you can go through your doctor and get a prescription with a relatively low co-pay for Tresepatide, the real thing. I think most patients will go for Tresepatide. I think you and I probably live in a milieu of people who are much more comfortable with ordering the vials from Shenzhen than most people might be. But I don't consider that to be a tremendous concern writ large. I do think the broader point of, if you have medicines with very long-term durability, how do you re-improve burst them, or if just the benefits are very long-term and, you know, sort of accrue in the out years.

Starting point is 01:33:16 A challenge we have in the U.S. system is that the average person churns insurers every three to four years. That number fluctuates around, but that's the right order of magnitude. And that means that if, for instance, you had a medicine which dramatically reduced the cost of all other health care incidents, but it happened exactly five years after you got dosed with it, no insurer is technically economically incentivized to cover that. And so I think there are a couple models here that can make sense. One is something called pay for performance where rather than reimbursing all of the cost of the drug up front, you actually reimburse it over time.

Starting point is 01:33:48 So say you get a medicine that just makes you generically healthier and you can measure the reduced rates of heart attack and reduced rates of obesity and various other things. And you get this one dose and it lasts for 10 years. Each year you would pay something like a tenth of the cost of the medicine contingent on the idea that it was actually still working for you and you had some way of measuring that. So that's a big challenge in this industry is like how would you demonstrate? that any one of these medicines is still working for the patient. In the few examples we have today, these are things like gene therapies, where you can just measure the expression of the gene and like, okay, the drug is still there. But it gets more complicated when you have some of these sort of longer term net benefits.

Starting point is 01:34:24 And the idea would be that then each insurer is incentivized to just pay for the time of coverage that you're on their plan. And we already have a framework for this in post-affordable Care Act in the U.S., where, you know, pre-existing conditions no longer really exist. So patients are able to freely move between pairs, and you could sort of treat. the presence of one of these therapeutics, lowering this patient's overall health care costs the same way we treat a pre-existing condition. I think this is something that the system is still overall figuring out. So what I'm saying here is one hypothesis about what the future might look like,

Starting point is 01:34:54 but I think there are alternative, clever approaches people might think about for reimbursement. I also think over time we're going to move more toward a direct-to-consumer model for many of these medicines which preserve and promote health rather than just fixing disease. You're seeing what I think are really some of the most innovative examples of this right now. from Lilly around the Incredit Memetics, where they actually launched Lilly Direct. So for the first time, rather than going to a pharmacy, which interacts with a PBM, which interacts with your primary care physician, now you can get a prescription from your doctor. Go straight to Lilly, the source of the good stuff, and you're able to order high-quality drug from them

Starting point is 01:35:26 when not involve some intermediary compounder in the middle that might not even make your molecules properly. And I think as these medicines develop that have actual consumer demand, because you feel it in your daily life, you're actually seeing a benefit from it. It's not just something that your physician is trying to get you to take, that that model will start to dominate. And that means that this sort of like payment over time for some of these long-term benefits might be able to be abstracted away from our current payer system where it turns every few years. And now a sort of like payment overtime plan, the same way we finance other large purchases in life seems very feasible. The reason I'm interested in this is that health care is already 20% of GDP. I think it's grown like notable percentages in the last.

Starting point is 01:36:10 last few years. Like, this is a fraction that is quickly growing. And most of this, I should have the numbers, I should have looked the numbers up. But the overwhelming majority of this is going to administering treatments that have already been invented, which is good, but nowhere near as good as spending this enormous sum of resources towards coming up with new treatments that in the future will improve the lives of people that will have these ailments. I mean, one question is just how do we make it so that more, like we're going to spend 20% of GDP on healthcare, it should at least go towards coming up with new treatments

Starting point is 01:36:49 rather than just like paying nurses and doctors to keep administering stuff that kind of works now. And two, if the cost of drugs ends up being, at least from the perspective of the payer, ends up being, you need a doctor to give you some scan, and before I can write you a prescription, and then they need to administer it, and they need to make sure that you're doing okay, et cetera, et cetera,

Starting point is 01:37:13 then even if for you to manufacture this therapy might cost, you know, tens of dollars per patient, for the health care system overall, it might be tens of thousands of dollars per patient. Actually, I'm curious if you agree with those orders of magnitude. I think that's correct. So I think the stat is something like drugs are roughly 7% of health care spend. I could be a little bit wrong on that, but the Oom is right.

Starting point is 01:37:36 Right. So basically, even if we invent de-aging technology, or especially if we invent de-aging technology, how should we think about the way it will net out in the fraction of GDP that we have to spend on health care? Will that increase because now people just had to go, everybody's lining up at the doctor's office to get a prescription and you got to go into the clinic every week? Or will that decrease because the other downstream ailments from aging aren't coming about? I think the latter is much more likely to be the case.

Starting point is 01:38:01 So just there's like some quick heuristics. Part of the reason, I think, there are many reasons that health care costs so much in the U.S. One of them is something like Baumels Cost Disease, which is, you know, very unrelated to pharmaceutical discoveries, but, you know, is something that we will have to solve in the system. Part of it's like the disintermediation of the actual customer and the actual provider. And these are things that biotech probably isn't going to be able to solve as an industry alone. That's probably a larger economic problem. But when you think about how will this affect sort of the total amount of health care that will need to be delivered, if you have more of these, what I like, like to think of as sort of like medicines for everyone, medicines that keep you healthier

Starting point is 01:38:36 longer, rather than medicines that only fix a problem once you're already very sick. I think you actually avoid a lot of the types of administration costs, not just administration like admins at hospitals, but the cost of administering existing medicines and therapies to you going down. Once doubt on why I think that's true, something like a third of all Medicare costs are spent in the final year of life, which is shocking when you realize that the average person on Medicare is, I don't know the exact number, but probably a decade plus covered by it. And so there's an incredible concentration of the actual expenses once someone is already terribly sick.

Starting point is 01:39:09 So helping prevent you from ever having to access the intensive health care system, meaning something like an inpatient hospital visit, if you can prevent even just a couple of those visits over a long period of someone's life with a medicine like an incrementic, like a reprogrammedic, like a reprogramming medicine that keeps your liver, your immune system younger. I think on net that actually starts to drive health care spend down because you're sort of shifting some of that burden from the administration system to the pharmaceutical system.

Starting point is 01:39:34 And the pharmaceutical system is the only piece of healthcare where technology has made us more efficient. As drugs go generic, actually the cost of administering a given unit of health care is going down. And the grand social contract is that they eventually go generic. That's the way our current IP system works. So I think, you know, if you were to get the question of like, when would you like to be born as a patient?

Starting point is 01:39:54 You always want to be born as close to today as possible. Because for a given unit in terms of pharmaceuticals, for a given dollar unit of expense, you can access more pharmaceuticals. technology today than has ever been possible in history, even as healthcare costs everywhere else in the system have shot up. And so pharmaceuticals are the one place where because of the mechanism of things going generic and the fact that our old medicines continue to work and persist over time, you're actually able to get more benefit per dollar.

Starting point is 01:40:20 Okay, final question. So pharma is spending billions of dollars per new drug it comes up with. And surely they have noticed that the lack of some general platform or some general model has made it more and more expensive and difficult to come up with new drugs. And you say perturpsic has existed in 2016. And as far as you can tell, you have the most amount of that kind of data, which we could feed into a general perverse model. So what is like the traditional pharma industry on the other coast up to? If I went to the head of R&D at Eli Lilly or Pfizer or something, do they think that this is

Starting point is 01:40:57 like they have some different idea of the platform that needs to be built or they're like, No, we're all in on the bespoke game, bespoke for each drug. Yeah. So I'll just correct one thing to make sure I'm not overstating. We have way more data for a particular, the limited sub-problem we're tackling, which is overexpressing TFs in combinations. I think we have way more data than anyone on full stop there. But even more specifically, I feel very, very confident we have more data than anyone

Starting point is 01:41:21 looking at trying to reprogram a cell's age. And so that's where we're way larger than the rest of the world. When we think about just general single-cell perturbation data, various flavors, then I think there are other groups which have very large data sets as well. We're still differentiated because we do everything in human cells with the right number of chromosomes, whereas it's very common to do things

Starting point is 01:41:40 in like cancer cell lines which have 200 chromosomes. So like, is that human? I don't know. Depends on how you actually quantify these things. So then if you're going to go ask the leaders of some of the traditional pharmaceutical firms, like are you trying to build a general model? I think some of them have in-house like AI innovation teams

Starting point is 01:41:56 that are working on this. They're really smart people there. But I think it's a general trend. I think you can think about some of the modern pharma is a bit like venture capital firms where they've over time externalized a lot of their R&D, and so they often have divisions of external innovation, which you can kind of think of as like the corp dev version of venture capital. They work with the biotech ecosystem to have a number of smaller nimble firms explore

Starting point is 01:42:20 really pioneer ideas, like the types of things we're working on, and then eventually partner with them once they have assets that are later downstream. And so I think the industry has sort of bifurcated where, smaller biotechs like ours take on most of the early discovery. The stat I'm going to get a little bit wrong from memory, but it's something like 70% of molecules approved in a given year come from originally small biotechs rather than large pharma's, even though you look at the actual like dollars of R&D spend on the balance sheet

Starting point is 01:42:46 and it's like largely in big pharma. Another level of a disintermediation. Another disintermediation. And part of the reason for that difference in cost is they're running most of the trials. Most people partner with pharma to run trials where a lot of the costs are incurred. So it's not just that like, oh, all large farmers are horribly and efficient. or anything like that. And so I think some of them would tell you, like, these ideas are really exciting.

Starting point is 01:43:06 We have an external innovation department. If we don't have one internally or we're collaborating with a startup that's doing something similar. And so you can kind of think of the market structure, like you have a bunch of biotechs, which are kind of like the startups in your ecosystem. And then they're working with something like an oligopsony of pharma's where it's like a limited number of buyers for this particular type of product, which is a therapeutic asset that is ready for a phase one, phase two trial. And so there's a very liquid market for the phase one, phase two assets.

Starting point is 01:43:32 And that's the point at which these partnerships can come to fruition. And so I think that's what a lot of those leaders would say. Now, some of them, by contrast, for instance, Roche bought Genentech back in 2013. R&D is currently run by a Verveerve, one of the scientists I admire most in the world, who's like a thousand times smarter than me. And, you know, she's one of the people who invented this technology and has a big group doing this sort of work there. So it's not like every pharma takes that view, but I think that's sort of a general trend.

Starting point is 01:43:56 Interesting. Full disclosure, I am a. small angel investor in you a little bit now, but that did not influence the decision to have Jacob on. This is super fascinating. Thanks so much for coming on the podcast. Awesome. Thanks for cash. I hope you enjoyed this episode. If you did, the most helpful thing you can do is just share it with other people who you think might enjoy it. Send it to your friends, your group chats, Twitter, wherever else. Just let the word go forth. Other than that, super helpful if you can subscribe on YouTube and leave a five-star review on Apple Podcasts and Spotify. Check out the sponsors in the

Starting point is 01:44:28 description below. If you want to sponsor a future episode, go to thwarcash.com slash advertise. Thank you for tuning in. I'll see you on the next one.

Dwarkesh Podcast - Evolution designed us to die fast; we can change that — Jacob Kimmel

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.