Dwarkesh Podcast - Evolution designed us to die fast; we can change that — Jacob Kimmel
Episode Date: August 21, 2025Jacob Kimmel thinks he can find the transcription factors to reverse aging. We do a deep dive on why this might be plausible and why evolution hasn’t optimized for longevity. We also talk about why ...drug discovery has been getting exponentially harder, and what a new platform for biological understanding to speed up progress would look like. As a bonus, we get into the nitty gritty of gene delivery and Jacob’s controversial takes on CAR-T cells. For full disclosure, I am an angel investor in NewLimit. This did not impact my decision to interview Jacob, nor the questions I asked him.Watch on YouTube; listen on Apple Podcasts or Spotify.SPONSORS* Hudson River Trading uses deep learning to tackle one of the world's most complex systems: global capital allocation. They have a massive in-house GPU cluster, and they’re constantly adding new racks of B200s to ensure their researchers are never constrained by compute. Explore opportunities at hudsonrivertrading.com/dwarkesh\* Google’s Gemini CLI turns ideas into working applications FAST, no coding required. It built a complete podcast post-production tool in 10 minutes, including fully functional backend logic, and the entire build used less than 10% of Gemini’s session context. Check it out on Github now!* To sponsor a future episode, visit dwarkesh.com/advertise.TIMESTAMPS(00:00:00) – Three reasons evolution didn’t optimize for longevity(00:12:07) – Why didn't humans evolve their own antibiotics?(00:25:26) – De-aging cells via epigenetic reprogramming(00:44:43) – Viral vectors and other delivery mechanisms(01:06:22) – Synthetic transcription factors(01:09:31) – Can virtual cells break Eroom’s Law?(01:31:32) – Economic models for pharma Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe
Transcript
Discussion (0)
Today, I have a pleasure of chatting with Jacob Kimmel, who is president and co-founder of New Limit,
where they epigenetically reprogram cells to their younger states.
Jacob, thanks so much coming on the podcast.
Thanks so much for having me.
Looking forward to the conversation.
All right.
First question, what's the first principles argument for why evolution just like discards that so easily?
Look, I know evolution cares about our kids, but if we have longer, healthier lifespans,
we can have more kids, right?
Or we can care for their longer.
We can care for our grandkids.
So is there some pliotropic?
effect that anti-aging medicine would have, which actually selects against you staying young for longer?
Yeah. So I think there are a couple different ways one can tackle this. One is you have to think about
what's the selective pressure that would make one live longer? Right. And encode for higher health
over longer durations. Do you have that selective pressure present? There's another which is,
are there any anti-selective pressures that are actually pushing against that? And there's a third
piece of this, which is something like the constraints of your optimizer. If we think about the genome as a
set of parameters and the optimizer's natural selection, then you've got some constraints on how
that actually works. You can only do so many mutations at a time. You have to kind of spend your
steps that update your genome in certain ways. So tackling those from a few different directions,
like what would the positive possible selection be? As you highlighted, it might be something
like, well, if I'm able to extend the lifespan of an individual, they can have more children,
they can care for those children more effectively, that genome should propagate more readily
into the population. And so one of the challenges then, if you're trying to think back in sort of a
thought experiment style of evolutionary simulation here would be, what were the conditions under
which a person would actually live long enough for that phenotype to be selected for, and how
often would that occur?
And so this brings us back to some very hypothetical questions, things like, what was the baseline
hazard rate during the majority of human and private evolution?
The hazard rate is simply, what is the likelihood you're going to die on any given day?
And that integrates everything.
That's like diseases from aging.
That's getting eaten by a tiger.
That's falling off a cliff.
that's like scraping your foot on a rock and getting an infection and dying from that.
And so from the best evidence we have, the baseline hazard rate was very, very high.
And so even absent aging, you're unlikely to actually reach those outer limits of possible health,
where aging is one of the main limitations.
And so the number of individuals in the population that are going to make it later in that lifespan,
where using some of your evolutionary updates to try and actually push your lifespan upward is relatively limited.
So the amount of gradient signal flowing back to the genome then is not as high as one might intuitively think.
Right. By the way, just on that, often people who are trying to forecast AI will discuss basically how hard did evolution try to optimize for intelligence?
And what were the things which optimizing for intelligence would have prevented evolution for selecting for at the same time, which would make it so that even if intelligence were a relatively easy thing to build in this universe, it would have taken evolution so long to get at human level intelligence.
and potentially if intelligence would be really easy,
then it might imply that we're going to get to super intelligence
and Jupiter-level intelligence, et cetera, et cetera,
the sky is a limit.
So one argument, you know, like birth canal sizes, et cetera,
or the fact that, you know, we had to spend most of our resources
on the immune system.
But what you just sent out at is actually an independent argument
that if you have this high hazard rate,
that would imply that you can be a kid for too long
because you got to, you know, the kids die all the time
and you've got to become an adult
so that you can have your kids.
Yeah, you've got to contribute resources back to the group.
You can't just be a freeloader.
You need to get calories, go out in the jungle, get some berries.
Like if you're just hanging out learning stuff for 50 years,
you're just going to die before you get to have kids yourself.
So obviously humans have bigger brains and other primates.
We also have longer adolescents,
which help us make use potentially of the extra capacity
our brain gives us.
But if you made the adolescents too big,
then you would just die before you get to have kids.
And if that's going to happen anyways,
what's the point of making the brain bigger,
aka, you know, maybe intelligence is easier than we think,
and there's a bunch of contingent reasons
that evolution didn't turn as hard on this variable as it could have.
I entirely agree with that particular thesis.
You know, I think in biology in general,
when you're trying to engineer a given property,
be it being healthier longer,
be it making something more intelligent.
And this is true even at the micro level
of trying to engineer a system
to manufacture a protein at high efficiency.
You always have to start by asking yourself,
Did evolution spend a lot of time optimizing this?
If yes, my job is going to be insanely hard.
If no, potentially there are some low-hanging fruit.
And so I think this is a good argument for why potentially intelligence wasn't strongly selected
for.
And I think actually the lifespan argument plays back into intelligence to a degree.
Interesting.
You start to ask, okay, if I have intelligence that's able to compound over time and I can
develop, you know, for instance, in some hypothetical universe, my fluid intelligence
lasts much longer into my lifespan.
If the number of people who are reaching something like 65 is very small in a
population. You're not necessarily going to select for alleles that lead to fluid intelligence
preservation laid into life. This is actually part of my own pet hypothesis around some of the
interesting phenomenology in when discoveries are made throughout lifespans. So there are some famous
results where, for instance, I'm going to get the exact age a little bit wrong, but in mathematics,
most great discoveries happen roughly before 30. Why should that be true? That doesn't make sense.
You can sort of put down a bunch of societal reasons for it. Oh, maybe you sort of, you know,
become stayed in your ways. Your teachers have, you know, cause you to restrict your thinking by that
point. But really, that's true across centuries. That's true across many different unique cultures
around the world. That's true in both cultures from the East and cultures from the West. That seems unlikely to
me. I think a much simpler explanation is that for whatever reason, our fluid intelligence is roughly
maximized at the time where the population size during human evolution was maximal. If you had to
pick an age at which fluid intelligence was selected most strongly for, it's probably around 25 or 30.
That's probably about the age of the adults in the large populations that were being selected.
for during the rest of evolution. And so I think there's a lot of reason here to think that
actually there's interplay between many features of modern humans and how long we were living
and how that dictates some of the features that occur that rise and fall throughout our lives.
So in one way, this is actually a very interesting RL problem, right? It's a long horizon
an RL problem, 20-year horizon length, and then there's a scalar value of how many kids you have,
I guess that's a vibe, et cetera. And if you know how hard, or I don't know, but if you've heard from
your friends about how hard RL is on these models for just very intermediate goals that last
an hour or a couple hours. It's actually surprising that any signal propagates across a 20-year
horizon. By the way, on the point about fluid intelligence speaking, so not only is it the
case that in many fields achievement peaks before 30, in many cases, if you look at the greatest
scientists ever, they had many of their greatest achievements in a single year. So...
Yeah, the annulus mirror bliss. Yeah, exactly. Yeah, yeah, exactly. Yeah, Newton, what is it, optics,
gravity calculus 21.
Do you know the Alexander von Hummelts story?
No.
So Alexander von Hummelts, one of the most famous scientists in history, is kind of forgotten
now.
But he had this one expedition to South America where he climbed Mount Chimborazo at a time
when very few Europeans had done that.
And so he was able to observe various ecological layers that were repeated across latitudes
and across altitudes.
And it caused him to formulate an understanding of how selection was operating on plants
at different layers in the ecosystem.
And that one expedition was the basis of his.
entire career. And so when you see something named Hummel, just to give you a sense of how famous
this guy is, it's usually Alexander von Hummel. It's not like this is some like massive, prosperous
German family name that just happens to be really common. It's this one guy. Right, right. And so really,
it was like this singular year in which he conceived a lot of our modern understanding of botany and
selective pressure. Interesting. So that's one out of three components of the evolutionary story.
Yeah. So then the next piece of the evolutionary story is like, is there anything selecting against
longevity? Like, okay, let's just pretend everything I said was wrong. Can I still make an argument that
maybe evolution hasn't maximally optimized are for our longevity. One argument that comes up,
and I'll caveat and say, I don't know how strong some of the mathematical models that people
put together here are. You can find people using the same idea to argue for and against.
But there's this notion of what's called kin selection, that if you sort of take a selfish gene
view of the world, that really this is the genome optimizing for the genome's propagation,
it's not trying to optimize for any one individual, then actually optimizing for longevity
is a pretty tricky problem because you have this nasty regularization term, which is that
if you're able to make a member of the population live longer,
but you don't also counteract the decrease in their fitness over time,
meaning you maybe extend maximum lifespan,
but you haven't totally eliminated aging.
Then the number of net calories contributed to the genome
as a function of that person's marginal year
and their own calorie consumption is less
than if you were to allow that individual to die
and actually have two 20-year-olds, for instance,
that sort of follow behind them.
And so there is a notion by which a population being laden demographically
with many aged individuals,
even if they did have fecundity persisting out some period later in life,
is actually net negative for the genome's proliferation,
and that really a genome should optimize for turnover
and population size at max fitness.
I love this idea of aging as a length regularizer.
So people want to be familiar with the idea that when companies are training models,
they'll have a regularizer for you can do chain of thought,
but don't make the chain of thought too long.
And then you're saying, like, how many calories you consume over the course of your life
is that one such regularizer?
Yeah. That's interesting. Okay. And then the third point was...
The third piece is basically optimization constraints.
Yeah. So I think this is where another ML analogy is helpful, which is something like, well, actually a two-layer neural network is technically a universal approximator, but we can never actually fit them in such a way.
Yeah. And why does that occur? People will wave their hands, but it basically comes down to we don't really know how to optimize them, even if you can prove out in a formal sense that they are universal approximators.
And so I think we have similar optimization challenges with our genome as the parameters and evidence.
evolution as the optimization algorithm. And one of those is that your mutation rate basically
bounds the step size you can take. So if you imagine that at each generation, you get some number
of inputs, you can select for some number of alleles. Well, the max number of variations in the genome
is set by your mutation rate. If you dial your mutation rate up too high, you probably get a bunch
of cancers. So you're selected against. If you have it too low, you can't really adapt to anything.
So you end up with this happy medium, but that limits your total step size. And then the number of
variants you can screen in parallel is basically limited by your population size. And so for most
of evolution, there were lots of forces constraining population size as well.
One of the dominant source of selection on the genome is really prevention of infectious
disease. And it seems like when you study the history of early modern man, infectious
disease is actually what shaped a lot of our population demographics. And so there's a lot
of pressure pushing for those step sizes, those updates to the genome, really to be optimizing
for protection against infectious disease rather than other things. And so even if you imagine
that maybe the arguments on the former and the first and the second of these possible, you know,
positive selection being absent for longevity and potentially some negative selection existing.
You could, I think, construct a reasonable argument for why humans don't live forever,
why the genome hasn't optimized for that, simply based on these optimization constraints.
You have to imagine not only that the positive selection is there and the negative selection is absent,
but that when you think about sort of the weighted loss term of all the things the genome is optimizing for,
that the weight on longevity is high enough to matter.
And so even if you imagine it's there, if you simply imagine that the lambdas are dialed
toward infectious disease resilience more effectively, then you can construct
an argument for yourself. And so I think really when you start to ask, why don't we live forever?
Why didn't evolution solve this? You actually have to think about an incredibly contingent
scenario where both the positive selection is there, the negative selection is absent, and you have
a lot of our evolutionary pressure going toward longevity to solve this incredibly hard problem
in order to construct the counterfactual in which longevity is selected for and does arise
in modern man and in which we are optimal. And so I think that puts human aging and longevity and
health, really in this category of problem in which evolution has not optimized for it.
Ergo, it should be, relatively speaking, relative to a problem evolution had worked on,
easy to try and intervene and provide health.
And I think in many ways, the existence of modern medicines, which are incredibly simplistic,
we are targeting a single gene in the genome and turning it off everywhere at the same
time.
And yet the fact that these provide massive benefit to individuals is another sort of positive
emission or piece of evidence.
Antibiotics are an even more clear case of that because here is something that
evolution actually cares a lot about.
Right?
So it feels like antibiotics should have been...
Why didn't humans of all their own antibiotics?
Yeah.
It's actually an excellent question that I haven't heard post before.
So we think about where do antibiotics come from?
To your point, we could synthesize them.
They're just metabolites largely of other bacteria in fungi.
You think about the story of penicillin.
What happens?
Alexander Fleming finds some fungi growing on a dish.
And the fungi secrete this penicillin antibiotic compound.
And so there's no bacteria growing near the fungi.
And he says he has this light bulb moment of, oh, my gosh, they're probably making something
that kills bacteria.
Yeah. There's no prima facie reason that you couldn't imagine encoding an antibiotic cassette into a mammalian genome.
I think part of the challenge that you run into is that you're always an evolutionary competition. There's this notion of what's called the Red Queen hypothesis. It's an allusion to the story in Lewis Carroll's through the looking glass where the Red Queen is running really fast just to stay in place.
So when you look at sort of pathogen host interactions or competition between bacteria and fungi that are all trying to compete for the same niche, what you find is they're evolving very rapidly in competition with one another. It's an arm's.
race. Every time a bacteria evolves a new evasion mechanism, the fungus that occupies the
niche will evolve some new antibiotic. And so part of why there is this competitiveness between the two
is they both have very large population sizes in terms of number of genomes per unit resource they're
consuming. There are trillions of bacteria and a drop of water that you might pick up. So there's
trillions of copies of the genome, massive analog parallel computation. And then at the same time,
they can tolerate really high mutation rates because they're prokaryotic. They don't have multiple
cells. So if one cell manages to mutate too much and it isn't viable or it grows too fast,
it doesn't really compromise the population in the whole genome. Whereas for metazzoans, like you and I,
if even one of our cells has too many mutations, it might turn into a cancer and eventually
kill off the organism. So basically what I'm getting at, and this is a long-winded way of getting
there, is that bacteria and other types of microorganisms are very well adapted to building these
complex metabolic cascades that are necessary to make something like antibiotics. And they are
necessary to maintain that same mutation rate and population size in order to maintain the
competition. Even if our human genome stumbled into making an antibiotic, most pathogens probably would
have mutated around it pretty quickly. Actually, that should imply that there's, through
evolutionary history, millions of quote unquote, naive antibiotics, which could have acted as
antibiotics, but now basically all the bacteria have evolved around it. Do we see evidence of these
like historical antibiotics that some fungi came up with and a bacteria revolved around and there's evidence
for a remnant in their DNA? I'm going a bit beyond my own knowledge here. So I want to say my
strong hypothesis would be yes. I can't point to direct evidence today. There are some examples of
this where, for instance, bacteria that fight off viruses that infect them, bacteria phages,
have things like CRISPR systems. And you can actually go and look at the spacers, the individual
guide sequences that tell the CRISPR system. Which genome do you go? Where do you cut? And you find
some of these guides that are very ancient. It seems like this bacterial genome might not have
encountered that particular pathogen for quite a while, and so you can actually get sort of an
evolutionary history of what was the warfare like, what were the various conflicts throughout
this genomic history, just by looking at those sequences. In mammals, where I do know a bit better,
we do have examples of this, where there is this co-evolution of pathogen and host.
Imagine you have some antipathogen gene A fighting off some virus X. Well, you then actually
updates, so now you have virus X prime and antipathogen gene A prime.
Now, virus X prime goes away, but actually virus X still exists, and we've lost our ability to fight it.
Those examples really do happen, and so there's a prominent one in the human genome.
So we have a gene called Trim 5 Alpha, and it actually binds a endogenous retrovirus that is no longer present, but was at one point actually resurrected by a bunch of researchers.
And it was demonstrated that it is the case.
We have this endogenous gene, which basically fits around the capsid of the virus, like a baseball in a glove, and prevents it from infecting.
And it turns out, if you look at the evolutionary history of that gene, and you trace it back through.
monkeys. You can actually find that a previous iteration inhibited
SIV, which is the cousin of HIV in humans. And so old world monkeys
actually can't get SIV, whereas new world monkeys can and humans can, obviously.
And so it seems like what happened, and you can actually make a few mutations in
Trim 5 alpha and find that this is true, is that Trim 5 alpha once protected against an HIV-like
pathogen in the primate genomes. And then there was this challenge from this massive
endogenous retrovirus. And it was so bad that the genome lost the ability to fight
these HIV-like viruses in order to restrict this endogenous retrovirus.
And you can see it because that retrovirus integrates into our genome.
There are like latent copies, like the, you know, half bodies of this virus all throughout
our DNA code.
And then this particular retrovirus when extinct.
Reasons unknown.
No one knows why.
But we didn't like re-update that piece of our host defense machinery to fight off HIV again.
And so we're in a situation where you can go in and take human cells and make just a couple
edits in that trim 5 alpha gene.
And it's currently protecting against a virus which no longer exists.
And you can edit it back to actually.
restrict HIV dramatically. So there are plenty of examples. You could imagine the same thing for
antibiotics. We're like, hey, this particular, you know, defense mechanism went away because
the pathogen evolve its own defense to it. Well, the pathogen might have lost that defense
long ago, and if you could sort of extract that historical antibiotic, that historical antifungal,
potentially it actually has efficacy. Isn't the mutation rate per base pair per generation,
like one in a billion or something? It's quite low. So you're saying that in our genomes we can find
some extended sequence which encodes how to bind specifically to the kind of virus that
SIV is. And the amount of evolutionary signal you would need in order to have a multiple
base pair sequence. So each nucleotide consecutively would have to mutate in order to finally get
the sequence that binds to SIV. That seems almost implausible that you could, I mean,
I guess evolution works. So like we can come up with new genes, right? But like how would that even
working out. I think a great explanation for understanding a lot of evolution and how you're able to
actually adapt to new environments, new pathogens, is that gene duplication is possible. And this explains
a whole lot. If you look at most genes in the genome, they actually arise at least at some point
in evolution from a duplication event. So that means you've got gene A, it's doing, you know,
it's performing some job, and then some new environmental concern comes along. Maybe it's like a lack of
particular source of nutrient. Maybe it's a pathogen challenging you. And maybe genea, if it were to
dedicate all of its energies, so to speak, you were to mutate it to solve this new problem,
could be adapted with a minimal number of mutations. But then you lose this original function.
So we have this nice feature of the genome, which is it can just copy and paste. And so
occasionally what will happen in evolution is you get a copy paste event. Now I've got two copies
of gene, and I can preserve my original function in the original copy. And then this new copy
can actually mutate pretty freely because it doesn't have a strong selective pressure on it.
So most mutations might be null. I've got two copies of the gene. I can have lots of mutations
in it accumulate, nothing back.
really happens because I've got my backup copy, my original, and so that you can end up with
drift. So you're saying that even though the per base pair mutation rate might be one in a
billion, if you've got 100 copies of a gene, then the sort of like mutation rate on a gene
or on a low-hamming distance sequence to the one you're aiming for might actually be quite
high and so you can actually get the target sequence. It's not that the base rate goes up. It's not like
DNA polymerase is, you know, more erroneous or that you're just like doubling it. It's like,
oh, well, I've got two copies. That is true, but I don't think it's.
the main mechanism. One of the main mechanisms that just makes it difficult for evolution to
solve a problem is that if a mutation breaks a gene or somewhere along the path of edits,
imagine there are three edits that take a host defense gene from restricting SIV to restricting
this new nasty PT endogenous retrovirus. Well, if one edit just breaks the gene, two edits just
breaks the gene, three edits fixes it, it's really hard for evolution to find a path
whereby you're actually able to make those first two edits because they're net negative
and that net negative for fitness.
And so you need some really weird contingent circumstances.
So through duplication, you can create a scenario
where those first two edits are totally tolerated.
They have no effect on fitness.
You've got your backup copy.
It's doing its job.
And so even though the mutation rate is low,
some of these edits actually aren't that large.
I'm going to forget the number of edits,
for instance, in Trim 5 Alpha for this particular phenomenon
we're talking about for memory,
but it's in like the tens.
It's not that you need massive kilobase scale rearrangements.
It's actually a fairly small number of edits.
And basically, you can just,
align the sequence of this gene in New World
versus Old World Monkeys and then for humans,
and you find there's a very high degree of conservation.
Conceptually, is there some phylogenetic tree of gene families
where you've got the transposons
and you've got like the gene itself,
but then you've got like the descendant genes,
which are like low-haming distance?
I don't know.
Is there like some conceptual way in which is they're categorized?
You can arrange genes in the human genome
by homology to one another.
What you find is even in our current genome,
even without having the full historical record,
there are many, many genes which are likely resulting from duplication events.
One, like, trivial way that you can check this for yourself is, like, just go look at the names of genes.
And very often you'll see something where it's like gene one, gene two, gene three, or, you know, type one, type two, type three.
And if you then go look at the sequences, sometimes those names arise from like they were discovered in a common pathway and they have nothing to do with each other.
A lot of the time, it's because the sequences are actually quite darn similar.
And really what probably happened is they evolved through a duplication event and then maybe did some swapping with some other genes.
and you ended up with these quite similar,
quite homologous genes that now have specialized functions.
So it's like when evolution has a new problem to solve,
it doesn't have to start from scratch.
It starts from like,
what was the last copy of the parameters
for encoding a gene that is getting close to solving this?
Okay, let's do a copy paste on that
and then iterate and fine tune on those parameters
as opposed to having to start with like ab initio,
some random stretch of sequence somewhere in the geno has to become a gene.
Interesting.
Man, this is fascinating.
Okay, back to aging.
You'll have to cancel her evening plans.
I've got so many questions for you, and I...
Keep going.
So the second reason you gave, which was that there's selective pressure against people who get old, but still keep living, but they're, like, slightly less fit.
They're suboptimal from a calorie input perspective.
Right.
The number of calories they can gather for the population.
And that's how people love thinking about their grandpas, you know.
Yeah, yeah.
Some optimal from a calorie provider.
A total calorie provider right there.
Anyways, so a concern you might have about the effects of longevity treatments on your own body
is that you will fix some part of the aging process, but not the whole thing.
It seems like you're saying that you actually think this is the default way in which an anti-aging
procedure would work because that's the reason evolution didn't optimize it for it.
It's just that like, we're only fixing half of the aging process and not the whole thing.
Whereas sometimes I hear longevity proponents be like, no, we'll get the whole thing.
There's like going to be a source that explains all of aging and we'll get it.
Whereas your evolutionary argument for why evolution didn't optimize against aging relies on the fact that aging actually is not monocausal.
And evolution didn't bother to just fix one cause of aging.
Yeah, I think that's correct.
I don't think that there is a single mono causal explanation for aging.
I think there are layers of molecular regulation that explain a lot.
For instance, I have dedicated my career now to working on epigenetics and trying to change
which gene cells use because I think that explains a lot of it.
But it's not that there is like some upstream bad gene X and all we have to do is turn that off
and suddenly aging is solved.
And so I think the most likely outcome is that when we eventually develop medicines that
prolong health in each of us, it's not going to fix everything all at once.
There's not going to be a singular magic pill.
But rather, you're going to have medicines that add multiple healthy years to your life.
you can't otherwise get back.
But it's not going to fix everything at the same time.
You are still going to experience for the first medicine
some amount of decline over time.
And this gives you an example of if you think about evolution
as a medicine maker in this sort of anthropomorphic context,
why it might not have been selected for immediately.
What would the AI Foundation model for trading and finance look like?
It would have to be what LLM starts to NLP
or what the virtual cell is for biology.
And it would have to integrate every single kind of information
from around the world,
from order books to geopolitics.
Now, think about how insane this training objective is.
Here's this constantly changing our environment
with input data that's incredibly easy to overfit to
where you're pitted against extremely sophisticated agents
who are learning from your behavior and plotting against it.
Obviously, there's very few things in the world
that are as complex to global capital allocation.
It's a system that reflects billions of live decisions in real time.
Now, as you might imagine,
trading in AI to do all of this is a compute-intensive task.
That's why Hudson River Trading continually upgrades this massive in-house cluster with fresh racks of brand-new B-200s being installed as we speak and more on the way.
HART executes about 15% of all U.S. trading equities volume, and researchers there get compensated for the massive upside that they create.
If the newest researcher on the team improves an H.R.T model, their contributions are recognized and rewarded right away, regardless of their tenure.
If you want to work on high stakes, unsolved problems,
unconstrained by your GPU budget,
check out HART at hussendriver trading.com slash forecash.
All right, back to Jacob.
All right, so evolution didn't select for aging.
What are you doing?
What's your approach to new limit that you think is likely to find the true cause of aging?
Yeah, so we're working on something called epigenetic reprogramming,
which very broadly is using genes called transcription factors.
I like to think about these as sort of the,
orchestra conductors of the genome. They don't perform many functions directly themselves, but they bind
specific pieces of DNA, and then they tell which genes to turn on, which genes to turn off.
They eventually put chemical marks on top of DNA, on some proteins that DNA surrounds. And this is one of
the answers, this particular layer of regulation, called the epigenome. It's the answer to this
fundamental biological question of how do all my cells have the same genome, but ultimately do very
different things. Your eyeball and your kidney have the same code, and yet they're performing
different functions, and that may sound a little bit simplistic, but ultimately I think it's
kind of a profound realization.
And so that epigenetic code is really what's important for cells to define their functions.
That's what's telling them which genes to evoke from your genome.
What has now become relatively apparent is that the epigenome can degrade with H.
It changes.
The particular marks that tell your cells which genes to use can shift as you get older.
This means that cells aren't able to use the right genetic programs at the right times to
respond to their environment.
You're then more susceptible to disease.
you have a less resilience, too many insults that you might experience.
And our hope is that by remodeling the epigenome back toward the state it was in when you were young right after development,
that you'll be able to actually address myriad different diseases,
whose one of strong contributing factors is that cells are less functional than when you were at an earlier point in your life.
So we're going after this by trying to find combinations of these transcription factors
that are able to actually remodel the epigenome so that they can buy into just the right places in the DNA
and then shift the chemical marks back toward that state
when you were a young individual.
If you were just making these broad changes to a cell state
through these transcription factors,
which have many effects,
are there other aspects of a cell state
that are likely to get modified at the same time
in a way that would be deleterious,
or would it be a sort of straightforward effect on cell state?
Oh, how I wish it were straightforward.
No, it's very likely.
Each of these transcription factors bind
hundreds to thousands of places in the genome.
And one way of thinking about it is if you imagine the genome is sort of the base components
of cell function, then these transcription factors are kind of like the basis set in linear
algebra.
It's different combinations and different weights of each of the genes.
And so most of them are targeting pretty broad programs.
And there are no guarantees that aging actually involves moving perfectly along any of the
vectors in this particular basis set.
And so it's probably going to be a little tricky to figure out a combination that actually
takes you backward.
There's, again, no guarantees from evolution.
it's just a simple reset. And so it's actually a critical part of the process that we run through
as we try and discover these medicinal combinations of transcription factors we can turn on,
is to ensure that they not only are making an age cell revert to a younger state. We measure that
a couple different ways. One is simply measuring which genes those cells are using. They use
different genes as they get older. You can measure that just by sequencing all of the MRIs,
which are really the expressed form of the genes being utilized in the genome at a given time.
You see that age cells use different genes. Can I revert them back to a younger state?
colloquially we call this, you know, looks like assay. Can I make an old cell look like a young one
based on the genes it's using? And maybe more importantly, we go down and drill to the functional
level and we measure, can I actually make an age cell performance functions? It's object rolls
within the body the same way a young cell would. And these are the really critical things you
care about for treating diseases. Can I make a hepatocyte, a liver cell in Greek? Function better
in your liver, so it's able to process metabolites like the foods you eat, how it's able to
process toxins like alcohol and caffeine. Can I make a T-cell respond to pathogens and other
antigens that are presented within your body. So these are the ways in which we measure age. And so we
need to ensure that not only does the combination of TFs that we find actually have positive
effects along those axes, but we then want to also measure any potential detrimental effects
that observe that image. So there are canonical examples where you can seemingly reverse the age
of a cell, for instance, at the level of a transcriptome, but simultaneously you might be changing
that cell's type or identity. So Shinya Amanaka, a scientist who won the Nobel in 2012 for some
work he did in about 2007, discovered that you could just take four transcription factors,
and actually just by turning on these four genes, turn an adult cell all the way back
into a young embryonic stem cell. This is a pretty amazing existence proof that shows that you can
reprogram a cell's type and a cell's age simultaneously just by turning on four genes. Out of the
20,000 genes in the genome, the tens of millions of biomolecular interactions, just four genes is
enough. That's a shocking fact. And so we actually have known for many years now that you can
reprogram the age of a cell, the challenge is that simultaneously you're doing a bunch of other
stuff, as you alluded to. You're changing its type. And that might be pathological. If you did that in the
body, it would probably cause a type of tumor called a teratoma. So we measure not only at the
the level of the genes a cell is using, do you still look like the right type of cell? Are you still
hepatocyte? Are you still a T-cell? If not, that's probably pathological. But you can also
use that same information to check for a number of other pathologies that might develop. Did I make
this T-cell hyper-inflammatory in a way that would be bad? Did I make this liver cell potentially
neoplastic proliferate too much, even when the organism's healthy and undamaged.
And you can check for each of those at the level of gene expression programs, and then likewise
functionally. Before you put these molecules in a human, you actually just functionally check
in an animal. You make an itemized list of the possible risks you might run into. Here are the ways it
might be toxic. Here are the ways it might cause cancer. Are we able to measure deterministically
and empirically that that doesn't actually occur? Okay. This is a dumb question, but it will
help me understand why an AI model is necessary to do any of this work. So you mentioned the
Yamanaka factors. From my understanding, the way he identified these four transcription factors
was that he found the 24 transcription factors that associated, that have high expression
in embryonic cells, and then he just turned them all on in a somatic cell. Basically, he systematically
removed from this set until they found the minimal set that still induces
a cell to become a stem cell, and that just doesn't require any fancy AI models, et cetera.
Why can't we do the same things for the transcription factors that are associated with younger cells,
or express more in younger cells as opposed to older cells, and then keep eliminating from them
until we find the ones that are necessary to just make a cell young?
I wish it were so easy.
You're entirely right.
You know, Shinya Amanaka was able to do this with a relatively small team with relatively
few resources and achieve this remarkable field.
So it's entirely worth asking.
Why can't a similar procedure work for arbitrary problems in reprogramming cell state, whether it be trying to make an age cell act like a young one, disease cell act like a healthy one, why can't you just take 24 transcription factors and randomly sort through them?
So there were two features of Shinya's problem that I think make it amenable to that sort of interrogation that aren't present for many other types of problems.
And this is why he's such a remarkable scientist.
Most of science is problem selection.
You don't actually get better at pipetting or running experiments after a certain age, but you do get better at picking what to do.
And he's amazing at this.
So the first feature is that measuring your success criterion is trivial in the particular case he was investigating.
He's starting with somatic cells that in this case were a type of fiberblast, which literally is defined as cells that stick to glass and grow in a dish when you grind up a tissue.
So it sounds fancy, but it's a very simplistic thing.
So he's starting with fibroblast.
You can look at them under a microscope, and you can see their fiberblast just based on how they look.
And then the cells he's reprogramming toward are embryonic stem cells.
So these are tiny cells.
They're mostly nucleus.
They grow really, really fast.
They look different.
They detach from a dish.
They grow up into a 3D structure.
And they express some genes that will just never be turned on in a fibroblast by definition.
So actually, how he ran the experiment was he just set up a simple reporter system.
So he took a gene that should never be on in a fibroblast, should only be on in the embryo.
And he put a little reporter behind it so that these cells would actually turn blue when you dumped a chemical on them.
And then he ran this experiment in many, many dishes with millions upon millions of cells.
The second really key feature of the problem is this notion that those cells he's converting into amplify.
They divide and grow really quickly.
So in order for you to find a successful combination, you don't actually need it to be efficient, almost at all.
The original efficiency Yamanaka published, the number of cells in the dish that convert from somatic to an induced pluripotent state back into a stem cell is something like a basis point or a tenth of a basis point.
So like 0.01, 0.01%.
If these cells were not growing and they were not proliferating like Matt, you probably would never be able to detect that you had actually found anything successful.
It's only because success is easy to measure once you have it and even being successful in very rare cases, one in a million, amplifies and you can detect it that this, I think, was amenable to his particular approach.
So in practice, what he would do is dump these factors or this group of 24 minus some number eventually whittling it down to four.
He would dump these onto a group of cells.
And over the course of about 30 days, just a few cells in that dish, like a countable number on your fingers, would actually reprogram.
But they would proliferate like mad.
They form these big what we call colonies because it's like a single cell that just proliferates and forms a bunch of copies of itself.
They form these colonies.
You can see with your eyeballs by holding the dish up to the light and looking for opaque little dots on the bottom.
You don't need any fancy instruments.
And then you could stain them with this particular stain and they would turn blue based on the genetic reporter he had.
So now we look at those key features of the problem.
And we pick any other problem we're interested in aging, so that's what I'm going to pick for explanation.
How difficult is it to measure the likelihood of success or whether you've achieved success for cell age?
Well, it turns out age is much more complicated in terms of discriminating function than actually just comparing two types of cells.
An old liver cell and a young liver cell, prima facie, actually look pretty darn similar.
It's actually quite nuanced the ways in which they're distinct.
And so there isn't a simple trivial system where you just like label your one favorite gene or you can just give the young cells cancer.
They'll grow, you know.
Yeah, yeah.
You want to see them.
Just make the old ones cancer, and then they'll grow.
Yeah, Dorcas, you've solved it for me.
So there's no trivial way that you can tell whether or not you've succeeded.
You actually need a pretty complex molecular measurement.
And so for us, a real key enabling technology, and I don't think our approach would really
have been possible until it's emerged with something called single cell genomics.
So you now take a cell, rip it open, sequence all the MRNAs it's using.
And so at the level of individual cells, you can actually measure every gene that they're
using at a given time and get this really complete picture of a cell state, everything
it's doing, lots of mutual information to other features.
And from that profile, you can train something like a model that discriminates young and
aged cells with really high performance.
It turns out there's no one gene that actually has that same characteristic.
So unlike in Yamanaka's case where a single gene on or off is like an amazing binary
classifier, you don't have that same feature of easy detection of success in aging.
The second feature is, as you highlighted, we can't just turn these into cancer cells.
Success doesn't amplify.
And so in some ways, the bar for a medicine is higher than what Yamanaka achieved in
his laboratory discovery. You can't just have 0.001% success and then wait for the cells to grow
a whole bunch in order to treat a patient's disease or, you know, make their liver younger,
make their immune system younger, make their endothelium younger. You need to actually have it
be fairly efficient across many cells at a time. And so because of this, we don't have the same
luxury Yamanaka did of taking a relatively small number of factors and finding a success case within
there that was pretty low efficiency. We actually need to search a much broader portion of TF space
in order to be successful.
And when you start playing that game
and you think, okay, how many TFs are there?
Somewhere between 1,000 and 2,000,
depends on exactly where you draw the line
and developmental biologists love to argue
about this over beer.
But let's call it 2000 for now.
And you want to choose some combination,
let's say you guess it's like somewhere
between one and six factors might be required.
The number of possible combinations
is about 10 to the 16.
So if you do any like math on the back of a napkin,
in order to just screen through all of those,
you would need to do many orders of magnitude,
more single cell sequencing
than the entire world has done
to date cumulatively across all experiments. And so it's just not tractable to do exhaustively.
And so that's where actually having models that can predict the effect of these interventions
comes in. If I can do a sparse sampling, I can test a large number of these combinations.
And I can start to learn the relationship of what a given transcription factor is going to do
to an age cell. Is it going to make it look younger? Is it going to preserve the same type?
I can learn that across combinations. I can start to learn their interaction terms.
Now I can use those models to actually predict in silico for all the combinations I haven't seen,
which are most likely to give me the state I want,
and you can actually treat that as a generative problem
and start sampling and asking which of these combinations
is most likely to take my cell to some target destination in state space.
In our case, I want to take an old cell to a young state,
but you could imagine some arbitrary mappings as well.
And so I think as you get to these more complex problems
that don't have the same features that Shinia benefited from,
which were the ability, again, to measure success really easily.
You can see it with your bare eyes.
You don't even need a microscope.
And two, amplification.
As you get into these more challenging problems,
you're going to need to be able to search a larger fraction of the space to hit that higher bar.
So we can think of these transcription factors as these basis directions,
and you can get like a little bit of this thing, a little bit of that thing and some combination.
And evolution has designed these transcription factors to,
is that your claim to have a relatively modular, self-contained effects,
that work in predictable ways with other transcription factors?
And so we can use that same handle to our own ends?
Yeah, yeah, that would be very much my contention. And one piece of evidence for this is that's the way the development works. You know, it's kind of a crazy thing to think about, but you and I were both just like a single cell, and then we were a bag of undifferentiated cells that were all exactly alike. And then somehow we became humans with hundreds of different cell types all doing very different things. And when you look at how development specifies those unique fates of cells, it is through groups of these transcription factors that each identify a unique type. And in many cases, actually, the groups of transcription factors, the sets that specify very different fates.
are actually pretty similar to one another.
And so evolution has optimized for being able to just swap one TF in or swap one TF out of a combination
and get pretty different effects.
And so you have this sort of like local change leading to local change in sequence or gene set space
leading to a pretty large global change in output.
And then likewise, many of these TFs again are duplicated in the genome.
And because mutations are going to be random and they're inherently small changes at the
level of sequence at a given time, evolution needs a substrate where in order to function
effectively, these small changes can give you relatively large changes in phenotype.
Otherwise, it would just take a very long time across evolutionary history for enough mutations
to accumulate in some duplicated copy of the gene for you to evolve a new TF that does
something interesting.
And so I think we're actually in most cases in biology due to that evolutionary constraint.
Small edits need to lead to meaningful phenotypic changes in a relatively favorable
regime for generic gradient-like optimizers.
You know, it would be maybe a little bit overstating to say evolution is like using the gradient, but there is a system kind of like if you've heard of evolution strategies where basically the way you optimize parameters is you can't take a gradient on your loss. So you make a bunch of copies of your parameters. You randomly modify them. And then you compute a gradient on your parameters against your loss. And so you can take a gradient in that space. That's kind of how I imagine evolution is working. And so you need lots of those little edits to actually lead you in to have meaningful step sizes in terms of the ultimate output that you have.
Interesting. You're just like designing the laura that goes on top of...
Yeah, yeah, in a way. And to think like, you know, why would transcription factors,
maybe this is getting a little bit too gigabrained about it, but like, why is the genome even have transcription factors?
Like, what's the point? Why not just have every time you want a new cell type, you like engineer some new cassette of genes or some new totally de novo set of promoters or something like this?
I think one possible explanation for their existence rather than just an appreciation for their presence is that.
it while having transcription factors allows a very small number of base pair edits at the
substrate of the genome to lead to very large phenotypic differences. If I break a transcription
factor, I can delete a whole cell type in the body. If I retarget a transcription factor
to different genes, I can dramatically change when cells respond and have, you know,
hundreds of their downstream effector genes change their behavior in response to the environment.
And so it puts you in this regime where transcription factors are a really nice substrate to
manipulate as targets for medicines. In some ways, they might be like,
evolution's levers upon the broader architecture of the genome. And so by pulling on those same
levers and evolution has gifted us, there are probably many useful things we can engender upon biology.
Yeah, you're sort of hinting that if we analogize it to some code base, we're going to find
a couple lines that are like commented out that's like de-aging, you know, and then like unhyphen
or unparr parentheses. I don't know about that, but if I can give you, I'll give you like a real
cringe analogy that sometimes I deploy, but it requires a very special audience. I think you'll
probably be one who fits into it. You're flattering our listeners. Only cringe listeners will
appreciate it, but your audience will love this. I don't know about your audience, but you will.
You can kind of think about it like, you know, if you think about how attention works,
like queries, keys values. TFs are kind of like the queries, the genome sequences they bind to,
kind of like the keys. Genes are kind of like the values. And it turns out that structure
then allows you to very efficiently in terms of editing space. You can change just one of those
embedding vectors, in this case, one of those sequences, and get dramatically different performances
or total outputs.
And so I do think it's kind of interesting how these structures recur throughout biology, you know,
in the same way that the attention mechanism seems to exist in some neural structures.
I think it's kind of interesting that you can very easily see how that same sort of querying
and information storage might exist in the Gmail.
Interesting.
Yeah, a previous guest and a mutual friend Trenton Brickend has a, had a paper in grad school
about how the brain implements attention.
Yeah, Eddie Chang has found, like, positional encodings probably exist in humans using neuropics.
Really? If you haven't read these papers. Oh, yeah. So he implants these neuropixel probes into individuals. And then he's able to talk to them, look at them as they read sentences. And what he's able to talk to them as certain representations, which function as a positional encoding across sentences. So they fired a certain frequency and it just increases as the sentence goes and then like resets. And so it seems exactly like what we do when we train large language models where you've got some term function.
It's so funny the way we're going to learn how the brain works is just like trying to first, for the first,
principles engineer intelligence and AI.
And then it just happens to be the case that each one of these things has a neural correlate.
Gemini's CLI just one shoted an automated producer for me in one hour.
Basically, I went at this interface where I could just paste in a raw episode transcript
and then get suggestions for Twitter clips and titles and descriptions and some other copy,
all of which cumulatively takes me about half a day to write.
Honestly, it was just extremely good.
I described the app I wanted and then asked Gemini to talk through how I would go about implementing it.
It walked through its plans.
It asked me for input where I hadn't been sufficiently clear.
And after we ironed out all the details, Gemini just literally one-shot at the full working application.
With fully functional backend logic.
Making this app literally took 10 minutes, including installing CLI.
Then I spent 50 minutes, fine-tuning the UI, messing around.
And by the way, this process did not involve me actually editing or even looking at any of the code.
I would just tell Gemini how I wanted things moved around.
And the whole UI would change is Gemini rewrote the files.
Despite building and then fine-tuning an entire working application, the session context didn't even get 10% exhausted.
This is just a super easy and fast way to turn your ideas into useful applications.
You can check out Gemini CLI on GitHub to get started.
All right, back to Jacob.
If you're right that transcription factors are the modality, evolution has used to have complex phenotypic effects, optimize for different things.
two part question.
One, why having pathogens, which have a strong interest in having complex phenotypic effects on your body,
also utilized the transcription factors as the way to fuck you over and steal your resources.
And two, we've been trying to design drugs for centuries.
Why aren't all the big drugs, the top-selling drugs, ones that just modulate transcription factors?
Yeah, yeah, why don't we have a million of these pills? Okay, I'll try and take those in stride, and they're pretty different answers. First answer is there actually are pathogens that utilize transcription factors as part of their lifecycle. So like a famous example of this is HIV. HIV encodes a protein called TAT. And tat actually activates NF Kappa B. And so HIV, sorry, to back up a little bit as a retrovirus, starts out as RNA, turns itself into DNA, shoves itself into the genome of your CD4 CT cells. And so then it needs this ornate machinery to actually control when does it make more HIV and when does it go
latent so it can hide and your immune system can't clear it out. And this is why HIV is so pernicious
is you can kill every single cell in the body that's actively making HIV with like a really
good drug. But then a few of them that have like lingered and hunkered down just turn back on. And so
people call this the latent reservoir. Same with HB, right? Well, Hep B, HEPC can both do this
sort of like latent sort of behavior. And so HIV is probably the most pernicious of these. And one way it does
it is this gene called TAT actually interacts with NFCAPA B. NFCAPAB is a master transcription factor within
immune cells. Typically, if I'm going to, like, horribly reduce what it does and some immunologists
can crucify me later, it, like, increases the inflammatory response of most cells. They become
more likely to attack given pathogens around them on the margin. And so it'll turn on an F-CAPA-B
activity and then uses that to drive its own transcription and its own lifecycle. And so it,
I can't remember quite all the details now exactly of how it works, but part of this circuitry is what
allows it to, in some subset of cells, where some of that upstream transcription factory machinery
in the host might be deactivated.
it goes latent. And so as long as the population of cells it's infecting always has a few
that are like turning off the transcription factors upstream that drive its own transcription,
then HIV is able to persist in this latent reservoir within human cells. So it's just one example
offhand. Then there are a number of other pathogens and unfortunately don't have quite as much
molecular detail on some of these, but they will interface with other parts of the cell that eventually
result in transcription factor translocation to the nucleus and then transcription factors being
active. This actually segues a little bit to your second question on why aren't there more medicines
targeting TFs. In a way, I think many of our medicines ultimately downstream are leading to
changes in TF activity, but we haven't been able to directly target them due to their physical
location within cells. And so we go several layers upstream. If you think about how a cell works
in sensing its environment, it has many receptors on the surface, it has the ability to sense mechanical
tension and things like this. And ultimately, most of what these signaling pathways lead to is to tell the
cell, use some different genes than you're using right now. That's often what's occurring. And so that
ultimately leads to transcription factors being some of the final effectors in these signaling cascades.
So a lot of the drugs we have that, for instance, inhibit a particular cytokine that might bind a
receptor or they block that receptor directly or maybe they hit a certain signaling pathway. Ultimately,
the way that they're exerting their effect is then downstream of that signaling pathway. Some
transcription factor is either being turned on or not turned on, and you're using different
genes in the cell. And so we're kind of taking these like crazy bank shots because we can't
hit the TFs directly. So it sort of begs the question, like, why can't you just go after the TF
directly? Traditionally, we use what are called small molecule drugs, where they're defined just by
their size. The reason they have to be small is they need to be small enough to wiggle through
the membrane of a cell and get inside. And then you run into a challenge, which is if you want to
actually stick a small molecule between two proteins that have a pretty big interface, meaning like
they've got big swaths on the side of them that all, you know, sort of line up and form a
synaps with one another, then you would need a big molecule in order to inhibit that. And it turns out
that TF's binding DNA is a pretty darn big surface. And so small molecules aren't great at disrupting
that and certainly even worse at activating it. So small molecules can get all the way into the nucleus,
but they can't do much once they're there. They're just too small. And then the other classic
modalities we have are recombinant proteins. We make a protein like a hormone in a big fat. We grow it
in some Chinese hamster ovary cells. We extract it. We inject it into you. This is how, for instance,
like human insulin works that we make today, or you make antibodies.
Antibodies produced by the immune system, these run around and find proteins that have a
particular sequence, they bind to it, and often they just stop it from working by glomming
a big thing onto the side.
So those are too big to get through the cell membrane, so then they can't actually get to a
TF or do anything directly.
So we take these bank shots.
So what changes that today, and why I think it's pretty exciting, is we now have new
nucleic acid and genetic medicines, where you can, for instance, deliver RNAs to a cell that
can get through using tricks like lipid nanoparticles, you wrap them in a fat,
looks kind of like a cell membrane.
It can fuse with a cell, put the MRNAs in the cytosol.
You can make a copy of a transcription factor there.
And then it translocates the nucleus the same way a natural one would and exerts its effect.
And likewise, there are other ways to do this using things like viral vectors.
But I think we've only very recently actually gotten the tools we need to start addressing
transcription factors as first-class targets rather than treating them as like maybe some
ancillary third-order thing that's going to happen.
Interesting. So the drugs we have can't target them.
But your claim is that a lot of drugs actually do work by.
binding to the things we actually can target and those having some effect on transcription factors.
So this brings us to questions about delivery, which is the next thing I want to ask you.
You mentioned lipid nanoparticles. This is what the COVID vaccines were made of.
The ultimate question, if we're going to work on deaging, is how do we make every single cell in the body?
Even if you identify what is the right transcription factor to deage a cell, and even if they're shared across cell types, or you figure out the right one for every single cell type.
How do you get it to every single cell in the body?
Yeah.
How do you deliver stuff?
How do you get them in there?
So I think there are many ways one could imagine solving it.
I'll sort of narrow the scope of the problem to saying,
I think delivering nucleic acid is a pretty good first order primitive.
Ultimately, the genome's nucleic acids, the RNAs that come out of it are nucleic acids.
So if you can get nucleic acid into a cell, you can drug pretty much anything in the genome effectively.
So you can reduce this problem to asking, how do I get nucleic acids wherever I want them to any cell type very specifically?
So today, there are two main modalities that people use, both of which have some downsides.
The first one that we've touched on already is lipid nanoparticles.
These are basically fat bubbles.
And by default, they get taken up by tissues which take up fat, like the liver.
And they can be used sort of like Trojan horses.
So they can release some arbitrary nucleic acid, usually RNA, maybe encoding your favorite genes,
in our case, transcription factors, into the cell types of interest.
You can play with the fats, and you can also tie stuff onto the outside of the fat.
Like you can attach a part of an antibody, for example, to make it go to different cell types in the body.
and I think the field is making a lot of progress on being able to target various different cell types with lipid nanoparticles.
So even if nothing else worked for the next several decades, I think companies like ours would have more than enough problems to solve and with the cells that we can actually target.
Another prominent way people go after this is using viral vectors.
The basic idea being viruses had a lot of evolutionary history and very large population sizes.
They've evolved to get into ourselves.
Maybe we can learn something from them, even better Trojan horses.
So one type of virus people use a lot.
It's called an AAV.
Those AVs carry DNA genomes, so you can get genes, whole genes into cells.
They've got some packaging sizes.
You can think of it kind of like a very small delivery truck so you can't put everything you want into it.
They can go to certain cell types as well.
And then on top of just where do you actually get the nucleic acid to begin with?
You can engineer the sequences a bit.
And that basically allows you to add like a knot gate on it.
You can make it turn off the nucleic acid in certain cell types,
but you're never going to use the sequence engineering to get nucleic acid into cells
where it didn't get delivered in the first place.
So you can sort of start broad with your delivery vector
and then use sequence to narrow down
to make it more specific, but not the other way around.
So I think both of those methods are super promising.
Again, if nothing else emerged for decades,
we'd still have tons and tons of problems
as a therapeutic development community
to solve even using just those.
I do think I have one sort of very controversial opinion
which, you know, people can roast me for later.
You have just one?
You're trying to solve aging?
You think you have only one?
I have many controversial opinions.
One of them is that I think both of these probably in the limit will not be the way that we're delivering medicines in the year 2100.
If you think about viral vectors, no matter what, there are always going to be some amount of immunogenic.
You're always going to have your immune system trying to fight them off.
You can play tricks.
You can try and cloak them, et cetera, et cetera.
But they're always going to have some toxicity risk.
They also don't go everywhere.
It's not that we have examples of like a single viral species that infects every cell type in the body and we just need to engineer it to make it safe.
We would have to also engineer the virus to go to new cell types.
So there's some limitations there.
L&Ps likewise have some problems.
They can go to tons of cell types.
That's what largely we're working on.
We're super excited about it.
But there are some physical constraints.
They just have a certain size,
and they have to get from your bloodstream,
out of your bloodstream,
toward a given target cell,
and they have to not fuse into any of the other cells along the way.
So there's a whole gamut they have to run.
Ultimately, I think we're probably going to have to solve delivery
the way that our own genome solved delivery.
So we have this same problem that arose during evolution,
which is how do I patrol the body,
find arbitrary signals in the environment and then deliver some important cargo there when some
set of events happens. How do I, you know, find a specific place and only near those cell types
release my cargo? And really, the problem was solved by the immune system. So we have cell types
in our body, T cells and B cells, which are effectively engineered by evolution to run around,
invaginate whatever tissues they need to, they can climb almost anywhere in the bodies,
there's nowhere they can't get access almost. And then once they sense a particular set of signals,
and they've got a very ornate circuitry to do this.
They run basically an endgate logic.
They can release a dispecified payload.
And right now, the way our genome sets them up,
the payload they release is largely either enzymes
that will kill some cell that they're targeting
or kill some pathogen or some signal flares
that call in other parts of the immune system
to do the same thing.
So that's super cool.
But you can think about it as a modular system
that evolution's already gifted us.
We've got some signal and environmental recognition systems
so we can find particular areas of the body
that we want to find, and then some sort of payload delivery system. I can deliver some arbitrary
set of things. And I imagine if we were to rip-on-winkle ourselves into 2100 and wake up,
the way we will be delivering these nucleic acid payloads is actually by engineering cells to do it
to perform this very ornate function. Those cells might actually live with you. You probably will
get engrafted with them, and they might persist with you for many years. They deliver the medicine
only when the environment within your body actually dictates that you need it. And so you'll
actually won't be seeing a physician every time this medicine is actually.
rather you'll have a more ornate, responsive circuit.
The other exciting thing about cells is that they're big and they have big genomes.
And so you actually have a large palette to encode complex infrastructure and complex circuitry.
So you don't need to limit yourself to like the very small RNAs you can get in that might encode a gene or two,
or in our case a few transcription factors.
You don't have to limit yourself to this tiny AavV genome that's only a few kilobases.
You've got billions of base pairs to play with in terms of encoding all your logic.
So I think that's ultimately how delivery will get solved.
We've got many, many stepping stones along the way.
But if I could clone myself and work on an even riskier endeavor, that's probably what I would do.
This is actually, I mean, in a way, we treat cancer this way with CAR teeth therapy, right?
We take the T cells out and then we tell them, go find a cancer with this receptor and kill it.
But is the reason that works is that the cancer cells are trying to target are also free-floating in the blood.
And is that what the targets?
Basically, could this deliver to literally every single cell in the body?
Not literally every single cell.
I'll like asterisk it there.
So example, T cells don't go into your brain.
You don't have, they can, but it's generally a pathology when they get in there.
So it's not like literally every cell.
But almost every cell in your body is surveilled by the immune system.
So there are very, very few what we call immune privilege compartments in your body.
It's things like the joints of your knees and your shoulders, your eyeball, and your brain, basically.
There might be a couple of these.
I think the ear probably falls into that category.
A funny way of thinking about this is all the gene therapy people using viruses.
they want to deliver to the immune privilege compartments
because their drugs are immunogenic
and they're limited to a very, very small set of diseases.
So in a way, it's like the shadow of all the diseases
you can address with viruses
is what you can address with cells.
And given the complementarity between them,
it's like, okay, you can probably cover the entire body.
And so they can't literally go everywhere,
but I think your analogy to the CART work
is very apt as well,
where you can think about that two-component system,
I've got some detection mechanism
for the environment I want to sense
to perform some function.
and then I have some sort of payload that I deliver.
Carty's engineer the first of those
and leave the second exactly the same as the immune system does.
So they engineer go recognize this other antigen
that you wouldn't usually target some protein
on the surface of a cell, for instance.
And then deliver the payload you would usually deliver
if it was infected by a virus
or if you saw that it was foreign in some way,
whereas cancer cells usually don't actually look that foreign.
Most of their genes are the same genes
that are in your normal genome,
and that's why it's hard for the immune system of surveillance.
Interesting.
You know, it's funny that whenever we're trying to cure infection
diseases we just started to deal with. Fuck, viruses have been evolving for billions of years with our
oldest common ancestor and they know exactly what they're doing and it's so hard. And then whenever
we're trying to do something else, we're like, fuck, the immune system has been evolving for billions
of years and they knows what it's doing and how do we get past it. Yeah. Yeah, the red queen race is like
quite sophisticated. If you want to just like throw a new tool into biology, you somehow have to get around
one side of that equation. Right. Given the fact that it's somewhere between impossible,
will and very far away.
And it's necessary for full curing of aging.
Does that mean that in the short run, in the next few decades,
we'll have some parts of her body, which will have these amazing therapies,
and then other parts which will just be stuck the way they are.
So you mentioned hepatocytes are some of the cells that you're able to actually study
in or deliver to, and these are our liver cells.
So you're saying, look, I can.
get drunk as much as I want, and it's not going to have an impact on my long run liver health,
because then you'll just check me with this therapy. But for the rest of my body, it's going to
age as normal. What is the implication of the fact that the delivery seems to be lagging much behind
your understanding, at some point your understanding of aging? Yeah. Just to give the delivery folks
credit, they're currently ahead. There are currently no reprograming medicines for aging, and there are
medicines that deliver nucleic acid. So, like, they're still winning the race against us right now. But to your
point, I hope the lines cross. I hope.
we out-compete them.
So I do think, actually, even if you were able to only target some subsets of cells,
it's not that you would see, like, this strange Frankensteinian benefit in health in some
aspects and lack of benefit entirely in others.
I think what we found across the history of medicine is that actually the body's an
incredibly interconnected complex system.
And if you're able to rescue function, even in one cell type and one tissue, you often have
knock-on benefits in many places that you didn't initially anticipate.
One way we can get examples of this is through transplant experiments.
So both in bone marrow and in liver, for example, we have fairly common transplant procedures that occur in humans.
And so we can compare old humans who get livers from young people or old people.
And in a way, ask a pretty controlled question.
What occurs as a function of just having a young liver?
Is it that, for example, you can eat a lot of fatty food and drink a lot and be fine?
Or is it that actually you see broader benefits?
And the latter seems to be true.
They have reduced risk of several other diseases and overall better survival as a function of having a younger liver than they do for an older one.
Suggesting that actually because these tissues are so interconnected, many of these organs like the liver, like your adipose tissue or endocrine organs, they're also sending out signals to many other places in your body, helping coordinate your health across multiple tissue systems.
Even just one tissue can benefit other tissue systems in your body at the same time.
HSCs are another example where there are many circumstances where one of the, and then I'll summarize,
this is mostly examples taken from a wonderful book by Frederick Applebaum, who trained with Don Thomas, the physician who invented human bone marrow transplants.
There are many circumstances where patients got a bone marrow transplant and actually cured another disease they had as a result, maybe unanticipated, where it's even just the replacement of this one special cell type, HSCs, has knock-on effects throughout the body.
You know, there were symptoms of these diseases that presented in myriad ways throughout their system, but ultimately its root cause was even just a single cell.
There are counter examples as well
where you can go into animals
and break even just one gene
in one specific subset of T cells.
You can break a gene in there
that encodes for a transcription factor
in their mitochondria called TREM,
and you actually dramatically shorten
the lifespan of mice.
One gene in one special type of C cells
can give you that type of pathology.
And so it sort of implies
the inverse may also exist.
Is this related to why Zempec
has so many downstream positive effects
that seem even not totally related
to its effects
just on making you leaner?
Yeah, I think it's one example.
Because it is a hormone,
and your endocrine system coordinates a lot of the complex interplay between your tissues,
I don't think the story is fully written yet on exactly why GLP1 and GIP1,
broadly incriminate and memetic medicines like OZempic,
have so many knock-on benefits,
but I think they're a great example of this phenomenon.
If someone told you, I'm going to find a single molecule,
and I'm going to drug it, and it's not only going to have benefits for weight loss,
but also for cardiovascular disease, also possibly for addictive behavior, and maybe even preventing
neurodegeneration, you would have told them they were crazy.
And yet, just by acting on the small number of cells in your body, which are receiving this
signal, the interplay and the communication between those cells and the rest of your body
seems to have many of these knock-on benefits.
So it's just one existence proof.
Very small numbers of cells in your body can have health benefits everywhere.
And so even if cellular delivery does not emerge by 2100, as I imagine it with, then I still
think that you're going to have the ability to add decades of healthy life to individuals
by reprogramming the age of individual cell types and individual tissues.
Interesting.
How big will the payload have to be?
How many transcription factors?
Yeah.
I think just a countable number.
I think some of those that we found today that have efficacy or, you know, somewhere
between one and five, and that that's a small enough number that you can encapsulate it in
current MRNA medicines.
So already in the clinic today, there are medicines that deliver many different genes as RNA.
So there are medicines where, for instance, it's a vaccine as a combination of flu and COVID
proteins, and they're delivering 20 different unique transcripts all at the same time.
And so when you think about that already is a medicine that's being injected into people in
trials, the idea of delivering just a few transcription factors is seemingly quotidian.
And so, thankfully, I don't think we'll be limited by the size of the payloads that one can
deliver.
One other really cool thing about transcription factors is that the endogenous biology is very
favorable for drug development.
The expression level of transcription factors in your genome relative to other genes,
is incredibly low.
So if you just look at the rank-ordered list
of what are the most frequently expressed genes
in the genome by the count of how many MRNAs are in the cell,
transcription factors are near the bottom.
And that means you don't actually need to get that many copies
of a transcription factor into a cell
in order to have benefits.
And so what we've seen so far,
and what I imagine will continue to play out,
is that even fairly low doses of these medicines,
which are well within the realm of what folks have been taking
for now more than a decade,
are able to induce really strong.
efficacy. And so we're hopeful that not only will the actual size of the payload in terms of
number of base pairs not be limiting, but the dose shouldn't be limiting either.
And is it, would it have to be a chronic treatment or could it just be a one-time dose?
In principle, it could be one time. I think that would be an overstatement for today.
But I can sort of talk you through the evidence from like the first principles back to
the reality of like what's the hardest thing we have in hand. So epigenetic reprogramming is basically
how the cell types in our bodies right now are able to adopt the identities that they have. And
the existence proof that those epigenetic reprogramming events can last decades is that my tongue
doesn't spontaneously turn into a kidney. So these epigenetic marks can persist for decades throughout
a human life or, you know, hundreds of years if you want to take the example of a bowhead whale,
which uses the same mechanism. And we also know that with very targeted edits, other groups
have done this, folks like Luke Gilbert now at the Ark Institute, who I think of as like one of the great
unsung scientists of our time, have been able to make a targeted edit in a single locust.
and then show that you can actually make cells divide 400 plus times over multiple years in an incubator in the lab.
So imagine like a hot house where you're just trying as hard as you can to break this mark down,
and it can actually persist for many years.
Other companies have actually now dosed some editors similar to the ones that Luke developed in his lab in monkeys
and shown they last at least a couple years.
So in principle, the upper bound here is really long.
You could potentially have one dose and it lasts a very long time, you know, potentially decades,
as long as it took you to age the first time, maybe.
We don't have data like that today, so we don't want to overstayed.
We do have data that these positive effects can last several weeks after a dose.
And so you could imagine, even without many leaps of faith, up toward this upper bound limit of what's possible, just from the data we have in hand now, that you could get doses every month, every few months, and actually have really dramatic benefits that persist over time, rather than needing, for instance, to get an IV every day, which might not be tractable.
So we've got 1,600 transcription factors in the human genome.
Is it worth looking at non-human TFs and seeing what effects they might have, or are they unlikely to beat the right search base?
I think it's less likely.
I think you have a prior that evolution has given you a reasonable basis set for navigating the states that human cells might want to occupy.
And in our case, we know that the state we're trying to access is encoded by some combination of these TFs.
It does arise in development, obviously.
We're trying to make an old cell look young, not look like some Frankenstein cell that's never been seen before.
That said, we don't have any guarantees that the way aging progresses is by following the same basis set of these transcription factor programs in the genome that are encoded during development.
So I don't think it's unreasonable to ask, would your eventual ideal reprogramming medicine necessarily be a composition of the natural TFs?
Or would it include something like TFs from other organisms as you posit or even entirely synthetic transcription factors as well?
Things like Super Sox.
Super Sox is a particular publication from Sergei, I might mispronounce his last name, Villicenko.
where they mutated the SOX2 gene,
and they made more efficient IPSC reprogramming.
So they could take somatic cells
and turn them into pluripotent stem cells
more effectively than you could
with just the canonical Yamanaka factors,
which are Oct4, SOX2, KLF4, and MEC.
IPSC reprogramming never happens in nature.
So there's no reason to necessarily believe
that the natural TFs are optimal.
And so even really simple optimizations,
like just mutagenizing one of the four Yamanaka factors
we already know about
or swapping some domains between a few TFs,
seem to improve things dramatically.
So I think that's a pretty good signal
that actually there's a lot of gradient to climb here
and that potentially for us,
the end-state products we're developing in 2100
are more like synthetic genes that have never existed
rather than just compositions of the natural set.
What about the effects of aging, which are...
Okay, so I don't know, your skin starts to sag
because of the effects of gravity over the course of decades.
Is that a cellular process?
How would some cellular therapy deal with that?
The best evidence is that it's probably not cellular.
So the reason your skin sags is there's a protein in your skin called elastin, which does exactly what you'd think it would based on the name.
It kind of keeps your skin elastic-y like a waistband and holds it to your face.
So you have these big polymerized fibers of elastin in your face.
And as far as we understand it, you only polymerize it and form a long fiber during development.
And in the rest of your life, you make the individual units of the polymer.
But for reasons, no one, as far as I can tell understands, they fail to polymerize.
And you can't like make new long cords to hold your skin up to your face.
So I think the eventual solution for something like that is likely that you need to program cells to states that are extra physiological.
There might not be a cell in your body.
It's not just like a young skin cell from a 20-year-old is better at making these fibers.
As far as we can tell, they don't.
But you could probably program a cell to be able to reinvigorate that polymerization process to run along the fiber and repair it in places where it's damaged.
Obviously, these things get made during development, so it's totally physically feasible for this to occur.
Maybe there's even a developmental state which would be sufficient to achieve this.
I don't think anyone knows, but that would be the kind of state that one might have to engineer de novo,
even if our genome doesn't necessarily encode for it explicitly.
Interesting.
Okay. What is Ehrum's Law?
Irom's Law is a funny portmanteau created by a friend of mine, Jack Scannell, where he inverted the notion of Moore's Law,
which is the doubling of compute density on silicon chips every few years.
So Moore's Law has graciously given us massive increases in compute performance over.
over several decades.
And Eroom's Law is the inverse of that,
because in biopharma, what we're actually seeing
is that there's a very consistent decrease
in the number of new molecular entities,
so new medicines that we're able to invent
per billion dollars invested.
And this trend actually starts way back in the 1950s
and persists through many different technological transitions
along the way.
So it seems to be an incredibly consistent feature
of trying to make new medicines.
So in a weird way,
Airwom's Law is actually very similar
to the scaling laws you have in ML,
where you have this,
very consistent logarithmic relationship of you throw in more inputs and you get consistently
diminishing outputs. The difference, of course, is that this trend in ML has been used to
raise exponentially more investment and to drive more hype towards AI. Whereas in biotech,
you know, modular new limits, new round, it has driven down valuations, driven down excitement
and energy. With AI, at least you can sort of internalize the extra cost and the extra benefits
because there's a general purpose model you're training.
So this year you spent $100 million training a model
next year a billion dollars the year after that $10 billion.
But it's one general purpose model,
unlike we made money on this drug
and now we're going to use that money to invest in 10 different drugs
in 10 different bespoke ways.
Okay, anyways, I was gearing up to ask you,
what would a general purpose platform
where even if you had diminishing returns,
at least you can have this sort of like
less bespoke way of designing drugs look like for biotech?
Okay.
I'm going to slightly dodge your question first
to maybe analyze something really interesting
that you highlighted,
which is you have these two phenomena, again, ML scaling and then scaling in terms of the cost for new drug discovery,
why is it that the patterns of investment have been so different?
I think there are probably two key features that might explain this difference.
One is that the returns to the scaled output in the case of ML actually are expected to increase super exponentially.
You actually reach AGI.
It's going to be a much larger value than just even a few logs back on the performance curve that people are following.
Whereas in the life sciences thus far, each of those products were generating further in,
further out on the E-room slot curve as time moves forward haven't necessarily scaled in their
potential revenue and their potential returns quite so much. And so you're seeing these increased
costs not counterbalanced by increased ROI. The other piece of it that you highlighted is that
unlike building a general model where potentially by making larger investments, you can be able to
solve a broader addressable market, moving from solving very narrow tasks to eventually replacing
large fractions of white collar intelligence. In biotech, when you're traditionally able to develop
a medicine in a given indication, I was able to treat
disease X. It doesn't necessarily engender you to be able to then treat disease Y more readily.
Typically, where these firms, biotech firms in general, have been able to develop unique
expertise, is on making molecules to target particular genes. So I'm really good at making a
molecule that intervenes on gene X or gene Y. And it turns out that the ability to make
those molecules more rapidly isn't actually reducing the largest risk in the process. And so this
means that the ability to go from one or two outputs one year to then going to four, the next, is
much more limited. And so this brings us then to the question. And so this brings us then to the
question of what would the general model be in biology? And I think it kind of reduces down to
how do you actually imbue those two properties that create the ML scaling law curve of hope
and bring those over to biology so that you can take the Eroom's law curve and potentially give it
the same sort of potential beneficial spin. So I think there are a few different versions of this
you could imagine, but I'll address the first point. How do you get to a place where you're actually
able to generate more revenue per medicine so that potentially the outputs you're generating are
more valuable, even if each output might cost a bit more.
Traditionally, when we've developed medicines, we go after fairly narrow indications,
meaning diseases that fairly small numbers of people get.
And that's actually increased in terms of the narrow scope of what medicines are addressing
as we've gone forward in time.
And so there's a sort of an ironic situation where we've gone from addressing pretty
broad categories of disease like infectious disease to narrower and narrow or genetically
defined diseases that have small patient populations.
Because these only affect a few people, if you think about the vet,
function of a medicine is, you know, how many years of healthy life does it give how many people?
Right.
If how many people is pretty small, it just really bounds the amount of value you're able to
generate.
So you need to then be able to find medicines that treat most people.
All of us will one day get sick and die.
So arguably, the tam for any really successful medicine could be everybody on planet Earth.
Right. So we need to find a way to be able to route toward medicines that address these
very large populations.
The second piece, then, is how do we actually build models that enable us to take the
success in one medicine we've developed?
and lead that to an increased probability of success on the next medicine.
Traditionally, we haven't been able to do that.
Maybe you're better at making an antibody for a gene Y because you made one for gene X five years ago,
but it turns out making an antibody isn't really the hard part of drug discovery.
Figuring out what to make an antibody to target is the hard thing about drug discovery.
What gene do I intervene upon in order to actually treat a disease in a given patient?
Most of the time, we just don't know.
And so that's why, even if a given drug firm becomes very good at making antibodies to gene X,
they have a successful approval.
When they then go to treat disease Y, they don't necessarily know what gene to go after.
And most of the risk is not in, how do I make an antibody to treat my particular target?
It's in figuring out what to target in the first place.
I'm not sure how to understand this claim that we know, you know, we know how to engage with the right hook.
We just don't know what that hook is supposed to do in the body.
I don't know if that's the way you to describe it.
With another claim that I've seen that, you know, with small mom,
We have this Goldilocks problem where they had to be small enough to percolate through the body and through cell walls, et cetera, but big enough to interfere with, like, protein-14-18 interactions that transcription factors might have or something. So there, it seems like getting the hook is the big problem.
Yeah, in this particular case, if we bound ourselves to we must use small molecules as our modality, then there are lots of targets which are very difficult to drug.
There are many other modalities by which you can drug some of these genes. And I would say I don't have a formal way.
of explaining us, but if you were to write out a list of well-known targets that many, many folks
would agree are the correct genes to go after and to try and inhibit or activate in order to
treat a given set of diseases. And the only reason we don't have medicines is that we can't
figure out a trick in order to be able to drug them, it's a fairly small list. It would probably
fit on a single page, whereas the number of possible indications that one could go after
and the number of possible genes that one could intervene upon, especially when you consider
their combinations, is astronomical. I think, you know, the experiment you could run here is if
You lock 10 really smart drug developers in a room, and you tell them to write down some incredibly high conviction target disease pairs where they're sure if they modulate this biology, these patients are going to benefit.
And all they need is some molecular hook, as you put it, in order to do this.
It's a relatively short list.
What you're not going to get is anything approximating the panoply of human pathologies that develop.
And you can actually look for this.
There are some existence proofs you can look for out in the universe, which is to say, if the only problem was that we didn't have the ability to drug something,
using current therapeutics that we can put in humans,
we should still be able to treat it in the best animal models of that disease
because we can use things like transgenic systems.
You can go in and you can engineer the genome of that animal.
And so this gives you all sorts of superpowers that you don't have in patients,
but allow you to, for instance, turn on arbitrarily complex groups of genes
and arbitrarily specific or broad groups of cells in the organism
at any time you want, at any dose you want in the animal.
And for the majority of pathologies, we just don't have many of those examples.
Okay, so then what is the answer to what is the general purpose?
The general purpose model.
Every marginal discovery increases the odds you make the next discovery or something like that.
So there are multiple ways one might approach this problem.
The most common today, this is often what people are describing when they talk about a virtual cell.
This is sort of a very nebulous idea, sometimes luminous, if you'll let me describe it in that way as well.
But I think most concretely, what most people are trying to do is measure some number of molecules or some sort of
of perceived emissions like the morphology of a cell, and then perturb it many times, turn some genes
on, turn some genes off, and measure how that molecular morphological state changes.
The notion is that there's a lot of mutual information in biology.
So if I measure something like most commonly, all the genes the cell is using at a given moment,
which you can get by RNA sequencing, that I get a decent enough picture of most of the other
complexity going on, and so that I can, for instance, take a bunch of healthy cells and a
bunch of cells that are in a diseased or age state. And I'm able then to compare those profiles
and say, okay, my disease cells use these genes, my healthy cells use these. Are there anti-interventions
I can find that I'm able to experimentally in the lab that shift one toward the other? And then the hope would
be because you're never going to be able to scan combinatorily all the possible groups of genes
just to make that concrete. They're just going to be round with it. But there's something like 20,000
genes in the genome. You can then choose however many genes in your combination you want. It's not
crazy to think of hundreds at a time. That's what transcription factors control. That's how
development works. So the number of possible combinations is truly astronomical. You just can't test it
all. So the hope would be that by doing some sparse sampling of those pairs, your inputs are,
here's what the cell looked like beforehand, here's the particular genes I perturbed. You have
some measurement then of the state that the cell resulted in. So here's which genes went up.
Here's which went down. And then you can start to ask, once I've trained a model to predict from
the perturbations to the output on the cell state, what would happen for some arbitrary combinations
of genes. And now in silico, I can search all possible things that one might do and potentially
discover targets that take my disease cells back to something like healthy cells. So that's another
version of what would a all-encompassing model look like, where you actually have compounding
returns in drug discovery. Right. And you basically described one of the models you guys are
working on at New Limit. You're training this model based on this data where you were taking
the entire transcriptome and just labeling it based on how old that cell actually is.
If you've got all this data you're collecting on how different perturbations are having different phenotypic effects on a cell,
why only record the, like, whether that effect correlates with more or less aging,
why can't you also label it with all the other effects that we might eventually care about
and eventually get the full virtual cell?
Because that's a more general purpose model, right?
not just the one that predicts whether a cell looks old or not.
Yeah, absolutely.
So I think what we actually do both today.
So we can train these models where basically the inputs are a notion of what that cell looked like at the starting place.
Here's what a generic old cell looked like.
And then representations of the transcription factors themselves.
We derive those from protein foundation models.
Their language models basically train on protein sequences.
Turns out that gives you a really good base level understanding of biology.
So the model is kind of starting from a pretty smart place.
And then you can predict a number of different target.
it's from some learned embedding, the same way you could have multiple heads on a language model.
And so one of those for us is actually just predicting every gene the cell is expressing.
Can I just recapitulate the entire state and guess what effect these transcription factors will have on every given gene?
And you can think about that as like an objective rather than a value judgment on the cell.
I'm not asking whether or not I want this particular transcriptome.
I'm just asking what it will look like.
And then we also have something more like value judgments.
I believe that that transcriptome looks like a younger cell.
And I'm going to select on that and train ahead to predict it where I can denoise across genes and then select for younger cells.
But you could do that for arbitrary numbers of additional heads.
What are some other states you might want?
Do I want to polarize T cells to a less inflammatory state in somebody with an autoimmune disease?
Do I want to make liver cells more functional in a patient who's suffering from certain types of metabolic syndrome, be that maybe even orthogonal to the way that they age?
Do I want to go in and change the way a neuron is functioning to a different state to treat a particular type of neurodegenerative disease?
These are all questions you can ask.
They're not the ones we're going after, but that is the more general, broader vision.
This is so similar to, in LLMs, you have first imitation learning with pre-training
that builds a general purpose representation of the world.
And then you do RL about a particular objective in math or coding or whatever that you care about.
And you are describing an extremely similar procedure where first you just learn to predict
perturbations and genes to
broad effects on the cell
and that's like
that's the sort of pre-training
just like learn how cells work
and then there's another afterward
layer of these like value
judgments of okay well how would we
how would we have to perturb it to have
effect X which actually seems very similar to
how do we get the base model
to answer this math problem
or answer this coding problem
I don't know I don't know if people usually put it
this way but it actually just seems like an extremely
extremely, I mean, that makes me more optimistic on this because like LLMs work, right? And RL works.
Yeah, they do. I think the conceptual analogy is very apt. You know, we don't actually use RL at
the moment, so I don't want to overstate the level of sophistication we've got. But I think the
general problem reduces down in a similar way. And so you can think about, you know, your earlier
question of what does the general model look like that enables you to actually have compounding
returns in drug discovery? Well, you might have something like this base model, which, as you said,
just predicts this object function of how are these perturbations hitting these
targets going to change which genes are turned on and off in this cell. Then there's an entirely
other task, which is, well, which genes do you want to turn on and off? And what state do I want
the cell to adopt? Our lens on that is that across many different diseases people have, age is one of
the strongest predictors of how they're going to progress, whether that disease arises. And so in
many, many circumstances, you have evidence in humans where you can say, oh, if I could make the
cell younger, maybe that's not a perfect fix, but that's going to dramatically benefit not only patients
who have a diagnosed disease, but it might actually help most of us stay healthier longer,
even subclinically, before anyone would formally say that we're sick.
Now, that's another more general function, the same way that in LLMs, you might have to
create these particular RLVF environments.
You need to have places where you can state a value function of the particular task
that you're trying to optimize for.
In drug discovery, you would then need to know, well, what are the cell states I want
to engineer for?
That's kind of the next generation of what a target might be, beyond just which genes do I
want to move up and down and which gene perturbations do I put in, you then need to know,
what cell state am I engineering for? What do I want this T-cell to do? You'll have a bunch of
labelers in Nigeria, like clicking different pictures of cells, like, oh, this one looks young,
this one looks old. This one looks really great. I love that one. Potentially. It's more like
developmental biologists locked in a room, as my friend Cole Trapnel would say. It seems like
what you're describing seems quite similar to perturbseek. And we've had perturbs seek for, I don't know,
when it was done. What year was it? There were three papers almost simultaneously in 2016.
Okay, so almost a decade. I don't know. We're still waiting, I guess, for the big breakthrough is supposed to cause. And this is the same procedure. So why is this going to have an effect? Why is this taken so long? Yeah, yeah, good questions. So the original procedure is created by a bunch of brilliant folks. There's a group in Edel Amitz Lab at the Weizmann's Lab at the Brod, where Tray Dixit, a friend of mine helped work on this. And then Jonathan Wiseman's lab at U.S.F. where Britt-Adamsson did a lot of the early work. They all constructed this,
idea where you can go in and you label a perturbation that you're delivering to a cell.
So this is typically a transgenic perturbation, meaning you're integrating some new gene
into the genome of a cell, and that turns another gene on or off. They used CRISPR, but there's
lots of ways to do it, and the concept's pretty general. And then you attach on that new trans gene,
that new gene you put into the genome of the cell, some barcode that you can read out
by DNA sequencing. So now, when you rip the cells open, you're able to not only measure every
gene they're using, but you also sequence these barcodes and you know which genes you turned on
and which are off. So you can then start to ask questions like, well, I've turned on genes A, B, and C,
what did it do to the rest of the cell? So that's the general premise of the technology.
And so it's useful to just set that up because it explains why this didn't all happen earlier.
Yeah. One, the actual readout, ripping the cells open and sequencing them, used to be pretty
bad, and it used to be really expensive. And it's gotten much better over time. So the metric
people often think about here is like cost per cell to sequence. It used to be measured in
dollars, and now it's measured in cents and down to the fractions of cents because
because that cost curve has improved dramatically.
The cost of sequencing has likewise come down.
So even beyond the actual reagents necessary to rip the cell open
and turn its mRNAs into DNAs that are ready for the sequencer,
now the sequencer is cheaper.
The other piece is actually getting these genes in
and then figuring out which ones are there started out pretty bad.
So when we started with this technology,
it was a beautiful proof of concept,
but I don't think anyone would tell you it was 100% ready for prime time.
When you sequence to cell,
only about 50% of the time could you even tell which perturbation you put in.
Sometimes you just wouldn't detect the barcode, and you'd have to throw the cell away, or you detect the wrong barcode, and now you've mislabeled your data point.
So this might sound like a trivial sort of technical piece, but imagine you're running this experiment the old-fashioned way, where you test different groups of genes and different test tubes on a bench.
Now imagine you hired someone who every other tube labels it wrong.
So when you then collect data from your experiment, you basically have no idea what happened because you've just randomized all your data labels.
You wouldn't do much science, and you wouldn't get very far that way.
So a lot of those technologies have improved to the point where you had a number of processes which are pretty inefficient and you multiplied a lot of these things together and ended up with like a very small outcome of successful cells you could actually sequence.
They've all improved to the degree where now you can actually operate at scale.
And then groups like ours have had to do a bunch of work in order to actually enable combinatorial perturbations, turning on more than just one gene at a time, which it turns out is much, much harder for the same reason we're just alluding to.
Imagine you're having trouble figuring out which one gene you put in this cell and turned on or off.
Now imagine you have to do that five times correctly in a row.
Well, if you start out with the original sort of performance of like you could detect roughly 50% of them, then the fraction of cells that would be correctly labeled is like one over two to the end where N is the number of genes you're trying to detect.
And very quickly, it's like more of your data is mislabeled that's labeled.
So there's lots of technical reasons like this that have gotten worked out over time.
And so only now are we really able to scale up where we're able to run experiments that are in the millions of cells in just a single day at, for instance, a small.
company like New Limit, there was a point even just six or seven years ago where the companies
that made these reagents were publishing the very first million cell data set just as a proof of
concept and only they could do it as the constructors of the technology. And now two scientists in
our labs can degenerate that in an afternoon. If it actually is the case that the, this is actually
very similar to the way LLM dynamics work, then once this technology is mature and you get the
gpt three equivalent of the virtual cell.
What you would expect to happen is there's many different companies that have,
you know,
are doing these cheap,
perturpsi like experiments and building their own virtual cells,
or at least a couple.
And then they're like leasing this out to other people who then have their own ideas about,
well,
we want to see if we can come up with the labels for this particular thing we care about
and test for that.
What seems like happening right now,
is at least a new limit, you're like,
we know the end use case we're going after.
It would be like if cursor or whatever is like we're going to,
in like 2018, it's like we're going to build our own LLM from scratch
so that we can enable our application rather than some foundation model company being like,
we don't care what you use it for, we're going to build this.
Does that make sense?
Like it seems like you're combining two different layers of the stack.
And it's just because nobody else is doing the other layer.
And so you're just doing both of them.
I don't know to extend this analogy maps on, but...
Yeah, yeah, maybe to play with the analogy a bit.
Imagine that, you know, you think about New Limit as an LLM company.
If I'm going to put us in the shoes of cursor, which, oh, so I wish.
Imagine we're trying to, in 2018, create cursor tap, but we're not trying to create a full LLM.
Right.
I'm not, I don't know enough about the underlying mechanics to know if that would have been feasible,
but it's a much more feasible problem than trying to create, like, their most recent cursor agent
or compete with, like, modern cloud code, right?
I think that's roughly the equivalent where the problem where we're
breaking off is a subset of the more general virtual cell problem. We're trying to predict what do
groups of transcription factors do to the age of very specific types of cells. We only work on a few
cell types at New Limit because those are the only cell types where some of the only cell types today,
we believe we can get really effective delivery of medicines. And so we think they're just more
important because we can act on them today if we solve the problem of what TFs to use, we can make
a medicine pretty quickly. So in a way, we're carving out a region of this massive parameter space
and saying, if we can learn the distribution of effects, even just in this small region,
it's going to be really effective for us, and we can make really amazing products unlike the world has ever seen.
And over time, we can expand to this more general corpus of predicting every possible gene perturbation
and every possible cell type.
And so I think that's maybe the way the analogy maps on.
But it is true that we are vertically integrating here.
We're generating our own data in a way that's proprietary.
We think we have a much, much larger data set for this particular regime than the rest of the world combined.
and that enables us to build what we think are the best models.
And in many cases, what we found is that unlike with LLMs,
where a lot of the data that was necessary to build these
was sort of a common good,
it was produced as a function of the internet,
shared across everyone, it's pretty common,
across all the domains everyone wants to use it for.
This biological data is still in its infancy.
It's like, imagine we're in like the early 1980s,
and we are just now thinking about trying to create some of the first web pages.
That's kind of the era we're in.
And so we're going after generating some of our own data in this very niche circumstance,
like building the very high-quality corpus, the Wikipedia, that you might train your, you know,
overly analogized now LLM on, and then building the first products based on that, and then expanding from there.
And so we think that's necessary because of where we are today.
There isn't this internet-like equivalent of data that everyone can go out and reap rewards from.
Interesting.
And then this is more a question about the broader pharma industry rather than just new limit,
which is that in the future, how are people going to be able to be?
going to make money if you have, you know, with the gLPs, we've got peptides from China that
are just a gray market that people can easily consume. And presumably with these future AI models,
even if you have a patent on a molecule, maybe finding an isomorphic molecule or an isomorphic
treatment is relatively easy. If you do come up with these crazy treatments and a farmer in general
is able to come up with these crazy treatments, will they be able to make money?
The gray market piece will maybe put aside and say, you know, that's a lot of, you know,
sort of a IP enforcement at a geostrategic level that I'm maybe not qualified to speak to,
but I do think it comes down to IP enforcement effectively.
I think for that gray market piece, another reason that sort of the traditional
pharmaceutical industry, I think, will still continue to reap the majority of rewards here,
is that most of the payment in the United States, which provides most of the revenue for
drug discovery in the world, goes through a payment system that is not just direct consumer.
It goes through payers.
And so if you have the opportunity to,
either like order a sketchy vial off of some website from some company in Shenzhen, or you can
go through your doctor and get a prescription with a relatively low co-pay for Tresepatide,
the real thing. I think most patients will go for Tresepatide. I think you and I probably live
in a milieu of people who are much more comfortable with ordering the vials from Shenzhen than
most people might be. But I don't consider that to be a tremendous concern writ large. I do think
the broader point of, if you have medicines with very long-term durability, how do you re-improve
burst them, or if just the benefits are very long-term and, you know, sort of accrue in the
out years.
A challenge we have in the U.S. system is that the average person churns insurers every three
to four years.
That number fluctuates around, but that's the right order of magnitude.
And that means that if, for instance, you had a medicine which dramatically reduced the cost
of all other health care incidents, but it happened exactly five years after you got dosed
with it, no insurer is technically economically incentivized to cover that.
And so I think there are a couple models here that can make sense.
One is something called pay for performance where rather than reimbursing all of the cost of the drug up front, you actually reimburse it over time.
So say you get a medicine that just makes you generically healthier and you can measure the reduced rates of heart attack and reduced rates of obesity and various other things.
And you get this one dose and it lasts for 10 years.
Each year you would pay something like a tenth of the cost of the medicine contingent on the idea that it was actually still working for you and you had some way of measuring that.
So that's a big challenge in this industry is like how would you demonstrate?
that any one of these medicines is still working for the patient.
In the few examples we have today, these are things like gene therapies, where you can just
measure the expression of the gene and like, okay, the drug is still there.
But it gets more complicated when you have some of these sort of longer term net benefits.
And the idea would be that then each insurer is incentivized to just pay for the time of coverage
that you're on their plan.
And we already have a framework for this in post-affordable Care Act in the U.S.,
where, you know, pre-existing conditions no longer really exist.
So patients are able to freely move between pairs, and you could sort of treat.
the presence of one of these therapeutics, lowering this patient's overall health care costs the same
way we treat a pre-existing condition. I think this is something that the system is still overall
figuring out. So what I'm saying here is one hypothesis about what the future might look like,
but I think there are alternative, clever approaches people might think about for reimbursement.
I also think over time we're going to move more toward a direct-to-consumer model for many of these
medicines which preserve and promote health rather than just fixing disease. You're seeing what I think
are really some of the most innovative examples of this right now.
from Lilly around the Incredit Memetics, where they actually launched Lilly Direct.
So for the first time, rather than going to a pharmacy, which interacts with a PBM,
which interacts with your primary care physician, now you can get a prescription from your doctor.
Go straight to Lilly, the source of the good stuff, and you're able to order high-quality drug from them
when not involve some intermediary compounder in the middle that might not even make your molecules properly.
And I think as these medicines develop that have actual consumer demand,
because you feel it in your daily life, you're actually seeing a benefit from it.
It's not just something that your physician is trying to get you to take, that that model will start to dominate.
And that means that this sort of like payment over time for some of these long-term benefits might be able to be abstracted away from our current payer system where it turns every few years.
And now a sort of like payment overtime plan, the same way we finance other large purchases in life seems very feasible.
The reason I'm interested in this is that health care is already 20% of GDP.
I think it's grown like notable percentages in the last.
last few years. Like, this is a fraction that is quickly growing. And most of this, I should have
the numbers, I should have looked the numbers up. But the overwhelming majority of this is going to
administering treatments that have already been invented, which is good, but nowhere near as good
as spending this enormous sum of resources towards coming up with new treatments that in the future
will improve the lives of people that will have these ailments.
I mean, one question is just how do we make it so that more,
like we're going to spend 20% of GDP on healthcare,
it should at least go towards coming up with new treatments
rather than just like paying nurses and doctors
to keep administering stuff that kind of works now.
And two, if the cost of drugs ends up being,
at least from the perspective of the payer, ends up being,
you need a doctor to give you some scan,
and before I can write you a prescription,
and then they need to administer it,
and they need to make sure that you're doing okay, et cetera, et cetera,
then even if for you to manufacture this therapy
might cost, you know, tens of dollars per patient,
for the health care system overall,
it might be tens of thousands of dollars per patient.
Actually, I'm curious if you agree with those orders of magnitude.
I think that's correct.
So I think the stat is something like drugs are roughly 7% of health care spend.
I could be a little bit wrong on that, but the Oom is right.
Right.
So basically, even if we invent de-aging technology, or especially if we invent de-aging
technology, how should we think about the way it will net out in the fraction of GDP
that we have to spend on health care?
Will that increase because now people just had to go, everybody's lining up at the doctor's
office to get a prescription and you got to go into the clinic every week?
Or will that decrease because the other downstream ailments from aging aren't coming about?
I think the latter is much more likely to be the case.
So just there's like some quick heuristics.
Part of the reason, I think, there are many reasons that health care costs so much in the U.S.
One of them is something like Baumels Cost Disease, which is, you know, very unrelated to pharmaceutical discoveries, but, you know, is something that we will have to solve in the system.
Part of it's like the disintermediation of the actual customer and the actual provider.
And these are things that biotech probably isn't going to be able to solve as an industry alone.
That's probably a larger economic problem.
But when you think about how will this affect sort of the total amount of health care that will need to be delivered, if you have more of these, what I like,
like to think of as sort of like medicines for everyone, medicines that keep you healthier
longer, rather than medicines that only fix a problem once you're already very sick.
I think you actually avoid a lot of the types of administration costs, not just administration
like admins at hospitals, but the cost of administering existing medicines and therapies
to you going down.
Once doubt on why I think that's true, something like a third of all Medicare costs are
spent in the final year of life, which is shocking when you realize that the average person
on Medicare is, I don't know the exact number, but probably a decade plus covered by it.
And so there's an incredible concentration of the actual expenses once someone is already terribly sick.
So helping prevent you from ever having to access the intensive health care system,
meaning something like an inpatient hospital visit,
if you can prevent even just a couple of those visits over a long period of someone's life
with a medicine like an incrementic, like a reprogrammedic,
like a reprogramming medicine that keeps your liver, your immune system younger.
I think on net that actually starts to drive health care spend down
because you're sort of shifting some of that burden from the administration system
to the pharmaceutical system.
And the pharmaceutical system is the only piece of healthcare
where technology has made us more efficient.
As drugs go generic, actually the cost of administering
a given unit of health care is going down.
And the grand social contract is that they eventually go generic.
That's the way our current IP system works.
So I think, you know, if you were to get the question of like,
when would you like to be born as a patient?
You always want to be born as close to today as possible.
Because for a given unit in terms of pharmaceuticals,
for a given dollar unit of expense,
you can access more pharmaceuticals.
technology today than has ever been possible in history, even as healthcare costs everywhere
else in the system have shot up. And so pharmaceuticals are the one place where because of the
mechanism of things going generic and the fact that our old medicines continue to work and persist
over time, you're actually able to get more benefit per dollar.
Okay, final question. So pharma is spending billions of dollars per new drug it comes up with.
And surely they have noticed that the lack of some general platform or some general model has made it
more and more expensive and difficult to come up with new drugs.
And you say perturpsic has existed in 2016.
And as far as you can tell, you have the most amount of that kind of data, which
we could feed into a general perverse model.
So what is like the traditional pharma industry on the other coast up to?
If I went to the head of R&D at Eli Lilly or Pfizer or something, do they think that this is
like they have some different idea of the platform that needs to be built or they're like,
No, we're all in on the bespoke game, bespoke for each drug.
Yeah.
So I'll just correct one thing to make sure I'm not overstating.
We have way more data for a particular, the limited sub-problem we're tackling,
which is overexpressing TFs in combinations.
I think we have way more data than anyone on full stop there.
But even more specifically, I feel very, very confident we have more data than anyone
looking at trying to reprogram a cell's age.
And so that's where we're way larger than the rest of the world.
When we think about just general single-cell perturbation data, various flavors,
then I think there are other groups
which have very large data sets as well.
We're still differentiated because we do everything
in human cells with the right number of chromosomes,
whereas it's very common to do things
in like cancer cell lines which have 200 chromosomes.
So like, is that human?
I don't know.
Depends on how you actually quantify these things.
So then if you're going to go ask the leaders
of some of the traditional pharmaceutical firms,
like are you trying to build a general model?
I think some of them have in-house like AI innovation teams
that are working on this.
They're really smart people there.
But I think it's a general trend.
I think you can think about
some of the modern pharma is a bit like venture capital firms where they've over time externalized
a lot of their R&D, and so they often have divisions of external innovation, which you can
kind of think of as like the corp dev version of venture capital.
They work with the biotech ecosystem to have a number of smaller nimble firms explore
really pioneer ideas, like the types of things we're working on, and then eventually
partner with them once they have assets that are later downstream.
And so I think the industry has sort of bifurcated where,
smaller biotechs like ours take on most of the early discovery.
The stat I'm going to get a little bit wrong from memory,
but it's something like 70% of molecules approved in a given year
come from originally small biotechs rather than large pharma's,
even though you look at the actual like dollars of R&D spend on the balance sheet
and it's like largely in big pharma.
Another level of a disintermediation.
Another disintermediation.
And part of the reason for that difference in cost is they're running most of the trials.
Most people partner with pharma to run trials where a lot of the costs are incurred.
So it's not just that like, oh, all large farmers are horribly and efficient.
or anything like that.
And so I think some of them would tell you, like, these ideas are really exciting.
We have an external innovation department.
If we don't have one internally or we're collaborating with a startup that's doing something similar.
And so you can kind of think of the market structure, like you have a bunch of biotechs,
which are kind of like the startups in your ecosystem.
And then they're working with something like an oligopsony of pharma's where it's like
a limited number of buyers for this particular type of product, which is a therapeutic
asset that is ready for a phase one, phase two trial.
And so there's a very liquid market for the phase one, phase two assets.
And that's the point at which these partnerships can come to fruition.
And so I think that's what a lot of those leaders would say.
Now, some of them, by contrast, for instance, Roche bought Genentech back in 2013.
R&D is currently run by a Verveerve, one of the scientists I admire most in the world,
who's like a thousand times smarter than me.
And, you know, she's one of the people who invented this technology and has a big group
doing this sort of work there.
So it's not like every pharma takes that view, but I think that's sort of a general trend.
Interesting.
Full disclosure, I am a.
small angel investor in you a little bit now, but that did not influence the decision to have Jacob
on. This is super fascinating. Thanks so much for coming on the podcast. Awesome. Thanks for cash.
I hope you enjoyed this episode. If you did, the most helpful thing you can do is just share it
with other people who you think might enjoy it. Send it to your friends, your group chats, Twitter,
wherever else. Just let the word go forth. Other than that, super helpful if you can subscribe on
YouTube and leave a five-star review on Apple Podcasts and Spotify. Check out the sponsors in the
description below. If you want to sponsor a future episode, go to thwarcash.com
slash advertise. Thank you for tuning in. I'll see you on the next one.
