a16z Podcast - When AI and Genomics Collide
Episode Date: October 3, 2023Today’s episode continues our coverage from a16z’s recent AI Revolution event. You’ll hear a16z Bio & Health GP Vijay Pande speak with Daphne Koller about the fascinating convergence of machine ...learning and genomics – two industries that have benefitted decades of investment and progress – which are now colliding head on.Daphne is a prominent innovator at this intersection, as a long-time professor in computer science at Stanford and co-founder of Coursera, who has decided to step back into the arena with her company Insitro. In fact, Insitro is a blend of in silico and in virto!If you’d like to access all the talks from AI Revolution in full, visit a16z.com/airevolution. Resources:Find Daphne on Twitter: https://twitter.com/DaphneKollerFind Vijay on Twitter: https://twitter.com/vijaypandeFind Insitro on Twitter: https://twitter.com/insitro Stay Updated: Find a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.
Transcript
Discussion (0)
We built a language model for biology.
The big problem in biology is that biology is hard.
Biology is really hard.
Biology is very hard.
So why are you doing it?
What is the big win?
I mean, all of it is an AI-enabled architecture.
Every part of our technology stat is intrinsically AI-enabled.
Where do you think this confluence between AI and life sciences goes from here?
There are so very few people who actually have the language of both disciplines
that are able to bring them together.
If you are a listener to this podcast,
you're probably familiar with Moore's Law, where the number of transistors on an integrated circuit
has doubled every two years. But what you might not know is the cost of sequencing the genome
has fallen even faster. Since the Human Genome Project in 2003, the cost has fallen from about
$1 billion to less than $1,000 today. And now in 2023, these two trends meet, and one of the people
at this fascinating intersection is Daphne Collar.
Daphne, longtime professor in computer science at Stanford
and co-founder of Coursera,
has decided to step back into the arena
with her company in CITRO,
right at this intersection of computation and biology.
In fact, the name is even a nod to this intersection
as a blend of in silico and in vitro.
And in today's episode,
you'll get to hear A16Z Bio and Health General Partner,
VJ Ponday,
speak with Daphne about the why now,
but also how machine learning is fundamentally changing
our ability to understand genetics.
Without machine learning, without AI,
the space would be so complex and so high-dimensional
that you couldn't even make sense of it
far less bridge between those two different worlds.
So what can this unlock?
We can now, finally, for the first time,
measure biology at scale.
And this is like a truly,
engineered approach to discovery.
Listen in to get a glimpse into what may be a new era of digital biology,
and how this AI wave is really not just a moment of opportunity in bits, but also atoms.
The next frontier of the impact that AI can have is when AI starts to touch the physical world.
I think this convergence is a moment in time for us to make a really big difference using tools that
exist today that did not exist even five years ago.
This episode also continues our coverage of A6 and Z's exclusive AI Revolution event
from just a few weeks ago, which housed some of the most influential builders across
the ecosystem, including the founders of OpenAI, Anthropic, Character AI, Reblocks, and more.
So, be sure to check out the full package, including all of the talks in full at A6CZ.com
slash AI Revolution.
As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments, please see A16c.com slash disclosures.
Daphne is like the OG's OG in AI.
She was a pioneer at Stanford in different areas of AI, especially in PGMs.
She left Stanford to co-found Coursera with Andrew Ng,
and actually is now the founder, CEO of Enstitro,
a tech bio company using AI to develop drugs in life sciences.
So Daphne, given all the things you could be doing, why life sciences?
It's one of the really hard and really important problems.
And there is very few things that are as challenging, as exciting, as intervening in a safe and
effective way in human health.
And so it's just a thing that absolutely needs to be done if we are going to use AI for good,
which I think is one of the things that I think I at least really strive to do.
The second part of the answer is why now?
And what brought me back to this field back in 2016 post-Coursera was that.
the realization that we can now finally, for the first time, measure biology at scale, both at
the cellular level, sometimes at subcellular level, and at the organism level via ways of
quantitating human biology. And that gives us, for the very first time, the ability to deploy
machine learning in ways where it is truly meaningful to do that, because the data sets are
large enough for really interesting machine learning methods to be deployed. And the third,
part of the answer is, well, okay, but why me? And I am a big believer in leverage. That is
places where you can have a disproportionately large impact. And because of the facts that I had
spent a large part of my Stanford career working in these two spaces simultaneously, core machine
learning on the one hand and machine learning and service of biomedical data on the other,
I actually have the ability to sort of bridge the chasm between these two very disparate
disciplines. And when I was leaving Coursera in 2016 and I looked around me and I saw that machine
learning was changing the world, it wasn't having much of an impact in the life sciences. And I believe
one of the main reasons for that is because there are so very few people who actually have
the language of both disciplines that are able to bring them together. So I felt like I could
have impact in AI across many things, but sure, I could have disproportionate impact.
Well, you know, you spoke about the why now. What's your take on AI for life sciences? What's the
why now there? And what's different now than even what we could do even just, let's say, five years
ago? So I think it comes back to this ability to collect, but even more than collect and generate
data at scale. So one of the things that we have at Encitro that is truly unique is we have a
data factory. We have put together the tools that have been developed by people who are taking
pluripotent stem cells, which are cells from you or me or anyone in this audience, and turning them
into this pluripotent status which can make a Daphne neuron in a dish or a Daphne
hepatocyte.
And vision is going to be different than the VJ neuron and the VJ hepatocyte because we have
different genetics and that's going to manifest in how these cells behave and how these cells
look in different measurements.
We can engineer those to introduce a disease-causing mutation and ask what does that disease
causing mutation do to a Daphne neuron versus what does it do to a VJ neuron and what does this
mutation do versus that mutation.
Just a quick note for the audience.
pluripotent stem cells are unique cells that are able to undergo self-renewal.
The term pluripotency in particular speaks to these cells being able to give rise to all cell types,
including the ectoderm like the skin or nervous system, the endoderm like the liver or respiratory tract,
and the mesoderm like bone or muscle.
All right, back to Daphne speaking to just how special this application is.
So we're able to kind of do data generation on spec,
and that is a truly unique capability,
which frankly is not that easy to do
even in other areas where AI is being deployed.
You don't get to make your own data in many cases,
but here we do,
and that creates both really important discovery opportunities
for life sciences, but also really cool and interesting machine learning problems.
Well, maybe you could dive a little deeper and give an example,
so like your paper on the posh approach, I think,
that came out on an archive.
Could you double-click on that, tell people what you did there?
Especially, like, why is AI and life science is a big deal?
What could you hope to get?
So, first of all, let me tell you a little bit about that platform,
which is called posh or pooled optical screening in humans.
You take a bunch of cells and you put them with a pool of CRISPR guides that edit them,
and each cell gets a different guide.
So now you have a bunch of cells, each with a dynamically diverse mutation.
And now they're all sitting there in a pool.
You can measure them with a microscope.
You can measure them as they move around and do their stuff.
you can basically fix them, and you sequence the barcode that came with a guide.
So now you can say, oh, this cell that got this guide behaved this way, and this other cell
behaved that way.
And I can tell you that one of the really challenging things about cells is because they're
live, if you put different cells in different wells, then they each have a slightly different
environment, and you get subtle differences, and it's really hard to reconcile.
When they're all in a pool, you eliminate all of those artifacts, and all of a sudden you have
the ability to measure a genome-wide.
CRISPR screen, basically, so 20,000 genes in the genome, all modifying the same cellular
background in the same dish with a different genetic intervention. And you're measuring that
in a genome-wide scale in like 10 or 12 plates in two weeks. Now, imagine doing that rinse-repeat
and doing genome-wide scale on this genetic background or in this cell type. And so you can really
start to decipher the genotype phenotype connection and the effect in which individual genetics
makes a difference on cellular phenotypes, which we then translate to what we believe they
will have in terms of clinical impact. And that is the beginning of an understanding of what it is
that we want to modify in order to have meaningful therapeutic interventions. And this is like
a truly engineered approach to discovery. Another quick reminder for those of you, like me,
where biology class was a little further back than you'd like to admit.
Genotype refers to the genetic makeup of an organism,
while phenotype refers to the observable characteristics,
like hair color or blood type that are yielded from a genotype,
but also an organism's environment.
This is really important because phenotype or the observable impact
isn't only due to genetics,
as sometimes even the same genes can produce different results.
One such example of this is in honey,
bees, in which colonies will have the same genome, but very significantly in phenotype,
like size, shape, or behavior of queen bees. And as Daphne said, new technologies are bringing us
closer to understanding this very important connection. Well, the biology part is really critical
because now you get the data, and we all know how important that is. But I think one of the things
that's really I found intriguing is the creation of a latent space for human biology, and especially
be able to tell the difference between disease and non-disease or even different disease phenotype.
So how does that come about? And especially, you know, how is AI driving that?
So actually, I'm going to go back one step further because he said, of course, one of the things
we need to do is get the data. And I should have mentioned that it's impossible to run this
instrument without AI being built into it because you can't even segment the cells.
You can't call the barcodes. I mean, all of it is an AI-enabled architecture.
Every part of our technology stat is intrinsically AI-enabled. But then, to your point,
day. Now you have a whole bunch of cellular images and what do you do with them? And so the first thing
we do is we built this latent space. We built a language model for biology. Now everyone's an expert
to language models. You just have to explain this to people. It's like, oh, language of biology.
No one knew what I was talking about. But now it's like, I'm just saying, look, it's just like
GPT, but for cells. So we have the language of, you know, cells and what cells look like
or the transcriptional or gene expression profiles of cells. And you measure hundreds of millions of cells
in different states, and now with a much more limited amount of data,
because we have this latent space,
than just like the large language models for natural language,
with a small amount of data, you can start asking,
okay, how does disease move you, like a disease-causing gene from one place to the other?
How does a treatment move you hopefully back in the opposite direction,
from the disease state back to the healthy state?
And that's super powerful, kind of like other language models, you have this.
It keeps getting better, the more data you feed it.
And let me just say, this is not just for cellular data,
the other source of data that we use is clinical data.
So we do the same thing with histopathology.
There's so much more in histopathology than your pathologist typically looks at.
In MRI data, your radiologist doesn't see more than like a small percentage
of what's there in your radiology images, but also not just imaging.
There's also other modalities where there's an equal amount of information left on the table.
And over time, we're learning these languages,
of different biological modalities
and the ability to translate between them.
I think this concept of foundation model for biology
is particularly exciting because 10 years ago,
you could have ML that was predictive.
You just needed maybe 100 actives.
And the problem is like if you have 100 cases,
examples of a drug that works,
you don't need to design a drug.
And so these low shot, zero shot approaches
that come from a foundation model are really night and day.
So how far this is go?
I mean, the big problem in biology is that biology is hard.
Biology is really hard.
biology is very hard. So why are you doing it? What is the big win? Where does this go, let's say,
by the end of the decade? Like, what could you hope to do that we couldn't do before?
We want to come up with a very, almost like, systematic recipe for how do you go from a decision
that I want to work on ALS or I want to work on fatty liver disease through a sequence of steps
towards something that results in a meaningful intervention in the right patient population.
The hope is that by the end of this decade, we want to do that.
have built this process. We will have run through it a number of times. We will have delivered
some medicines to patients in our first tranche of indications. But then we will have learned
enough from that so that we can now say, okay, and here's how we're going to do it here and here, and
here. And because it's not only machine learning that moves forward over time, it's also the
biological tools that we're relying on. I mean, used to be that there wasn't any CRISPR. There was
just S-I-R-R-N-A, actually, there wasn't even that. And then there's like CRISPR base editing, and now
There's Christopher Prime that replaces entire regions of the genome.
So the tools that we're building on also get better and better over time,
which unlocks more and more diseases that we could tackle and meaningful.
Well, let's step back for a second because I think this may not be clear for everyone,
like why biology is so hard.
And one of the biggest reasons is that if we can do tons of experiments on mice,
you know, I have to feel like it's a great time to be a rich mouse.
You could be cured of any disease, right?
Like all these diseases can be cured in mice.
But, you know, it's obviously unethical to experiment on people.
And that's one of the big reasons why trials fail, right?
Because when you go into a clinical trial, you spent all this money to get there,
you're spending all this money hundreds of millions of dollars in the trial,
and turns out mice are different than people, and it fails.
Yeah.
So, like, how can AI help that?
So first of all, and this notion of, you know, we can cure lots of mice
is something that really drove our discovery strategy at EnCitro,
which is all of our work is done in human and human-derived systems
that incorporates at least some subset of human cells working together.
So that's one piece, and the nice thing about it is that it allows you to intervene in those systems
and ask the what-if questions, the counterfactuals.
Like, what if I had this person's biology, but in a world where this gene was inactive versus active
or the other way around.
So that's great, but obviously you want to cure people, not cells or even organoids.
And so the other source of data that we bring in is data from people, from clinical.
records. And what we end up doing is kind of bridging between the two using machine learning.
So using machine learning on the cellular data, using machine learning on the human data,
bridging between those in representation space, but also in genetic space.
Because genetics is kind of like this thread that ties the two together.
And without machine learning, without AI, the space would be so complex and so high
dimensional that you couldn't even make sense of it far less bridge between those two to
different worlds. Yeah. Well, that makes sense. I'm curious to change years a bit and talk a bit about
company building. And one of the interesting things that you've done is that you've brought
together people that are biology experts with people that are MLAI experts. And how do you
build that culture? What does that look like, especially since they're from fairly different parts
of the universe? So, first of all, it may not have been obvious to everybody, but the company
name, in citro is actually the blend of in silico and in vitro, in silico being in the computer
and in vitro being in the lab,
those elements of bringing those two strands together
are so deeply woven even into our logo.
And how do you build that is really hard
because you take your average machine learning scientists
and your average life scientist,
even if they're very well-intentioned,
you put them into the room together,
they might as well be talking Thai and Swahili to each other.
The languages are different,
the ways in which you think are totally different.
So how do you create a shared language, a shared vision?
And so there's a few approaches that we use.
First of all, we hire some number of people.
You can't get enough of them, unfortunately, who are in the middle,
who are able to be translators and talk to both sides and kind of bring them together.
And then I think the other really important part is that you create a culture
and you hire very rigorously to that culture of people who are genuinely interested in engaging with, you know, the other side.
And we have a list of company values.
And the final value, which is one that I hold particularly dear,
it's last not because it's least importance because they're ordered from what we do to how we do it,
which is that we engage with each other openly, constructively, and with respect.
Openly means an openness to asking really naive questions when you don't understand.
And to accepting really naive suggestions from somebody else,
because sometimes the best ideas come from an orthogonal mindset.
It's something that I experienced even as a kid.
So my parents are scientists.
And my mom warned me, no matter what I do, even though this was what I was doing as a kid, because of programming as a kid, I should not get into programming and computer science.
No one's ever going to make money by selling software.
So I think maybe then, you know, especially for this audience here that are coming from the AI side, especially as AI gets into areas that are sort of not just the world of bits, but in the world of Adams.
Yeah.
Any advice for how to bridge those gaps?
I mean, first of all, having a deep respect for Adams is really.
important. Yeah, I mean, my closest friends are atoms. I think we're all atoms,
but I think having an appreciation for the complexity of atoms and the facts that especially
when your atoms are part of live systems, they behave in unexpected, unpredictable,
idiosyncratic ways that sometimes cause a lot of pain. And I can tell you that when you do
biological experiments, one of the strongest signals when you apply machine learning to it is
what was the technician who actually did the experiments?
You could read that very clearly off the cells
because they behave a little bit differently.
They pipette a little bit differently.
They treat the cells a little bit differently.
It's amazing how hard that is to clean that up,
which is one of the reasons why we spend so much of our time building robots
because they do the same thing over and over again.
So I think having a lot of respect for atoms,
but also I would say an appreciation
for the fact that the next frontier of the impact that AI can have
is when AI starts to touch the physical world.
And we've all seen just how much harder that is.
We've all seen how hard it is, astonishingly,
to build a self-driving car compared to building a chatbot, right?
So having an appreciation for that complexity,
but also an appreciation for the magnitude of the impact
if you can actually nail it.
So one last topic, then we'll go into closing,
but you're talking about life sciences in terms of health care and drug design,
but there's a lot more to biology than just drugs, right?
Where do you think this confluence between AI and life sciences goes from here?
So I actually think that there is this incredible opportunity at this point,
at this intersection between the two fields.
And I think about it from a little bit of a historical perspective of think back on the history of science.
And at certain times in our history, there have been,
eras where a particular scientific discipline has made incredible amounts of progress in a relatively
short amount of time because there was kind of like a click where we started to see the world
in a different way or there was a tool that wasn't available before. So if you think back to,
you know, the late 1800s, that was chemistry where we suddenly realized, you know, we couldn't
really turn that into gold and there was this thing called the periodic table and there were
electrons and it really shifted chemistry. And then in the early 1900s, obviously that discipline
was physics and the connection between energy and matter and between space and time completely
shifted our understanding of the universe. In 1950s, that discipline was computing, where we
get these machines that perform calculations that up until that point only a human was able to
perform. And then in 1990s, there was this interesting bifurcation. On the one side, there was
data science that drew on computers, but also had elements of neuroscience and optimization and
statistics and ultimately gave us modern day machine learning and AI. And then the other side was
what I think of as quantitative biology, which was the first time where we actually started
to measure biology in a scale that was more than like trapped three genes across an experiment that
took five years. And that was the first microwave data and the first human genome and so on and so forth.
And I think this is the time when those last two disciplines are actually going to merge.
And they're giving us an era of what I think of as digital biology,
which is the ability to measure biology at unprecedented stability in scale,
interprets the unbelievable masses of data,
different biological scales and different systems using the tools of machine learning data science.
And then bring that back to engineer biology using tools like CRISPR,
genome editing and so on, so that we can make biology do things that it would otherwise not want to do.
Well, like what?
So I think there's obviously, as we said, applications in human health, but I think there's applications in agriculture.
I don't think we need to tell anybody anymore, although there's still some people who might need
to hear it about the impact of climate change on our world and the fact that we need to have
props that are much more resistant to drought and severe weather.
And to feed 10 billion people?
To feed 10 billion people.
I think there is opportunities in the environment to maybe do better carbon sequestration using plants or algae or who knows what.
I actually wish I knew more about that because that is my alternate life would have been to do that.
Well, there's still time, right?
Well, there's still time.
So are you funding me for that?
You come up with the deck, we'll talk.
Okay.
Yeah, yeah.
So there's that.
I think there's, you know, bio materials and so on.
There's so many opportunities at this intersection
that I would encourage any of you in this audience
who are looking for something truly aspirational and exciting to do.
I think this convergence is a moment in time
for us to make a really big difference in the world
that we live in using tools that exist today
that did not exist even five years ago.
Yeah, with that, I think that's the opportunity at hand.
Maybe we'll wrap up there.
Let's thank Definitely one more time.
If you like this episode, if you made it this far, help us grow the show.
Share with a friend, or if you're feeling really ambitious, you can leave us a review at rate
thispodcast.com slash A16c.
You know, candidly, producing a podcast can sometimes feel like you're just talking into a void.
And so if you did like this episode, if you liked any of our episodes, please let us know.
I'll see you next time.
Thank you.