Behind The Tech with Kevin Scott - Daphne Koller, PhD: CEO and founder of insitro
Episode Date: August 25, 2020Former Stanford University professor Daphne Koller's cutting edge work combines machine learning and biology to transform pharmaceutical drug development. Her work at insitro is part of an emerging fi...eld - digital biology - that is having an impact on multiple fields, including biomaterial design, agriculture, and human health. This MacArthur Fellow was also named one of TIME Magazine’s 100 most influential people. Click here for transcript of this episode. Kevin Scott
Transcript
Discussion (0)
I think one of the very, very thin silver linings around this very dire situation that we find ourselves in is that there is, I hope, a growing appreciation among the general public for what science is able to do for us today and how much of that ability rests on decades of basic
science work by many, many people.
Hi, everyone. Welcome to Behind the Tech. I'm your host, Kevin Scott, Chief Technology Officer for Microsoft.
In this podcast, we're going to get behind the tech.
We'll talk with some of the people who made our modern tech world possible
and understand what motivated them to create what they did.
So join me to maybe learn a little bit about the history of computing
and get a few behind-the-scenes insights into what's happening today.
Stick around.
Hello, welcome to Behind the Tech.
I'm Christina Warren, Senior Cloud Advocate at Microsoft.
And I'm Kevin Scott.
Today, our guest is Daphne Kohler, who's the CEO and founder of InCitro, which is a company that works at the conversions of biology and
machine learning. Yeah, I'm guessing everyone has a newfound appreciation for how important
biomedicine and biotechnology is at this time with the COVID-19 pandemic still raging around us.
And I think Daphne is doing some of the most interesting
work right now in the field that is, as we've seen with several of our other guests, like this
really powerful combination of biology and machine learning and high-performance computing and
laboratory automation. Like what they're doing is really wonderful work. And
Daphne is just sort of a brilliant computer scientist and has had many, many different
chapters in her career that are inspirational. That's so true, Kevin. I cannot wait to hear
this conversation. Yeah, so let's get started.
Our guest on the show today is Daphne Kohler. Daphne is a machine learning pioneer.
She's CEO and founder of In-Citro, a company that applies machine learning to pharmaceutical development.
Daphne was a computer science professor at Stanford, co-founder and co-CEO of Coursera, and is a MacArthur Fellow.
She was also named one of Time Magazine's 100 Most Influential People.
And I have had a copy of her book, Probabilistic Graphical Models, on my bookshelf for more years than I know.
So welcome to the show, Daphne.
Thank you. Glad to be here.
And I would be very impressed if you actually read the book. I read the book many, many years ago when I was at Google,
where your work was very influential in how we thought about doing some of our very early
and what we thought at the time was very sophisticated machine learning work in the ad system. So I was, I do have to admit at the time that I picked the book up
the first time I was a compiler and programming language person. So it was, it was not an easy
read. It is definitely a bit of a tome. I think at this point, it serves well as a doorstop if you need one. It's very, very big.
So, in any case, I'm so delighted you could be with us here today. So, we typically start off these conversations with a bit about your background. So, I'm curious how it is that you got interested in science and
technology in the first place. So I was interested in science ever since I was a kid. My family had
this series of books that were the time-life series on everything from asteroids to how plants grow, and I just used to sit and read them for fun.
I didn't get interested in technology until my freshman year of high school when my parents came
here. My dad came on sabbatical to Stanford, and for the first time I was in a school where there
was a computer center. This was a long time ago, so I'm going to date myself at this point but these were TRS-80
computers that were time shared across two people and I got to learn to program and I found it
an amazing experience where you could actually tell a computer what to do and it did it which
didn't work for me in any other context but but here it actually did. And so I got
interested in computing at that point. And that I think led to basically my college choices and
so on. And ultimately, I think the combining the two in the work that I do now, but that's many
years later. So on those TRS-80s, it sounds like your experience is very similar to mine, actually.
In high school, I did a bunch of coding on TRS-80s.
So was your language of choice there the basic interpreter, or were you using something else?
No, at that point, it was basic.
This was the only language that was available when I was here in high school.
But I very quickly migrated beyond that to Pascal and then C.
That's really interesting. That is almost exactly the same language, although there
was some assembly language in there because I wanted to code games and you had to supplement
the basic with a little bit of assembly language if you wanted to make things move around on the screen.
So, you know, this thing that you said about being attracted to programming as a kid because the computer would listen to you,
like I think is very interesting.
It is one of those things that I think can give kids agency. And, you know, I know
that you, you know, both as an educator at Stanford and as one of the co-founders of Coursera, you've
thought a lot about how to educate both kids and adults. How important do you think that sense of agency is in getting kids interested in computing?
I think it's very important.
Kids, I think it's difficult for us to appreciate as adults just how powerless most kids feel.
Certainly the ones from less advantaged backgrounds but even the others and I think
giving them an avenue where they can really dictate if you will what happens is super exciting for
them and I think we are not giving them enough of that in how we currently teach technology because
we've moved far away from the programming
and how one teaches computers in most schools.
And if you actually came back to that and say,
hey, look what you can build, and it actually works,
I think it's an incredible feeling of empowerment for kids.
Yeah, one of the things that I've struggled with
with my own kids is trying to get them interested in programming.
So I'm not trying to force them to learn anything that they don't want to.
Like we try to expose them to a bunch of things.
And it took a while to figure out how to find an entry point into coding that was interesting for them. And the thing for my son
was Roblox, which is this game that he plays on his tablet obsessively. And as soon as he figured
out that there was a way for him to create his own stages in Roblox, that was the thing that
enticed him to want to program. I think it's become harder to get kids interested in programming because
the programs that are already out there are really sophisticated and fancy. And what kids can create
is always going to pale by comparison to what is already out there, which is not a problem that you
and I had when we were starting with computers. you didn't have a tablet on which you could play amazing games.
And so what we created seemed kind of cool.
And now when kids create, it seems not quite as cool as the games they have on their phone.
And so the question is, is there a way in which we can give kids that same sense of excitement
about what they're creating so that it does seem cool and interesting.
And I don't think we've paid enough attention to that.
Yeah.
And it's interesting that you bring that up because we have talked with a bunch of other
guests on this podcast where it's also true that the programming tools that you have available
to you now are vastly more powerful than the ones that we had when we were first learning to program.
But it could very well be, and I 100% agree with what you're saying, that the gap between the sophistication of the software has maybe grown even further apart from the power of the tools for entry-level, for kids.
Yeah, no, that's right.
And my daughter, she was kind of interested
in machine learning for a while,
but so I said, well, why don't you try your hand
at one of those Kaggle competitions?
And the problem is that the Kaggle competitions,
they're full of really sophisticated,
top-notch programmers looking
to build a reputation so that they can go get jobs because of their machine learning street cred.
And kids like my daughter have no chance of even getting into the top 100. So it was kind of a bit
of a demoralizing experience in the sense of nothing that she could do whatever rank up there.
And so I'm not sure it ended up serving a positive purpose.
And so I wonder if it would make sense to have a Kaggle for Kids or something that would let
kids compete in a playing field that was more even and would get them excited about going on
to the next level. I think that's a fantastic idea. I mean, and we understand how to do this with sports, right? So like there are all these sports leagues and sport opportunities for kids where you can get them into a team where they can learn the sport and get all of this physical activity, but they're not like completely and utterly outmatched either with their teammates or the teams that they're playing.
So it just seems perfectly reasonable to me that we could figure out how to do this with some of these coding competitions.
You would think.
I mean, if you weren't already busy founding, running a company, I'd say that sounds like a good thing to go do.
This would be my second or third alter ego if I were able to wrangle that.
Yeah, indeed. So you learned a program when you were in high school. And when you went to college,
did you choose computer science? So I actually had an interesting early career in that regard because I actually was young when I was doing high school.
And I then started, when I came back from the United States after that sabbatical, I actually started college in parallel with high school because I'd always found high school to be not as inspiring as I would have hoped.
And I found a lot more flexibility in college curriculum so I
started to study math and computer science while I was still finishing up high school
and yeah I think it was the timeliness of that and that I had just come back from the United
States where I'd learned to program probably influenced my career choice and who knows if
I'd waited three or four or five years
like most kids do, then maybe I would have picked something else. But the nice thing that I found
about computer science and even more so over time is that it's actually an entry point into multiple
other fields because especially today, but even back then, most fields can benefit from computational methods and using computer science, whether it's algorithmic thinking or now even more so machine learning.
All of them find that technology a really useful and often very divergent way of approaching the field. And so that allowed me in my career to touch on
so many different areas from things that are more core tech, like robots and computer vision,
to things that are a little bit more distal, but at least today still considered more core,
like natural language processing, to things that are a little bit less viewed as part of core computer science.
I did a lot of game theory, for instance, and economics early in my career.
And now I'm doing a ton of science and medicine.
And it's not that I've become a biologist.
I am still a computer scientist, but the tools are just so useful in all these different disciplines that as a computer scientist, I'm not only able to do interesting biology.
I'm able to do it in a way that is often very different to how someone who was either as a high school student or when you were in college, that you did that wasn't the core CS stuff, like operating systems, compilers, algorithms, data structures, where you sort of realized, like, oh, wow, this computer science stuff that I've learned is like a superpower that lets me do a whole bunch
of things? So I think the first one was really the integration between game theory and computer
science and the meeting point in distributed systems. So that was actually what my master's thesis was about was back when I started my PhD, where
I was at that point trying to explore it from both sides. Can an understanding of
the multi-agent incentive function, if you will, from the game theory perspective,
help us build better distributed systems? also conversely if game theory was a
really interesting framework for decision making can we use the tools the algorithmic tools that
computer science gave us to help us find better solutions to game theoretic problems that were
not even necessarily within the scope of computer science. So can we help people make better decisions in the multi-agent setting by turning it into
a computational problem so it wouldn't be this kind of bespoke, somewhat obscure mathematical
analysis that only game theorists could do, but actually a tool that's useful in decision
making?
And so that was kind of the entry point for me that went actually from decision making in multi-agent systems to decision making in single agent systems to modeling of the world that would enable decision making to then learning those models from data, which is what took me to machine learning.
So that was actually that trajectory. And did you have anyone
or like any interesting way
that you were getting into these tangential fields?
So like game theory, for instance,
like was this something where
you had an influential mentor
who put you onto it? Was this you just independently getting curious
and reading a whole bunch of stuff? What's your approach to learning these disparate things?
I wish I could tell you that it was a systematic thorough exploration where I
took a broad perspective and tried to figure out what was interesting and most useful.
Often it's a bit of serendipity and just affinity to a particular space and maybe just a sense of
there's something here that could be exciting. On the game theory side, I just happened to take
as part of my undergraduate degree, which was a dual degree in math and computer science, there was a game theory class. And so I took that and I got really intrigued by the truly elegant
mathematics that underlie it. And I said, wow, this could be a really cool way of thinking about
interactions in a computer system. So that was purely serendipity. My move into biology was much later. Interestingly, my father was a biologist,
and so I had always actually steered away from biology, partly, I think, because like most kids,
you don't want to do what your parents do. But also because at that time time when I did take a biology class, it was incredibly descriptive.
It was like a catalog of, you know,
obscure Latin names of plants or pieces of cells.
And it was all about memorizing that this does this to that.
And I was just completely uninterested in doing that because it seems like
there were very few principles in play. It was all about the details and I'm not good at details. So, um, especially
memorizing them. So I didn't really get into that and, uh, did my entire, both high school and
undergraduate career focusing on things that were much more interpretable in terms of
principles and systems, so math and physics and computer science. And the reason I got into biology
and medicine was actually when I came back to Stanford as a faculty member and started to do machine learning and realized just how
boring and uninspiring the data sets that we machine learning people had to work with at the
time that machine learning was getting off the ground. I mean, one of the flagship data sets,
was something called the 20 news groups, which is exactly what it sounds like. It's articles from 20 very boring news groups,
and you had to classify which news article came from which group. And it was not interesting
technically, and it certainly wasn't very aspirational. And so I started to look around
what other interesting data sets were around. And specifically, my focus at the time was on data sets that were more richly
structured, relational data sets, if you will, where there's multiple types of entities, multiple
types of relationships, and looking for data sets that had those characteristics that were available.
And most of those were trapped behind the doors of companies that
weren't very excited about making those available to outside researchers. And biology at that point
was like, oh, well, look, there's genes and cells and proteins and people. And I actually started
early work on, interestingly enough today, on epidemiology at that point of tuberculosis and
tracking infection chains and figuring out if you could sort of pinpoint where an infection started,
something that seems very timely today. But at that point, the data sets there were also pretty
small, but they were much more interesting than the 20 news groups. And so I started to work first on things like the TB epidemiology, and then on some of the earliest data sets that measured
the expression or activity level of different genes and different types of cells. And that,
of course, was a network problem galore, because you really had to figure out that this gene did
this thing to this other gene. And so that really created a much more interesting technology challenge.
And then from there, I actually started to get interested in the biology in its own right,
because it was not only more interesting, but it was also much more aspirational in terms of
what you could do with it actually could help people.
So, and then that grew to be a much more significant driving force for me over time, the wish to
just do good, not just good science, but also good to the world.
Right.
And so I want to dive deep in just a minute into this, both the biology and in this notion of how it is
that we technologists can be doing more to do more good in the world. I want to take a moment and
double-click on this point that you just made, which is this very strong correlation between
what people will do research on and what people will study in machine learning and the available data.
Because I know your former colleagues, Fei-Fei Li, like helped put together ImageNet,
which catalyzed a whole bunch of really interesting developments in computer vision.
And it's just true, like the data sets that you have available to play around with will dictate the character of your research.
In a way, you were doing something extraordinary at the time by realizing, all right, well, I'm going to go find the interesting data.
And I do think as part of this notion of we should be directing our efforts towards things that will do public good is partially about making sure that people
have the compute resources and tools and whatnot. But it's also about making that data available,
data sets that are relevant to the problems we want solved.
Absolutely. I mean, when you think about an aspiring young researcher or even an undergrad
or even one of those high school students that we talked
about earlier having a data set that is interesting that offers potential for them to do something
innovative and cool and that is processed and easily accessible and where the kind of there's
at least initially a set of well-defined problems that they can tackle
while they explore the data to come up with potentially new problems. That, I think, is such
an important entry point for people into the field. And we all understand that that's not where
the real action is ultimately. If you're going to become a leading researcher in the field,
part of what you
need to do is really develop your own way of finding data sets. Although to be fair, there's
some incredibly talented people who just continue methods development on data sets that other people
have already created. And I think that's a very worthwhile path as well. But so for either of those, giving people an easily accessible first entry point
into a field is just absolutely critical as opposed to what I've seen in a lot of early
stage machine learning projects where they tell people, oh, go around, figure out what problem
you think would be interesting for you to solve, and then figure out how to get get data for it and then figure out what machine learning algorithm is good for it.
I mean, that is such an insurmountable mound of stuff for someone to tackle the first time they're getting into the field that it explains that. And that's especially true for people who are not quite as privileged as others
in terms of what we give them as a starting point.
Or they go work on the same old data sets as everyone else,
which I think Feifei's work on ImageNet
was absolutely transformative.
But at this point, I'd like people to start thinking about other forms of data that they could get practice on.
And there haven't been enough Feifeis to go and create those data sets in other places.
Well, and you can sort of see it even in how we reward folks. You know, like maybe this is sort of
a controversial thing to say,
but like I was a little bit shocked
that Feifei wasn't on the same roster
of folks who got the Turing Award for deep learning
because the ImageNet stuff was like potentially,
I mean, it was absolutely a precondition
for the stuff that Hinton and Lacoon and Bengio did.
Whether or not folks actually agree with that is almost beside the point that we don't recognize this data collection and building these data assets as much as we do the fancy algorithms?
I think there's always been a lot of appreciation in the machine learning community as a whole
for technical firepower, for yet another improvement on algorithms or models that admittedly is an amazing contribution.
And obviously a lot of those developments have been what's opened the door to the performance
that we see today.
But there's been less appreciation, I think, for the intellectual endeavor of doing work
that is more applied. And I think people
often don't understand the amount of intellectual endeavor and thought that goes into questions such as what is the right problem within this big sea of a space like biology or earth science
or whatever what are the questions that are both technically tractable and yet can be transformative
to what the field is trying to accomplish. And that is an incredible intellectual
exercise, followed by the second intellectual exercise, well, if this is the problem that we
aim to solve, how do we get the data to actually solve it? And can we acquire it? Can we clean it?
Do these data sets have issues that we need to address? Or do we need to go and collect data de novo?
That too is an incredible intellectual exercise and often a very time-consuming feat.
And I agree with you that those efforts are not always as recognized as some of the sort of mathematical or machine learning sort of flashier efforts.
Yeah.
Which is a really good segue into what you're doing right now,
which is applying machine learning to a very, very worthy set of problems.
And I'm guessing you started well before we were in this pandemic moment that we're
in right now. But what you're doing, I'm guessing, is more relevant now than it was even six months
ago. No, absolutely. And I think one of the very, very thin silver linings around this
very dire situation that we find ourselves in is that there is, I hope, a growing appreciation
among the general public for what science is able to do for us today and how much of that ability
rests on decades of basic science work by many, many people that much of which is publicly funded
work at academic institutions, that without that level of progress that we've made, the concept of,
say, creating a vaccine in 12 months would have been completely ludicrous, you know, a few years ago,
or the work that's being done on repurposing of drugs that exist to help address, you know,
some of the, even if not cure the disease, at least slow its progression or help ameliorate some of the more significant inflammatory consequences.
There's thousands of drugs out there.
You can't do thousands of clinical trials for each of them.
So a lot of the work that we've done on interpreting cell-based assays and understanding the immune system and understanding things like cytokine storms and such,
those are all key building blocks for the fact that we actually have at this point two drugs
and hopefully more coming that at least are somewhat helpful in addressing this disease.
And so I'm really hoping that people are paying attention, that science matters.
It really matters.
And you should be supporting science and listening to science in the good days, because when the bad days come, it's going to be too late to sort of suddenly realize that you need science. So, sorry, that was my little soapbox right there.
But I think it's everything you said I could not more strongly agree with. And,
you know, like one of the things that I'm hoping for, like this is my desired silver lining
potentially for this moment that we're in is like, I think we are making very rapid progress
towards vaccines and therapeutics and better understanding exactly the mechanism of this
miserable little virus. But I'm hoping like we will, in this moment, see how much science can
accomplish when we point it at a task like this. And hopefully we will decide that that is a worthy
set of things to invest much more in than we have been over the past,
I would say, decade, because the last decade has been transformative in science
in many of the same ways that has been transformative in machine learning,
but coming into it from the other side.
I think we have a chance of making significant headway against other diseases that are currently still scourges that are incredibly damaging and shorten people's lives, reduce their quality of life.
And I think with the right investment and the right focus, we could actually make a difference.
So tell me a little bit about what you're doing at In-Citro.
So the premise for what we're doing really emerges from what I said a moment ago,
which is that this last decade has been transformative in parallel on two fields that very rarely talk to each other. We've already talked about the advancement on the machine learning side
and the ability to build incredibly high accuracy predictive models in a slew of different problem
domains if you have enough quality data. On the other side, the biologists and bioengineers have
developed a set of tools over the last decade or so,
that each of which have been transformative in their own rights.
But together, they create, I think, a perfect storm of large data creation, enabling large
data creation on the biology side, which when you feed it into the machine learning piece
can all of a sudden
give rise to unique insights. And so some of those tools are actually pretty special and
incredible, honestly. So one of those is what we call induced pluripotent stem cells,
which is we being the community, not we at In-Citro, which is the ability to take
skin cells or blood cells from any one of us. And then by some almost magic, revert them to
the state that they're in when you're an embryo in which they can turn into any lineage of your body. So you can take a skin cell from us,
revert it to stem cell status,
and then make a Daphne neuron.
And that's amazing because that Daphne neuron
carries my genetics.
And if there are diseases that manifest
in a neuronal tissue,
you will be able to potentially examine, assay those cells and say, oh, wait, this is what
makes a healthy neuron different from one that carries a larger genetic burden of disease.
And so that's one tool that has arisen. A different one that is also remarkable is the whole
CRISPR revolution and the ability to modify the genetics
of those cells so that you could actually create fake disease, not fake disease because it's real
disease, but introduce it into a cell to see what a really high penetrant mutation looks like in a
cell. And then commensurate with that, there's been the ability
to measure cells in many, many, many different ways where you can collect hundreds of thousands
of measurements from each of those cells. So you can really get a broad perspective on
what those cells look like rather than coming in with, I know I need to measure this one thing.
And you can do this all at an incredible scale.
So on the one side, you have all this capability for data production.
And on the other side, you have all this capability for data interpretation.
And I think those two threads are converging into a field that I'm calling digital biology, where we suddenly have the ability to measure biology
quantitatively at an unprecedented scale, interpret what we see, and then take that back and write
biology, whether it's using CRISPR or some other intervention to make the biological system do
something other than what it would
normally have done. So that to me is a field that's emerging and will have repercussions that
span from, you know, environmental science, biofuel, bacteria or algae that do all sorts of
funky things like suck carbon dioxide out of the environment, better crops, but also importantly
for what we do, better human health. And so I think we're part of this wave that's starting
to emerge. And what we do is take this convergence and point it in the direction of making better drugs that can potentially actually be disease
modifying rather than, as in many existing drugs, just often just make people feel better but don't
really change the course of their disease. And so this technology that you're talking about,
will it be used to make the drugs or to examine the effect of potential drugs or both?
Both.
So it actually starts with understanding where you even want to develop drugs for.
So a lot of the problems that we have with current day drug failures, which are, depending on which statistic you believe, the success rate
of a drug discovery effort from beginning to end is somewhere around 5%. So think about that. It's
a 95% failure rate. And a lot of this is because we just don't understand the biology. we don't know where to develop drugs towards. What is the right target and what is the
right cell type and what is the right patient population? So it starts with predicting using
machine learning what viable targets are in the context of a given disease in the given target
population. And then from there, okay, how do we design drugs more rapidly
so that we don't have to wait five years or sometimes much longer for a drug to emerge?
And so really we want to close that arc of going all the way from the biology to the actual drug.
There's so much obvious potential for this thing that you're calling digital biology. And like,
there are a bunch of very promising companies and a bunch of like very brilliant researchers
who are doing work in this area. So I'm curious if you have any thoughts on what are the obstacles standing in our way of going faster?
Is it educating the right people?
Is it we need more data?
We need more compute resources.
We need breakthroughs in particular areas.
So, like, how do we make all of this go faster?
So, I think yes to everything that you said.
With the possible exception of more compute power, I don't think that's currently the rate-limiting aspect.
That's great.
You are then an unusual part of machine learning.
Well, I mean, maybe I'm being overly optimistic,
but there is just, you know,
you can currently turn on the tap
and pay your cloud provider,
whoever that is, more money,
but it's not like that's the place
that is currently blocking us.
What's currently blocking, I think,
is working my way backwards through your list
is having not only more data, but having
the right data, data that really helps inform the answers to the questions that are really going to
transform the space. Creation of biological data is challenging. Those are living beings that you're manipulating. And as
such, there is all sorts of funky things that can go wrong that those of us who were trained as
engineers with man-constructed artifacts are not familiar with, you know cells behave differently for reasons that we do
not understand they clump they get infected with these things called mycoplasm that ruin your whole
experiment and infect other cells um there's just so much stuff that can go wrong in a biological
experiment where you manipulate living beings that you need to be really good at it and you need to be very very careful in how you do the experiment but equally
careful in figuring out what experiment is it that you want to do because experiments take
time there's only so far you can accelerate a cell and getting it to grow faster. And even
more so when you're dealing with a larger living organism, be it a model system or a human. So
the experiments are much more high stakes because it's not just a matter of, okay,
let's push a button and launch another 10,000 of those in the cloud. And then I think working our way backwards in order to really answer those
questions in the right way, which is what are the experiments that we need to perform? The ones that
are going to be truly meaningful, transformative, feasible from an experimental perspective and at the same time feeding into the machine learning
in the right way you need to have at least a group that speaks both languages that understands the
biology in terms of what's useful and also what's possible and at at the same time, a group of people on the computer science side
who understand what the technology can do
and where to find within that sort of stew
of the broader field of, say, biology or even drug discovery,
problems that are both impactful and tractable.
And those people who speak both languages are very few and far between.
There's maybe a few more of them coming up as educational institutions become more cognizant
of the need to train interdisciplinary people.
But those people are very hard to find.
If you talk to your average computer science person
or machine learning engineer,
and you put them in a room with your average biologist or medical doctor,
they could, and even if they come in with all of the good intentions of wanting to collaborate,
they have not only completely different languages, they have completely different mindsets.
So coming back to some of the earlier points that we made, Biology still, even today, is a lot about the details. And the reason for that
is that the exceptions, those little nitpicky things that don't line up with everything else
that you've seen, are often the starting point for new discovery. So people kind of want to look for
those, whereas engineers really care about, let's find the principles that cover 95% of what we see,
because that's going to be good enough for us to go and build systems. And so that mindset,
those two mindsets are so at odds with each other in many ways that getting people to really
communicate in a way that is collaborative and constructive is really hard. And if I can point
to the one thing that we've done at In-Citro that I'm the proudest of is that we've built a
community of people that span a broad spectrum of disciplines in that range and are actually working as a single team. And that's just very unusual.
Yeah, that'd be fascinating.
I'm just sort of curious, like what is, you know,
and this may require going out on a limb
you don't want to go out on,
but, you know, one of the things that's made computing
so much more powerful over the past five decades, like the entire course of modern computing history, is that we have this way of building abstractions that compose where we don't have to understand all of the little nitpicky things. I mean, it's useful to have a model for the nitpicky things when your abstractions fail so that you can go investigate things and figure out what went wrong.
But by and large, you're sort of trusting a bunch of very powerful, very high-level abstractions when you go do your job as a computer scientist or a software engineer today. You know, everything from, like, I can just sort of push a button
and a virtual machine materializes on a server,
in the data center somewhere, in the cloud,
and, like, I don't have to worry about all of the
just colossal amount of complexity that makes that happen.
Is there an equivalent mechanism at play in modern biology?
You know, it's interesting that you bring that up because I've, maybe not that surprising because
we were both trained as computer scientists, but one of the things that I love about modern
biology is that we're getting there. So, there's an emerging set of building blocks that are relatively well-defined in terms of what I'm going to call their API, which is obviously not a word a biologist would ever use, but they have a well-defined kind of input-output functionality. And these include things like CRISPR for genome editing,
where you can basically say,
okay, this is what I want to do to edit the cell.
And then I do that, and there's a set of steps that we need to do,
and then an edited cell comes out.
So that's the glass half-full side of it,
that there are these building blocks that are emerging and you can
start to compose them and do more interesting things with larger and larger, more complex
programs, if you will, that are written in terms of those building blocks. The bad news is that
each of those building blocks is in turn based on a system that is not a nice, predictable, well-understood system like a computer.
It's something that involves living cells.
And so everyone, I think, has heard about the risk of, say,
I'm taking a very simple example, off-target effects of CRISPR editing.
And the fact that which off-target effects you get depends on many things that we don't understand.
Not only which cell type it is, but the specific individual from whom it came gives rise to
somewhat different consequences.
The state that the cell was in at the time that the experiment was started.
So you can think of these as, on the one hand, composable
building blocks that you can start to sort of create systems with, but each of them is incredibly
variable in its response. So it creates a distribution of outcomes that we really don't
understand. And we need to design these experiments in a way that is robust enough that it's hopefully useful even despite that variability and put in what we as computer scientists would call QA pieces that measure as many of the pieces along the way that we possibly can in order to figure out
what emerged from each of those building blocks so that we can trace the repercussions down the
line. And it's very hard. So when you ask what is it that makes this hard is that you have to bring that systems mindset of QA and tracking
and putting in incredibly stringent sort of constraints
on each of those building blocks in the same way that you do
when you build an Intel microchip fab, for instance,
to a discipline that really hasn't done as much of that,
but in a way that is cognizant of all of the sources of variability
and errors that might occur in a biological system.
So that confluence is really hard to put together.
Well, and it strikes me that this bag of techniques
that you are bringing from your background,
so probabilistic modeling and machine learning,
they're the best possible contemporary set of techniques
for dealing with some of these uncertainties.
Whereas if you had to go in and like describe these systems
with a set of partial differential equations,
you'd be lost from the outset.
I completely agree.
I mean, unfortunately, our ability to describe biological systems
using rigid mathematical, deterministic mathematical tools
fails once you go beyond the atomic level.
And even there, I mean, when you think about something
that is relatively circumscribed, like a single protein folding,
you can do some of the differential equation modeling.
But even there, we've seen that techniques that take a step back and say, you know what,
let me not try and construct detailed mechanistic models, but instead let's give the machine
enough data to learn from and it'll pick up
patterns that might be useful that's what made the deep fold um uh success from deep mind work
is that they took a machine learning approach that now the critical piece of course and that
comes back to our conversation a moment ago, is that they had
enough data to train on unfolded proteins. And getting enough high quality data is where it's,
what it's all about in this new world of bringing machine learning into the space. And that's why
we built in Citro the way we did. Awesome. Well, we're just about out of time, and I wanted to ask you before we wrapped up, what do you do for fun?
So you have what sounds to me like an incredibly fun job, but there must be something outside at work.
Well, so first of all, I am grateful to have a job that is as much fun as this in the sense that I get to read all of the coolest papers in biology and
all of the coolest papers in automation and in machine learning and figure out how to put them
together in new ways and do it towards a goal that I think is just truly important, which is
how do we make people healthier, which to me is, and I'm going to go on a soapbox for just a
moment and talk about the fact that I think part of our goal here as we, you know,
when we were put on this earth was to try and leave the world a little bit better than it was
when we came into it. And we should be, we should be doing that. And for
those of us who had the privilege of being born to relatively affluent, well-educated families,
where we didn't need to struggle for where our next meal is coming from, that burden
is actually even higher. And we are, we should be thinking about how we can give back um so anyway
sorry that was a no that's so important yeah uh so but that being said um when the thing that i
most liked to do for fun pre-coronavirus was um to travel and see parts of the world that are different from the little cocoon where we live.
I've been to 65 different countries so far.
Six different continents.
Have not yet been to Antarctica.
That's definitely on the bucket list.
And I find it to be a wonderful experience, both in visiting other cultures and seeing how different people live,
but also I love being out in nature and the outdoors and hiking and scuba diving and sailing and doing all that.
So that is the thing I used to do for fun.
I have no idea when the next time I'll be able to do that is, unfortunately, at this point in time so the other things that I like to do is just
spending time with my family and you know going on local hikes in nature which are not perhaps
as dramatic as visiting Iceland or the Great Barrier Reef or this incredible lake in Palau that has jellyfish that don't sting and you can swim in
them. But at least it's being outdoors in the fresh air. And I'm lucky enough to live in a
part of the world that has some beautiful scenery, even locally. So I go for hikes a lot these days.
Yeah. Well, hiking in Northern California is not bad at all.
Nope. Can't complain too much relative to what the situation could be.
But I do wish we could get back on a plane at some point and visit some of those amazing places elsewhere in the world.
Well, I'm hoping that probably not as soon as we want, but sooner than we would ever have been able to do at any other point in human history.
Science will be able to give us enough safety around coronavirus that hopefully you'll be able
to travel soon. I won't make any predictions about when soon is, but let's hope for soon.
Very much hope so. And I think if we do get to that point in the near term, and by near I mean within the next 12 to 18 months, I hope people will appreciate the miracle that it is and the many decades of work by so many people that needed to happen in order to make that possible.
Yeah, and I think that is the perfect place to stop. So thank you so much for being on the show
today. This was a fascinating, fun conversation, and I'm glad we got to talk to you today.
So am I. Thank you very much.
Awesome.
So that was Kevin's conversation with Daphne Kohler, CEO and founder of In-Citro. And oh my
gosh, that was so interesting. There were so many amazing parts of that conversation. I'm not even
honestly someone who's that into biology. And there are so many things that I'm going to think
more about and that I want to kind of pull
on more strings based on that conversation. That was amazing. Yeah, I think one of the really
great things about Daphne and one of the things that has made her such a great scientist and
entrepreneur is that she thinks about everything that she does extremely deeply.
She has this wide-ranging curiosity, which I think is one of the best superpowers. You combine that with persistence, and you find yourself in all of these situations
where you are making connections across disciplines and doing a whole bunch of things that maybe you wouldn't
be able to imagine if you were a more narrowly focused person or had a more narrowly focused
set of interests. And like she said so many things in that conversation that I'm like, wow,
I really need to go think about this more deeply myself. Like just one of the casual things that she said
was this need for like maybe, you know, an equivalent of little league sports or like a
kid's Kaggle competition so that you can find the right competitive and social dynamic for kids
getting themselves onboarded into machine learning. It's a great idea.
Yeah, it really is.
Somebody needs to go do that now.
No, I'm in total agreement. Yeah, because we have Little League and we have other sorts of
competitions. And when kids get older, there are some more science type of competitions,
but to have something gamifying things when you're younger around machine learning would be brilliant.
That's a brilliant idea. And I loved, you know, kind of her origin story, you know, the fact that
she was writing her thesis on, you know, game theory and distributed systems and multi-Asian
incentive systems. Like, I was just like, this is brilliant. You know, these are things that you,
that to your point, you would need
certain curiosity and just wide-ranging interest and persistence to really want to pursue.
One of the big takeaways I kind of got from this was something she said, you know, about how much
science matters. And what are your thoughts about that, especially in the moment that we're living
in right now? Well, I think the thing that she tried to draw our attention to several times is that we,
for this pandemic, and I think in general, like we're more dependent now on science to solve some
of the really big problems that we are facing as a society or some of the challenges that we have to overcome in order to live our best lives and to have the future that we all want.
And the thing to remember is none of this is sort of overnight.
Like science is just years of substantial investment in a wide variety of things that build this foundation that when you get to a
moment like the one that we have right now, you have all of the things that you need to go
tackle these problems. So if you don't do these long-term investments and these foundational
pieces in educating scientists and giving them the ability to go do this work that builds
this solid, solid foundation and like carries the whole field forward, you really can get yourself
into a situation where a crisis comes along and like you just don't have any way for science to
help solve it. And so like, I think that's the thing that we all really need to remember, you know, when, and hopefully it will redouble our
resolve to go make even bigger investments in those foundations for the future.
No, I think you're 100% correct. We need to continue to make these investments. And I love
that there are people like Daphne who are taking these two different fields, you know, taking
computer science and machine learning as well as biology and working together so that hopefully the right problems for doing things that will
produce positive benefit for all of humanity. And, you know, we just were, to use Daphne's words,
I think we were put here to try to leave the world a little bit better than we found it.
Absolutely. Absolutely.
All right. Well, that's all for us today. Thank you again to Daphne Kohler.
And we are so glad that you joined us.
We learned so much information.
And we hope that all of you at home got a little bit of nugget
to impress all of your friends at your next socially distanced gathering.
I know I definitely did.
I'm definitely going to be dropping things
like digital biology in conversation now.
And remember to reach out to us anytime
at behindthetech at microsoft.com.
Stay safe and be well.
See you next time.