Behind The Tech with Kevin Scott - Daphne Koller, PhD: CEO and founder of insitro

Starting point is 00:00:00 I think one of the very, very thin silver linings around this very dire situation that we find ourselves in is that there is, I hope, a growing appreciation among the general public for what science is able to do for us today and how much of that ability rests on decades of basic science work by many, many people. Hi, everyone. Welcome to Behind the Tech. I'm your host, Kevin Scott, Chief Technology Officer for Microsoft. In this podcast, we're going to get behind the tech. We'll talk with some of the people who made our modern tech world possible and understand what motivated them to create what they did. So join me to maybe learn a little bit about the history of computing and get a few behind-the-scenes insights into what's happening today.

Starting point is 00:01:03 Stick around. Hello, welcome to Behind the Tech. I'm Christina Warren, Senior Cloud Advocate at Microsoft. And I'm Kevin Scott. Today, our guest is Daphne Kohler, who's the CEO and founder of InCitro, which is a company that works at the conversions of biology and machine learning. Yeah, I'm guessing everyone has a newfound appreciation for how important biomedicine and biotechnology is at this time with the COVID-19 pandemic still raging around us. And I think Daphne is doing some of the most interesting

Starting point is 00:01:47 work right now in the field that is, as we've seen with several of our other guests, like this really powerful combination of biology and machine learning and high-performance computing and laboratory automation. Like what they're doing is really wonderful work. And Daphne is just sort of a brilliant computer scientist and has had many, many different chapters in her career that are inspirational. That's so true, Kevin. I cannot wait to hear this conversation. Yeah, so let's get started. Our guest on the show today is Daphne Kohler. Daphne is a machine learning pioneer. She's CEO and founder of In-Citro, a company that applies machine learning to pharmaceutical development.

Starting point is 00:02:50 Daphne was a computer science professor at Stanford, co-founder and co-CEO of Coursera, and is a MacArthur Fellow. She was also named one of Time Magazine's 100 Most Influential People. And I have had a copy of her book, Probabilistic Graphical Models, on my bookshelf for more years than I know. So welcome to the show, Daphne. Thank you. Glad to be here. And I would be very impressed if you actually read the book. I read the book many, many years ago when I was at Google, where your work was very influential in how we thought about doing some of our very early and what we thought at the time was very sophisticated machine learning work in the ad system. So I was, I do have to admit at the time that I picked the book up

Starting point is 00:03:32 the first time I was a compiler and programming language person. So it was, it was not an easy read. It is definitely a bit of a tome. I think at this point, it serves well as a doorstop if you need one. It's very, very big. So, in any case, I'm so delighted you could be with us here today. So, we typically start off these conversations with a bit about your background. So, I'm curious how it is that you got interested in science and technology in the first place. So I was interested in science ever since I was a kid. My family had this series of books that were the time-life series on everything from asteroids to how plants grow, and I just used to sit and read them for fun. I didn't get interested in technology until my freshman year of high school when my parents came here. My dad came on sabbatical to Stanford, and for the first time I was in a school where there was a computer center. This was a long time ago, so I'm going to date myself at this point but these were TRS-80

Starting point is 00:04:47 computers that were time shared across two people and I got to learn to program and I found it an amazing experience where you could actually tell a computer what to do and it did it which didn't work for me in any other context but but here it actually did. And so I got interested in computing at that point. And that I think led to basically my college choices and so on. And ultimately, I think the combining the two in the work that I do now, but that's many years later. So on those TRS-80s, it sounds like your experience is very similar to mine, actually. In high school, I did a bunch of coding on TRS-80s. So was your language of choice there the basic interpreter, or were you using something else?

Starting point is 00:05:39 No, at that point, it was basic. This was the only language that was available when I was here in high school. But I very quickly migrated beyond that to Pascal and then C. That's really interesting. That is almost exactly the same language, although there was some assembly language in there because I wanted to code games and you had to supplement the basic with a little bit of assembly language if you wanted to make things move around on the screen. So, you know, this thing that you said about being attracted to programming as a kid because the computer would listen to you, like I think is very interesting.

Starting point is 00:06:21 It is one of those things that I think can give kids agency. And, you know, I know that you, you know, both as an educator at Stanford and as one of the co-founders of Coursera, you've thought a lot about how to educate both kids and adults. How important do you think that sense of agency is in getting kids interested in computing? I think it's very important. Kids, I think it's difficult for us to appreciate as adults just how powerless most kids feel. Certainly the ones from less advantaged backgrounds but even the others and I think giving them an avenue where they can really dictate if you will what happens is super exciting for them and I think we are not giving them enough of that in how we currently teach technology because

Starting point is 00:07:22 we've moved far away from the programming and how one teaches computers in most schools. And if you actually came back to that and say, hey, look what you can build, and it actually works, I think it's an incredible feeling of empowerment for kids. Yeah, one of the things that I've struggled with with my own kids is trying to get them interested in programming. So I'm not trying to force them to learn anything that they don't want to.

Starting point is 00:07:54 Like we try to expose them to a bunch of things. And it took a while to figure out how to find an entry point into coding that was interesting for them. And the thing for my son was Roblox, which is this game that he plays on his tablet obsessively. And as soon as he figured out that there was a way for him to create his own stages in Roblox, that was the thing that enticed him to want to program. I think it's become harder to get kids interested in programming because the programs that are already out there are really sophisticated and fancy. And what kids can create is always going to pale by comparison to what is already out there, which is not a problem that you and I had when we were starting with computers. you didn't have a tablet on which you could play amazing games.

Starting point is 00:08:47 And so what we created seemed kind of cool. And now when kids create, it seems not quite as cool as the games they have on their phone. And so the question is, is there a way in which we can give kids that same sense of excitement about what they're creating so that it does seem cool and interesting. And I don't think we've paid enough attention to that. Yeah. And it's interesting that you bring that up because we have talked with a bunch of other guests on this podcast where it's also true that the programming tools that you have available

Starting point is 00:09:24 to you now are vastly more powerful than the ones that we had when we were first learning to program. But it could very well be, and I 100% agree with what you're saying, that the gap between the sophistication of the software has maybe grown even further apart from the power of the tools for entry-level, for kids. Yeah, no, that's right. And my daughter, she was kind of interested in machine learning for a while, but so I said, well, why don't you try your hand at one of those Kaggle competitions? And the problem is that the Kaggle competitions,

Starting point is 00:10:00 they're full of really sophisticated, top-notch programmers looking to build a reputation so that they can go get jobs because of their machine learning street cred. And kids like my daughter have no chance of even getting into the top 100. So it was kind of a bit of a demoralizing experience in the sense of nothing that she could do whatever rank up there. And so I'm not sure it ended up serving a positive purpose. And so I wonder if it would make sense to have a Kaggle for Kids or something that would let kids compete in a playing field that was more even and would get them excited about going on

Starting point is 00:10:39 to the next level. I think that's a fantastic idea. I mean, and we understand how to do this with sports, right? So like there are all these sports leagues and sport opportunities for kids where you can get them into a team where they can learn the sport and get all of this physical activity, but they're not like completely and utterly outmatched either with their teammates or the teams that they're playing. So it just seems perfectly reasonable to me that we could figure out how to do this with some of these coding competitions. You would think. I mean, if you weren't already busy founding, running a company, I'd say that sounds like a good thing to go do. This would be my second or third alter ego if I were able to wrangle that. Yeah, indeed. So you learned a program when you were in high school. And when you went to college, did you choose computer science? So I actually had an interesting early career in that regard because I actually was young when I was doing high school. And I then started, when I came back from the United States after that sabbatical, I actually started college in parallel with high school because I'd always found high school to be not as inspiring as I would have hoped.

Starting point is 00:12:02 And I found a lot more flexibility in college curriculum so I started to study math and computer science while I was still finishing up high school and yeah I think it was the timeliness of that and that I had just come back from the United States where I'd learned to program probably influenced my career choice and who knows if I'd waited three or four or five years like most kids do, then maybe I would have picked something else. But the nice thing that I found about computer science and even more so over time is that it's actually an entry point into multiple other fields because especially today, but even back then, most fields can benefit from computational methods and using computer science, whether it's algorithmic thinking or now even more so machine learning.

Starting point is 00:12:53 All of them find that technology a really useful and often very divergent way of approaching the field. And so that allowed me in my career to touch on so many different areas from things that are more core tech, like robots and computer vision, to things that are a little bit more distal, but at least today still considered more core, like natural language processing, to things that are a little bit less viewed as part of core computer science. I did a lot of game theory, for instance, and economics early in my career. And now I'm doing a ton of science and medicine. And it's not that I've become a biologist. I am still a computer scientist, but the tools are just so useful in all these different disciplines that as a computer scientist, I'm not only able to do interesting biology.

Starting point is 00:13:52 I'm able to do it in a way that is often very different to how someone who was either as a high school student or when you were in college, that you did that wasn't the core CS stuff, like operating systems, compilers, algorithms, data structures, where you sort of realized, like, oh, wow, this computer science stuff that I've learned is like a superpower that lets me do a whole bunch of things? So I think the first one was really the integration between game theory and computer science and the meeting point in distributed systems. So that was actually what my master's thesis was about was back when I started my PhD, where I was at that point trying to explore it from both sides. Can an understanding of the multi-agent incentive function, if you will, from the game theory perspective, help us build better distributed systems? also conversely if game theory was a really interesting framework for decision making can we use the tools the algorithmic tools that computer science gave us to help us find better solutions to game theoretic problems that were

Starting point is 00:15:41 not even necessarily within the scope of computer science. So can we help people make better decisions in the multi-agent setting by turning it into a computational problem so it wouldn't be this kind of bespoke, somewhat obscure mathematical analysis that only game theorists could do, but actually a tool that's useful in decision making? And so that was kind of the entry point for me that went actually from decision making in multi-agent systems to decision making in single agent systems to modeling of the world that would enable decision making to then learning those models from data, which is what took me to machine learning. So that was actually that trajectory. And did you have anyone or like any interesting way that you were getting into these tangential fields?

Starting point is 00:16:36 So like game theory, for instance, like was this something where you had an influential mentor who put you onto it? Was this you just independently getting curious and reading a whole bunch of stuff? What's your approach to learning these disparate things? I wish I could tell you that it was a systematic thorough exploration where I took a broad perspective and tried to figure out what was interesting and most useful. Often it's a bit of serendipity and just affinity to a particular space and maybe just a sense of

Starting point is 00:17:13 there's something here that could be exciting. On the game theory side, I just happened to take as part of my undergraduate degree, which was a dual degree in math and computer science, there was a game theory class. And so I took that and I got really intrigued by the truly elegant mathematics that underlie it. And I said, wow, this could be a really cool way of thinking about interactions in a computer system. So that was purely serendipity. My move into biology was much later. Interestingly, my father was a biologist, and so I had always actually steered away from biology, partly, I think, because like most kids, you don't want to do what your parents do. But also because at that time time when I did take a biology class, it was incredibly descriptive. It was like a catalog of, you know, obscure Latin names of plants or pieces of cells.

Starting point is 00:18:14 And it was all about memorizing that this does this to that. And I was just completely uninterested in doing that because it seems like there were very few principles in play. It was all about the details and I'm not good at details. So, um, especially memorizing them. So I didn't really get into that and, uh, did my entire, both high school and undergraduate career focusing on things that were much more interpretable in terms of principles and systems, so math and physics and computer science. And the reason I got into biology and medicine was actually when I came back to Stanford as a faculty member and started to do machine learning and realized just how boring and uninspiring the data sets that we machine learning people had to work with at the

Starting point is 00:19:12 time that machine learning was getting off the ground. I mean, one of the flagship data sets, was something called the 20 news groups, which is exactly what it sounds like. It's articles from 20 very boring news groups, and you had to classify which news article came from which group. And it was not interesting technically, and it certainly wasn't very aspirational. And so I started to look around what other interesting data sets were around. And specifically, my focus at the time was on data sets that were more richly structured, relational data sets, if you will, where there's multiple types of entities, multiple types of relationships, and looking for data sets that had those characteristics that were available. And most of those were trapped behind the doors of companies that

Starting point is 00:20:06 weren't very excited about making those available to outside researchers. And biology at that point was like, oh, well, look, there's genes and cells and proteins and people. And I actually started early work on, interestingly enough today, on epidemiology at that point of tuberculosis and tracking infection chains and figuring out if you could sort of pinpoint where an infection started, something that seems very timely today. But at that point, the data sets there were also pretty small, but they were much more interesting than the 20 news groups. And so I started to work first on things like the TB epidemiology, and then on some of the earliest data sets that measured the expression or activity level of different genes and different types of cells. And that, of course, was a network problem galore, because you really had to figure out that this gene did

Starting point is 00:21:02 this thing to this other gene. And so that really created a much more interesting technology challenge. And then from there, I actually started to get interested in the biology in its own right, because it was not only more interesting, but it was also much more aspirational in terms of what you could do with it actually could help people. So, and then that grew to be a much more significant driving force for me over time, the wish to just do good, not just good science, but also good to the world. Right. And so I want to dive deep in just a minute into this, both the biology and in this notion of how it is

Starting point is 00:21:46 that we technologists can be doing more to do more good in the world. I want to take a moment and double-click on this point that you just made, which is this very strong correlation between what people will do research on and what people will study in machine learning and the available data. Because I know your former colleagues, Fei-Fei Li, like helped put together ImageNet, which catalyzed a whole bunch of really interesting developments in computer vision. And it's just true, like the data sets that you have available to play around with will dictate the character of your research. In a way, you were doing something extraordinary at the time by realizing, all right, well, I'm going to go find the interesting data. And I do think as part of this notion of we should be directing our efforts towards things that will do public good is partially about making sure that people

Starting point is 00:22:47 have the compute resources and tools and whatnot. But it's also about making that data available, data sets that are relevant to the problems we want solved. Absolutely. I mean, when you think about an aspiring young researcher or even an undergrad or even one of those high school students that we talked about earlier having a data set that is interesting that offers potential for them to do something innovative and cool and that is processed and easily accessible and where the kind of there's at least initially a set of well-defined problems that they can tackle while they explore the data to come up with potentially new problems. That, I think, is such

Starting point is 00:23:32 an important entry point for people into the field. And we all understand that that's not where the real action is ultimately. If you're going to become a leading researcher in the field, part of what you need to do is really develop your own way of finding data sets. Although to be fair, there's some incredibly talented people who just continue methods development on data sets that other people have already created. And I think that's a very worthwhile path as well. But so for either of those, giving people an easily accessible first entry point into a field is just absolutely critical as opposed to what I've seen in a lot of early stage machine learning projects where they tell people, oh, go around, figure out what problem

Starting point is 00:24:21 you think would be interesting for you to solve, and then figure out how to get get data for it and then figure out what machine learning algorithm is good for it. I mean, that is such an insurmountable mound of stuff for someone to tackle the first time they're getting into the field that it explains that. And that's especially true for people who are not quite as privileged as others in terms of what we give them as a starting point. Or they go work on the same old data sets as everyone else, which I think Feifei's work on ImageNet was absolutely transformative. But at this point, I'd like people to start thinking about other forms of data that they could get practice on. And there haven't been enough Feifeis to go and create those data sets in other places.

Starting point is 00:25:17 Well, and you can sort of see it even in how we reward folks. You know, like maybe this is sort of a controversial thing to say, but like I was a little bit shocked that Feifei wasn't on the same roster of folks who got the Turing Award for deep learning because the ImageNet stuff was like potentially, I mean, it was absolutely a precondition for the stuff that Hinton and Lacoon and Bengio did.

Starting point is 00:25:50 Whether or not folks actually agree with that is almost beside the point that we don't recognize this data collection and building these data assets as much as we do the fancy algorithms? I think there's always been a lot of appreciation in the machine learning community as a whole for technical firepower, for yet another improvement on algorithms or models that admittedly is an amazing contribution. And obviously a lot of those developments have been what's opened the door to the performance that we see today. But there's been less appreciation, I think, for the intellectual endeavor of doing work that is more applied. And I think people often don't understand the amount of intellectual endeavor and thought that goes into questions such as what is the right problem within this big sea of a space like biology or earth science

Starting point is 00:27:11 or whatever what are the questions that are both technically tractable and yet can be transformative to what the field is trying to accomplish. And that is an incredible intellectual exercise, followed by the second intellectual exercise, well, if this is the problem that we aim to solve, how do we get the data to actually solve it? And can we acquire it? Can we clean it? Do these data sets have issues that we need to address? Or do we need to go and collect data de novo? That too is an incredible intellectual exercise and often a very time-consuming feat. And I agree with you that those efforts are not always as recognized as some of the sort of mathematical or machine learning sort of flashier efforts. Yeah.

Starting point is 00:28:10 Which is a really good segue into what you're doing right now, which is applying machine learning to a very, very worthy set of problems. And I'm guessing you started well before we were in this pandemic moment that we're in right now. But what you're doing, I'm guessing, is more relevant now than it was even six months ago. No, absolutely. And I think one of the very, very thin silver linings around this very dire situation that we find ourselves in is that there is, I hope, a growing appreciation among the general public for what science is able to do for us today and how much of that ability rests on decades of basic science work by many, many people that much of which is publicly funded

Starting point is 00:29:09 work at academic institutions, that without that level of progress that we've made, the concept of, say, creating a vaccine in 12 months would have been completely ludicrous, you know, a few years ago, or the work that's being done on repurposing of drugs that exist to help address, you know, some of the, even if not cure the disease, at least slow its progression or help ameliorate some of the more significant inflammatory consequences. There's thousands of drugs out there. You can't do thousands of clinical trials for each of them. So a lot of the work that we've done on interpreting cell-based assays and understanding the immune system and understanding things like cytokine storms and such, those are all key building blocks for the fact that we actually have at this point two drugs

Starting point is 00:30:14 and hopefully more coming that at least are somewhat helpful in addressing this disease. And so I'm really hoping that people are paying attention, that science matters. It really matters. And you should be supporting science and listening to science in the good days, because when the bad days come, it's going to be too late to sort of suddenly realize that you need science. So, sorry, that was my little soapbox right there. But I think it's everything you said I could not more strongly agree with. And, you know, like one of the things that I'm hoping for, like this is my desired silver lining potentially for this moment that we're in is like, I think we are making very rapid progress towards vaccines and therapeutics and better understanding exactly the mechanism of this

Starting point is 00:31:07 miserable little virus. But I'm hoping like we will, in this moment, see how much science can accomplish when we point it at a task like this. And hopefully we will decide that that is a worthy set of things to invest much more in than we have been over the past, I would say, decade, because the last decade has been transformative in science in many of the same ways that has been transformative in machine learning, but coming into it from the other side. I think we have a chance of making significant headway against other diseases that are currently still scourges that are incredibly damaging and shorten people's lives, reduce their quality of life. And I think with the right investment and the right focus, we could actually make a difference.

Starting point is 00:32:25 So tell me a little bit about what you're doing at In-Citro. So the premise for what we're doing really emerges from what I said a moment ago, which is that this last decade has been transformative in parallel on two fields that very rarely talk to each other. We've already talked about the advancement on the machine learning side and the ability to build incredibly high accuracy predictive models in a slew of different problem domains if you have enough quality data. On the other side, the biologists and bioengineers have developed a set of tools over the last decade or so, that each of which have been transformative in their own rights. But together, they create, I think, a perfect storm of large data creation, enabling large

Starting point is 00:33:18 data creation on the biology side, which when you feed it into the machine learning piece can all of a sudden give rise to unique insights. And so some of those tools are actually pretty special and incredible, honestly. So one of those is what we call induced pluripotent stem cells, which is we being the community, not we at In-Citro, which is the ability to take skin cells or blood cells from any one of us. And then by some almost magic, revert them to the state that they're in when you're an embryo in which they can turn into any lineage of your body. So you can take a skin cell from us, revert it to stem cell status,

Starting point is 00:34:09 and then make a Daphne neuron. And that's amazing because that Daphne neuron carries my genetics. And if there are diseases that manifest in a neuronal tissue, you will be able to potentially examine, assay those cells and say, oh, wait, this is what makes a healthy neuron different from one that carries a larger genetic burden of disease. And so that's one tool that has arisen. A different one that is also remarkable is the whole

Starting point is 00:34:43 CRISPR revolution and the ability to modify the genetics of those cells so that you could actually create fake disease, not fake disease because it's real disease, but introduce it into a cell to see what a really high penetrant mutation looks like in a cell. And then commensurate with that, there's been the ability to measure cells in many, many, many different ways where you can collect hundreds of thousands of measurements from each of those cells. So you can really get a broad perspective on what those cells look like rather than coming in with, I know I need to measure this one thing. And you can do this all at an incredible scale.

Starting point is 00:35:25 So on the one side, you have all this capability for data production. And on the other side, you have all this capability for data interpretation. And I think those two threads are converging into a field that I'm calling digital biology, where we suddenly have the ability to measure biology quantitatively at an unprecedented scale, interpret what we see, and then take that back and write biology, whether it's using CRISPR or some other intervention to make the biological system do something other than what it would normally have done. So that to me is a field that's emerging and will have repercussions that span from, you know, environmental science, biofuel, bacteria or algae that do all sorts of

Starting point is 00:36:20 funky things like suck carbon dioxide out of the environment, better crops, but also importantly for what we do, better human health. And so I think we're part of this wave that's starting to emerge. And what we do is take this convergence and point it in the direction of making better drugs that can potentially actually be disease modifying rather than, as in many existing drugs, just often just make people feel better but don't really change the course of their disease. And so this technology that you're talking about, will it be used to make the drugs or to examine the effect of potential drugs or both? Both. So it actually starts with understanding where you even want to develop drugs for.

Starting point is 00:37:16 So a lot of the problems that we have with current day drug failures, which are, depending on which statistic you believe, the success rate of a drug discovery effort from beginning to end is somewhere around 5%. So think about that. It's a 95% failure rate. And a lot of this is because we just don't understand the biology. we don't know where to develop drugs towards. What is the right target and what is the right cell type and what is the right patient population? So it starts with predicting using machine learning what viable targets are in the context of a given disease in the given target population. And then from there, okay, how do we design drugs more rapidly so that we don't have to wait five years or sometimes much longer for a drug to emerge? And so really we want to close that arc of going all the way from the biology to the actual drug.

Starting point is 00:38:25 There's so much obvious potential for this thing that you're calling digital biology. And like, there are a bunch of very promising companies and a bunch of like very brilliant researchers who are doing work in this area. So I'm curious if you have any thoughts on what are the obstacles standing in our way of going faster? Is it educating the right people? Is it we need more data? We need more compute resources. We need breakthroughs in particular areas. So, like, how do we make all of this go faster?

Starting point is 00:39:08 So, I think yes to everything that you said. With the possible exception of more compute power, I don't think that's currently the rate-limiting aspect. That's great. You are then an unusual part of machine learning. Well, I mean, maybe I'm being overly optimistic, but there is just, you know, you can currently turn on the tap and pay your cloud provider,

Starting point is 00:39:32 whoever that is, more money, but it's not like that's the place that is currently blocking us. What's currently blocking, I think, is working my way backwards through your list is having not only more data, but having the right data, data that really helps inform the answers to the questions that are really going to transform the space. Creation of biological data is challenging. Those are living beings that you're manipulating. And as

Starting point is 00:40:10 such, there is all sorts of funky things that can go wrong that those of us who were trained as engineers with man-constructed artifacts are not familiar with, you know cells behave differently for reasons that we do not understand they clump they get infected with these things called mycoplasm that ruin your whole experiment and infect other cells um there's just so much stuff that can go wrong in a biological experiment where you manipulate living beings that you need to be really good at it and you need to be very very careful in how you do the experiment but equally careful in figuring out what experiment is it that you want to do because experiments take time there's only so far you can accelerate a cell and getting it to grow faster. And even more so when you're dealing with a larger living organism, be it a model system or a human. So

Starting point is 00:41:12 the experiments are much more high stakes because it's not just a matter of, okay, let's push a button and launch another 10,000 of those in the cloud. And then I think working our way backwards in order to really answer those questions in the right way, which is what are the experiments that we need to perform? The ones that are going to be truly meaningful, transformative, feasible from an experimental perspective and at the same time feeding into the machine learning in the right way you need to have at least a group that speaks both languages that understands the biology in terms of what's useful and also what's possible and at at the same time, a group of people on the computer science side who understand what the technology can do and where to find within that sort of stew

Starting point is 00:42:16 of the broader field of, say, biology or even drug discovery, problems that are both impactful and tractable. And those people who speak both languages are very few and far between. There's maybe a few more of them coming up as educational institutions become more cognizant of the need to train interdisciplinary people. But those people are very hard to find. If you talk to your average computer science person or machine learning engineer,

Starting point is 00:43:00 and you put them in a room with your average biologist or medical doctor, they could, and even if they come in with all of the good intentions of wanting to collaborate, they have not only completely different languages, they have completely different mindsets. So coming back to some of the earlier points that we made, Biology still, even today, is a lot about the details. And the reason for that is that the exceptions, those little nitpicky things that don't line up with everything else that you've seen, are often the starting point for new discovery. So people kind of want to look for those, whereas engineers really care about, let's find the principles that cover 95% of what we see, because that's going to be good enough for us to go and build systems. And so that mindset,

Starting point is 00:43:54 those two mindsets are so at odds with each other in many ways that getting people to really communicate in a way that is collaborative and constructive is really hard. And if I can point to the one thing that we've done at In-Citro that I'm the proudest of is that we've built a community of people that span a broad spectrum of disciplines in that range and are actually working as a single team. And that's just very unusual. Yeah, that'd be fascinating. I'm just sort of curious, like what is, you know, and this may require going out on a limb you don't want to go out on,

Starting point is 00:44:39 but, you know, one of the things that's made computing so much more powerful over the past five decades, like the entire course of modern computing history, is that we have this way of building abstractions that compose where we don't have to understand all of the little nitpicky things. I mean, it's useful to have a model for the nitpicky things when your abstractions fail so that you can go investigate things and figure out what went wrong. But by and large, you're sort of trusting a bunch of very powerful, very high-level abstractions when you go do your job as a computer scientist or a software engineer today. You know, everything from, like, I can just sort of push a button and a virtual machine materializes on a server, in the data center somewhere, in the cloud, and, like, I don't have to worry about all of the just colossal amount of complexity that makes that happen. Is there an equivalent mechanism at play in modern biology?

Starting point is 00:45:47 You know, it's interesting that you bring that up because I've, maybe not that surprising because we were both trained as computer scientists, but one of the things that I love about modern biology is that we're getting there. So, there's an emerging set of building blocks that are relatively well-defined in terms of what I'm going to call their API, which is obviously not a word a biologist would ever use, but they have a well-defined kind of input-output functionality. And these include things like CRISPR for genome editing, where you can basically say, okay, this is what I want to do to edit the cell. And then I do that, and there's a set of steps that we need to do, and then an edited cell comes out. So that's the glass half-full side of it,

Starting point is 00:46:42 that there are these building blocks that are emerging and you can start to compose them and do more interesting things with larger and larger, more complex programs, if you will, that are written in terms of those building blocks. The bad news is that each of those building blocks is in turn based on a system that is not a nice, predictable, well-understood system like a computer. It's something that involves living cells. And so everyone, I think, has heard about the risk of, say, I'm taking a very simple example, off-target effects of CRISPR editing. And the fact that which off-target effects you get depends on many things that we don't understand.

Starting point is 00:47:26 Not only which cell type it is, but the specific individual from whom it came gives rise to somewhat different consequences. The state that the cell was in at the time that the experiment was started. So you can think of these as, on the one hand, composable building blocks that you can start to sort of create systems with, but each of them is incredibly variable in its response. So it creates a distribution of outcomes that we really don't understand. And we need to design these experiments in a way that is robust enough that it's hopefully useful even despite that variability and put in what we as computer scientists would call QA pieces that measure as many of the pieces along the way that we possibly can in order to figure out what emerged from each of those building blocks so that we can trace the repercussions down the

Starting point is 00:48:35 line. And it's very hard. So when you ask what is it that makes this hard is that you have to bring that systems mindset of QA and tracking and putting in incredibly stringent sort of constraints on each of those building blocks in the same way that you do when you build an Intel microchip fab, for instance, to a discipline that really hasn't done as much of that, but in a way that is cognizant of all of the sources of variability and errors that might occur in a biological system. So that confluence is really hard to put together.

Starting point is 00:49:15 Well, and it strikes me that this bag of techniques that you are bringing from your background, so probabilistic modeling and machine learning, they're the best possible contemporary set of techniques for dealing with some of these uncertainties. Whereas if you had to go in and like describe these systems with a set of partial differential equations, you'd be lost from the outset.

Starting point is 00:49:42 I completely agree. I mean, unfortunately, our ability to describe biological systems using rigid mathematical, deterministic mathematical tools fails once you go beyond the atomic level. And even there, I mean, when you think about something that is relatively circumscribed, like a single protein folding, you can do some of the differential equation modeling. But even there, we've seen that techniques that take a step back and say, you know what,

Starting point is 00:50:15 let me not try and construct detailed mechanistic models, but instead let's give the machine enough data to learn from and it'll pick up patterns that might be useful that's what made the deep fold um uh success from deep mind work is that they took a machine learning approach that now the critical piece of course and that comes back to our conversation a moment ago, is that they had enough data to train on unfolded proteins. And getting enough high quality data is where it's, what it's all about in this new world of bringing machine learning into the space. And that's why we built in Citro the way we did. Awesome. Well, we're just about out of time, and I wanted to ask you before we wrapped up, what do you do for fun?

Starting point is 00:51:10 So you have what sounds to me like an incredibly fun job, but there must be something outside at work. Well, so first of all, I am grateful to have a job that is as much fun as this in the sense that I get to read all of the coolest papers in biology and all of the coolest papers in automation and in machine learning and figure out how to put them together in new ways and do it towards a goal that I think is just truly important, which is how do we make people healthier, which to me is, and I'm going to go on a soapbox for just a moment and talk about the fact that I think part of our goal here as we, you know, when we were put on this earth was to try and leave the world a little bit better than it was when we came into it. And we should be, we should be doing that. And for

Starting point is 00:52:07 those of us who had the privilege of being born to relatively affluent, well-educated families, where we didn't need to struggle for where our next meal is coming from, that burden is actually even higher. And we are, we should be thinking about how we can give back um so anyway sorry that was a no that's so important yeah uh so but that being said um when the thing that i most liked to do for fun pre-coronavirus was um to travel and see parts of the world that are different from the little cocoon where we live. I've been to 65 different countries so far. Six different continents. Have not yet been to Antarctica.

Starting point is 00:52:54 That's definitely on the bucket list. And I find it to be a wonderful experience, both in visiting other cultures and seeing how different people live, but also I love being out in nature and the outdoors and hiking and scuba diving and sailing and doing all that. So that is the thing I used to do for fun. I have no idea when the next time I'll be able to do that is, unfortunately, at this point in time so the other things that I like to do is just spending time with my family and you know going on local hikes in nature which are not perhaps as dramatic as visiting Iceland or the Great Barrier Reef or this incredible lake in Palau that has jellyfish that don't sting and you can swim in them. But at least it's being outdoors in the fresh air. And I'm lucky enough to live in a

Starting point is 00:53:54 part of the world that has some beautiful scenery, even locally. So I go for hikes a lot these days. Yeah. Well, hiking in Northern California is not bad at all. Nope. Can't complain too much relative to what the situation could be. But I do wish we could get back on a plane at some point and visit some of those amazing places elsewhere in the world. Well, I'm hoping that probably not as soon as we want, but sooner than we would ever have been able to do at any other point in human history. Science will be able to give us enough safety around coronavirus that hopefully you'll be able to travel soon. I won't make any predictions about when soon is, but let's hope for soon. Very much hope so. And I think if we do get to that point in the near term, and by near I mean within the next 12 to 18 months, I hope people will appreciate the miracle that it is and the many decades of work by so many people that needed to happen in order to make that possible.

Starting point is 00:55:04 Yeah, and I think that is the perfect place to stop. So thank you so much for being on the show today. This was a fascinating, fun conversation, and I'm glad we got to talk to you today. So am I. Thank you very much. Awesome. So that was Kevin's conversation with Daphne Kohler, CEO and founder of In-Citro. And oh my gosh, that was so interesting. There were so many amazing parts of that conversation. I'm not even honestly someone who's that into biology. And there are so many things that I'm going to think more about and that I want to kind of pull

Starting point is 00:55:46 on more strings based on that conversation. That was amazing. Yeah, I think one of the really great things about Daphne and one of the things that has made her such a great scientist and entrepreneur is that she thinks about everything that she does extremely deeply. She has this wide-ranging curiosity, which I think is one of the best superpowers. You combine that with persistence, and you find yourself in all of these situations where you are making connections across disciplines and doing a whole bunch of things that maybe you wouldn't be able to imagine if you were a more narrowly focused person or had a more narrowly focused set of interests. And like she said so many things in that conversation that I'm like, wow, I really need to go think about this more deeply myself. Like just one of the casual things that she said

Starting point is 00:56:48 was this need for like maybe, you know, an equivalent of little league sports or like a kid's Kaggle competition so that you can find the right competitive and social dynamic for kids getting themselves onboarded into machine learning. It's a great idea. Yeah, it really is. Somebody needs to go do that now. No, I'm in total agreement. Yeah, because we have Little League and we have other sorts of competitions. And when kids get older, there are some more science type of competitions, but to have something gamifying things when you're younger around machine learning would be brilliant.

Starting point is 00:57:26 That's a brilliant idea. And I loved, you know, kind of her origin story, you know, the fact that she was writing her thesis on, you know, game theory and distributed systems and multi-Asian incentive systems. Like, I was just like, this is brilliant. You know, these are things that you, that to your point, you would need certain curiosity and just wide-ranging interest and persistence to really want to pursue. One of the big takeaways I kind of got from this was something she said, you know, about how much science matters. And what are your thoughts about that, especially in the moment that we're living in right now? Well, I think the thing that she tried to draw our attention to several times is that we,

Starting point is 00:58:11 for this pandemic, and I think in general, like we're more dependent now on science to solve some of the really big problems that we are facing as a society or some of the challenges that we have to overcome in order to live our best lives and to have the future that we all want. And the thing to remember is none of this is sort of overnight. Like science is just years of substantial investment in a wide variety of things that build this foundation that when you get to a moment like the one that we have right now, you have all of the things that you need to go tackle these problems. So if you don't do these long-term investments and these foundational pieces in educating scientists and giving them the ability to go do this work that builds this solid, solid foundation and like carries the whole field forward, you really can get yourself

Starting point is 00:59:12 into a situation where a crisis comes along and like you just don't have any way for science to help solve it. And so like, I think that's the thing that we all really need to remember, you know, when, and hopefully it will redouble our resolve to go make even bigger investments in those foundations for the future. No, I think you're 100% correct. We need to continue to make these investments. And I love that there are people like Daphne who are taking these two different fields, you know, taking computer science and machine learning as well as biology and working together so that hopefully the right problems for doing things that will produce positive benefit for all of humanity. And, you know, we just were, to use Daphne's words, I think we were put here to try to leave the world a little bit better than we found it.

Starting point is 01:00:46 Absolutely. Absolutely. All right. Well, that's all for us today. Thank you again to Daphne Kohler. And we are so glad that you joined us. We learned so much information. And we hope that all of you at home got a little bit of nugget to impress all of your friends at your next socially distanced gathering. I know I definitely did. I'm definitely going to be dropping things

Starting point is 01:01:05 like digital biology in conversation now. And remember to reach out to us anytime at behindthetech at microsoft.com. Stay safe and be well. See you next time.

Your Ad Here

Behind The Tech with Kevin Scott - Daphne Koller, PhD: CEO and founder of insitro

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.