a16z Podcast - a16z Podcast: Deep Learning for the Life Sciences
Episode Date: June 6, 2019with Vijay Pande (@vijaypande) and Bharath Ramsundar Deep learning has arrived in the life sciences: every week, it seems, a new published study comes out... with code on top. In this episode, a16z Ge...neral Partner Vijay Pande and Bharath Ramsundar talk about how AI/ML is unlocking the field in a new way, in a conversation around their book, Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More (also co-authored with Peter Eastman and Patrick Walters. So -- why now? ML is old, bio is certainly old. What is it about deep learning's evolution that is allowing it to finally making a major impact in the life sciences? What is the practical toolkit you need, the right kinds of problems to attack, and the right questions to ask? How is the hacker ethos coming to the world of biology? And what might “open source biology” look like in the future?
Transcript
Discussion (0)
Hi, and welcome to the A16Z podcast. I'm Hannah. Deep learning has come to the life sciences.
Lately, it seems every week a published study comes out with code on top. In this episode,
A16Z general partner on the biofund Vijay and Bart Ramsundar talk about how AI and ML is unlocking
the field in a new way in a conversation around their recently published book, Deep Learning for
the Life Sciences, written along with co-authors Peter Eastman and Patrick Walters.
The book aims to give developers and scientists a toolkit on how to use deep learning.
for genomics, chemistry, biophysics, microscopy, medical analysis, and other areas.
So why now? What is it about ML's development that is allowing it to finally make an impact in this
field? And what is the practical toolkit, the right problems to attack, the right questions to ask?
Above and beyond that, as this deep learning toolkit becomes more and more accessible, biology is
becoming democratized through ML. So how is the hacker ethos coming to the world of biology,
and what might open source biology truly look like?
So, Bart, we spend a lot of time thinking about deep learning and life sciences.
It's a great time, I think, for people to become practitioners in this space, especially
for people, maybe that's never done machine learning before from the life sciences side,
or maybe people from the machine learning side to get into life sciences.
But maybe even a place to kick it off is, you know, what's special about now?
Like, why should people be thinking about this?
The challenge of programming biology has been that we don't know biology, and we make
up theoretical models and the computers are wrong and, you know, biologists and chemists
understandably get grumpy and say, why are you wasting my time?
But with machine learning, the advantages that we can actually learn from the raw data.
And all of a sudden we have this powerful new tool there.
It can find things that we didn't know before.
And this is why now is the time to get into it, really to enable that next wave of, you know,
breakthroughs in the core science.
The part that still blows me away is just how fast this field is moving.
And it feels like it's a combination of having the open source code on places like GitHub
and archive.
And there's like a paper a week that's impactful when it used to be maybe a paper a quarter
or a paper a year.
And the fact that code is coming with a paper, it's just layering on top.
I mean, that seems to me to be sort of the critical thing that's different now.
Yeah.
I think when you can clone a repo off GitHub, you all of sudden have new insights just because
I'm using a new language.
And now that thousands of people are getting into it, I think all of a sudden you'll find
lots of semi-self-thought biologists who are really starting to find new, interesting things.
And that is why it's exciting.
It's like the hacker ethos, but kind of coming into the bio world, which has typically been
much more buttoned down.
Now, I think anyone who can go into repo can start really making difference.
I think that's going to be where the real long-term impact arises from these types of
efforts.
You don't need a journal subscription to get archive or to get the code, which is actually that
alone is kind of amazing.
It wasn't that long ago where a lot of academic software was sold,
you know, and it may be sold for $500, which is very material.
That's one piece.
You connect that to the concept of now AI or ML can unlock things in biology.
Then biology is becoming democratized, is kind of sort of your point, right?
And so let's talk about that because we're still learning biology collectively.
What is it about deep learning in biology now?
Because biology is old, machine learning is old, like what's new now?
Deep learning has this question all over the place.
Why does it work now?
The first neural nets kind of popped out in the 1950s.
And I think it's really a combination of things.
I think that part of it is the hardware, really.
The hardware, the software, the growth of kind of rapid linear algebra stacks that have made it accessible.
I think also an underappreciated part of it is, you know, the growth of the cloud and the internet, really.
You know, neural nets are about as janky now as they used to be in the 80s.
The difference is that I can now pull up a blog post where someone says,
oh, these things are chanky, here's the 17 things I did.
I can copy, paste that into my code, and all of a sudden, I'm a neural net expert.
It's not quite that easy.
It turns it to a trade craft almost that you can learn by just working through it.
That's why the deep learning tool kit has been accessible.
Then you get to biology, and the question is, why biology, why now?
And I think you're actually the question's a little deeper.
I think that it's really about, I think, representation learning.
So we have now reached this point where I think we can learn representations of molecules that are useful.
This has been something that, you know, in the science of chemistry, we've been doing a long time.
There's been all sorts of, you know, hand-encoded representations of parts of molecular behavior that we think are important.
But I think now using the new technology from, you know, image processing, from word processing, we can begin to learn molecular representations.
And, you know, to be fair, I actually don't think we've really broken through there.
If you look at, you know, what's happening in images or text, there are five years ahead of us.
Well, let me break in here because just for the listeners to give a sense for why representation is important, one of my pet examples, is that if I gave anybody, like, say, two, five-digit numbers to add, it'd be trivial.
If I gave you those same five-digit numbers in Roman numerals, and you want to add them, the representation there would make this like insane.
And what would you do?
Well, you would convert into appropriate representation where the operations are.
trivial or obvious. And then the operation is done and maybe re-encode, auto-encode back to the other
representation. So this is the problem is like when you have a picture, representations are
obvious because it's pixels, right? And computers love pixels. And maybe even for DNA. Like DNA is like
a one-dimensional image. And so you have bases, but those kind of like pixels. We used to joke early
days that we would just take a photograph of a small molecule and then use all the other stuff. But
that's kind of insane too. And so with the right representation, things become
transparent and obvious with the wrong representation becomes hard.
This is really at the heart of machine learning.
It's that there's something about the world that I want to compute on.
But computers only accept, you know, very limited forms of input.
Zero ones, tax strings, like simple structures.
Whereas, like, if you take a molecule,
a molecule is like a frighteningly complex entity.
So one thing that we often don't realize is that until 100 years ago,
like we barely had any idea what a molecule was.
It's this alarmingly strange concept that although we see little diagrams in 10th grade chemistry or whatever, that isn't what a molecule is.
It's a much weirder, weirder quantum object, dynamic, kind of shifting, flowing.
We barely understand it even now.
So then you just really start asking the question of, like, what is water, for example?
Is it the three characters, H2O?
Is it, you know, two hydrogens and oxygen?
Is it some quantum construct?
Is it like this dynamic vibrating thing?
Is it this bulk mass?
There's so many layers to kind of the science of it.
So what you really want to do is like you've got to pick one.
And this is where it gets really hard, right?
Like if I'm thirsty, what I care about in water is a class of water.
If I'm trying to answer deep questions about, you know, the structure of Neptune,
I might want a slightly different representation of water.
The power of the new deep learning techniques is we don't necessarily have to pick a representation.
We don't have to say water is X or water is Y.
Instead, you say, let's do some math, and let's take that math and let the machine really learn the form of water that it needs to answer the question at hand.
So one form of mathematical construct is thinking of a molecule as a graph.
And if you do this, you can begin to do these graph deep learning algorithms that can really extract meaningful structure from the molecule itself.
We've learned finally that here's a general enough mathematical form we can use to extract meaningful insights about molecules
or these critical biological chemical entities that we can then use to answer real questions in the real world.
What I think is interesting here in particular is that so much has been developed on images
and there's a lot of biology that's images.
And so we could just spend the whole time talking about images and it could be microscopy or radiology
and tons of good stuff there.
But there's a lot of biology that's more than images.
and molecules a good example.
And for a long time, it seemed like deep learning
was being so successful in images
that that's all it really did.
And if you could sort of take your square peg
and put in whatever holes you got, it would work.
What you're talking about for graphs
is kind of an interesting evolution of this
because a graph in an image
are different types of representations.
But, you know, at a technical level,
convolutional networks for images
or graph convolutions for graphs
are kind of a sort of barring a concept
at a higher level.
The biology version of machine learning is starting to sort of grow up and starting to not just be a direct copy of what was done with images and in other areas, but now starting to be its own thing.
A five-year-old can really point out the critical points in an image, but you almost need a PhD to understand the critical points of a protein.
So you have this dual kind of weights or burden of understanding.
So it's taken a while for the biological machine learning approach to really mature.
Because we've had to spend so much time even figuring out the basics.
But now we're finally at this point where it feels like we are diverging a little bit from the core trunk of, you know, what people have done for images or text.
In another five years, I'm going to be blown away by what this thing does.
It's going to understand more deeply.
So we kind of have this sort of connection between democratization of ML, ML into biology, democratization into biology.
but I don't think we're there yet.
I think for ML, I think there really is a sense of democratization.
You could code on your phone and do some useful things
or certainly on a laptop, a cheap laptop.
I mean, but for biology, what is missing?
One is data.
And there's a fair bit of data.
On the book, we talk about the PDB, we talk about other data sets
and there are publicly available data sets.
But somehow that doesn't get you into the big leagues.
So like if in this vision of democratizing biology,
what's left to be done?
In some ways, the democratization of ML is a teeny bit of an illusion even.
It's because that the core constructs were mathematically invented,
that there is this convolutional neural net or its cousins, the LSTM,
or the other forms of core mathematical breakthroughs that have been designed,
that you can take these building blocks and just apply them straight out.
In biology, as you pointed out earlier, I think we don't have those core building blocks just yet.
we don't know what the Lego pieces are that would enable a newcomer to really start to do, you know, breakthrough work.
We're closer than we were.
I think we've had the beginnings of a toolbox, but we're not there yet.
Well, let's think about what happened on the ML side as inspiration for the bio side.
How much is it driven through academia, how much driven through companies?
Because what I'm getting at is that there's a lot of bio in academia.
I don't know if we're seeing that being made open sourced in companies.
We're getting to this really weird set of influences where,
In order for companies to gain influence, they need to open source.
So this is why, you know, 10 years ago, I can't imagine that Google would have open source
TensorFlow.
It would have been core proprietary technology.
But now they know that if they don't do that, developers will shift to some other platform.
Right.
Buy some other company.
Exactly.
So it's weird that the competitive market forces are driving democratization.
Well, so high torch, basically sort of Facebook based, right, and TensorFlow from Google.
So let's say Google kept pencil for proprietary, what would be so bad for them if they did that?
What if everybody outside used Pythorch?
This actually, I think there's like a really neat kind of analogy to the financial sector.
A lot of financial banks have masses of functional programs that they keep under the hood, under the covers.
If you look at Jane Street or I believe Standard and Chartered or a few other these other big institutions,
lots and lots of functional code hiding behind those walls.
But that really hasn't really infiltrated further out.
And this actually, I think, in the long run, weakens them because it's harder to train.
It's harder to find new talent.
It's more specialized.
A lot of the codebase at Google is proprietary, like the original map producers never put out there.
And this, I think, has actually caused them a little bit of a problem in that new developers coming in have to spend months and months and months getting up to speed of the Google stack.
Whereas, if you look at TensorFlow, it doesn't take any time at all.
someone could walk in and basically be able to write.
They've been using it for months to years.
Exactly.
And I think at the scale that big tech is at, this is just like, it's a powerful market
advantage.
You're almost outsourcing their education process.
Yeah, and I guess if they don't put it out, someone else will, and then they'll learn
on their platform.
Yes, but then maybe what is the missing part in biology?
We've got pharma, you know, a huge force there, but they have very specific goals.
A lot of agricultural companies, but it's much more distant.
bro.
Yeah.
It's dramatically hard to actually take an existing organization and turn it into an AI
machine learning organization.
And so one thing I've honestly been surprised by is that, you know, when I've seen companies
or organizations I know try to incorporate AI into their drug discovery process, it ends
up taking them years and years and years because they're fighting all these upstream battles,
to get their, you know, old computing system to upgrade to the right version of, you know,
their numerical library so they could even install TensorFlow. And then they had all these things
about who can actually say, upgrade kind of the core software, who is it kind of this department,
like how much does you need to talk to the biologists, to the chemists? The fact is that
pharma and, you know, existing big codes are not built this way. That's not their core expertise.
Whereas if you look at, you know, Facebook or Google,
they've been machine learning for almost two decades now,
like from the first AdWords model.
So in some sense, like they had to change very little about their culture.
Like, you know, there's a slight difference instead of like this function, use that function, but whatever.
But the core culture was there.
Exactly.
And I think the culture, the people, changing that is going to be dramatically hard.
So which is why I think it will really take, I think, 10 years and a generation of students who have been trained in the new way to come in and shift.
Yeah, well, Google was a startup too, right?
I think, you know, the thesis was that and is that startups will be able to build a new culture.
And I think the key thing that I think we're seeing sort of boots on the ground is that
that culture has to be not, here's your data scientists or machine learning people in one room
and your biologist in another room, that they'd have to be the same.
What's intriguing to me is just the size of the bio market.
Biology is health care, it's agriculture, it's food, it could be the future of manufacturing.
There are so many different places that biology plays a role to date and will play a role.
But it just means that I think to the point we're talking about,
these companies just are being built right now.
There's, I think, this whole host of challenges here because biology is hard.
And building kind of like that selective understanding of like, you know,
of the 10 best practices that existed, five are actually still best practices.
The other five, we need to toss out a window and stick in a deep learning model.
That kind of very painstaking process of experimentation and understanding, that I think is like where the really hard innovation is happening.
And that's going to take time.
You're never going to be able to replace like a world-class biologist with any machine learning program.
A world-class biologist is typically freaking brilliant.
And they often bring a set of understanding that no programmer or no computer scientists can.
Now, the flip side holds true.
And I think that merger, as you said, that's where like there's power for magic diet.
One really interesting factoid I heard from an entrepreneur in the space is that, you know, the best biologists that they could hire had a market rate that was lower than a introductory, intermediate, you know, front-hand dev.
Yep.
And, you know, of course, front-hand is very hard engineering.
I don't want to put that down.
But there's so many fewer these biologists, so there's almost this market imbalance of how is it possible that, you know, you can take really a world-class
biologists of whom there's maybe a couple hundred in the world and not have them be valued
properly by the market.
So do you even out those pay scales in one company?
Do you like have two awkward pay ladders that coexist and create tension in your company?
These are the types of like really hard operational questions that almost have nothing to do
the science, but at the heart of it they do.
Maybe it's interesting to talk about like how we can help people get there.
Yeah.
So what's like the training they should be doing?
Maybe we could even go like super nuts and bolts.
So I got my laptop.
What do I do?
So, I mean, like, I guess there's a couple key packages we install.
It's like TensorFlow, maybe DeepGAM, something like that.
Python is often already installed, let's say, on a Mac.
Is that it?
And then we start going through papers and books and code?
I think the first place really is to...
You need to form an understanding of, like, what are the problems even that you can think
about?
I think if you're not trained as a biologist, and even if you are, you might not see that
that intersection of these are the problems where biological machine learning can or cannot
work. And that I think is really what the book tries to teach you, as in like, what's the
frame of thinking? What's the lens at which you look at this world and say that, oh, that is data
coming out of a microscope. I should spend 30 minutes spin up a connet and iterate on that.
This is a really gnarly thing about how I prepare my like, you know, see elegant samples.
I don't think the deep learning is going to help me here. And I think it's a very gnarly thing.
that blend of knowledge that the book tries to give you. It's like a guidebook. When you see a new
problem, you ask, is this a machine learning problem? If so, let me use these muscles. If it's not
a machine learning problem, well, I know that I need to talk to someone who does know these things.
And that's what we try to give. Vandering has a great rule of thumb. If a human can do it in a
second, deep learning can probably figure it out. So start with something like, say, microscopy.
You have an image coming in, and an expert can probably eyeball and say, interesting, not interesting.
So there's this binary choice, and there's some arcane black box that was trained within the expert's head and experience.
That's actually the sort of thing machine learning is, like, made to solve.
So really ask yourself, like, when you see something like that, is there some type of perceptual input coming in?
Image, sound, text, and increasingly molecules, a weird new form of perception, almost
magnetic or quantum. But you have perceptual input coming in. And is there a simple right-wrong,
left-right, you know, intensity-type answer that you want from it? If you do, that's really
a machine learning problem at its heart. So that's one type of machine learning. And I think the
benefit there of that what a human can do in a second, deep learning can do, especially since
in principle on the cloud you could spin up 100,000, 10,000 servers, suddenly you've got
10,000 people working to solve a problem and then they go back to something.
else. That's just something you can't do with people. Or you've got 10,000 people working 24-7
as necessary. Can't do that with people. But there's another type of machine learning, which is
to do things people can't. Or maybe more specific, do things individual people can't, but maybe
crowds could. So like we see this in radiology, right, where the machine learning can have
accuracies greater than an individual akin to what, let's say, the consensus would be, which
would be the gold standard. That's maybe the real exciting part, sort of the so-called superhuman
intelligence. Where are the boundaries of possibilities there?
One of the biggest problems really with deep learning is that you have some like strange and
crazy prediction. Now, I think that there's a fallacy that people fall into of trusting the
machine too easily because 90% of the time that's going to be garbage. And I think really
kind of the challenge of picking out these bits of superhuman insight is to know how to shave
off the trash predictions.
Yeah.
Is 90% an exaggeration, or is it really 90%?
I like nice round numbers, so that might have just been something I picked out.
But there's like this great example, I think, in medicine.
So there's scans coming in, and the deep learning algorithm was doing, like, amazing at predicting
it.
And then, like, they dug into it, and it turned out that the scans came from three centers.
One of them had, like, some type of center label that was, like, the trauma center
or something.
There was the other non-trauma center.
The deep learning algorithm had, like a kindergartner told to do this,
learn to identify the trauma flag and flag those and uptick those.
So if you did this, like, naive statistics of blending them all out,
you'd look amazing.
But really, it's looking for a sticker.
Yeah.
I mean, there's tons of examples like that,
one with the pathologist with the ruler in there,
and it's becoming a ruler detector and so on.
Like, you know, this AUC, like a sense of accuracy of close to 1.0.
We all got to be very suspicious of that
because just running a second experiment wouldn't predict
the first experiment with that type of accuracy.
Anything that's too good to be true probably is.
Yeah, I think then you get into the really subtle challenges,
which is that, you know, the algorithm tells me this molecule
should be non-toxic to a human
and should have effect on this, you know, indication.
Do I trust it?
Is it possible that there's a false pattern learned there?
Humans make these types of mistakes all the time, right?
Like, if you have any type of, like, actual biotech,
you know that there's going to be molecules made
or theses that are disproven.
So you're getting into the hard core of learning, which is that, is this real?
The reality is we don't have answers to these.
We're really kind of trending into the edge of machine learning today, which is that,
is this a causal mechanism?
Does A cause P?
Is it a spurious correlation?
And now we're getting to that place where humans aren't necessarily better.
We talk about some techniques for interpreting,
for looking at kind of what informed the decision of the deep learning algorithm.
And we do provide a few kind of tips and tricks.
to start thinking about it.
But the reality is that's kind of the hard part
of machine learning. It's the edge.
The interpreting chapter is one of my favorite ones
because it's often sort of become
so-called common wisdom that machine learning
is a black box. But in fact, it doesn't
have to be and there's lots of things to do and we
are quite prescriptive there. So the
interpretability I think also is frankly
what's going to make human beings
more at peace with this. And this isn't
anything unique to machine learning. Like if you had
some guru who's like just spouting
off stuff and said, you know,
buy this stock X and short stock Y and put all your life savings into it, you probably would
be thinking, okay, well, maybe, but why? So I think this is just human nature and there's no
reason why our interaction with machines would be any different. What I think is interesting
is human beings are notoriously bad at causality. Like we kind of attribute things to be causal
when they're not causal at all. We do that in our lives from like, you know, why did that person
give me that cup of coffee to why did that drug fail? Like all these different reasons.
There's two big misconceptions about machine learning.
One is lack of interoperability.
The second one is correlation doesn't mean causation, which is true, but somehow people
take that to mean it's impossible to compute causality.
And that's the part that I think people have to really be educated on because there are now
numerous theories of causality, and you could use probabilistic generative models, PGMs.
There's lots of ways to go after causality.
The whole trick, though, is you need time series data.
What's beautiful about biology, or at least in health care, is that we've got time series
data in many cases. So now perhaps finally there's the ability to really understand causality
in a way that human beings couldn't because we're so bad at it and machines are good at
it and we've got the data. Can you think of like a place where, you know, in your experience
the algorithms have succeeded in teasing out a causal structure that people missed? Yeah. So, you know,
I think in healthcare, we always think about what is leading to various changes like, you know,
this drug having adverse effects, this diet having, probably.
positive or negative effects, all of these things are being understood and in the sort of the category
of real world evidence, which is a big deal in pharma these days.
And if you think about it, like a clinical trial is really a poor man surrogate for not
understanding causality.
Because if we don't understand causality, you've got to do this thing where it's double
blind, we start from scratch, and I'm following it in time, and we see it.
If you understood causality, you might be able to just get a lot of results from just mining
the data itself.
As a great example, you can't do clinical trials for all pairs of drugs.
I mean, just doing for a single drug is ridiculously expensive and important,
but all pairs of drugs would never happen.
But people take pairs of drugs all the time.
And so their adverse effects from real world data is probably the only way to do it.
And we can actually get causality.
There's tons of interesting journal medicine papers saying, aha, we found this from doing data analyses.
I think that's just starting out.
Honestly, I think that AI, bio-AI drug discovery needs to take a page.
from the self-driving car companies in the neighboring self-driving car worlds
simulators are all the rage and really because it's that same notion of like
causality almost like there's a structure to the world like pedestrians walk out
chickens alligators whatever crazy things I saw this for the picture it happens
yeah it happens so I think there they've built this amazing infrastructure of being
able to run these repeated experiments almost a randomized clinical trials but informed
by real data. We don't yet have that infrastructure in bio-world. And I know there's a couple of
exciting startups who are starting to kind of move towards that direction, but I think it's when
we can really probe the causality at scale. And then in addition to just probing it, when the
simulator is wrong, use the new data point that came in and have the simulator, you know,
learn to fix itself. That's when you get to this really amazing feedback loop that could really
revolutionize biology. So, you know, we talked about some basic nuts and bolts about
about how to get started and the framing of questions,
which is a key part.
So let's say people, they're set up,
they've got their question, where do they go from there?
I mean, in a sense, we're talking about something
closer to open source biology.
And to the extent that biology is programmable
and synthetic biologies, I think, very much,
you know, it's been around for a while,
but I think it's really starting to crest.
How do these pieces come together
such that we could finally get to this sort of open source
biology, democratization of biology?
The big part of this is really,
the growth of community.
There are people behind kind of all these, you know, GitHub pages that you see.
There's real decentralized powerful organizations that if you look at the Linux Foundation,
if you look at, say, the Bitcoin Core Foundation, there are networks of open source
contributors, really, that form this like brain trust.
It's very diffuse.
It's not centralized in the Stanford, Harvard Med Department or whatever.
And I think what we're going to see is the advent of similar decentralized brain trust
in the bio world.
as in a network of experts who are kind of spread across the world
and who kind of contribute through these code patches.
And that, I think, is not at all new to the software world.
We've seen that for decades.
It's totally new to biology.
It's alien.
You'd be surprised how much skepticism there can be at the idea
that a non-Harvard trained, say, biologists can come up with a deep insight.
We know that, to be a fact, right?
There is multiple PhDs worth of work in just like the Linux kernel
that that community really doesn't care to get that stamp of approval.
So I think we're going to see the similar parallel kind of knowledge base that grows organically.
But it takes time because you're talking about the building of another kind of almost educational structure,
which is this new and exciting direction.
Here's the challenge I worry about the most, which is that, like, so you're building a Linux kernel,
you can test whether it works or doesn't work relatively easily.
Even as it is, there's this huge reproducibility crisis in biology.
So how does one sort of become immune from that or at least not tainted by that?
How do you know what to trust?
And this is a really, really interesting question.
And this is kind of shading a little bit almost into the crypto world, right?
Like you could potentially think about this experiment where you have like a molecule.
You don't know what's going to happen to it.
But maybe you create a prediction market that talks about the future of this molecule.
And you could then begin to create these historical records of predictions.
And we all know they're kind of like expert drug pickers at big pharma who can like eyeball and say that is going through, that is failing.
And five years later you're like, well, shit, okay, yes, I was right.
There is the beginnings of infrastructure for these feedback mechanisms, but it's a really hard problem.
Yeah.
I'm trying to think, though, what that would be.
Like, the cute thing is like you could imagine if it was a simple question, like, is this drug soluble?
Someone might run a cheap software calculation.
Someone might do the experiment.
And there's different levels of cost of having different.
levels of certainty.
You're essentially describing a decentralized research facility.
Maybe the question is who would use it?
This is, I think, the really hard part because I think that biopharma tends to be a little
more risk-averse for good reasons than many other industries.
But I actually think that in the long run, this could be really interesting.
Because if you have multiple assets in a company, you could kind of like disbundle the assets.
then you could start to get this much more granular understanding of like what assets actually
do well, what assets don't.
And if you make it okay for people to like place a bet on these assets, all of a sudden
it's de-risk because if you're a big farmer and you're like, I don't really believe that
Alzheimer's molecule does what is claimed.
But I'm going to say like 15% odds that goes through.
I'll just invest 15% of what I would have in another world.
Well, the trick is, and especially what we're talking about now, is the world of financial instruments as well,
is the trick is you have to be able to know how to risk an asset.
And so it could be, in the end, one of the first interesting applications of deep learning, machine learning,
is to use all the available data to give the ML maximum likelihood estimator of what we think this asset is going to be.
It prices the asset, and then people can go from there.
It's kind of a fun world where we're sort of thinking about how the financial world, machine learning world,
and biology come together to kind of decentralize it and democratize it.
I think there's opportunities to kind of like allow for more risks, the long tail to be played out.
You don't have as many interesting hypotheses that grow dead in the water because it wasn't, you know, de-risk enough for a big bet.
So, you know, what I think the big takeaway for me here is that there is that possible world,
but I forget if this is the way you learned how to program, the way many of us did, is I learned when I was like 11 on like a, actually a T-I-99A.
and I was just playing around with it
and I learned so much because it was
I could just get my hands right in it
and I think kind of my hope for the book
is that it's kind of the equivalent in biology
that people can get their hands in it
and I don't know where they're going to go with it
it would be super cool if they go where we're describing
and that's one of the possible of any futures
but I think that's what we're hopefully being able to give people
we are opening out the sandbox here's what we've learned
in kind of these very exclusive academic institutions
let's throw the gate open, say, here's as much as we know as we can try to distill it down
and do what you will with it.
Like open source means no permission, so go to town and hopefully do something good for the world is kind of the dream.
That sounds fantastic.
Well, thank you so much for joining us.
Thank you for having me.