Microsoft Research Podcast - NeurIPS 2024: AI for Science with Chris Bishop
Episode Date: December 13, 2024In this special edition of the podcast, Technical Fellow and Microsoft Research AI for Science Director Chris Bishop joins guest host Eliza Strickland in the Microsoft Booth at the 38th annual Confere...nce on Neural Information Processing Systems (NeurIPS) in Vancouver, British Columbia, to talk about deep learning’s potential to improve the speed and scale at which scientific advancements can be made.
Transcript
Discussion (0)
Welcome to the Microsoft Research Podcast,
where Microsoft's leading researchers
bring you to the cutting edge.
This series of conversations showcases
the technical advances being pursued at Microsoft,
to the insights and experiences of the people driving them.
I'm Eliza Strickland, a senior editor at IEEE Spectrum,
and your guest host for a special edition of the podcast.
Joining me today in the Microsoft booth at host for a special edition of the podcast.
Joining me today in the Microsoft booth at the 38th Annual Conference on Neural Information Processing Systems, or NeurIPS, is Chris Bishop.
Chris is a Microsoft Technical Fellow and the Director of Microsoft Research AI for
Science.
Chris is with me for one of our two on-site conversations that we're having here at the
conference.
Chris, welcome to the podcast.
Thanks, Eliza, really great to join you.
How did your long career in machine learning
lead you to this focus on AI for science?
And were there any pivotal moments
when you started to think that,
hey, this deep learning thing,
it's going to change the way
that scientific discovery happens?
Oh, that's such a great question.
I think this is like my career coming full circle, really.
I started out studying physics at Oxford,
and then I did a PhD in quantum field theory.
And then I moved into the fusion program.
I wanted to do something of practical value,
so I worked on nuclear fusion for about seven or eight years
doing theoretical physics.
And then that was about the time that Geoff Hinton published
his back-prop paper, and it really caught my imagination
as an exciting approach to artificial intelligence
that might actually yield some progress. So that was kind of 35 years ago and I moved into the field
of machine learning and actually the way I made that transition was by applying neural networks to
fusion. I was working at the JET experiment, it was the world's largest fusion experiment, it was sort
of big data in its day and so I had had to, first of all, teach myself to program
because I was a pencil and paper theoretician up to that point,
persuade my boss to buy me a workstation,
and then started to play with these neural nets.
So right from the get-go, I was applying machine learning
35 years ago to data from science experiments.
And that was a great on-ramp for me.
And then eventually I just got so distracted,
I decided I wanted to build my career in machine learning,
spent a few years as a research professor,
and then joined Microsoft 27 years ago
when Microsoft opened its first research lab
outside the US in Cambridge, UK,
and I've been there very happily ever since.
Went on to become lab director,
but about three or four years ago, I realized that not only was deep learning transforming
so many different things, but I felt it was especially relevant to scientific discovery.
And so I had an opportunity to pitch to our chief technology officer to go start a new
team.
And he was very excited by this.
So just over two and a half years ago,, we set up Microsoft Research AI for Science.
And it's a global team,
and it sort of does what it says on the tin.
So you've said that AI could usher in a fifth paradigm
of scientific discovery,
which builds upon the ideas of Turing Award winner Jim Gray,
who described four stages in the evolution of science.
Can you briefly explain the four prior paradigms and then
tell us about what makes this stage different? Yeah, sure. So it was a nice insight by Jim. He
said, well, of course, the first paradigm of scientific discovery was really the empirical
one. I tend to think of some cave dweller picking up a big rock and a small rock and letting go
of them at the same time and thinking the big rock will hit the ground first and discovering
they land together. And this is interesting. They've discovered a sort of pattern of regularity in
nature. And even today, the first paradigm is in a sense the prime paradigm. It's the most important
one because at the end of the day, it's experimental results that determine the truth, if you like. So
that's the first paradigm and it continues to be of critical importance today. And then the second paradigm really emerged
in the 17th century, when Newton discovered
the laws of motion and the law of gravity.
And not only did he discover the equations,
but this sort of remarkable fact that nature
can even be described by equations, right?
It's not obvious that this would be true,
but it turns out that the world around us
can be described by very simple equations
that you can write on a T-shirt. And so in the 19th century, James Maxwell discovered some simple equations that describe the
whole of electricity and magnetism, electromagnetic waves and so on. And then very importantly,
the beginning of the 20th century, we had this remarkable breakthrough in quantum physics.
So again, down at the molecular, the atomic level, the world is described with exquisite
precision by Schrodinger's equation
and so this was the second paradigm the theoretical that the world is described with incredible precision of a huge ranges of length and time by very simple equations but of course
there's a catch which is those equations are very hard to solve um and so the the third paradigm
really began i guess the sort of the 50s and 60s, the development of digital computers.
And actually, the very first use of digital computers was to simulate physics.
And it's been at the core of digital computing right up to the present day.
And so what you're doing there is using a computer to go with a numerical algorithm to solve those very simple equations, but solve them in a practical setting.
And so I'll refer to that as simulation. That's the third paradigm. And that's proven to
be tremendously powerful. If you look up the weather forecast on your phone today,
it's done by numerical weather forecasting, solving, in those cases, Navier-Stokes
equations using big numerical simulators. What Jim Gray observed though, really
emerging at the beginning of the 21st century,
was what he called the fourth paradigm or data intensive scientific discovery.
So this is the era of big data. Think of particle physics at the CERN accelerator, for example,
generating colossal amounts of data in real time.
And that data can then be processed and filtered. We can do statistics on it. But of course,
we're going to do machine learning on that data. And so machine learning feeds off large data.
And so the fourth paradigm really is dominated today by machine learning. And again, that remains
tremendously important. What I noticed, though, is that there's, again, another framework. We call
it the fifth paradigm. Again, it goes back to those fundamental equations but again it's driven by computation and it's the idea that we
can train machine learning systems not using the empirical data of the fourth paradigm but instead
using the results of simulation so the output of the third paradigm. So think of it this way. You want to predict the property of some molecule, let's say.
You could, in principle, solve Schrodinger's equation on a digital computer.
It'd be very expensive.
And let's say you want to screen hundreds of millions of molecules.
That's going to get far too costly.
So instead what you can do is have a mindset shift.
You can think of that simulator not as a tool to predict the molecule's properties
directly, but instead as a way of generating synthetic training data. And then you use that
train later to train a deep learning system to give what I like to call an emulator,
an emulator of the simulator. Once it's trained, that emulator is fast. It's usually three to four
orders of magnitude faster than the simulator.
So if you're going to do something over and over again, that three to four order of magnitude
acceleration is tremendously disruptive.
And what's really interesting is we see that fifth paradigm occur in many, many different
places.
The idea goes back a long way.
Actually the last project that I worked on before I left the fusion program
was to do what was the world's first ever real-time control of a tokamak fusion plasma
using a neural net. And the computers of the day with the processors were just far too slow,
long before GPUs and so on. And so it wasn't possible to solve the equations in that case
of the called the Grad-Schefranov equation. Again, a simple differential equation you could write on a t-shirt, but solving it was
expensive on a computer.
We were about a million times too slow to solve it directly in real time.
And so instead, we generated lots and lots of solutions.
We used those solutions to train a very simple neural network, not a deep network, just a
simple two-layer network back in the day. And then we implemented that in special hardware and did real-time feedback control. So
that was an example of the fifth paradigm from a quarter of a century ago. But of course,
deep learning just tremendously expands the range of applicability. So today we're using
the fifth paradigm in many, many different scenarios. And time and time again, we see these four orders of magnitude acceleration.
So I think it's worthy of thinking of that as a new paradigm because it's so pervasive and so ubiquitous.
So how do you identify fields of science and particular problems that are amenable to this kind of AI assistance?
Is it all about availability of data or the need for that kind of speed up?
So there are lots of factors that go into this. And when I think about AI for science,
actually the space of opportunity is colossal because science is really just understanding
more about the world around us. And so the range of possibilities is daunting, really.
So in choosing what to work on, I think there are several factors. Yes, of
course, data is important, but very interestingly, we can use experimental
data or we can generate synthetic data by running simulators. So we're a big fan
of the fifth paradigm. But I think another factor, and this is particularly
at Microsoft, is thinking about how can we have real-world impact at scale?
Because that's our job, is to make the world a better place and to do so at a
planetary scale. And so we've settled on for the most part working at the molecular level so if
you think about the number of different ways of combining atoms together to make new stable
configurations of atoms is gargantuan i mean the number of just small molecules small organic
molecules that are potential drug candidates is about 10 to the power 60. It's about the same as the number of
atoms in the solar system. The number of proteins may be that the fourth power of
the number of atoms in the universe or something crazy. So you've got this
gargantuan space to search and within that space for sure there'll be all
sorts of interesting molecules, materials, new drugs, new therapies, new materials for carbon capture, new kinds of batteries, new photovoltaics.
The list is endless because everything around us is made of atoms, including our own bodies.
So the potential just in the molecular space is gargantuan.
So that's why we focus there.
It's a big focus.
It's a broad focus.
So let's take one of these sort of case studies then.
In a project on drug discovery,
you worked with the Global Health Drug Discovery Institute on molecules that would interact with tuberculosis and coronaviruses, I think.
And you found, I think, candidate molecules in five months instead of several years.
Can you talk about what models you used in this work
and how they
helped you get this vastly sped up process?
Sure, yes. We're very proud of this project. We're working with the Gates Foundation
and the Global Health Drug Discovery Institute to look at, particularly at diseases that
affect low-income countries like tuberculosis. And in terms of the models we use, I think
we're all familiar with the large language model. We train it on a sequence of words or a sequence of word tokens, and it's trained to predict the next token.
We can do a similar thing, but instead of learning the language of humans, we can learn the language of nature.
So in particular, what we're looking for here is a small organic molecule that we could synthesize in a laboratory
that will bind with a particular target protein.
It's called ClipP.
And by interfering with that protein, we can arrest the process of tuberculosis.
So the goal is to search that space of 10 to the 60 molecules
and find a new one that has the right properties.
Now, the way we do this is to train something that's essentially a transformer.
So it looks like a language model.
But the language it's trained on is a thing called smiles strings.
It's an idea that's been around in chemistry for a long time.
It's just a way of taking a three-dimensional molecule and representing it as a one-dimensional
sequence of characters.
So this is perfect for feeding into a language model.
So we take a transformer and we train it on a large database of small organic molecules
that are sort of typical of the
kinds of things you might see in the space of drug molecules.
Once that's been trained, we can now run it generatively and it will output new molecules.
Now we don't just want to generate molecules at random because that doesn't help.
We want to generate molecules that bind to this particular binding site on this particular
protein.
So the next step is we have to tell the model about the protein and the protein binding
site.
And we do that by giving it information about, not actually, well, we do tell it about the
whole protein, but we especially give it information about the three-dimensional geometry of the
binding site.
So we tell it about the locations of the atoms in the binding site.
And we do this in a way that satisfies certain physics constraints,
sort of equivariance properties, it's called.
So if you think about a molecule, if I rotate the molecule in space,
the positions of all the atoms changes in a complicated way.
But it's the same molecule, has the same energy and other properties and so on.
So we need the right kind of representation.
That's then fed into this transformer using a technique called cross attention.
So internally, the transformer uses self-attention to look at the history of tokens but it can now use cross attention to look at another model
that understands the proteins. But even that's not enough because in discovering
drugs and exploring this gargantuan space and looking for these needles in a
haystack what typically happens you find a hit, a molecule that binds, but now you want to
optimize it. You want to make lots of small variations of that molecule in order to make
it better and better at binding. So the third piece of the architecture is another module,
I think called a variational autoencoder, that again uses deep learning. But this time,
you can take as input an organic molecule that is already known, a hit that's already known to bind
to the site. And that, again, is fed in through cost attention. And now the SMILES
autoregressive model can now generate a molecule that's an improvement on the starting molecule
and knows about the protein binding. And so what we do is we start off with the state-of-the-art
molecule. And the best example we found is one that's more than two orders of magnitude
stronger binding affinity to the binding pocket,
which is a tremendous advance.
It's the state-of-the-art in addressing tuberculosis.
And of course, the exciting thing is
that this is tested in the laboratory.
So this is not just a computer experiment
and some sort of benchmark or whatever.
We sent the description of the molecule
to the laboratories at Giddy.
They synthesized the molecule, characterized laboratories at Giddy. They synthesized
the molecule, characterized it, measured bindings properly and said, well, hey, this is a new state
of the art for this target protein. So we're continuing to work with them to further refine
this. There are obviously quite a few more steps, if you know about the drug discovery process.
There are a lot of hurdles you have to get through, including, of course, very important
clinical trials before you have something that can actually be used in humans. But we're already hugely excited about the fact
that we were able to make such a big advance so quickly in such a short amount of time compared
to the usual drug discovery process. And while you were looking for that molecule that had the
proper characteristics, were you also determining whether it could be manufactured easily,
like trying to think about practical realities of bringing this thing out of the computer and
into the lab? Great question. I mean, you're hinting there at the fact that the discovery
process, of course, is a long pipeline. You start with the protein, you have to find a molecule that
binds, you then refine the molecule. Now you have to look at ADMET, you know, the absorption and
metabolism and excretion and so on of the molecule.
But also make sure that it's not toxic.
But then you need to be able to synthesize it.
It's no good if nobody can make this molecule.
So you have to look at that.
So actually, in the AI for Science team, we look at all of these aspects of that drug discovery process.
And we find particular areas, especially where there's sort of low-hanging fruit, where we can see that deep learning can make a big impact.
It doesn't necessarily help much to take a very easy, fast piece of the pipeline and go work on that.
You want to understand what are the bottlenecks, and can we really unlock those with deep learning?
So we're very interested in that whole process.
It's a fascinating problem. You've got a gargantuan search space,
and yet you have so many different constraints that need to be met.
And deep learning just feels like the perfect tool to go after this problem.
When you talk to the scientists
that you collaborate with,
is AI changing the kinds of questions
that they are able to ask, they want to ask?
Oh, for sure.
And it's really empowering.
It's enabling those working in the drug discovery space
I think to think in a much more expansive way.
If you think about just the kind of acceleration that I talked about from the fifth paradigm, if you've got a four order of
magnitude acceleration, okay, it may not sound like much of a dent onto the 10 to the power 60 space,
but now when you're exploring variants of molecules and so on, the ability to explore that space
orders of magnitude faster allows you to think much more creatively, allows you to think in a more expansive way about how much of that space you can explore and how
efficiently you can explore it.
So I think it really is opening up new horizons.
And certainly we have an exciting partnership with Novartis.
We've been working with them for the last five years and they've been deploying some
of our techniques and models in practice for their drug discovery pipeline.
And we get a lot of great feedback from them about how exciting they're finding these techniques to use in practice,
because it is changing the way they go about doing the drug discovery process.
To jump to one other case study, we don't have to go into great detail on it,
but I'm very curious about your project Aurora, this foundation model for state-of-the-art weather
forecasting that I believe is 5,000 times faster
than traditional physics-based methods.
Can you talk a little bit about how that project is evolving,
how you imagine these AI forecasting models working
with traditional forecasting models perhaps,
or replacing them?
Yes, so I said most of what we do is down
at the molecular level, so this is one of the exceptions. so this is really at the global level, the planetary level.
Again it's a beautiful example of the fifth paradigm because the way
forecasting has been done for a number of decades now and the way most
forecasting is done at the moment is through what's called numerical weather
prediction. So again you have these simple equations, it's no longer the
Schrodinger's equation of atomic physics. It's
now Navier-Stokes equations of fluid flows and a whole bunch of other equations that describe
moisture in the atmosphere and the weather and so on. And those equations are solved on a
supercomputer. And again, we can think of that numerical simulator now, not just as the way
you're going to do the forecasting, but actually as the way to generate training data for a deep learning emulator.
So several groups have been exploring this over the last couple of years.
And again, we see this very robust three to four order of magnitude acceleration.
But what's really interesting about Aurora, it's the world's first foundation model.
So instead of just building an emulator of a particular numerical weather
simulator, which is already very interesting, we trained Aurora on a much more diverse set
of data and really trying to force it not just to emulate a particular simulator, but
really as it were understand or model the fundamental equations of fluid flows in the
Earth's atmosphere. And then the reason
we want to do this is because we now want to take that foundation model and fine-tune it to other
downstream applications where there's much less data. So one example would be pollution flow. So
obviously the flow of pollution around the atmosphere is extremely important,
but the data is far more sparse. There are far fewer sensors for pollution than there are for sort of wind and rain and certain temperature
and so on. And so we were able to achieve state-of-the-art performance in modeling
the flow of pollution by leveraging huge data and building this foundation model
and then using relatively little data on pollution monitoring to build that
downstream fine-tuned model. So a beautiful
example of a foundation model. That is a cool example. And finally, just to wrap up, what have
you seen or heard at NeurIPS that's gotten you excited? What kind of trends are in the air?
What's the buzz? Oh, that's a great question. I mean, it's such a huge conference. There's something
like 17,000 people or so here this year I've heard. I think, you know, one of the things that's
happened so far that's actually given me an enormous amount of energy wasn't just a technical talk. It was actually an event we had on the first
day called Women in Machine Learning. And I was a mentor on one of the mentorship tables.
And I found it very energizing just to meet so many people, early career stage people,
who were very excited about AI for Science and realizing that, you know, it's not just that
I think AI for science is important. A lot of people are moving into this field now. It is
a big frontier for AI. I'm a little biased, perhaps. I think that it's the most important
application area. Intellectually, it's very exciting because we get to deal with science
as well as machine learning. But also, if you think about science is really about learning more about
the world. And once we learn more about the world, we can then develop agriculture, we can develop
the steam engine, we can develop silicon chips, we can change the world, we can save lives and make
the world a better place. And so I think it's the most fundamental undertaking we have in AI for
Science. And the thing I loved about the Women in Machine Learning event is that the AI for Science table was just completely swamped with all of these people at early stages of their career, either already working in this field and doing PhDs or wanting to get into it.
That was very exciting.
That is really exciting and inspiring and gives me a lot of hope.
Well, Chris Bishop, thank you so much for joining us today and thanks for a great conversation.
Thank you. I really appreciate it.
And to our listeners, thanks for tuning in.
If you want to learn more about research at Microsoft,
you can check out the Microsoft Research
website at microsoft.com
slash research. Until next time.