Microsoft Research Podcast - NeurIPS 2024: AI for Science with Chris Bishop

Starting point is 00:00:00 Welcome to the Microsoft Research Podcast, where Microsoft's leading researchers bring you to the cutting edge. This series of conversations showcases the technical advances being pursued at Microsoft, to the insights and experiences of the people driving them. I'm Eliza Strickland, a senior editor at IEEE Spectrum, and your guest host for a special edition of the podcast.

Starting point is 00:00:24 Joining me today in the Microsoft booth at host for a special edition of the podcast. Joining me today in the Microsoft booth at the 38th Annual Conference on Neural Information Processing Systems, or NeurIPS, is Chris Bishop. Chris is a Microsoft Technical Fellow and the Director of Microsoft Research AI for Science. Chris is with me for one of our two on-site conversations that we're having here at the conference. Chris, welcome to the podcast. Thanks, Eliza, really great to join you.

Starting point is 00:00:48 How did your long career in machine learning lead you to this focus on AI for science? And were there any pivotal moments when you started to think that, hey, this deep learning thing, it's going to change the way that scientific discovery happens? Oh, that's such a great question.

Starting point is 00:01:00 I think this is like my career coming full circle, really. I started out studying physics at Oxford, and then I did a PhD in quantum field theory. And then I moved into the fusion program. I wanted to do something of practical value, so I worked on nuclear fusion for about seven or eight years doing theoretical physics. And then that was about the time that Geoff Hinton published

Starting point is 00:01:19 his back-prop paper, and it really caught my imagination as an exciting approach to artificial intelligence that might actually yield some progress. So that was kind of 35 years ago and I moved into the field of machine learning and actually the way I made that transition was by applying neural networks to fusion. I was working at the JET experiment, it was the world's largest fusion experiment, it was sort of big data in its day and so I had had to, first of all, teach myself to program because I was a pencil and paper theoretician up to that point, persuade my boss to buy me a workstation,

Starting point is 00:01:52 and then started to play with these neural nets. So right from the get-go, I was applying machine learning 35 years ago to data from science experiments. And that was a great on-ramp for me. And then eventually I just got so distracted, I decided I wanted to build my career in machine learning, spent a few years as a research professor, and then joined Microsoft 27 years ago

Starting point is 00:02:14 when Microsoft opened its first research lab outside the US in Cambridge, UK, and I've been there very happily ever since. Went on to become lab director, but about three or four years ago, I realized that not only was deep learning transforming so many different things, but I felt it was especially relevant to scientific discovery. And so I had an opportunity to pitch to our chief technology officer to go start a new team.

Starting point is 00:02:40 And he was very excited by this. So just over two and a half years ago,, we set up Microsoft Research AI for Science. And it's a global team, and it sort of does what it says on the tin. So you've said that AI could usher in a fifth paradigm of scientific discovery, which builds upon the ideas of Turing Award winner Jim Gray, who described four stages in the evolution of science.

Starting point is 00:03:02 Can you briefly explain the four prior paradigms and then tell us about what makes this stage different? Yeah, sure. So it was a nice insight by Jim. He said, well, of course, the first paradigm of scientific discovery was really the empirical one. I tend to think of some cave dweller picking up a big rock and a small rock and letting go of them at the same time and thinking the big rock will hit the ground first and discovering they land together. And this is interesting. They've discovered a sort of pattern of regularity in nature. And even today, the first paradigm is in a sense the prime paradigm. It's the most important one because at the end of the day, it's experimental results that determine the truth, if you like. So

Starting point is 00:03:38 that's the first paradigm and it continues to be of critical importance today. And then the second paradigm really emerged in the 17th century, when Newton discovered the laws of motion and the law of gravity. And not only did he discover the equations, but this sort of remarkable fact that nature can even be described by equations, right? It's not obvious that this would be true, but it turns out that the world around us

Starting point is 00:04:02 can be described by very simple equations that you can write on a T-shirt. And so in the 19th century, James Maxwell discovered some simple equations that describe the whole of electricity and magnetism, electromagnetic waves and so on. And then very importantly, the beginning of the 20th century, we had this remarkable breakthrough in quantum physics. So again, down at the molecular, the atomic level, the world is described with exquisite precision by Schrodinger's equation and so this was the second paradigm the theoretical that the world is described with incredible precision of a huge ranges of length and time by very simple equations but of course there's a catch which is those equations are very hard to solve um and so the the third paradigm

Starting point is 00:04:42 really began i guess the sort of the 50s and 60s, the development of digital computers. And actually, the very first use of digital computers was to simulate physics. And it's been at the core of digital computing right up to the present day. And so what you're doing there is using a computer to go with a numerical algorithm to solve those very simple equations, but solve them in a practical setting. And so I'll refer to that as simulation. That's the third paradigm. And that's proven to be tremendously powerful. If you look up the weather forecast on your phone today, it's done by numerical weather forecasting, solving, in those cases, Navier-Stokes equations using big numerical simulators. What Jim Gray observed though, really

Starting point is 00:05:22 emerging at the beginning of the 21st century, was what he called the fourth paradigm or data intensive scientific discovery. So this is the era of big data. Think of particle physics at the CERN accelerator, for example, generating colossal amounts of data in real time. And that data can then be processed and filtered. We can do statistics on it. But of course, we're going to do machine learning on that data. And so machine learning feeds off large data. And so the fourth paradigm really is dominated today by machine learning. And again, that remains tremendously important. What I noticed, though, is that there's, again, another framework. We call

Starting point is 00:06:02 it the fifth paradigm. Again, it goes back to those fundamental equations but again it's driven by computation and it's the idea that we can train machine learning systems not using the empirical data of the fourth paradigm but instead using the results of simulation so the output of the third paradigm. So think of it this way. You want to predict the property of some molecule, let's say. You could, in principle, solve Schrodinger's equation on a digital computer. It'd be very expensive. And let's say you want to screen hundreds of millions of molecules. That's going to get far too costly. So instead what you can do is have a mindset shift.

Starting point is 00:06:41 You can think of that simulator not as a tool to predict the molecule's properties directly, but instead as a way of generating synthetic training data. And then you use that train later to train a deep learning system to give what I like to call an emulator, an emulator of the simulator. Once it's trained, that emulator is fast. It's usually three to four orders of magnitude faster than the simulator. So if you're going to do something over and over again, that three to four order of magnitude acceleration is tremendously disruptive. And what's really interesting is we see that fifth paradigm occur in many, many different

Starting point is 00:07:18 places. The idea goes back a long way. Actually the last project that I worked on before I left the fusion program was to do what was the world's first ever real-time control of a tokamak fusion plasma using a neural net. And the computers of the day with the processors were just far too slow, long before GPUs and so on. And so it wasn't possible to solve the equations in that case of the called the Grad-Schefranov equation. Again, a simple differential equation you could write on a t-shirt, but solving it was expensive on a computer.

Starting point is 00:07:50 We were about a million times too slow to solve it directly in real time. And so instead, we generated lots and lots of solutions. We used those solutions to train a very simple neural network, not a deep network, just a simple two-layer network back in the day. And then we implemented that in special hardware and did real-time feedback control. So that was an example of the fifth paradigm from a quarter of a century ago. But of course, deep learning just tremendously expands the range of applicability. So today we're using the fifth paradigm in many, many different scenarios. And time and time again, we see these four orders of magnitude acceleration. So I think it's worthy of thinking of that as a new paradigm because it's so pervasive and so ubiquitous.

Starting point is 00:08:35 So how do you identify fields of science and particular problems that are amenable to this kind of AI assistance? Is it all about availability of data or the need for that kind of speed up? So there are lots of factors that go into this. And when I think about AI for science, actually the space of opportunity is colossal because science is really just understanding more about the world around us. And so the range of possibilities is daunting, really. So in choosing what to work on, I think there are several factors. Yes, of course, data is important, but very interestingly, we can use experimental data or we can generate synthetic data by running simulators. So we're a big fan

Starting point is 00:09:12 of the fifth paradigm. But I think another factor, and this is particularly at Microsoft, is thinking about how can we have real-world impact at scale? Because that's our job, is to make the world a better place and to do so at a planetary scale. And so we've settled on for the most part working at the molecular level so if you think about the number of different ways of combining atoms together to make new stable configurations of atoms is gargantuan i mean the number of just small molecules small organic molecules that are potential drug candidates is about 10 to the power 60. It's about the same as the number of atoms in the solar system. The number of proteins may be that the fourth power of

Starting point is 00:09:53 the number of atoms in the universe or something crazy. So you've got this gargantuan space to search and within that space for sure there'll be all sorts of interesting molecules, materials, new drugs, new therapies, new materials for carbon capture, new kinds of batteries, new photovoltaics. The list is endless because everything around us is made of atoms, including our own bodies. So the potential just in the molecular space is gargantuan. So that's why we focus there. It's a big focus. It's a broad focus.

Starting point is 00:10:22 So let's take one of these sort of case studies then. In a project on drug discovery, you worked with the Global Health Drug Discovery Institute on molecules that would interact with tuberculosis and coronaviruses, I think. And you found, I think, candidate molecules in five months instead of several years. Can you talk about what models you used in this work and how they helped you get this vastly sped up process? Sure, yes. We're very proud of this project. We're working with the Gates Foundation

Starting point is 00:10:52 and the Global Health Drug Discovery Institute to look at, particularly at diseases that affect low-income countries like tuberculosis. And in terms of the models we use, I think we're all familiar with the large language model. We train it on a sequence of words or a sequence of word tokens, and it's trained to predict the next token. We can do a similar thing, but instead of learning the language of humans, we can learn the language of nature. So in particular, what we're looking for here is a small organic molecule that we could synthesize in a laboratory that will bind with a particular target protein. It's called ClipP. And by interfering with that protein, we can arrest the process of tuberculosis.

Starting point is 00:11:30 So the goal is to search that space of 10 to the 60 molecules and find a new one that has the right properties. Now, the way we do this is to train something that's essentially a transformer. So it looks like a language model. But the language it's trained on is a thing called smiles strings. It's an idea that's been around in chemistry for a long time. It's just a way of taking a three-dimensional molecule and representing it as a one-dimensional sequence of characters.

Starting point is 00:11:55 So this is perfect for feeding into a language model. So we take a transformer and we train it on a large database of small organic molecules that are sort of typical of the kinds of things you might see in the space of drug molecules. Once that's been trained, we can now run it generatively and it will output new molecules. Now we don't just want to generate molecules at random because that doesn't help. We want to generate molecules that bind to this particular binding site on this particular protein.

Starting point is 00:12:24 So the next step is we have to tell the model about the protein and the protein binding site. And we do that by giving it information about, not actually, well, we do tell it about the whole protein, but we especially give it information about the three-dimensional geometry of the binding site. So we tell it about the locations of the atoms in the binding site. And we do this in a way that satisfies certain physics constraints, sort of equivariance properties, it's called.

Starting point is 00:12:48 So if you think about a molecule, if I rotate the molecule in space, the positions of all the atoms changes in a complicated way. But it's the same molecule, has the same energy and other properties and so on. So we need the right kind of representation. That's then fed into this transformer using a technique called cross attention. So internally, the transformer uses self-attention to look at the history of tokens but it can now use cross attention to look at another model that understands the proteins. But even that's not enough because in discovering drugs and exploring this gargantuan space and looking for these needles in a

Starting point is 00:13:20 haystack what typically happens you find a hit, a molecule that binds, but now you want to optimize it. You want to make lots of small variations of that molecule in order to make it better and better at binding. So the third piece of the architecture is another module, I think called a variational autoencoder, that again uses deep learning. But this time, you can take as input an organic molecule that is already known, a hit that's already known to bind to the site. And that, again, is fed in through cost attention. And now the SMILES autoregressive model can now generate a molecule that's an improvement on the starting molecule and knows about the protein binding. And so what we do is we start off with the state-of-the-art

Starting point is 00:14:00 molecule. And the best example we found is one that's more than two orders of magnitude stronger binding affinity to the binding pocket, which is a tremendous advance. It's the state-of-the-art in addressing tuberculosis. And of course, the exciting thing is that this is tested in the laboratory. So this is not just a computer experiment and some sort of benchmark or whatever.

Starting point is 00:14:20 We sent the description of the molecule to the laboratories at Giddy. They synthesized the molecule, characterized laboratories at Giddy. They synthesized the molecule, characterized it, measured bindings properly and said, well, hey, this is a new state of the art for this target protein. So we're continuing to work with them to further refine this. There are obviously quite a few more steps, if you know about the drug discovery process. There are a lot of hurdles you have to get through, including, of course, very important clinical trials before you have something that can actually be used in humans. But we're already hugely excited about the fact

Starting point is 00:14:49 that we were able to make such a big advance so quickly in such a short amount of time compared to the usual drug discovery process. And while you were looking for that molecule that had the proper characteristics, were you also determining whether it could be manufactured easily, like trying to think about practical realities of bringing this thing out of the computer and into the lab? Great question. I mean, you're hinting there at the fact that the discovery process, of course, is a long pipeline. You start with the protein, you have to find a molecule that binds, you then refine the molecule. Now you have to look at ADMET, you know, the absorption and metabolism and excretion and so on of the molecule.

Starting point is 00:15:26 But also make sure that it's not toxic. But then you need to be able to synthesize it. It's no good if nobody can make this molecule. So you have to look at that. So actually, in the AI for Science team, we look at all of these aspects of that drug discovery process. And we find particular areas, especially where there's sort of low-hanging fruit, where we can see that deep learning can make a big impact. It doesn't necessarily help much to take a very easy, fast piece of the pipeline and go work on that. You want to understand what are the bottlenecks, and can we really unlock those with deep learning?

Starting point is 00:15:53 So we're very interested in that whole process. It's a fascinating problem. You've got a gargantuan search space, and yet you have so many different constraints that need to be met. And deep learning just feels like the perfect tool to go after this problem. When you talk to the scientists that you collaborate with, is AI changing the kinds of questions that they are able to ask, they want to ask?

Starting point is 00:16:14 Oh, for sure. And it's really empowering. It's enabling those working in the drug discovery space I think to think in a much more expansive way. If you think about just the kind of acceleration that I talked about from the fifth paradigm, if you've got a four order of magnitude acceleration, okay, it may not sound like much of a dent onto the 10 to the power 60 space, but now when you're exploring variants of molecules and so on, the ability to explore that space orders of magnitude faster allows you to think much more creatively, allows you to think in a more expansive way about how much of that space you can explore and how

Starting point is 00:16:49 efficiently you can explore it. So I think it really is opening up new horizons. And certainly we have an exciting partnership with Novartis. We've been working with them for the last five years and they've been deploying some of our techniques and models in practice for their drug discovery pipeline. And we get a lot of great feedback from them about how exciting they're finding these techniques to use in practice, because it is changing the way they go about doing the drug discovery process. To jump to one other case study, we don't have to go into great detail on it,

Starting point is 00:17:20 but I'm very curious about your project Aurora, this foundation model for state-of-the-art weather forecasting that I believe is 5,000 times faster than traditional physics-based methods. Can you talk a little bit about how that project is evolving, how you imagine these AI forecasting models working with traditional forecasting models perhaps, or replacing them? Yes, so I said most of what we do is down

Starting point is 00:17:42 at the molecular level, so this is one of the exceptions. so this is really at the global level, the planetary level. Again it's a beautiful example of the fifth paradigm because the way forecasting has been done for a number of decades now and the way most forecasting is done at the moment is through what's called numerical weather prediction. So again you have these simple equations, it's no longer the Schrodinger's equation of atomic physics. It's now Navier-Stokes equations of fluid flows and a whole bunch of other equations that describe moisture in the atmosphere and the weather and so on. And those equations are solved on a

Starting point is 00:18:15 supercomputer. And again, we can think of that numerical simulator now, not just as the way you're going to do the forecasting, but actually as the way to generate training data for a deep learning emulator. So several groups have been exploring this over the last couple of years. And again, we see this very robust three to four order of magnitude acceleration. But what's really interesting about Aurora, it's the world's first foundation model. So instead of just building an emulator of a particular numerical weather simulator, which is already very interesting, we trained Aurora on a much more diverse set of data and really trying to force it not just to emulate a particular simulator, but

Starting point is 00:18:57 really as it were understand or model the fundamental equations of fluid flows in the Earth's atmosphere. And then the reason we want to do this is because we now want to take that foundation model and fine-tune it to other downstream applications where there's much less data. So one example would be pollution flow. So obviously the flow of pollution around the atmosphere is extremely important, but the data is far more sparse. There are far fewer sensors for pollution than there are for sort of wind and rain and certain temperature and so on. And so we were able to achieve state-of-the-art performance in modeling the flow of pollution by leveraging huge data and building this foundation model

Starting point is 00:19:38 and then using relatively little data on pollution monitoring to build that downstream fine-tuned model. So a beautiful example of a foundation model. That is a cool example. And finally, just to wrap up, what have you seen or heard at NeurIPS that's gotten you excited? What kind of trends are in the air? What's the buzz? Oh, that's a great question. I mean, it's such a huge conference. There's something like 17,000 people or so here this year I've heard. I think, you know, one of the things that's happened so far that's actually given me an enormous amount of energy wasn't just a technical talk. It was actually an event we had on the first day called Women in Machine Learning. And I was a mentor on one of the mentorship tables.

Starting point is 00:20:15 And I found it very energizing just to meet so many people, early career stage people, who were very excited about AI for Science and realizing that, you know, it's not just that I think AI for science is important. A lot of people are moving into this field now. It is a big frontier for AI. I'm a little biased, perhaps. I think that it's the most important application area. Intellectually, it's very exciting because we get to deal with science as well as machine learning. But also, if you think about science is really about learning more about the world. And once we learn more about the world, we can then develop agriculture, we can develop the steam engine, we can develop silicon chips, we can change the world, we can save lives and make

Starting point is 00:20:54 the world a better place. And so I think it's the most fundamental undertaking we have in AI for Science. And the thing I loved about the Women in Machine Learning event is that the AI for Science table was just completely swamped with all of these people at early stages of their career, either already working in this field and doing PhDs or wanting to get into it. That was very exciting. That is really exciting and inspiring and gives me a lot of hope. Well, Chris Bishop, thank you so much for joining us today and thanks for a great conversation. Thank you. I really appreciate it. And to our listeners, thanks for tuning in. If you want to learn more about research at Microsoft,

Starting point is 00:21:28 you can check out the Microsoft Research website at microsoft.com slash research. Until next time.

Your Ad Here

Microsoft Research Podcast - NeurIPS 2024: AI for Science with Chris Bishop

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.