Microsoft Research Podcast - Abstracts: December 12, 2023

Starting point is 00:00:00 Welcome to Abstracts, a Microsoft Research podcast that puts the spotlight on world-class research in brief. I'm Dr. Gretchen Huizenga. In this series, members of the research community at Microsoft give us a quick snapshot or a podcast abstract of their new and noteworthy papers.

Starting point is 00:00:24 Today, I'm talking to Dr. Tao Xin, a Senior Principal Research Manager, and Dr. Li Junwu, a Senior Researcher, both from Microsoft Research. Dr. Xin and Wu are co-authors of a paper titled F.A. BIND, Fast and Accurate Protein Ligand Binding. And this paper, which was accepted for the 2023 Conference on Neural Information Processing Systems, or NEURIPS, is available now on Archive. Tao Xin, Li Junwu, thanks for joining us on Abstracts. Thanks. Thank you. It's great to be here and to share our latest research. So Tao, let's start off with you. In a couple sentences, tell us what issue or problem your research addresses, and more importantly, why people should care about it.

Starting point is 00:01:12 We work on the problem of molecular docking, a computational modeling method used to predict the preferred orientation of one molecule when it binds to a second molecule to form a stable complex. So, it aims to predict the binding pulse of a ligand in the active site of a receptor and estimate the ligand-receptor binding affinity. This problem is very important for drug discovery and development. Actually predicting binding pulses can provide insights into how a drug candidate might bind to its biological target and whether it is likely to have the desired therapeutic effect.

Starting point is 00:01:55 To make an analogy, just like a locker and a cage, protein target is a locker, while the ligand is a cage. We should carefully design the structure of the cage so that it can properly fit into the locker, while the ligand is the key. We should carefully design the structure of the key so that it can perfectly fit into the locker. Similarly, the molecular structure should be accurately constructed so that the protein can be well-bounded. Then the protein function would be activated or inhibited. Molecular docking is used extensively in the early stages of drug design and discovery

Starting point is 00:02:26 to create a large library of hundreds of thousands of compounds to identify promising lead compounds. It helps eliminate poor candidates and focus on experimental results, although most likely to bind to the target protein well. So clearly, improving the accuracy and also the speed of docking methods, like what we have done in this work, could accelerate the development of new life-saving drugs. So Li Jun, tell us how your approach builds on and or differs from what's been done previously in this field.

Starting point is 00:03:03 Sure, thanks. So conventional protein-like docking methods, they usually take the sampling and scoring ways. So what that means, they will use first some sampling methods to generate multiple protein-like docking poses as candidates. And then we will use some scoring functions to evaluate these candidates and select from them and

Starting point is 00:03:25 to choose the best ones. So such as DivDoc, a very recent work developed by MIT, which is a very strong model to use the diffusion algorithm to do the sampling in this kind of way. And this kind of method, I say the sampling and scoring methods, they are accurate with good predictions. But of course, they are accurate with good predictions, but of course, they are very slow. So this is a very big limitation because the sampling process usually take a lot of time. So some other methods such as EqualBind or TechBind, they treat

Starting point is 00:03:56 the docking prediction as a regression task, which is to use the deep networks to directly predict the coordinates of the atoms in the molecule. Obviously, this kind of method is much faster than the sampling method, but the prediction accuracy is usually worse. So therefore, our F-A bind, which aims to provide a both fast and accurate method for the docking problem. F-A bind keeps its fast prediction by modeling in a regression way. And also, we utilize the novel designs

Starting point is 00:04:28 to improve its prediction accuracy. So Li Jun, let's stay with you for a minute. Regarding your research strategy on this, how would you describe your methodology, and how did you go about conducting this research? OK, sure. So when we're talking about the detailed method, we actually build an end-to-end deep learning framework,

Starting point is 00:04:48 FABind here. So for the protein-ligand docking, FABind divides the docking task as a pocket prediction process and also a post-prediction process. But importantly, we unify these two processes within a single deep learning model, which is a very normal equivalent graph neural network. Here the pocket means a local part of the whole protein, which are some specific amino

Starting point is 00:05:13 acids that can bind to the molecule in the structure space. So simply speaking, this novel graph neural network is stacked by some identity graph neural networks. And the graph neural layer is carefully designed by us, and we use the first graph layer for the pocket projection and the later layers to do the post-projection. And for each layer, there are some message parsing operations we designed. The first one is an independent message parsing, which is to update the information within protein or molecule itself. And the second one is the cross-attention message parsing, which is to update the information between the whole protein and also the whole molecule.

Starting point is 00:05:57 So we can then let each other have a whole global view. And the last one is an interfacial message passing, which is to do the update. We can message passing the information between the closed nodes between the protein and the molecule. So besides, there are also some small points that we have to get an accurate docking model. For example, we use a scheduled training technique to bridge the gap between the training and the inference stages. And also, we combine direct coordinate prediction and also the distance map refinement as our optimization method. MICHELLE CASBONERUK, Well, listen,

Starting point is 00:06:33 I want to stay with you even more because you're talking about the technical specifications of your research methodology. Let's talk about results. What were your major findings on the performance of FABind? ZHUOIJING ZHUOIJING ZHUOIJING ZHUOIJING ZHUOIJING ZHUOIJING What were your major findings on the performance of FABind? Yeah, the results are very promising. So first we need to care about the docking performance, which is the accuracy of the docking post prediction.

Starting point is 00:06:54 We compare our FABind to different baselines, such as EcoBind, TankBind, and also I talked before about a very strong model, DivDock, developed by MIT. So the results show that our docking prediction accuracy is very good. They achieve a very competitive performance to the DivDock like that. But specifically, we need to talk about that the speed is very important. Well compared to DivDock, we achieve about 170 times faster speed than DivDoc. So this is very promising. Besides, the interesting thing is that we found our FA band

Starting point is 00:07:32 can achieve very, very strong performance on the unseen protein targets, which means that the protein structure that we have never seen before during the training, we can achieve very good performance. So our FA bandBIND achieves significantly better performance with about 10% to 40% accuracy improvement than GIF-DOC. This performance demonstrates that the practical effectiveness of our work is very promising, since such kinds of new proteins are the most important ones that we need to care for a new disease. Tao, this is all fascinating, but talk about real-world significance for this work.

Starting point is 00:08:10 Who does it help most and how? Yeah, as Li Jing has introduced, FAA band significantly outperforms earlier methods in terms of speed. We are maintaining competitive accuracy. This fast prediction capability is extremely important in real-world applications where high-throughput virtual screening for compound selection is often required for drug discovery. So an efficient virtual screening process can significantly accelerate the drug discovery process.

Starting point is 00:08:41 Furthermore, our method demonstrates great performance on unseen or new proteins, which indicates that our FA band possesses a strong gelatization ability. This is very important. Consider the case of SARS-CoV-2, for example, where our knowledge of the protein target is very limited at the beginning of the pandemic. So if we have a robust doping model that can generalize to new proteins, we could conduct a large-scale virtual screening and confidently select potentially effective ligands. This would greatly speed up the development of new treatments. So downstream from the drug discovery science, benefits would accrue to people who have diseases

Starting point is 00:09:31 and need treatment for those things. Yes, exactly. Okay. Well, Tao, let's get an elevator pitch in here, sort of one takeaway, a golden nugget that you'd like our listeners to take away from this work. If there was one thing you wanted them to take away from the work, what would it be? Yeah, thanks for a great question.

Starting point is 00:09:53 So I think one sentence for takeaway is that if some researchers are utilizing molecular docking and they are seeking an AI-based approach, our FA-band method definitely should be in their considering list, especially considering the exceptional predictive accuracy and the high computational efficiency of our method. Finally, Tao, what are the big questions and problems that remain in this area, and what's next on your research agenda? Actually, there are multiple unaddressed questions along this direction.

Starting point is 00:10:28 So I think those are all opportunities for further exploration. So here I just give three examples. First, our method currently tackles rigid docking, where the target protein structure is assumed to be fixed, leaving only the lagging structure to be predicted. However, in a more realistic scenario, the protein is dynamic during molecular binding, so therefore, exploring flexible docking becomes an essential aspect. Second, our approach assumes that the target protein has only one binding pocket.

Starting point is 00:11:03 In reality, a target protein may have multiple binding pockets. So this situation will be more challenging. So how to address such kind of significant challenge is also exploration. Third, in the field of drug design, sometimes we need to find a target or we need to find a drug compound that can bind with multiple target proteins.

Starting point is 00:11:26 In this work, we only consider a single target protein. So the accurate prediction of doping for multiple target proteins poses a great challenge. Well, Tao Xin and Li Junwu, thank you for joining us today. And to our listeners, thanks for tuning in. If you're interested in learning more about this work, you can find a link to the paper at aka.ms forward slash abstracts, or you can find it on Archive. See you next time on Abstracts. Thank you.

Your Ad Here

Microsoft Research Podcast - Abstracts: December 12, 2023

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.