Microsoft Research Podcast - Abstracts: December 12, 2023
Episode Date: December 12, 2023Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations... about new and noteworthy achievements. In this episode, Senior Principal Research Manager Tao Qin and Senior Researcher Lijun Wu discuss “FABind: Fast and Accurate Protein-Ligand Binding.” The paper, accepted at the 2023 Conference on Neural Information Processing Systems (NeurIPS), introduces a new method for predicting the binding structures of proteins and ligands during drug development. The method demonstrates improved speed and accuracy over current methods.Learn more:FABind: Fast and Accurate Protein-Ligand BindingFABind code on GitHub
Transcript
Discussion (0)
Welcome to Abstracts,
a Microsoft Research podcast that puts
the spotlight on world-class research in brief.
I'm Dr. Gretchen Huizenga.
In this series,
members of the research community at Microsoft give us
a quick snapshot or a podcast abstract
of their new and noteworthy papers.
Today, I'm talking to Dr. Tao Xin, a Senior Principal Research Manager,
and Dr. Li Junwu, a Senior Researcher, both from Microsoft Research.
Dr. Xin and Wu are co-authors of a paper titled F.A. BIND, Fast and Accurate Protein Ligand Binding.
And this paper, which was accepted for the 2023 Conference on Neural Information Processing Systems, or NEURIPS, is available now on Archive.
Tao Xin, Li Junwu, thanks for joining us on Abstracts.
Thanks.
Thank you. It's great to be here and to share our latest research.
So Tao, let's start off with you. In a couple sentences, tell us what issue or problem your research addresses, and more importantly, why people should care about it.
We work on the problem of molecular docking, a computational modeling method used to predict the preferred orientation of one molecule when it binds to a second molecule to form a stable
complex.
So, it aims to predict the binding pulse of a ligand in the active site of a receptor
and estimate the ligand-receptor binding affinity.
This problem is very important for drug discovery and development.
Actually predicting binding pulses can provide insights into how a drug
candidate might bind to its biological target and whether it is likely to have the desired
therapeutic effect.
To make an analogy, just like a locker and a cage, protein target is a locker, while
the ligand is a cage.
We should carefully design the structure of the cage so that it can properly fit into the locker, while the ligand is the key. We should carefully design the structure of the key so that it can perfectly fit into
the locker.
Similarly, the molecular structure should be accurately constructed so that the protein
can be well-bounded.
Then the protein function would be activated or inhibited.
Molecular docking is used extensively in the early stages of drug design and discovery
to create a large library of hundreds of thousands of compounds to identify promising lead compounds.
It helps eliminate poor candidates and focus on experimental results, although most likely
to bind to the target protein well.
So clearly, improving the accuracy and also the speed of docking methods,
like what we have done in this work,
could accelerate the development of new life-saving drugs.
So Li Jun, tell us how your approach builds on
and or differs from what's been done previously in this field.
Sure, thanks.
So conventional protein-like docking methods,
they usually take the sampling and scoring ways.
So what that means,
they will use first some sampling methods
to generate multiple protein-like docking poses as candidates.
And then we will use some scoring functions
to evaluate these candidates and select from them and
to choose the best ones.
So such as DivDoc, a very recent work developed by MIT, which is a very strong model to use
the diffusion algorithm to do the sampling in this kind of way.
And this kind of method, I say the sampling and scoring methods, they are accurate with
good predictions.
But of course, they are accurate with good predictions, but of course,
they are very slow. So this is a very big limitation because the sampling process usually
take a lot of time. So some other methods such as EqualBind or TechBind, they treat
the docking prediction as a regression task, which is to use the deep networks to directly
predict the coordinates of the atoms in the molecule.
Obviously, this kind of method is much faster than the sampling method, but the prediction
accuracy is usually worse.
So therefore, our F-A bind, which aims to provide a both fast and accurate method for
the docking problem.
F-A bind keeps its fast prediction by modeling in a regression way.
And also, we utilize the novel designs
to improve its prediction accuracy.
So Li Jun, let's stay with you for a minute.
Regarding your research strategy on this,
how would you describe your methodology,
and how did you go about conducting this research?
OK, sure.
So when we're talking about the detailed method,
we actually build an end-to-end deep learning framework,
FABind here.
So for the protein-ligand docking,
FABind divides the docking task as a pocket prediction process
and also a post-prediction process.
But importantly, we unify these two processes
within a single deep learning model, which
is a very normal equivalent graph neural network.
Here the pocket means a local part of the whole protein, which are some specific amino
acids that can bind to the molecule in the structure space.
So simply speaking, this novel graph neural network is stacked by some identity graph
neural networks.
And the graph neural layer is carefully designed by us, and we use the first graph layer for
the pocket projection and the later layers to do the post-projection.
And for each layer, there are some message parsing operations we designed.
The first one is an independent message parsing, which is to update the information within protein or molecule itself.
And the second one is the cross-attention message parsing, which is to update the information between the whole protein and also the whole molecule.
So we can then let each other have a whole global view. And the last one is an interfacial message passing, which is to do the update.
We can message passing the information between the closed nodes between the protein and the molecule.
So besides, there are also some small points that we have to get an accurate docking model.
For example, we use a scheduled training technique to bridge the gap between the training and the
inference stages. And also, we combine direct coordinate prediction
and also the distance map refinement
as our optimization method.
MICHELLE CASBONERUK, Well, listen,
I want to stay with you even more
because you're talking about the technical specifications
of your research methodology.
Let's talk about results.
What were your major findings on the performance of FABind?
ZHUOIJING ZHUOIJING ZHUOIJING ZHUOIJING ZHUOIJING ZHUOIJING What were your major findings on the performance of FABind? Yeah, the results are very promising.
So first we need to care about the docking performance, which is the accuracy of the
docking post prediction.
We compare our FABind to different baselines, such as EcoBind, TankBind, and also I talked
before about a very strong model, DivDock, developed by MIT.
So the results show that our docking prediction accuracy is very good.
They achieve a very competitive performance to the DivDock like that.
But specifically, we need to talk about that the speed is very important.
Well compared to DivDock, we achieve about 170 times faster speed than DivDoc.
So this is very promising.
Besides, the interesting thing is that we found our FA band
can achieve very, very strong performance on the unseen protein targets,
which means that the protein structure that we have never seen before during the training,
we can achieve very good performance.
So our FA bandBIND achieves
significantly better performance with about 10% to 40% accuracy improvement than GIF-DOC.
This performance demonstrates that the practical effectiveness of our work is very promising,
since such kinds of new proteins are the most important ones that we need to care for a new disease. Tao, this is all fascinating,
but talk about real-world significance for this work.
Who does it help most and how?
Yeah, as Li Jing has introduced,
FAA band significantly outperforms earlier methods in terms of speed.
We are maintaining competitive accuracy.
This fast prediction capability is extremely important in real-world applications where
high-throughput virtual screening for compound selection is often required for drug discovery.
So an efficient virtual screening process can significantly accelerate the drug discovery
process.
Furthermore, our method demonstrates great performance on unseen or new
proteins, which indicates that our FA band possesses a strong gelatization ability. This is
very important. Consider the case of SARS-CoV-2, for example, where our knowledge of the protein
target is very limited at the beginning of the pandemic.
So if we have a robust doping model that can generalize to new proteins,
we could conduct a large-scale virtual screening and confidently select potentially effective ligands.
This would greatly speed up the development of new treatments. So downstream from the drug discovery science,
benefits would accrue to people who have diseases
and need treatment for those things.
Yes, exactly.
Okay. Well, Tao, let's get an elevator pitch in here,
sort of one takeaway, a golden nugget
that you'd like our listeners to take away from this work.
If there was one thing you wanted them to take away from the work,
what would it be?
Yeah, thanks for a great question.
So I think one sentence for takeaway is that
if some researchers are utilizing molecular docking
and they are seeking an AI-based approach,
our FA-band method definitely should be in their
considering list, especially considering the exceptional predictive accuracy and the
high computational efficiency of our method. Finally, Tao, what are the big questions
and problems that remain in this area, and what's next on your research agenda?
Actually, there are multiple unaddressed questions along this direction.
So I think those are all opportunities for further exploration.
So here I just give three examples.
First, our method currently tackles rigid docking,
where the target protein structure is assumed to be fixed,
leaving only the lagging structure to be predicted.
However, in a more realistic scenario, the protein is dynamic during molecular binding,
so therefore, exploring flexible docking becomes an essential aspect.
Second, our approach assumes that the target protein has only one binding pocket.
In reality, a target protein may have multiple binding pockets.
So this situation will be more challenging.
So how to address such kind of significant challenge
is also exploration.
Third, in the field of drug design,
sometimes we need to find a target
or we need to find a drug compound that can bind
with multiple target proteins.
In this work, we only consider a single target protein.
So the accurate prediction of doping for multiple target proteins poses a great challenge.
Well, Tao Xin and Li Junwu, thank you for joining us today.
And to our listeners, thanks for tuning in.
If you're interested in learning more about this work,
you can find a link to the paper at aka.ms forward slash abstracts,
or you can find it on Archive.
See you next time on Abstracts. Thank you.