Microsoft Research Podcast - Abstracts: November 14, 2024
Episode Date: November 14, 2024The efficient simulation of molecules has the potential to change how the world understands biological systems and designs new drugs and biomaterials. Tong Wang discusses AI2BMD, an AI-based system de...signed to simulate large biomolecules with speed and accuracy.Read the paperGet the code
Transcript
Discussion (0)
.
Welcome to Abstracts,
a Microsoft Research podcast that puts
the spotlight on world-class research in brief.
In this series, members of the research community at Microsoft,
give us a quick snapshot or
a podcast abstract of their new and noteworthy papers.
I'm Bonnie Cruft, Partner and Deputy Director of Microsoft Research AI for Science
and your host for today. Joining me is Tong Wang, a Senior Researcher at Microsoft.
Tong is the lead author of a paper called Ab initio characterization of protein molecular
dynamics with AI squared BMD, which has just been published by the top scientific journal Nature.
Tong, thanks so much for joining by the top scientific journal Nature.
Tong, thanks so much for joining us today on Abstracts.
Thank you, Bonnie.
Microsoft Research is one of the earliest institutions to apply AI in biomolecular simulation research. Why did the AI for Science team choose this direction? And with this work
specifically, AI squared BMD, what problem are you and your co-authors addressing and why should
people know about it? So, as Richard Feynman famously said, everything that living things do
can be understood in terms of the jiggling and wiggling of items. To study the mechanism
behind the biological processes and to develop biomaterials and drugs requires a computational approach that can
accurately characterize the dynamic motions of biomolecules. When we review the computational
research for biomolecular structure, we can get two key messages. First, in recent years,
predicting the crystal or static protein structures with methods
powered by AI has achieved great success and just won the Nobel Prize in Chemistry in the
last month.
However, characterizing the dynamic structures of proteins is more meaningful for biology,
drug and medicine fields, but it's much more challenging. Second, molecular dynamic
simulation, or MD, is one of the most widely used approaches to study proton dynamics, which can be
roughly divided into classical molecular dynamic simulation and quantum molecular dynamic simulation.
Both approaches have been developed for more than a half century
and were a noble prize. Classical MD is fast but less accurate, while quantum MD is very
accurate but computationally prohibitive for the proton study. However, we need both the
accuracy and the efficiency to detect the biomechanisms. Thus, applying AI in
biomolecular simulation can become the third way to achieve both ab initial or first principles
accuracy and high efficiency. In the winter of 2020, we have foreseen the trend that AI can make a difference in biomolecular simulations.
Thus, we chose this direction.
It took four years from the idea to the launch of AI-squared BMD, and there were many important milestones along the way.
First, talk about how your work builds on and or differs from what's been done previously in this field,
and then give our audience a sense of the key moments and challenges along the AI2 BMD research journey. First, I'd like to say applying AI in biomolecular simulation
is a novel research field. For AI-powered MD simulation for large biomolecules,
there is no existing dataset, no well-designed machine learning model for the interactions between the atoms and molecules.
No clear technical roadmap. No mature AI-based simulation system. So we face various new
challenges every day. Second, there are some other works exploring this area at the same time. I think
a significant difference between AS2BMD and other works
is that other works require to generate new data and train the deep learning models for any new
proteins. So it is a protein-specific solution. As a contrast, AS2BMD proposes a generalizable
solution for a wide range of proteins. To achieve it, as you mentioned,
there are some key milestones during the four-year journey. The first one is we propose
the generalizable protein fragmentation approach that divides proteins into the commonly used 20 kinds of deep hepatites. Thus, we don't need to generate data
for various proteins. Instead, we only need to sample the conformational space of such deep
hepatites. So, we built the protein unit data set that contains about 20 million samples with the initial accuracy. Then we propose with that the graph
neural network for molecular geometry modeling as a machine learning potential for AS2BMD.
Furthermore, we designed AS2BMD simulation system by efficiently leveraging CPUs and GPUs
at the same time, achieving hundreds of times simulation speed
acceleration than one year before, and accelerating the AI-driven simulation with only 10 to 100
milliseconds per simulation step.
Finally, we examined ASqlBMD on energy, force, free energy, decoupling, and many kinds of property calculations for tens of protons, and also applied AS2BMD in the drug development competition.
All things are done by the great team with science and engineering expertise and the great leadership and support from AFL Science Lab.
Tell us about how you conducted this research. What was your methodology?
As exploring an interdisciplinary research topic, our team consists of experts and students
with biology, chemistry, physics, math, computer science, and engineering backgrounds,
the teamwork with different expertise is key to AS2BMD research.
Furthermore, we collaborated and consulted with many senior experts in the molecular dynamics simulation field,
and they provided various insightful and constructive suggestions
to our research. Another aspect of the methodology I'd like to emphasize is learning from negative
results. Negative results happened most of the time during the study. What we do is to constantly analyze the negative results and adjust our algorithm
and model accordingly. There's no perfect solution for a research topic, and we are always on the way.
AI Squared BMD got some upgrades this year, and as we mentioned at the top of the episode,
the work around the latest system was published in the scientific journal Nature.
So tell us, Tong, what is new about the latest AI-squared BMD system?
Good question.
We posted a preliminary version of AI-squared BMD manuscript on BioArchive last summer.
I'd like to share three important upgrades through the past one and a half year.
The first is hundreds of times of simulation speed acceleration for ASqlBMD,
which becomes one of the fastest AI-driven AMD simulation systems,
and needs to perform much longer simulations than before.
The second aspect is ASqlBMD was applied for many protein property calculations, such as insulfate, heat capacity, food-free energy, PKA, and so on.
Furthermore, we have been closely collaborating with the Global Health Drug Discovery Institute, GIDI, a non-profit research institute funded and supported
by the Gates Foundation, to leverage AI2BMD and other AI capabilities to accelerate the
drug discovery processes. What significance does AI2BMD hold for research in both biology and AI?
And also, what impact does it have outside
of the lab in terms of societal and individual benefits? Good question. For biology, AI-squared
BMD provides a much more accurate approach than those used in the past several decades to simulate the protein dynamic motions and to study the bioactivities. For AI,
AI2BMD proves AI can make a big difference to the dynamic protein structure study beyond AI for the
protein static structure prediction. Raised by AI2BMD and other works, I can foresee there is a coming age of AI-driven
biomolecular simulation, providing binary-free energy calculation with quantum simulation
accuracy for the complex of drug and the target protein for drug discovery, detecting more flexible biomolecular conformational changes that
molecular mechanics cannot do, and opening more opportunities for enzyme engineering
and vaccine and antibody design. AI is having a profound influence on the speed and breadth
of scientific discovery, and we're excited to see more and more talented people joining us
in this space. What do you want our audience to take away from this work, particularly those
already working in the AI for Science space or looking to enter it? Good question. I'd like to
share three points from my research experience. First is aim high. Exploring a disruptive research topic is better than doing 10 incremental works.
In the years of research, our organization always encourages us to do the big things.
Second is persistence. I remembered a computer scientist previously said about 19% of the time during the research is failure and frustration.
The rate is even higher when exploring a new research direction. In ASQR BMD's study,
when we suffered from research bottlenecks that cannot be tackled for several months,
when we received critical comments from reviewers, when some team members
wanted to give up and leave, and always encouraged everyone to persist, and we will make it.
More importantly, the foundation of persistence is to ensure your research direction is meaningful
and constantly adjust your methodology from
failures and critical feedback.
The third one is real-world applications. Our aim is to leverage AI for advancing science.
Proposing scientific problems is the first step, then developing AI tools and evaluating on benchmarks, and
more importantly, examine its usefulness in the real-world applications, and further develop
your AI algorithms.
In this way, you can close the loop of AI for science research.
And finally, Tong, what unanswered questions or unsolved
problems remain in this area, and what's next on the agenda for the AI2BMD team?
Well, I think AI2BMD is a starting point for the coming age of AI-driven MD for biomolecules.
There are lots of new scientific questions and challenges coming out in this new field.
For example, how to expand the simulated molecules from proteins to other kinds of biomolecules,
how to describe the biochemical reactions during the simulations,
how to further improve the simulation efficiency and robustness,
and how to apply it for more real-world scenarios.
We warmly welcome any people from both academic and industrial fields to work together with
us to make the joint efforts to push the frontier of this new field moving forward.
Well, Tong, thank you for joining us today.
And to our listeners, thanks for tuning in if you want
to read the full paper on ai squared bmd you can find a link at aka.ms forward slash abstracts
or you can read it on the nature website see you next time on abstracts Thank you.