Microsoft Research Podcast - Abstracts: November 14, 2024

Episode Date: November 14, 2024

The efficient simulation of molecules has the potential to change how the world understands biological systems and designs new drugs and biomaterials. Tong Wang discusses AI2BMD, an AI-based system de...signed to simulate large biomolecules with speed and accuracy.Read the paperGet the code

Transcript
Discussion (0)
Starting point is 00:00:00 . Welcome to Abstracts, a Microsoft Research podcast that puts the spotlight on world-class research in brief. In this series, members of the research community at Microsoft, give us a quick snapshot or a podcast abstract of their new and noteworthy papers. I'm Bonnie Cruft, Partner and Deputy Director of Microsoft Research AI for Science
Starting point is 00:00:27 and your host for today. Joining me is Tong Wang, a Senior Researcher at Microsoft. Tong is the lead author of a paper called Ab initio characterization of protein molecular dynamics with AI squared BMD, which has just been published by the top scientific journal Nature. Tong, thanks so much for joining by the top scientific journal Nature. Tong, thanks so much for joining us today on Abstracts. Thank you, Bonnie. Microsoft Research is one of the earliest institutions to apply AI in biomolecular simulation research. Why did the AI for Science team choose this direction? And with this work specifically, AI squared BMD, what problem are you and your co-authors addressing and why should
Starting point is 00:01:05 people know about it? So, as Richard Feynman famously said, everything that living things do can be understood in terms of the jiggling and wiggling of items. To study the mechanism behind the biological processes and to develop biomaterials and drugs requires a computational approach that can accurately characterize the dynamic motions of biomolecules. When we review the computational research for biomolecular structure, we can get two key messages. First, in recent years, predicting the crystal or static protein structures with methods powered by AI has achieved great success and just won the Nobel Prize in Chemistry in the last month.
Starting point is 00:01:54 However, characterizing the dynamic structures of proteins is more meaningful for biology, drug and medicine fields, but it's much more challenging. Second, molecular dynamic simulation, or MD, is one of the most widely used approaches to study proton dynamics, which can be roughly divided into classical molecular dynamic simulation and quantum molecular dynamic simulation. Both approaches have been developed for more than a half century and were a noble prize. Classical MD is fast but less accurate, while quantum MD is very accurate but computationally prohibitive for the proton study. However, we need both the accuracy and the efficiency to detect the biomechanisms. Thus, applying AI in
Starting point is 00:02:48 biomolecular simulation can become the third way to achieve both ab initial or first principles accuracy and high efficiency. In the winter of 2020, we have foreseen the trend that AI can make a difference in biomolecular simulations. Thus, we chose this direction. It took four years from the idea to the launch of AI-squared BMD, and there were many important milestones along the way. First, talk about how your work builds on and or differs from what's been done previously in this field, and then give our audience a sense of the key moments and challenges along the AI2 BMD research journey. First, I'd like to say applying AI in biomolecular simulation is a novel research field. For AI-powered MD simulation for large biomolecules, there is no existing dataset, no well-designed machine learning model for the interactions between the atoms and molecules.
Starting point is 00:03:47 No clear technical roadmap. No mature AI-based simulation system. So we face various new challenges every day. Second, there are some other works exploring this area at the same time. I think a significant difference between AS2BMD and other works is that other works require to generate new data and train the deep learning models for any new proteins. So it is a protein-specific solution. As a contrast, AS2BMD proposes a generalizable solution for a wide range of proteins. To achieve it, as you mentioned, there are some key milestones during the four-year journey. The first one is we propose the generalizable protein fragmentation approach that divides proteins into the commonly used 20 kinds of deep hepatites. Thus, we don't need to generate data
Starting point is 00:04:48 for various proteins. Instead, we only need to sample the conformational space of such deep hepatites. So, we built the protein unit data set that contains about 20 million samples with the initial accuracy. Then we propose with that the graph neural network for molecular geometry modeling as a machine learning potential for AS2BMD. Furthermore, we designed AS2BMD simulation system by efficiently leveraging CPUs and GPUs at the same time, achieving hundreds of times simulation speed acceleration than one year before, and accelerating the AI-driven simulation with only 10 to 100 milliseconds per simulation step. Finally, we examined ASqlBMD on energy, force, free energy, decoupling, and many kinds of property calculations for tens of protons, and also applied AS2BMD in the drug development competition.
Starting point is 00:05:57 All things are done by the great team with science and engineering expertise and the great leadership and support from AFL Science Lab. Tell us about how you conducted this research. What was your methodology? As exploring an interdisciplinary research topic, our team consists of experts and students with biology, chemistry, physics, math, computer science, and engineering backgrounds, the teamwork with different expertise is key to AS2BMD research. Furthermore, we collaborated and consulted with many senior experts in the molecular dynamics simulation field, and they provided various insightful and constructive suggestions to our research. Another aspect of the methodology I'd like to emphasize is learning from negative
Starting point is 00:06:55 results. Negative results happened most of the time during the study. What we do is to constantly analyze the negative results and adjust our algorithm and model accordingly. There's no perfect solution for a research topic, and we are always on the way. AI Squared BMD got some upgrades this year, and as we mentioned at the top of the episode, the work around the latest system was published in the scientific journal Nature. So tell us, Tong, what is new about the latest AI-squared BMD system? Good question. We posted a preliminary version of AI-squared BMD manuscript on BioArchive last summer. I'd like to share three important upgrades through the past one and a half year.
Starting point is 00:07:45 The first is hundreds of times of simulation speed acceleration for ASqlBMD, which becomes one of the fastest AI-driven AMD simulation systems, and needs to perform much longer simulations than before. The second aspect is ASqlBMD was applied for many protein property calculations, such as insulfate, heat capacity, food-free energy, PKA, and so on. Furthermore, we have been closely collaborating with the Global Health Drug Discovery Institute, GIDI, a non-profit research institute funded and supported by the Gates Foundation, to leverage AI2BMD and other AI capabilities to accelerate the drug discovery processes. What significance does AI2BMD hold for research in both biology and AI? And also, what impact does it have outside
Starting point is 00:08:46 of the lab in terms of societal and individual benefits? Good question. For biology, AI-squared BMD provides a much more accurate approach than those used in the past several decades to simulate the protein dynamic motions and to study the bioactivities. For AI, AI2BMD proves AI can make a big difference to the dynamic protein structure study beyond AI for the protein static structure prediction. Raised by AI2BMD and other works, I can foresee there is a coming age of AI-driven biomolecular simulation, providing binary-free energy calculation with quantum simulation accuracy for the complex of drug and the target protein for drug discovery, detecting more flexible biomolecular conformational changes that molecular mechanics cannot do, and opening more opportunities for enzyme engineering and vaccine and antibody design. AI is having a profound influence on the speed and breadth
Starting point is 00:10:00 of scientific discovery, and we're excited to see more and more talented people joining us in this space. What do you want our audience to take away from this work, particularly those already working in the AI for Science space or looking to enter it? Good question. I'd like to share three points from my research experience. First is aim high. Exploring a disruptive research topic is better than doing 10 incremental works. In the years of research, our organization always encourages us to do the big things. Second is persistence. I remembered a computer scientist previously said about 19% of the time during the research is failure and frustration. The rate is even higher when exploring a new research direction. In ASQR BMD's study, when we suffered from research bottlenecks that cannot be tackled for several months,
Starting point is 00:11:01 when we received critical comments from reviewers, when some team members wanted to give up and leave, and always encouraged everyone to persist, and we will make it. More importantly, the foundation of persistence is to ensure your research direction is meaningful and constantly adjust your methodology from failures and critical feedback. The third one is real-world applications. Our aim is to leverage AI for advancing science. Proposing scientific problems is the first step, then developing AI tools and evaluating on benchmarks, and more importantly, examine its usefulness in the real-world applications, and further develop
Starting point is 00:11:54 your AI algorithms. In this way, you can close the loop of AI for science research. And finally, Tong, what unanswered questions or unsolved problems remain in this area, and what's next on the agenda for the AI2BMD team? Well, I think AI2BMD is a starting point for the coming age of AI-driven MD for biomolecules. There are lots of new scientific questions and challenges coming out in this new field. For example, how to expand the simulated molecules from proteins to other kinds of biomolecules, how to describe the biochemical reactions during the simulations,
Starting point is 00:12:38 how to further improve the simulation efficiency and robustness, and how to apply it for more real-world scenarios. We warmly welcome any people from both academic and industrial fields to work together with us to make the joint efforts to push the frontier of this new field moving forward. Well, Tong, thank you for joining us today. And to our listeners, thanks for tuning in if you want to read the full paper on ai squared bmd you can find a link at aka.ms forward slash abstracts or you can read it on the nature website see you next time on abstracts Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.