Microsoft Research Podcast - Ideas: AI for materials discovery with Tian Xie and Ziheng Lu

Starting point is 00:00:00 So the problem of generating materials from properties is actually a pretty old one. I still remember back in 2018 when I was giving a talk about property prediction models, one of the first questions people asked is, instead of going from material structure to properties, can you kind of inversely generating materials directly from their property conditions. So in a way this is a kind of like a dream for material scientists because the end goal is really about finding materials properties who satisfy your application. Previously a lot of people

Starting point is 00:00:40 are using this mystic simulators and this generation models alone. But if you think about it, now that we have these two foundation models together, it really can make the things different. You have a very good idea generator and you have every good goalkeeper and you put them together. They form a loop and now you can use this loop to design your materials really quickly. You're listening to Ideas, a Microsoft Research podcast that dives deep into the world of technology research and the profound questions behind the code.

Starting point is 00:01:15 In this series, we'll explore the technologies that are shaping our future and the big ideas that propel them forward. I'm your guest host, Lindsay Coulter. Today, I'm talking to Microsoft Principal Research Manager Tian Xia and Microsoft Principal Researcher Ziheng Liu. Tian's doing fascinating work with MatterGen, an AI tool for generating new materials guided by specific design requirements. Ziheng is one of the visionaries behind MatterSim, which puts those new materials to the test through advanced simulations. Together, they're redefining what's possible in material science. Tian and Zihang, welcome to the podcast. Very excited to be here.

Starting point is 00:01:58 Thanks, Lindsay. Very excited. Before we dig into the specifics of MatterGen and MatterSim, let's give our audience a sense of how you as researchers arrived at this moment. Material science, especially at the intersection of computer science is such a cutting edge and transformative field. What first drew each of you to this space? What, if any moment or experience made you realize this was where you wanted to innovate? Tian, do you want to start? So I started working on AI for materials back in 2015 when I started my PhD.

Starting point is 00:02:33 So I come as a chemist and material scientist, but I'm kind of figuring out what I want to do during my PhD. So there is actually one moment really drove me into the field. That was AlphaGo. AlphaGo was kind of coming out in 2016, where it was able to beat the world champion in Go in 2016. I was extremely impressed by that because I kind of learned how to do Go. In my childhood, I know how hard it is and how much effort those professional Go players have spent in learning about Go. So I kind of have the feeling that if AI can surpass

Starting point is 00:03:16 the world-leading Go players, on one day, it will be able to surpass material scientists in their ability to design innova materials. So that's why I ended up deciding to focus in my entire PhD on working on AI for materials. And I have been working on that since then. So it was actually very interesting because it was a very small field back then. And it's great to see how much progress has made in the past 10 years and how much bigger field it is now compared with 10 years ago. That's very interesting, Tian. So actually I think I started like two years before you as a PhD student, I was trained as a computational material scientist solely, not

Starting point is 00:04:06 really an AI expert. But at that time, the computational material science did not really work that well. It works, but not working that well. So after like two or three years, I went back to experiments for like another two or three years because I think the experiments is always the gold standard, right? And I worked on this experiment for a few years. And then about three years ago, I went back to this field of computation, especially because of AI. At that time, I think GPT and these large models, the currency we're using is not there, but we already have their prior forms like bird.

Starting point is 00:04:48 So we see the very large potential of AI. We know that these large AIs might work. So one idea is really to use AI to learn the entire space of materials and really grasp the physics there. And that really drove me to this field. And that's why I'm here working on this. We're going to get into what MatterGen and MatterSim mean for material science, the potential, the challenges, and open questions. But first, give us an overview of what each of

Starting point is 00:05:20 these tools are, how they do what they do, and as this show is about big ideas, the idea driving the work. Ziheng, let's have you go first. So MetaSim is a tool to do in silico characterizations of materials. If you think about working on materials, you have several steps. You first need to synthesize it, and then you need to characterize this. Basically, you need to know what property, what structures, whatever stuff about these materials. So for MetaSim, what we want to do is to really move the characterization process, a lot of these processes, into using computations. So the idea behind MetaSim is to really learn the fundamentals of physics. So we learn the energies and forces and stresses

Starting point is 00:06:07 from the atomic structures and the charge densities, all of these things. And then with these, we can really simulate any sort of materials using our computational machines. And then with these, we can really characterize a lot of these materials properties using our computer that is very fast it's much faster than we do experiments so that we can accelerate the material design so just in a word basically you input your material into your computer a structure

Starting point is 00:06:41 into your computer and medicine will try to simulate this material like what you do in a furnace in your XRD, and then you get your properties out of that. And a lot of times, it's much faster than you do experiments. All right, thank you very much. Tian, why don't you tell us about MatterGen? Yeah, thank you. So actually, Zihen, once you start with explaining MatterSim,

Starting point is 00:07:03 makes it much easier for me to explain MetaGem. So MetaGem actually represents a new way to design materials with generative AI. Material discovery is like finding needles in a haystack. You're looking for a material with a very specific property for a material application. For example, like finding a room temperature superconductor or finding a solid that can conduct a lithium ion very well inside a battery. So it's like finding one very specific material from a million candidates. So the conventional way of doing material discovery is via screening,

Starting point is 00:07:57 where you're going to go over millions of candidates to find the one that you're looking for, where MetaSim is able to significantly accelerate that process by making the simulation much faster. But it's still very inefficient because you need to go through this a million candidates right so with metaGen you can kind of directly generate materials giving the the prompts of the design requirements for the application so this means that you can discover materials it is for useful materials much more efficiently and it also allows to explore a much larger space beyond the set of known materials. Thank you, Tian. Can you tell us a little bit about how MatterGen and MatterSim work together? So you can really think about MatterSim and MatterGen accelerating different parts of material discovery process. MetaSim is trying to accelerate the simulation of material properties, while MetaGen is trying to accelerate the search of novel material

Starting point is 00:08:54 candidates. It means that they can really work together as a flywheel and you can compound the acceleration from both models. They are also both foundation AI models, meaning they can both be used for broad range of materials design problems. So we're really looking forward to see how they can kind of working together iteratively as a tool to design novel materials for a broad range of applications. I think that's a very good general introduction of how they work together. I think I can provide an example of how they really fit together.

Starting point is 00:09:31 If you want a material with a specific bulk modulus or lithium ion conductivity or thermal conductivity for your CPU chips. So basically what you want to do is start with a pool of material structures like some structures from the database and then you compute or you characterize your wanted property from that stack of materials and then what you do you've got these properties and structure pairs and you input these pairs into mattergen and mattergen will be able to give you a lot more of these structures that are very highly possible to be real but the number will be very large for example for the bulk modules i don't remember the number we generated in our work was it was that like a thousandth tenth like thousands, tens of thousands?

Starting point is 00:10:25 Thousands, tens of thousands. Yeah, that would be a very large number of pool, even with MatterGen. So then the next step would be, how would you like to screen that? You cannot really just send all of those structures to a lab to synthesize. It's too much, right?

Starting point is 00:10:41 Then that's when MatterSIM again comes medicine again comes in so medicine comes in and screen all those structures again and see which ones are the most likely to be synthesized and which ones have the closest property you wanted and then after screening you probably get five ten top candidates and then you send to a lab boom everything goes down that's that's it i'm wondering if there's any prior research or advancements uh that you drew from in creating matter gen and matter sim were there any specific breakthroughs and that influenced your approaches at all thanks lindsay i think i will take that question first. So, interestingly, for MetaSim, a very fundamental idea was true from Chi Cheng,

Starting point is 00:11:30 who was a previous lab mate of mine and now also works for Microsoft at Microsoft Quantum. He made this fantastic model named M3DNet, which is a prior form of a lot of these large-scale models for atomistic simulations. That model, M3GNet, actually resolves the near-ground state prediction problem. I mean, the near-ground state problem sounds like a fancy but not realistic word, but what that actually means is that it can simulate materials at near zero kevin states so basically at very low temperatures so at that time we were thinking

Starting point is 00:12:13 since the the models are now able to simulate materials at their near ground states it's not very large space but if you also look at other larger models like GPT, whatever, those models are large enough to simulate entire human language. So it's possible to really extend the capability from these such prior models to very large space Theme to learn the entire space of materials. I mean, the entire space really means the entire periodic table, all the temperatures and the pressures people can actually grasp. Yeah, I still remember a lot of the amazing works from Qi Chen when we were kind of back working on property prediction models. The problem of generating materials from properties is actually a pretty old one.

Starting point is 00:13:16 I still remember back in 2018 when I was working on CGCNN and giving a talk about property prediction models, one of the first questions people asked is, okay, can you inverse this process? Instead of going from material structure to properties, can you kind of inversely generate materials directly from their property conditions? So in a way, this is kind of like a dream for material scientists. Some people even call it like holy grail because the end goal is really about finding materials

Starting point is 00:13:54 property who satisfy your application. So I've been kind of thinking about this problem for a while and also there has been a lot of work over the past few years in the community to build a generative model for materials a lot of people have tried in the in the before like 2020 using ideas like VAEs or GANs but it's hard to represent materials in this type of generative model architecture and many of those models generated relatively poor candidates. So I thought it was a hard problem, I kind of know it for a while, but there is no good solutions back then. So I started to focus more on this problem during my postdoc when I studied that in 2020 and keep working on that in 2021. At the beginning, I wasn't really sure exactly what approach to take because it's kind of like open question and really tried a lot of random ideas. So one day actually in my group back then with Tomi Yakala and Regina Basel at MIT

Starting point is 00:15:02 CCL, we kind of get to know this methodical diffusion model. It was a very early stage of a diffusion model back then, but it already began to show a very promising science kind of achieving state of art in many problems like 3D point cloud generation and the 3D molecular conformal generation. So the work that really inspired me a lot is two work that was for molecular conformer generation. One is ConfGF and one is GeoDiff.

Starting point is 00:15:37 So they kind of inspired me to kind of focus more on diffusion models that actually lead to CDVAE. So it's interesting that we kind of spent like a couple of weeks in trying all this diffusion idea. And without that much work, it actually worked quite out of the box. And at that time, CDVAE achieved a much better performance than any previous models in materials generation. And we're super happy with that. So after CDVAE, I joined Microsoft, now working with more people together on this problem

Starting point is 00:16:20 of a generative model for materials. So we kind of know what the limitations of CDVA are is that it can do unconditional material generation well, means it can generate novel material structures, but it is very hard to use CDVA to do property guided generation. So basically it uses an architecture called a variational autoencoder, where you have a latent space. So the way that you do property-guided generation there was to do a kind of a gradient update inside the latent space. But because the latent space wasn't learned very well, so it actually, you cannot do kind of a good public guided generation. We only managed to do energy guided generation, but it wasn't successful in going beyond energy.

Starting point is 00:17:11 So that comes us to really thinking, right? How can we make the public guided generation much better? So I remember like one day, actually, my colleague, Daniel Zingner, who actually really showed me this blog, which basically explains this idea of classifier-free guidance, which is the powerhouse behind the text image generative models. And so yeah, then we began to think about, okay, can we actually make the diffusion model work for classifier-free guidance that lead us to remove the kind of

Starting point is 00:17:47 the variational autoencoder component from CDDAE and begin to work on a pure diffusion architecture. But then there was kind of a lot of development around that. But it turns out that classifier-free guidance is the key really to make public guided generation work. And then combine it with a lot more effort in kind of improving architecture and also generating more data. And also trying out these different downstream tasks that job of explaining how MatterGen and MatterSim work together and how MatterGen can offer a lot in terms of reducing the amount of time and work that goes into finding new materials. Tian, how does the process of using MatterGen to generate materials translate into real world applications? Yeah, that's a fantastic question.

Starting point is 00:18:45 So one way that I think about MetaGen, right, is that you can think about it as like a co-pilot for material scientists, right? So they can help you to come up with kind of potential good hypothesis for the material design problems that you're looking for. So say you're trying to design a battery, right? So you may have some ideas over, okay, what candidates you want to make, but this is kind of based on your own experience, right? Depths of experience as a researcher. But MetaGen is able to kind of learn from a very broad set of data.

Starting point is 00:19:20 So therefore, it may be able to come up with some good suggestions, even surprising suggestions for you so that you can kind of try this out, right? Both with computation or even one day in web lab and experimentally synthesize it. But I also want to note that this, in a way, this is still an early stage in generating AI for materials, means that I don't expect all the candidates, metagenomics will be kind of suits your needs, right? So you still need to kind of look into them with expertise or with some kind of computational screening. But I think in the future, as this model keep improving themselves,

Starting point is 00:20:00 they will becoming a key component, right? In the design process of many of the materials we're seeing today, like designing new batteries, new solar cells, or even computer chips, like Zihen mentioned earlier. I want to pivot a little bit to the MatterSim side of things. I know identifying new combinations of compounds is key to meeting changing needs for things like sustainable materials, but testing them is equally important to developing materials that can be put to use. Ziheng, how does MatterSim handle the uncertainty

Starting point is 00:20:36 of how materials behave under various conditions, and how do you ensure that the predictions remain robust despite the inherent complexity of molecular systems? That's a very very good question. So uncertainty quantification is a key to make sure all these predictions and simulations are trustworthy. And that's actually one of the questions we got almost every time after a presentation. So people will ask, well, especially those experimentalists would ask, well, I've been using your model, how do I know those predictions are true under the very complex conditions I'm using in my experiments? So to understand how we deal with uncertainty, we need to know how medicine really functions in predicting an arbitrary property, especially under the condition you want, like the temperature and pressure. That would be quite complex, right?

Starting point is 00:21:36 So in the ideal case, we would hope that by using medicine, you can directly simulate the properties you want using molecular dynamics combined with statistical mechanics. So if so, it will be easy to really quantify the uncertainty because there are just two parts, the error from the model and the error from the simulation, the statistical mechanics. So the error from the model will be able to be measured by an example. So basically you start with different random C's when you train the model, and then when you predict your property, you use several models from the example, and then you get different numbers. If the variance from the numbers are very large, you'll see the prediction is not that trustworthy.

Starting point is 00:22:25 But a lot of times you will see the variance is very small. So basically an example of several different models will give you almost exactly the same number. You're quite sure that the number is somehow very useful. So that's one level of the way we want to get our property. But sometimes it's very hard to really directly simulate the property you want. For example, for catalytic processes, it's very hard to imagine how you really get those coefficients. It's very hard. The process is just too complicated. So for that process, what we do is to really use what we call embeddings learned

Starting point is 00:23:09 from the entire material space. So basically, that vector we learned for any arbitrary material. And then, to start from that, we build a very shallow layer of a neural network to predict the property. But that also means you need to bring in some of your experimental or simulation data from your side.

Starting point is 00:23:28 And for that way of predicting a property to measure the uncertainty, it's still like the two levels, right? So we don't really have the statistical error anymore, but what we have is like only the model error. So you can still stick to the example, and then it will work, right? So to be short, MatterSim can provide you an uncertainty to make sure the prediction tells you whether it's true or not. So in many ways, MatterSim is the realist in the equation,

Starting point is 00:24:03 and it's there to sort of be a gatekeeper for MatterGen, which is the idea generator. I really like the analogy. As is the case with many AI models, the development of MatterGen and MatterSim relies on massive amounts of data. And here you use a simulation to create the needed training data. Can you talk about that process

Starting point is 00:24:27 and why you've chosen that approach, Tian? So one advantage here is that we can really use a large-scale simulation to generate data. And so we have a lot of compute here at Microsoft on our Azure platform. So how we generate the data is that we use a method

Starting point is 00:24:45 called density functional theory, DFT, which is a quantum mechanical method. And we use a simulation workflow built on top with DFT to simulate the stability of materials. So what we do is that we curate a huge amount of material structures from multiple different sources of open data, mostly including material projects and the Alexandria database.

Starting point is 00:25:14 And in total, there are around 3 million material candidates coming from these two databases. But not all of these all these structures are stable. So therefore, we try to use DFT to compute their stability and try to filter down the candidates such that we are making sure that our training data only which was used to train the base model of MetaGen. So I want to note that actually we also use MetaSim as part of the workflow because MetaSim can be used to prescreen unstable candidates so that we don't need to use DFT to compute all of them. I think at the end we compute around 1 million DFT calculations,

Starting point is 00:26:05 where two-thirds of them are already filtered out by MatterSim, which saves us a lot of compute in generating our training data. Tian, you had a very good description of how we really get those ground state structures for the MatterGem model. Actually actually we've been also using Matter-Gen for MatterSim to really get the training data. So if you think about the simulation space of materials, it's extremely large. So we would think it in a way that it has three axes. So basically the elements, the temperature and the pressure.

Starting point is 00:26:44 So if you think about existing databases, they have pretty good coverage of the element space. Basically, if you think about materials projects, NOMAD, they really have this very good coverage of lithium oxide, lithium sulfide, hydrogen sulfide, whatever, those different ground state structures. But they don't really tell you how these materials behave under certain temperature pressure, especially under those extreme conditions like 1600 Kelvin, which you really use to synthesize your materials.

Starting point is 00:27:22 That's where we really focus on to generate the data for medicine. So it's really easy to think about how we generate the data, right? You put your wanted material into a pressure cooker, basically molecular dynamics. You can simulate the material's behavior on the temperature and pressure. So that's it.

Starting point is 00:27:42 Sounds easy, right? But that's not true because what we want is not one single material. What we want is the entire material space. So that will be making the effort almost impossible because the space is just so large. So that's where we really develop this active learning pipeline. So basically what we do is we generate a lot of these structures for different elements and temperatures and pressures, really really a lot. And then what we do is we ask the active learning or the uncertainty measurement to really say whether the model knows about this structure already. So if the model thinks, well, I think

Starting point is 00:28:27 I know the structure already, so then we don't really calculate this structure using density function theory as Tian just said. So this will really save us like 99% of the effort in generating the data. So in the end, by using combining this molecular dynamics, basically pressure cooker, together with active learning, we gathered around 17 million data for medicine. So that was used to train the model. And now it can cover the entire table and a lot of temperature pressures. Thank you, Ziheng. I'm sure this is not news to either one of you, given that you're both at the forefront of these efforts,

Starting point is 00:29:14 but there are a growing number of tools aimed at advancing material science. So what is it about MatterGen and MatterSim in their approach or capabilities that distinguish them? Yeah, I think I can start. So I think there is, in the past one year, there is a huge interest in building up generative AI tools for materials. So we have seen lots and lots of innovations from the community published in top conferences like NeurIPS, iClear, ICM, etc. So I think what distinguishes MetaGen, in my point of view, are two things.

Starting point is 00:29:52 First is that we are trained with a very big data set that we curate very, very carefully. And we also spend quite a lot of time to refining our diffusion architecture, which means that our model is capable of generating very high quality, highly stable and novel materials. We have some kind of bar plot in our paper that showcasing the advantage of our performance. I think that's one key aspect. And I think the second aspect, which in my point of view is even more important, is that it has the ability to do property guided generation. Many of the tools that we saw in the community, they are more focused on the problem of

Starting point is 00:30:39 crystal structure prediction, which MediGen can also do. But we focus more on really property-guided generation because we think this is one of the key problems that really material scientists cares about. So the ability to do a very broad range of property-guided generation, and we have both computational and now experimental result to validate those i think that's that's the second strong point for metagem see hung do you want to add to that yeah thank you so on the medicine side i think it's really the diverse condition you can handle that makes a difference we've been talking about like the training data we collected

Starting point is 00:31:25 really covers the entire periodic table. Also, more importantly, the temperatures from 0 Kelvin to 5,000 Kelvin and the pressures from gigapascal to 1,000 gigapascal. That really covers what humans can control nowadays. I mean, it's very hard to go beyond that. If you know anyone who can go beyond that, let me know. So that really makes MetaGen different, like it can handle the realistic conditions.

Starting point is 00:31:53 I think beyond that, I would say the combo between MetaGen and MetaGen really makes this set of tools really different. So previously, a lot of people are using this mystic simulator and this generation model alone. But if you think about it, now that we have these two foundation models together, it really can make the things different, right? So we have predictor, we have the generator, you have a very good idea generator, and you have a very good goalkeeper, and you put them together. They form a loop and now you can use this loop to design your materials really quickly so i would say to me now when i think about it it's really the combo that makes this set of tools different i know that i've

Starting point is 00:32:39 spoken with both of you recently about how there's so much excitement around this. And it's clear that we're on the precipice of this, as both of you have called it a paradigm shift. And Microsoft places a very strong emphasis on ensuring that its innovations are grounded in reality and capable of addressing real world problems. So with that in mind, how do you balance the excitement of scientific exploration with the practical challenges of implementation? Tian, do you want to take this? Yeah, I think this is a very, very important point because as there are so many hypes around AI

Starting point is 00:33:21 that is happening right now, we must be very, very careful about the claims that we are making so that people will not have unrealistic expectations over how these models can do. So for MetaGen, we're pretty careful about that. We're trying to say that this is an early stage of generative AI in materials, where this model will be improved over time quite significantly, but you should not say, oh, all the materials generated by MetaGen is going to be amazing.

Starting point is 00:33:59 That's not what is happening today so we we try to be very careful about to understand how far the metagen is already capable of designing materials with real-world impact so therefore we went all the way to synthesize one material that was generated by metagen so this material we generated is called tantalum chromium oxide. So this is a new material. It has not been discovered before and it was generated by a metagen by conditioning a bulk modulus equals to 200 gigapascal. Bulk modulus is like the compressiveness of the material. So we end up measuring the experimental synthesized material experimentally. And the measured bulk modulus is 169 gigapascal, which is within 20% of error.

Starting point is 00:34:58 So this is a very good proof concept in our point of view to show that, oh, you can actually give it a prompt right and the MetaGen can generate a material and then the property the material actually have the property that is very close to your target but it's still a proof concept and we're still working on to see how MetaGen can design materials that are much more useful and with a much broader range of applications. And I'm sure that there will be more challenges

Starting point is 00:35:31 we are seeing along the way, but we're looking forward to further working with our experimental partners to kind of push this further and also working with MetaScene, right, to see how these two tools can be used to design really useful materials and bringing this into real-world impact. Yeah, Tian, I think that's very well said. It's not really only for MetaGen, and for MetaSIM, we're also very careful.

Starting point is 00:35:57 So we really want to make sure that people understand how these models really behave under their instructions and understand like what they can do and they cannot do. So I think one thing that we really care about is that in the next a few maybe one or two years we want to really work with our experimental partners to make these realistic materials like in different areas so that we can even us can really better understand the limitations and at the same time explore the forefront of material science

Starting point is 00:36:31 to make this excitement become true. Zi-Heng, could you give us a concrete example of what exactly MatterSim is capable of doing? Now MatterSim can really do whatever you have on a potential energy surface. So what that means is anything that can be simulated with the energy and forces stresses alone. So to give you an example, we can compute. The first example would be the stability of a material. So basically you input

Starting point is 00:37:06 the structure and from the energies of the relaxed structures you can really tell whether the material is likely to be stable like the composition right. So another example would be the thermal conductivity. Thermal conductivity is like a fundamental property of materials that tells you how fast heat can transfer into the material. So, for medicine, it can really simulate how fast this heat can go through your diamond, your graphene, your copper. So, basically, those are two examples. So, these examples are based on energies and forces alone. But there are things matter sim cannot do, at least for now. For example, you cannot really do anything related to electronic structures. So you cannot really compute the light absorption of a

Starting point is 00:37:55 semi-transparent material. That would be a no-no for now. It's clear from speaking with researchers, both from MatterSim and MatterGen, that despite these very rapid advancements in technology, you take very seriously the responsibility to consider the broader implications of creating entirely new materials and simulating their properties, particularly in terms of things like safety, sustainability, and societal impact? Yeah, that's a fantastic question. So it's extremely important that we are making sure that these AI tools, they are not misused. A potential misuse, as you just mentioned, is that people begin to use these AI tools, MetaGen, MetaSim, to design harmful materials. There was actually an extensive discussion over how generative AI tools that was originally purposed for junk design can be then misused to create bioweapons. So at Microsoft, we take this very, very seriously because we believe that when we create new

Starting point is 00:39:12 technologies, you must also ensure that the technology is used responsibly. So we have an extensive process to ensure that all of our models respect those ethical considerations. In the meantime, as you mentioned, maybe sustainability and the societal impact, right? So there's a huge amount that AI tools management can do for sustainability because a lot of the sustainability challenges, they are really at the end of materials design challenges, right? So therefore, I think managerial medicine can really help with that in solving, in helping us to alleviate climate change and having positive societal impact for the broader society.

Starting point is 00:40:10 And Ziheng, how about from a simulation standpoint? Yeah, I think Tian gave a very good description. So at Microsoft, we're really careful about these ethical considerations. So I would add a little bit on the more like the bright side of things like so for medicine like it really carries out these simulations at atomic scale. So one thing you can think about is really the educational purpose. So back in my graduate, back in my bachelor and PhD period, so I would sit like at the table and really grab a pen, really deal with those very complex equations and get those statistics using my pen. It's really painful. But now with MatterSim, these simulation tools at atomic level, what you can do is to really simulate the reactions, the movement of atoms at atomic scale in real time. You can really see the chemical reactions and see the statistics.

Starting point is 00:41:14 So you can get really the feeling, like very direct feeling of how the system works instead of just working on those toy systems with your pen. I think it's going to be a very good educational tool using MatterSim. Yeah. Also MatterGen. MatterGen is like a generative tools and generating those ID distributions. It will be a perfect example to show the students how the Boltzmann distribution works.

Starting point is 00:41:40 I think Tian, you will agree with that, right? A hundred percent. Yeah. I really, really liked the example that you mentioned about the educational purposes. I still remember when I was kind of learning material simulation class, right? So everything is DFT. You kind of need to wait for an hour, right? For getting some simulation, maybe then you make some animation. Now you can do this in real time.

Starting point is 00:42:05 This is a huge step forward for our young researchers to gain a sense about how atoms interact at an atomic level. And the results are really true, not really just toy models. I think it's going to be very exciting stuff. And Tian, I'm directing this question to you, even though Ziheng, I'm sure you can chime in as well. But Tian, I know that you and I have previously discussed this specifically. I know that you said back in, you know, 2017, 2018, that you knew an AI-based approach to material science was possible, but that even you were surprised by how far the technology has come so fast in aiding this area. What is the status of these tools right now?

Starting point is 00:43:01 Are they in use? And if so, who are they available to? And what's next for them? Yes, this is a fantastic question. So I think for a generation of AI tools like MetaGen, as I said many times earlier, it's still in the early 80s stages. MetaGen is the first tool that we managed to show that generationity AI can enable very broad property guided generation and we have managed to have experimental validation to show it's possible. But it would take more work to show, okay, it can actually design batteries, can design solar cells, can design really useful materials in these broader domains. So this is kind of exactly why we are now taking a pretty open approach with MetaGen.

Starting point is 00:43:54 We make our code, our training data, and model weights available to the general public. We're really hoping the community can really use our tools to the problem that they care about and even build on top of that. So in terms of what next, I always like to use what happened with generative AI for JAX to kind of predict how generative AI will impact materials. Three years ago, there was a lot of research around generative model model for drugs, first coming from the machine learning community, right? So then all the big drug companies begin to take notice, and then there are kind of researchers

Starting point is 00:44:34 in these drug companies begin to use these tools in actual drug design processes. From a colleague, Marvin Swagger, because he kind of works together with Novartis in Microsoft and Novartis collaboration, he has been basically telling me that at the beginning, all the chemists in the drug companies, they're all very suspicious, right? The molecules generated by these genetic models, they all look a bit weird, so they don't believe this will work. But once this chemistry sees one or two examples that actually turns out to be performing pretty well from the experimental result, then they

Starting point is 00:45:12 begin to build more trust into these genetic models. And today, these genetic tools, they are part of the standard drug discovery pipeline that is widely used in all the drug companies that is today. So I think Genentech Air Force materials is going to go into a very similar period.

Starting point is 00:45:36 People will have doubts. People will have suspicions at the beginning. But I think in three years, right? So it will become a standard tool over how people are going to design new solar cells, new design, new batteries, and many other different applications. Great. Ziheng, do you have anything to add to that? So actually for medicine, we released a model,

Starting point is 00:46:00 I think back in last year, December. I mean, both the waste and the model, right? So we're really grateful how much the community has contributed to the Revo. And now, I mean, we really welcome the community to contribute more to both MatterSync and MatterJam via our open source code bases. So I mean, the community effort is really important.

Starting point is 00:46:27 Well, it has been fascinating to pick your brains. And as we close, I know that you're both capable of quite a bit, which you have demonstrated. I know that asking you to predict the future is a big ask, so I won't explicitly ask that. But just as a fun thought exercise, let's fast forward 20 years and look back.

Starting point is 00:46:50 How have MatterGen and MatterSim and the big ideas behind them impacted the world? And how are people better off because of how you and your teams have worked to make them a reality? Tian, you wanna start? Yeah, I think one of the biggest challenges our human society is going to face in the next 20 years is going to be climate change. And there are so many material design problems people need

Starting point is 00:47:16 to solve in order to properly handle climate change, like finding new materials that can absorb CO2 from the atmosphere to create a carbon capture industry or have battery materials that is able to do large-scale energy-grade storage so that we can fully utilize all the wind powers and the solar power etc. So if you want me to make one prediction, I really believe that these AI tools like MetaGem and MetaSync is going to play an essential role in our humans' like to see we have already solved climate change. We have large-scale energy storage systems that was designed by AI that is basically that we have removed all the fossil fuels from our energy production. And for the rest of the carbon emissions, that is very hard to remove. We will have a carbon capture industry with materials designed by AI

Starting point is 00:48:35 that absorbs this CO2 from SPVL. It's hard to predict exactly what will happen, but I think AI will play a key role into defining how our society will look like in 20 years. Tian, very well said. So I think instead of really describing the future, I would really quote a science fiction scene in Iron Man.

Starting point is 00:48:58 So basically in 20 years, I will say when we want to really get a new material, we will just sit in an office and say, well, Jarvis, can you design us a new material that really fits my newest MK7 suit? That will be the end. And it will run automatically, and we get this autolab running and all those better gen medicine, these AI models running.

Starting point is 00:49:17 And then probably in a few hours, in a few days, we get the material. Well, I think I speak for many people from several industries when I say that I cannot wait to see what is on the horizon for these projects. Tianan Zihang, thank you so much for joining us on Ideas. It's been a pleasure. Thank you so much. Thank you.

Microsoft Research Podcast - Ideas: AI for materials discovery with Tian Xie and Ziheng Lu

How do you generate and test materials that don’t exist yet? Researchers Tian Xie and Ziheng Lu share the story behind MatterGen and MatterSim, AI tools poised to transform materials discovery and h...elp drive advances in energy, manufacturing, and sustainability.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Microsoft Research Podcast - Ideas: AI for materials discovery with Tian Xie and Ziheng Lu

How do you generate and test materials that don’t exist yet? Researchers Tian Xie and Ziheng Lu share the story behind MatterGen and MatterSim, AI tools poised to transform materials discovery and h...elp drive advances in energy, manufacturing, and sustainability.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.