Microsoft Research Podcast - Ideas: AI for materials discovery with Tian Xie and Ziheng Lu
Episode Date: January 16, 2025How do you generate and test materials that don’t exist yet? Researchers Tian Xie and Ziheng Lu share the story behind MatterGen and MatterSim, AI tools poised to transform materials discovery and h...elp drive advances in energy, manufacturing, and sustainability.
Transcript
Discussion (0)
So the problem of generating materials from properties is actually a pretty old one.
I still remember back in 2018 when I was giving a talk about property prediction models,
one of the first questions people asked is,
instead of going from material structure to properties,
can you kind of inversely generating materials directly from their
property conditions. So in a way this is a kind of like a dream for material
scientists because the end goal is really about finding materials
properties who satisfy your application. Previously a lot of people
are using this mystic simulators and this generation models alone. But if you think about it,
now that we have these two foundation models together,
it really can make the things different.
You have a very good idea generator and you have
every good goalkeeper and you put them together.
They form a loop and now you can
use this loop to design your materials really quickly.
You're listening to Ideas, a Microsoft Research podcast that dives deep into the world of technology research and the profound questions behind the code.
In this series, we'll explore the technologies that are shaping our future and the big ideas that propel them forward.
I'm your guest host, Lindsay Coulter. Today, I'm talking to Microsoft Principal Research Manager Tian Xia and Microsoft Principal Researcher Ziheng Liu. Tian's doing
fascinating work with MatterGen, an AI tool for generating new materials guided by specific
design requirements. Ziheng is one of the visionaries behind MatterSim, which puts those new materials to the test
through advanced simulations.
Together, they're redefining what's possible in material science.
Tian and Zihang, welcome to the podcast.
Very excited to be here.
Thanks, Lindsay.
Very excited.
Before we dig into the specifics of MatterGen and MatterSim, let's give our audience a sense of how you as researchers arrived at this moment.
Material science, especially at the intersection of
computer science is such a cutting edge and transformative field.
What first drew each of you to this space?
What, if any moment or experience made you realize this was where you wanted to innovate?
Tian, do you want to start? So I started working on AI for materials back in 2015 when I started my PhD.
So I come as a chemist and material scientist, but I'm kind of figuring out what I want to
do during my PhD.
So there is actually one moment really drove me into the field. That was
AlphaGo. AlphaGo was kind of coming out in 2016, where it was able to beat the world champion in
Go in 2016. I was extremely impressed by that because I kind of learned how to do Go. In my childhood, I know how hard it is
and how much effort those professional Go players
have spent in learning about Go.
So I kind of have the feeling that if AI can surpass
the world-leading Go players,
on one day, it will be able to surpass material scientists
in their ability to design innova materials.
So that's why I ended up deciding to focus in my entire PhD on working on AI for materials.
And I have been working on that since then. So it was actually very interesting because it was a
very small field back then. And it's great to see how much progress has made in the past 10 years and
how much bigger field it is now compared with 10 years ago. That's very interesting, Tian. So
actually I think I started like two years before you as a PhD student, I was trained as a computational material scientist solely, not
really an AI expert.
But at that time, the computational material science did not really work that well.
It works, but not working that well.
So after like two or three years, I went back to experiments for like another two or three
years because I think the experiments is always the gold standard,
right? And I worked on this experiment for a few years. And then about three years ago,
I went back to this field of computation, especially because of AI. At that time,
I think GPT and these large models, the currency we're using is not there, but we already have their prior forms like bird.
So we see the very large potential of AI.
We know that these large AIs might work.
So one idea is really to use AI to learn the entire space
of materials and really grasp the physics there.
And that really drove me to this field. And
that's why I'm here working on this.
We're going to get into what MatterGen and MatterSim mean for material science,
the potential, the challenges, and open questions. But first, give us an overview of what each of
these tools are, how they do what they do, and as this show is about big ideas,
the idea driving the work. Ziheng, let's have you go first.
So MetaSim is a tool to do in silico characterizations of materials. If you think
about working on materials, you have several steps. You first need to synthesize it, and then you need
to characterize this. Basically, you need to know what property, what structures, whatever stuff about these materials.
So for MetaSim, what we want to do is to really move the characterization process,
a lot of these processes, into using computations.
So the idea behind MetaSim is to really learn the fundamentals of physics. So we learn the energies and forces and stresses
from the atomic structures and the charge densities,
all of these things.
And then with these, we can really simulate
any sort of materials using our computational machines.
And then with these, we can really characterize
a lot of these materials properties using our
computer that is very fast it's much faster than we do experiments so that we can accelerate the
material design so just in a word basically you input your material into your computer a structure
into your computer and medicine will try to simulate this material
like what you do in a furnace in your XRD,
and then you get your properties out of that.
And a lot of times, it's much faster than you do experiments.
All right, thank you very much.
Tian, why don't you tell us about MatterGen?
Yeah, thank you.
So actually, Zihen, once you start with explaining MatterSim,
makes it much easier for me to explain MetaGem.
So MetaGem actually represents a new way to design materials with generative AI.
Material discovery is like finding needles in a haystack.
You're looking for a material with a very specific property for a material application.
For example, like finding a room temperature superconductor
or finding a solid that can conduct a lithium ion very well inside a battery.
So it's like finding one very specific material from a million candidates.
So the conventional way of doing material discovery is via screening,
where you're going to go over millions of candidates to find the one that you're looking for, where MetaSim is able to significantly accelerate that process by making the simulation much faster.
But it's still very inefficient because you need to go through this a million candidates right so with metaGen you can kind of directly generate materials giving the the prompts of the design requirements for the application so this means that
you can discover materials it is for useful materials much more efficiently
and it also allows to explore a much larger space beyond
the set of known materials. Thank you, Tian. Can you tell us a little bit about how MatterGen
and MatterSim work together? So you can really think about MatterSim and MatterGen accelerating
different parts of material discovery process. MetaSim is trying to accelerate the simulation
of material properties, while MetaGen is trying to accelerate the search of novel material
candidates. It means that they can really work together as a flywheel and you can compound
the acceleration from both models. They are also both foundation AI models,
meaning they can both be used for broad range of materials design problems.
So we're really looking forward to see how they can kind of working together
iteratively as a tool to design
novel materials for a broad range of applications.
I think that's a very good general introduction of how they work together.
I think I can provide an example of how they really fit together.
If you want a material with a specific bulk modulus or lithium ion conductivity
or thermal conductivity for your CPU chips.
So basically what you want to do is start with a pool of material structures like some
structures from the database and then you compute or you characterize your wanted property from that
stack of materials and then what you do you've got these properties and structure pairs and you input these pairs into mattergen and mattergen will be able to
give you a lot more of these structures that are very highly possible to be real but the number
will be very large for example for the bulk modules i don't remember the number we generated
in our work was it was that like a thousandth tenth like thousands, tens of thousands?
Thousands, tens of thousands.
Yeah, that would be a very large number of pool,
even with MatterGen.
So then the next step would be,
how would you like to screen that?
You cannot really just send all of those structures
to a lab to synthesize.
It's too much, right?
Then that's when MatterSIM again comes medicine again comes in so medicine comes in and screen
all those structures again and see which ones are the most likely to be synthesized and which ones
have the closest property you wanted and then after screening you probably get five ten top
candidates and then you send to a lab boom everything goes down that's that's it
i'm wondering if there's any prior research or advancements uh that you drew from in creating
matter gen and matter sim were there any specific breakthroughs and that influenced your approaches
at all thanks lindsay i think i will take that question first. So, interestingly, for MetaSim,
a very fundamental idea was true from Chi Cheng,
who was a previous lab mate of mine
and now also works for Microsoft at Microsoft Quantum.
He made this fantastic model named M3DNet,
which is a prior form of a lot of these large-scale models for
atomistic simulations. That model, M3GNet, actually resolves the near-ground state
prediction problem. I mean, the near-ground state problem sounds like a fancy but not
realistic word, but what that actually means is that it can simulate materials at near
zero kevin states so basically at very low temperatures so at that time we were thinking
since the the models are now able to simulate materials at their near ground states it's not
very large space but if you also look at other larger models like GPT, whatever, those models are large enough to simulate entire human language.
So it's possible to really extend the capability from these such prior models to very large space Theme to learn the entire space of materials.
I mean, the entire space really means the entire periodic table,
all the temperatures and the pressures people can actually grasp.
Yeah, I still remember a lot of the amazing works from Qi Chen
when we were kind of back working on property prediction models.
The problem of generating materials from properties is actually a pretty old one.
I still remember back in 2018 when I was working on CGCNN and giving a talk about property prediction models,
one of the first questions people asked is,
okay, can you inverse this process?
Instead of going from material structure to properties,
can you kind of inversely generate materials directly
from their property conditions?
So in a way, this is kind of like a dream for material scientists.
Some people even call it like holy grail because the end goal is really about finding materials
property who satisfy your application. So I've been kind of thinking about this problem for a
while and also there has been a lot of work over the past few years in the community to build a generative model for materials a lot of people have tried in the in the before like 2020
using ideas like VAEs or GANs but it's hard to represent materials in this type of
generative model architecture and many of those models generated relatively poor candidates.
So I thought it was a hard problem, I kind of know it for a while, but there is no good solutions back then.
So I started to focus more on this problem during my postdoc when I studied that in 2020 and keep working on that in 2021. At the beginning, I wasn't really
sure exactly what approach to take because it's kind of like open question and really tried a lot
of random ideas. So one day actually in my group back then with Tomi Yakala and Regina Basel at MIT
CCL, we kind of get to know this methodical diffusion model.
It was a very early stage of a diffusion model back then,
but it already began to show a very promising science
kind of achieving state of art in many problems
like 3D point cloud generation
and the 3D molecular conformal generation.
So the work that really inspired me a lot is two work that was for molecular conformer generation.
One is ConfGF and one is GeoDiff.
So they kind of inspired me to kind of focus more on diffusion models that actually lead to CDVAE.
So it's interesting that we kind of spent like a couple of weeks
in trying all this diffusion idea.
And without that much work, it actually worked quite out of the box.
And at that time, CDVAE achieved a much better performance
than any previous models in materials generation.
And we're super happy with that.
So after CDVAE, I joined Microsoft, now working with more people together on this problem
of a generative model for materials.
So we kind of know what the limitations of CDVA
are is that it can do unconditional material generation well, means it can generate novel
material structures, but it is very hard to use CDVA to do property guided generation. So basically
it uses an architecture called a variational autoencoder, where you have a latent space.
So the way that you do property-guided generation there was to do a kind of a gradient update inside the latent space.
But because the latent space wasn't learned very well, so it actually, you cannot do kind of a good public guided generation. We only managed to do energy guided generation,
but it wasn't successful in going beyond energy.
So that comes us to really thinking, right?
How can we make the public guided generation much better?
So I remember like one day, actually, my colleague,
Daniel Zingner, who actually really showed
me this blog, which basically explains this idea of classifier-free guidance, which is
the powerhouse behind the text image generative models.
And so yeah, then we began to think about, okay, can we actually make the diffusion model
work for classifier-free guidance that lead us to remove the kind of
the variational autoencoder component from CDDAE and begin to work on a pure diffusion
architecture.
But then there was kind of a lot of development around that.
But it turns out that classifier-free guidance is the key really to make public guided generation work. And then combine it with a lot more effort in kind of improving architecture and also generating more data.
And also trying out these different downstream tasks that job of explaining how MatterGen and MatterSim
work together and how MatterGen can offer a lot in terms of reducing the amount of time and work
that goes into finding new materials. Tian, how does the process of using MatterGen to generate
materials translate into real world applications? Yeah, that's a fantastic question.
So one way that I think about MetaGen, right,
is that you can think about it as like a co-pilot for material scientists, right?
So they can help you to come up with kind of potential good hypothesis
for the material design problems that you're looking for.
So say you're trying to design a battery, right? So you may have some ideas over, okay, what candidates you want to make, but this is kind
of based on your own experience, right?
Depths of experience as a researcher.
But MetaGen is able to kind of learn from a very broad set of data.
So therefore, it may be able to come up with some good suggestions, even surprising
suggestions for you so that you can kind of try this out, right? Both with computation or even
one day in web lab and experimentally synthesize it. But I also want to note that this, in a way,
this is still an early stage in generating AI for materials, means that I don't expect all the candidates,
metagenomics will be kind of suits your needs, right?
So you still need to kind of look into them with expertise
or with some kind of computational screening.
But I think in the future, as this model keep improving themselves,
they will becoming a key component, right?
In the design process of many of the materials we're seeing today,
like designing new batteries, new solar cells, or even computer chips,
like Zihen mentioned earlier.
I want to pivot a little bit to the MatterSim side of things.
I know identifying new combinations of compounds is key to meeting
changing needs for things like sustainable materials, but testing them is equally important
to developing materials that can be put to use. Ziheng, how does MatterSim handle the uncertainty
of how materials behave under various conditions, and how do you ensure that the predictions remain robust despite the
inherent complexity of molecular systems? That's a very very good question. So uncertainty
quantification is a key to make sure all these predictions and simulations are trustworthy.
And that's actually one of the questions we got almost every time after a presentation. So people will ask, well, especially those experimentalists would ask, well, I've been
using your model, how do I know those predictions are true under the very complex conditions
I'm using in my experiments?
So to understand how we deal with uncertainty, we need to know how medicine really functions in predicting an arbitrary property, especially under the condition you want, like the temperature and pressure.
That would be quite complex, right?
So in the ideal case, we would hope that by using medicine, you can directly simulate the properties you want using molecular dynamics combined with statistical mechanics.
So if so, it will be easy to really quantify the uncertainty because there are just two
parts, the error from the model and the error from the simulation, the statistical mechanics.
So the error from the model will be able to be measured by an example.
So basically you start with different random C's when you train the model,
and then when you predict your property, you use several models from the example,
and then you get different numbers.
If the variance from the numbers are very large, you'll see the prediction is not that trustworthy.
But a lot of times you will see the variance is very small.
So basically an example of several different models will give you almost exactly the same
number.
You're quite sure that the number is somehow very useful.
So that's one level of the way we want to get our property.
But sometimes it's very hard to really directly simulate the property you want.
For example, for catalytic processes, it's very hard to imagine how you really get those coefficients.
It's very hard. The process is just too complicated. So for that process, what we do is to really use what we call embeddings learned
from the entire material space.
So basically, that vector we learned
for any arbitrary material.
And then, to start from that, we build a very shallow layer
of a neural network to predict the property.
But that also means you need to bring in some
of your experimental or simulation
data from your side.
And for that way of predicting a property to measure the uncertainty, it's still like
the two levels, right?
So we don't really have the statistical error anymore, but what we have is like only the
model error.
So you can still stick to the example, and then it will work, right?
So to be short, MatterSim can provide you an uncertainty
to make sure the prediction tells you whether it's true or not.
So in many ways, MatterSim is the realist in the equation,
and it's there to sort of be a gatekeeper for MatterGen,
which is the idea generator.
I really like the analogy.
As is the case with many AI models,
the development of MatterGen and MatterSim
relies on massive amounts of data.
And here you use a simulation to create the needed training data.
Can you talk about that process
and why you've chosen that approach, Tian?
So one advantage here is that
we can really use a large-scale simulation
to generate data.
And so we have a lot of compute here
at Microsoft on our Azure platform.
So how we generate the data
is that we use a method
called density functional theory, DFT,
which is a quantum mechanical method.
And we use a simulation workflow built on top with DFT
to simulate the stability of materials.
So what we do is that we curate a huge amount
of material structures
from multiple different sources of open data,
mostly including material projects and the Alexandria database.
And in total, there are around 3 million material candidates
coming from these two databases.
But not all of these all these structures are stable. So therefore,
we try to use DFT to compute their stability and try to filter down the candidates such that we are making sure that our training data only which was used to train the base model of MetaGen.
So I want to note that actually we also use MetaSim as part of the workflow because MetaSim
can be used to prescreen unstable candidates so that we don't need to use DFT to compute
all of them.
I think at the end we compute around 1 million DFT calculations,
where two-thirds of them are already filtered out by MatterSim, which saves us a lot of compute
in generating our training data. Tian, you had a very good description of how we really get those
ground state structures for the MatterGem model. Actually actually we've been also using Matter-Gen for MatterSim
to really get the training data.
So if you think about the simulation space of materials,
it's extremely large.
So we would think it in a way that it has three axes.
So basically the elements, the temperature and the pressure.
So if you think about existing databases, they have pretty good coverage of the element
space.
Basically, if you think about materials projects, NOMAD, they really have this very good coverage
of lithium oxide, lithium sulfide, hydrogen sulfide, whatever, those different ground
state structures.
But they don't really tell you how these materials behave under certain temperature
pressure, especially under those extreme conditions like 1600 Kelvin, which you really use to
synthesize your materials.
That's where we really focus on to generate the data for medicine.
So it's really easy to think about
how we generate the data, right?
You put your wanted material into a pressure cooker,
basically molecular dynamics.
You can simulate the material's behavior
on the temperature and pressure.
So that's it.
Sounds easy, right?
But that's not true because
what we want is not one single material. What we want is the entire material space. So that will be
making the effort almost impossible because the space is just so large. So that's where we really
develop this active learning pipeline. So basically what we do
is we generate a lot of these structures for different elements and temperatures and pressures,
really really a lot. And then what we do is we ask the active learning or the uncertainty
measurement to really say whether the model knows about this structure already. So if the model thinks, well, I think
I know the structure already, so then we don't really calculate this structure using density
function theory as Tian just said. So this will really save us like 99% of the effort in generating
the data. So in the end, by using combining this molecular dynamics,
basically pressure cooker, together with active learning, we gathered around 17 million data for
medicine. So that was used to train the model. And now it can cover the entire table and a lot
of temperature pressures. Thank you, Ziheng.
I'm sure this is not news to either one of you,
given that you're both at the forefront of these efforts,
but there are a growing number of tools aimed at advancing material science.
So what is it about MatterGen and MatterSim in their approach or capabilities that distinguish them?
Yeah, I think I can start.
So I think
there is, in the past one year, there is a huge interest in building up generative AI tools for
materials. So we have seen lots and lots of innovations from the community published in top conferences like NeurIPS, iClear, ICM, etc.
So I think what distinguishes MetaGen,
in my point of view, are two things.
First is that we are trained with a very big data set
that we curate very, very carefully.
And we also spend quite a lot of time
to refining our diffusion architecture, which means that our model is capable of generating very high quality, highly stable and novel materials.
We have some kind of bar plot in our paper that showcasing the advantage of our performance.
I think that's one key aspect. And I think the second aspect,
which in my point of view is even more important, is that it has the ability to do property guided
generation. Many of the tools that we saw in the community, they are more focused on the problem of
crystal structure prediction, which MediGen can also do. But we focus more on really property-guided generation
because we think this is one of the key problems that really material scientists cares about.
So the ability to do a very broad range of property-guided generation, and we have both
computational and now experimental result
to validate those i think that's that's the second strong point for metagem
see hung do you want to add to that yeah thank you so on the medicine side i think it's really
the diverse condition you can handle that makes a difference we've been talking about like
the training data we collected
really covers the entire periodic table.
Also, more importantly, the temperatures from 0 Kelvin
to 5,000 Kelvin and the pressures from gigapascal
to 1,000 gigapascal.
That really covers what humans can control nowadays.
I mean, it's very hard to go beyond that.
If you know anyone who can go beyond that, let me know.
So that really makes MetaGen different, like it can handle the realistic conditions.
I think beyond that, I would say the combo between MetaGen and MetaGen really makes this set of tools really different.
So previously, a lot of people are using this mystic simulator and this generation model alone.
But if you think about it, now that we have these two foundation models together,
it really can make the things different, right?
So we have predictor, we have the generator, you have a very good idea generator,
and you have a very good goalkeeper, and you put them together.
They form a loop and now you can use this loop to design your materials really quickly so i would say to me now when i
think about it it's really the combo that makes this set of tools different i know that i've
spoken with both of you recently about how there's so much excitement around this. And it's clear
that we're on the precipice of this, as both of you have called it a paradigm shift. And Microsoft
places a very strong emphasis on ensuring that its innovations are grounded in reality and capable
of addressing real world problems. So with that in mind, how do you balance the excitement of scientific exploration
with the practical challenges of implementation?
Tian, do you want to take this?
Yeah, I think this is a very, very important point
because as there are so many hypes around AI
that is happening right now,
we must be very, very careful about the claims
that we are making so that people will not have unrealistic expectations over how these
models can do.
So for MetaGen, we're pretty careful about that.
We're trying to say that this is an early stage of generative AI in materials,
where this model will be improved over time quite significantly,
but you should not say, oh, all the materials generated by MetaGen is going to be amazing.
That's not what is happening today so we we try to be very careful about to understand how far the metagen
is already capable of designing materials with real-world impact so therefore we went all the way
to synthesize one material that was generated by metagen so this material we generated is called tantalum chromium oxide. So this is a new material.
It has not been discovered before and it was generated by a metagen by conditioning a bulk
modulus equals to 200 gigapascal. Bulk modulus is like the compressiveness of the material. So we end up measuring the experimental synthesized
material experimentally.
And the measured bulk modulus is 169 gigapascal,
which is within 20% of error.
So this is a very good proof concept in our point of view
to show that, oh, you can actually give it a
prompt right and the MetaGen can generate a material and then the property
the material actually have the property that is very close to your target but
it's still a proof concept and we're still working on to see how MetaGen can
design materials that are much more useful
and with a much broader range of applications.
And I'm sure that there will be more challenges
we are seeing along the way,
but we're looking forward to further working
with our experimental partners to kind of push this further
and also working with MetaScene, right,
to see how these two tools can be used to design
really useful materials and bringing this into real-world impact.
Yeah, Tian, I think that's very well said.
It's not really only for MetaGen, and for MetaSIM, we're also very careful.
So we really want to make sure that people understand
how these models really behave under their instructions
and understand like
what they can do and they cannot do. So I think one thing that we really care
about is that in the next a few maybe one or two years we want to really work
with our experimental partners to make these realistic materials like in
different areas so that we can even us can really better understand the limitations
and at the same time explore the forefront of material science
to make this excitement become true.
Zi-Heng, could you give us a concrete example
of what exactly MatterSim is capable of doing?
Now MatterSim can really do whatever you have on a potential energy surface.
So what that means is anything that can be simulated with the energy and forces stresses
alone.
So to give you an example, we can compute.
The first example would be the stability of a material. So basically you input
the structure and from the energies of the relaxed structures you can really tell whether the
material is likely to be stable like the composition right. So another example would be the
thermal conductivity. Thermal conductivity is like a fundamental property of materials that tells you how fast heat can transfer into the material.
So, for medicine, it can really simulate how fast this heat can go through your diamond, your graphene, your copper.
So, basically, those are two examples.
So, these examples are based on energies and forces alone. But there are things matter sim cannot do, at least for now.
For example, you cannot really do anything related to electronic structures.
So you cannot really compute the light absorption of a
semi-transparent material. That would be a no-no for now.
It's clear from speaking
with researchers, both from MatterSim and MatterGen, that despite these very rapid advancements in technology, you take very seriously the responsibility to consider the broader implications of creating entirely new materials and simulating their properties, particularly in terms of things like safety, sustainability, and societal impact?
Yeah, that's a fantastic question.
So it's extremely important that we are making sure that these AI tools, they are not misused. A potential misuse, as you just mentioned, is that people
begin to use these AI tools, MetaGen, MetaSim, to design harmful materials. There was actually
an extensive discussion over how generative AI tools that was originally purposed for junk design can be then misused to create bioweapons.
So at Microsoft, we take this very, very seriously because we believe that when we create new
technologies, you must also ensure that the technology is used responsibly.
So we have an extensive process to ensure that all of our models respect those ethical considerations.
In the meantime, as you mentioned, maybe sustainability and the societal impact, right?
So there's a huge amount that AI tools management can do for sustainability
because a lot of the sustainability challenges, they
are really at the end of materials design challenges, right?
So therefore, I think managerial medicine can really help with that in solving, in helping
us to alleviate climate change and having positive societal impact for the broader society.
And Ziheng, how about from a simulation standpoint? Yeah, I think Tian gave a very good description. So at Microsoft, we're really careful about these
ethical considerations. So I would add a little bit on the more like the bright side of things like so for
medicine like it really carries out these simulations at atomic scale. So one thing you can
think about is really the educational purpose. So back in my graduate, back in my bachelor and PhD
period, so I would sit like at the table and really grab a pen, really deal with those
very complex equations and get those statistics using my pen. It's really painful. But now
with MatterSim, these simulation tools at atomic level, what you can do is to really simulate the reactions, the movement of atoms at atomic scale in real time.
You can really see the chemical reactions and see the statistics.
So you can get really the feeling, like very direct feeling of how the system works instead
of just working on those toy systems with your pen.
I think it's going to be a very good educational tool using MatterSim.
Yeah.
Also MatterGen.
MatterGen is like a generative tools and generating those ID distributions.
It will be a perfect example to show the students how the Boltzmann distribution
works.
I think Tian, you will agree with that, right?
A hundred percent.
Yeah.
I really, really liked the example that you mentioned about the educational purposes.
I still remember when I was kind of learning material simulation class, right?
So everything is DFT. You kind of need to wait for an hour, right?
For getting some simulation, maybe then you make some animation.
Now you can do this in real time.
This is a huge step forward for our young researchers to gain a sense about how
atoms interact at an atomic level. And the results are really true,
not really just toy models.
I think it's going to be very exciting stuff.
And Tian, I'm directing this question to you, even though Ziheng, I'm sure you can chime in as well.
But Tian, I know that you and I have previously discussed this specifically. I know that you said back in, you know, 2017, 2018, that you knew an AI-based
approach to material science was possible, but that even you were surprised by how far the
technology has come so fast in aiding this area. What is the status of these tools right now?
Are they in use? And if so, who are they available to? And what's next for them?
Yes, this is a fantastic question. So I think for a generation of AI tools like
MetaGen, as I said many times earlier, it's still in the early 80s stages. MetaGen is the first
tool that we managed to show that generationity AI can enable very broad property guided generation
and we have managed to have experimental validation to show it's possible.
But it would take more work to show, okay, it can actually design batteries, can design solar cells,
can design really useful materials in these broader domains.
So this is kind of exactly why we are now taking a pretty open approach with MetaGen.
We make our code, our training data, and model weights available to the general public.
We're really hoping the community can really use our tools to the problem that they care
about and even build on
top of that. So in terms of what next, I always like to use what happened with generative AI for
JAX to kind of predict how generative AI will impact materials. Three years ago, there was a
lot of research around generative model model for drugs, first coming from
the machine learning community, right?
So then all the big drug companies begin to take notice, and then there are kind of researchers
in these drug companies begin to use these tools in actual drug design processes.
From a colleague, Marvin Swagger, because he kind of works together with Novartis in Microsoft
and Novartis collaboration, he has been basically telling me that at the beginning, all the chemists
in the drug companies, they're all very suspicious, right? The molecules generated by these
genetic models, they all look a bit weird, so they don't believe this will work. But once this chemistry sees one or two examples
that actually turns out to be
performing pretty well
from the experimental result, then they
begin to build more
trust into these
genetic models.
And today,
these genetic tools, they are part of
the standard drug discovery
pipeline that is widely used in all the drug companies that is today.
So I think Genentech Air Force materials is going to go into a very similar period.
People will have doubts.
People will have suspicions at the beginning.
But I think in three years, right? So it will become a standard tool
over how people are going to design new solar cells,
new design, new batteries,
and many other different applications.
Great. Ziheng, do you have anything to add to that?
So actually for medicine, we released a model,
I think back in last year, December.
I mean, both the waste and the model, right?
So we're really grateful how much the community
has contributed to the Revo.
And now, I mean, we really welcome the community
to contribute more to both MatterSync and MatterJam
via our open source code bases.
So I mean, the community effort is really important.
Well, it has been fascinating to pick your brains.
And as we close,
I know that you're both capable of quite a bit,
which you have demonstrated.
I know that asking you to predict the future is a big ask,
so I won't explicitly ask that.
But just as a fun thought exercise,
let's fast forward 20 years and look back.
How have MatterGen and MatterSim
and the big ideas behind them impacted the world?
And how are people better off
because of how you and your teams
have worked to make them a reality?
Tian, you wanna start?
Yeah, I think one of the biggest challenges our human society is going to face in the next 20
years is going to be climate change. And there are so many material design problems people need
to solve in order to properly handle climate change, like finding new materials that can
absorb CO2 from the atmosphere to create a carbon capture industry
or have battery materials that is able to do large-scale energy-grade storage so that we can
fully utilize all the wind powers and the solar power etc. So if you want me to make one prediction, I really believe that these AI tools like MetaGem and MetaSync is going to play an essential role in our humans' like to see we have already solved climate change.
We have large-scale energy storage systems that was designed by AI that is basically that we have removed all the fossil fuels from our energy production. And for the rest of the carbon emissions,
that is very hard to remove.
We will have a carbon capture industry
with materials designed by AI
that absorbs this CO2 from SPVL.
It's hard to predict exactly what will happen,
but I think AI will play a key role
into defining how our society will
look like in 20 years.
Tian, very well said.
So I think instead of really describing the future, I would really quote a science fiction
scene in Iron Man.
So basically in 20 years, I will say when we want to really get a new material, we will
just sit in an office and say, well, Jarvis, can you design us a new material
that really fits my newest MK7 suit?
That will be the end.
And it will run automatically,
and we get this autolab running
and all those better gen medicine,
these AI models running.
And then probably in a few hours,
in a few days, we get the material.
Well, I think I speak for many people from several industries when I say that I cannot
wait to see what is on the horizon for these projects.
Tianan Zihang, thank you so much for joining us on Ideas.
It's been a pleasure.
Thank you so much.
Thank you.