Catalyst with Shayle Kann - Can AI revolutionize materials discovery?
Episode Date: September 19, 2024AI is working its way across climate tech, helping companies discover giant lodes of ore, catch battery defects, and monitor energy infrastructure. Could it help us find revolutionary new materials, t...oo? Turns out, it’s complicated. In this episode, Shayle talks to Ekin Dogus Cubuk, or Dogus, a researcher focused on materials at Google DeepMind. DeepMind is one of several players, including Microsoft, trying to discover new materials that could be used in things like better battery chemistries, powerful carbon-capture sorbents, and room-temperature superconductors. But so far, Dogus says AI-powered approaches haven’t actually yielded any commercially-deployable materials. Shayle and Dogus cover topics like: Existing approaches to materials discovery, like experimentation and density functional theory, and how AI could complement those techniques Why AI may actually require a lot more lab work – and larger datasets – before it becomes useful for material discovery The types of material properties that AI may be especially useful for, such as optical or electric qualities Recommended resources Latitude Media: Armed with AI, Microsoft found a new battery material in just two weeks Google DeepMind: Millions of new materials discovered with deep learning Catalyst is brought to you by Kraken, the advanced operating system for energy. Kraken is helping utilities offer excellent customer service and develop innovative products and tariffs through the connection and optimization of smart home energy assets. Already licensed by major players across the globe, including Origin Energy, E.ON, and EDF, Kraken can help you create a smarter, greener grid. Visit kraken.tech. Catalyst is brought to you by Anza, a revolutionary platform enabling solar and energy storage equipment buyers and developers to save time, increase profits, and reduce risk. Instantly see pricing, product, and counterparty data and comparison tools. Learn more at go.anzarenewables.com/latitude. Catalyst is brought to you by Antenna Group, the global leader in integrated marketing, public relations, creative, and public affairs for energy and climate brands. If you're a startup, investor, or enterprise that's trying to make a name for yourself, Antenna Group's team of industry insiders is ready to help tell your story and accelerate your growth engine. Learn more at antennagroup.com.
Transcript
Discussion (0)
Latitude Media, podcast at the frontier of climate technology.
I'm Shail Khan, and this is Catalyst.
There is a certain amount that we know as humans,
and maybe we can use computation to predict a bit outside of that circle, sphere.
But the farther we get from the sphere, the less good our approximations will be.
Will materials discovery be the killer app for AI in climate tech,
or is it a lot harder than we think it is?
When utilities need flexible capacity they can count on, they turn to Energy Hub.
Energy Hub works with more than 170 utilities, coordinating over 2.5 million devices to manage
3.4 gigawatts of flexibility, built for the moments when utilities can't afford uncertainty.
Energy Hub builds and operates virtual power plants that utilities actually stake their grid
planning on, coordinating EVs, batteries, thermostats, and more through a single platform
built for utility scale.
predictive, verifiable, and designed to perform when it counts.
Learn more at Energy Hub.com.
Trillions of dollars are flowing into clean and critical infrastructure,
but those investments aren't driven by technology alone.
They're shaped by markets, by policy, by capital,
and by the institutions that connect them.
I'm Alfred Johnson, CEO of Crux,
and host of a brand new podcast, Critical Capital.
Each episode, I talk with people deploying capital,
shaping policy and building the clean economy.
tune in as we unpack how progress is actually made.
Listen to critical capital on Spotify, Apple, or wherever you get your podcasts.
Catalyst is supported by Fish Tank PR, an award-winning PR firm focused on climate and energy tech, renewables, and sustainability.
Fish Tank is known for generating prominent and effective media coverage for the brands they work with.
If you want a PR partner that's thoughtful, shoots straight, and gets results, you'll like Fish Tank PR.
To learn more about Fish Tank's approach, visit Fish,
tankpr.com. That's F-I-S-C-H-Fish-Tankpr.com.
I'm Shail Khan. I invest in revolutionary climate technologies and energy impact partners.
Welcome. Well, as the AI boom of the past couple of years has taken off, the way that I've
thought about the intersection between AI and what I do, which is climate tech, is that they're
kind of two distinct components. The first is the impact of the growth of AI, or really the
growth of AI data centers and compute on energy and therefore on climate. That one we've talked about
a bunch here. The other one, though, which we haven't talked about as much, are the actual applications
of AI for climate tech. And to be honest, one of the reasons that we haven't talked about it so much
is that I'm still kind of searching for what I think is a real tangible opportunity that would
drive big impact. Of course, the world abounds with ways to use AI to do things more efficiently,
but to really move the needle on gigatons of emissions, I think is trickier.
But here's one category I've been pretty curious about, which is AI for materials discovery.
Undoubtedly, a big part of the technical challenge of getting to Net Zero is a materials challenge.
And one of the areas that, at least on the surface, you can pretty easily imagine AI creating a step-function improvement,
is in doing the complex and currently quite slow work of discovering new materials.
So is this our AI climate killer app?
Let's find out.
For this one, I spoke to Doge Chubuk, who is a research scientist studying materials discovery at Google Deep Mind.
Here's Doge.
Doge, welcome.
Hi, Shail.
Thanks for having me.
I'm really excited to talk to you about AI for materials discovery.
I want to start by talking pre-AI.
So obviously, in the history of humanity, we've discovered many, many new materials.
We've commercialized many of them.
Just talk to me about before AI, just walk me through the process of new materials discovery broadly.
Yeah, that's a great question.
And it goes really back far, right?
So if you think about the invention of money, for example, I think one of the timelines for people talk about is when we invented gold or like gold money.
And if you think about how that happened, it turns out a lot of the oars have gold and silver mixed, so they're alloys.
And I think the big innovation there was when some humans found a way of extracting the gold and silver from each other.
And once you had pure gold, that was used as money.
You know, it's interesting to think about those times.
But in more recent times, I feel like one thing that's very relevant to this conversation is that a lot of materials discovery has been by random trial and error.
And it's been very serendipitous.
Actually, the more I look into this, the more I realize that almost all fields involve some kind of important
strandipolis discovery.
So one of the fun examples we often talk about is, you know, for light bulbs around 1905 or something,
they were realizing that tungsten is a good material, is a filament.
But tungsten wasn't ductile enough to be kind of like wrapped up as a coil.
And then apparently by mistake, one time it was dropped.
into a pool of liquid mercury,
and turns out when mercury and tungsten react,
it becomes more ductile.
So then came ductile tungsten.
So we can talk more about this,
but I think history is just full of examples like this.
Like if you think about the inventional kind of lithium ion battery,
I think one of the stories is at Exxon,
they were looking at discovering superconductors,
and they had this reason to think that lithium-ion-intercalation
could be interesting for studying superconductivity,
but as they were interclating lithium,
they realized it's actually really good at storing energy.
So, and this is one of the reasons I'm really excited about trying to see if AI and simulations can be useful here,
because a lot of the discoveries have just been randomly trying kind of relevant materials.
Maybe another example I can give you is, you know, in 1949, I think,
when Bardeen and his collaborator first invented the transistor, the solstay transistor.
And you can look at their diary, and you noticed that they tried so many different materials
for all the different parts of it because they just didn't know what would work.
So at the time, they were hoping silicon would work, but turns out silicon didn't work for them.
So they ended up switching to germanium, and that worked.
And then I think they had a problem with the glue, so they had to change the glue.
They had a problem with the metal electrode in the device.
So what I'm trying to say is that Bardeen was maybe one of the best Salisapis ever lived.
But even he was just randomly trying materials to make this transistor work.
Yeah, it's funny.
When you talk about the like accidentally dropping tungsten into a bed of, what is it, liquid mercury,
it makes me think sometimes of like I have a two and a half year old son.
I should just like put him in a chemistry lab, you know, with a bunch of materials in a bunch of different places
and just give him enough time and eventually he's going to accidentally do something
that's going to discover some amazing new material.
Obviously, I'm a better father than that.
But the transistor one is, I think, a good one to talk about because, okay, sure, in the arc
of human history, many of the important discoveries have been made purely accidentally.
But certainly in the past few decades, I would presume we've developed a body of knowledge
about the characteristics and properties of various materials.
And so if you're trying to do a material discovery, solve a material discovery problem,
I don't know, 10 years ago, probably you're not just doing totally random trial.
an error, right? Like, what is the depth of our, what was the depth of our knowledge and our ability
to iterate on different designs of materials and so on that went beyond the purely random,
again, prior to AI? Yeah, that's a great question. So, and obviously, right, even the examples I was
giving, they were not purely random. Like, for example, Bardeen knew that the semiconductor would be
something like silicon or germanium. Like, they both have, you.
know, four electrons in their last shelf.
Like, there was a lot of physical understanding.
They knew about the surface state of silicon.
So it's quite different than, I think, as you said, like a random kid going in,
although your kid is probably very smart, a random kid just randomly trying stuff.
But here comes, I think, a very interesting philosophical contradiction.
And I think this is true for science, but also for machine learning.
The better you know a system, the more you can continue optimizing the system.
But it doesn't necessarily mean that knowledge,
will help you discover something different.
And I think this is probably why a lot of important discoveries are serendipitous,
because, like, let's say you're in a company and your company is really an expert on material A.
So you're right, like you've developed so many important sets of expertise that you can really optimize A,
but most likely those skills don't help you discover C, which is very different or even B.
And this, I think, is definitely an issue with machine learning.
So, you know, in machine learning, we know that we do really well on the training set distribution,
like on the kinds of things you trained on.
And the farther you get from the training set distribution, the worse your predictions are.
And this is also true for science, right?
Like, the closer things are to the textbooks, the better our theories are at predicting them.
And this is partly why I think material discovery has become so difficult in,
the commercial space because
like if you think about plastics,
we're still mostly using things that we
discovered 70 years ago, 80 years
ago, and we've gotten so good
at, you know, manufacturing them,
optimizing them. So now for
someone to come up and say, oh, I discovered a completely
different one, it's quite difficult.
And for this
reason, yeah, it just lends itself to
us optimizing
non-materials and not
necessarily discovering completely new ones
that might be better. And it's probably also why
a lot of the materials we're using today in many technologies are quite simple.
Like if you think about transistor, it's just like pure silicon, right?
If you think about, I think in MR machines, the superconductors they're using
are quite a bit more simple and older than, you know, the new corporates and stuff.
So, yeah, it's quite common that I think we're having difficulty discovering new materials,
even if we're pretty good at modeling some of the older materials.
Okay, so to my layman's ear, I guess what I'm hearing is that
what we have gotten pretty good at historically is taking a sort of incremental step in material
discovery. We know a system, we know a category of material and so on. We can optimize the hell out of it.
Maybe this is what we've done with plastics over 80 years. What we have had a harder and harder
time doing, in part because maybe we've discovered the low-hanging fruit, is finding entirely novel
materials. And so that's maybe a good segue to talking about the new world of AI and whether and to what
extent it has a role to play in helping to crack that code. Because the fundamental thing I wonder
about is, okay, presumably the thing that makes it difficult to discover an entirely novel,
you know, option C or whatever you called it before is that the possibility space is virtually
endless. It's just a huge number of possibilities of things that you could you could do. And so the question
is, is our ability to do this kind of computation that AI is introducing? Does that make that easier
in the sense that you can just run a million combinations if theoretically you can simulate
the properties of materials? Or does it actually make it just as hard for exactly the reason you
describe, which is, you know, we have a corpus of data we're going to train these models on, but that
corpus of data is grounded in what we already know.
And so definitionally, it's going to be hard for it to find the next thing.
Yeah, that's a great question.
And, you know, it's not just AI.
So there are two things that are happening right in the recent decades.
So simulations are becoming more and more commonplace.
And there's probably very correlated with why AI is becoming very commonplace because
our computing infrastructure is growing and computer is getting cheaper.
So now we're getting to a point where we're better at training neural networks.
We're better at using them.
and we're better at simulating atoms and materials,
and we're doing it for cheaper.
But I think exactly as you said,
both simulations like density functional theory
and machine learning have kind of the same bias
as the regular pre-simulation science, theoretical science,
and maybe it's even worse,
because humans as biased as they might be when doing science,
they also clearly have this ability to extrapolate.
Like humans have found ways of discovering things that were beyond their theories.
There's been like these paradigm shifts.
And AI hasn't really done this yet.
Even today's best AI models seem to be really good at kind of doing the textbook stuff, you know, like high school, college.
But then when you think about being more creative and, you know, trying to shift the paradigm, it's been more difficult.
Okay, so that's the pessimistic part.
But I think the optimistic part is even the less creative parts of science,
actually could really benefit from becoming more efficient.
So let me give you an example on that.
So if you think about high temperature superconductivity,
as you know, this is different than conventional superconductivity,
but it can be much higher transition temperature.
And we still don't know as physicists
where high temperature superconductivity comes from.
It's like a crazy thing.
You know, it's been around for 50 years, 40 years.
We don't know why it happens.
But we can still optimize it.
So the first high-temperature supercondu
that was discovered was it's called L-B-C-O.
And that's important because it's like L-E-L-is-Lanthinum, B is barium,
and then copper-oxide is the COO.
So I think when Mueller and his collaborator first discovered this at IBM,
I think people thought that Mueller was crazy
for considering coup rates for superconductors
because all other superconductors were B-CS and they were different.
But if you look at Mueller's Nobel Prize speed,
He actually talks about how he used the old understanding, the conventional superconductivity,
to be able to consider Cooperate as an example.
But we now know that it's actually not a great transfer because Cooperate is actually quite different
than conventional superconductors.
Okay.
So LBCO turns out to be quite interesting, but not good enough.
So then what people did is, even though they don't know why it's a superconductor,
they started replacing elements with similar elements.
And then the first one really made it, and the reason I'm saying it made it is because,
because it was at a temperature above the liquid nitrogen.
And there was the YBCO.
So what you notice is just the lantanum was replaced with an element.
And so humans here were able to find a good enough superconductor,
even though they didn't understand why it was a superconductor.
Okay.
So I think what computation and machine learning can give us here is,
even if they can't do the paradigm shift and go from Kuprae to a completely different superconductor,
they can at least help us do this optimization,
exploitation part to go from LBCO to YBCO faster.
And soon when we start talking about our own work, you'll see clear examples of this.
So would I be right in that example to think, okay, we have some high-temperature
superconductors.
And one of the things that as a non-physicist has always bugged me about that terminology,
like high-temperature superconductors still very cold, need to be extremely cold, right?
The Holy Grail, of course, is a room-temperature superconductor, which we have not yet discovered.
would it be right to think, okay, maybe the type of thing that computational machine learning might be good at is optimizing the high temperature, optimizing and tweaking the recipes that we've got for high temperature superconductors, probably less likely at least today to discover the room temperature superconductor because that probably requires some completely orthogonal type of thinking?
Yeah, I think that's exactly right.
So if you look at last year, the last few years, the most promising discoveries on the computational side,
they've been looking at hydrides, which might have conventional superconductivity,
and at high enough pressure, they might have high enough temperature.
So there's been some really good coming out of Picard Group and a few other groups,
where they use simulations to study these.
And the hope exactly, like you said, is maybe take a known kind of superconductivity and optimize it to get as close room temperature as possible.
Maybe that's a good segue to giving some examples of the types of things that in recent years, since the boom in AI, like, what has been proven to work so far? What have we discovered collectively using AI that perhaps either we wouldn't have otherwise or would have taken a whole lot longer and more work and effort? Like, tangibly, what have we shown?
You know, I think not a whole lot.
And, you know, I would actually even make the question a bit larger and ask, like, what has
simulation given us that was not possible?
Because, you know, we have to realize that simulations have started becoming a thing in material
science since early 2000s or even late 1990s.
And so it's been several decades at this point.
And it's important to ask what has that given us as an actual material in product?
that we didn't have before.
And I think that's the crucial question.
So there's one example that often gets talked about.
I think one of the cathode materials maybe from Seder Group and Materials project,
I think is in Dura cell batteries.
So there's one example that's known.
This is like a bit out of material science,
but in topological insulators,
the first three-dimensional topological insulators were proposed in a DFT paper.
but otherwise, yeah, so it's actually not that great.
I mean, materials discovery in general is quite hard, right, for the reasons we mentioned.
So it's usually just like experimentalist randomly trying stuff.
But the good news is, I think, the simulations and machine learning has been making progress,
not yet in putting materials in devices, products, but at least in making useful predictions.
virtual power plants are becoming a reliable way for utilities to manage capacity.
But enrolling devices is just the start.
What really matters is confidence, knowing those resources will perform when dispatched,
and being able to prove it, from the control room to the living room.
Energy Hub's platform handles the full picture, from near real-time forecasting,
locational dispatch, and the kind of rigorous verification that holds up when regulators,
grid operators, or leadership, ask, did it deliver?
easy enrollment creates momentum, proven performance builds trust.
That's why more than 170 utilities rely on Energy Hub to manage over 2.5 million devices
delivering 3.4 gigawatts of flexible capacity. See what that looks like at energyhub.com.
We're living through a profound economic shift, and energy sits at the center of all of it.
Trillions of dollars are flowing into power plants, transmission lines, battery factories,
data centers, but the future of energy isn't shaped by technology alone. It's shaped by markets,
by policy, by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux,
the capital platform for the clean economy. Join me for my brand new show, Critical Capital,
as I talk with people deploying capital, shaping policy and building projects. Together,
we unpack how risk is priced, how incentives are structured, and how progress is actually made.
Listen to Critical Capital on Spotify, Apple, or wherever you get your podcasts.
Are you tired of overpaying for big-name PR firms, but not really knowing what they're delivering?
Is your comms team wasting time reviewing lengthy messaging briefs and decks, instead of engaging journalists or producing content?
Are you wondering why your competitors are getting press and you aren't?
Fish Tank PR is an award-winning climate and energy tech, renewables, and sustainability-focused PR firm
dedicated to elevating the work of both early stage and established companies.
Whether you need to position yourself as a thought leader in between project announcements or translate complex ideas and technologies into tangible, compelling stories that resonate with the media, Fish Tank can help.
Check out fish tankpr.com. That's F-I-S-C-H-Fish-Tankpr.com.
We talked about this briefly, but I guess I want to understand it a little bit better.
You know, folks are going to be familiar with AI in the form of LLMs and things like that, all the degenerative AI,
world. And one of the benefits that that world has is that the corpus of training data for those models
is enormous. You're like training LLMs on the internet, basically. How does it look when you're trying
to do simulations for the purpose of material science? Like how big is the data upon which you can
train? Is it big enough? And do we need to be generating an enormous amount of synthetic data in order
to sufficiently train these models? Like, is that a real constraint here? Yeah, that's a great
question. So, you know, if you look at ICSD, which is the inorganic crystal structure database,
there are about a bit more than 200,000 inorganic crystals there. So they kind of tells you that it's
quite a bit smaller than the Internet scale. One good news for us is, as you said, we can simulate
data, and the simulations come from our physical approximations of quantum mechanics. So they tend
to be somewhat informative. And with density functional theory simulations, we can do a lot
more than 200,000. So, you know, if you look at our known paper, we had results for several
million training points, and people have been actually pushing that. So now there are many
different groups that have results for like 50 million points. So that's one thing. But then the
question is how many experimental data points is worth how many computational data points.
And but this is actually not that different than the internet scale. So when LLMs are trained on
the internet, the data is.
is not very high quality, it's just sentences.
And they aren't necessarily very good labels for them.
The sentences could be written on the internet.
It's not very high quality.
But as you mentioned, what can be quite effective is you pre-trained on the internet
and then you fine-tune on specific tasks.
And that might be pretty similar to us.
Like, if we end up having these hundreds or millions of points from computation
and then a few points, like 100,000 points from experiment,
maybe then we can get some good results.
Part of the big issues is, so 200,000 crystals on ICSD I mentioned, but for many of them, we actually don't know the properties.
So, like, what's their band gap?
Was their electronic conductivity?
And that even becomes a smaller set.
Like, then you may have like 1,000, 2,000 data points.
And it's a real problem, I think, yeah.
Yeah, that really drives home why this is challenging.
You have 1,000 to 2,000 data points where you actually understand the properties of a thing that you're training your data on.
And that seems those numbers, even I know, that's tiny.
I guess the other question, I think people can imagine, right, like the sort of world
of material discovery, particularly the way you describe it historically, right, when this stuff
has happened semi-accidentally, is a lot of, like, physical trial and error.
You're like doing something in a lab.
People are doing something in a lab and seeing what happens and then measuring those results
and then inferring something and moving on.
And then you can imagine the world in which.
AI, ML, et cetera, replaces the lab work because you're able to utilize computation to figure out
what's going to happen if you're to do that stuff in the lab. Do you see that as being a
realistic future? Like, are we going to replace lab work? Or in, I mean, I could make a case,
I guess, based on what you just said, that's kind of the opposite. Because at least for a while,
in order to get sufficient training data where we know the properties of the things, like
there's a chicken or egg problem, and actually maybe you have to do a lot more lab work up front
to get the data to train the model that then replaces the lab work?
Yeah, I mean, that's a great question.
So I think I can't imagine a future where we completely eliminate lab work, because, first of all,
we don't know if quantum mechanical simulations will ever become good enough to correctly predict
experiments, you know, all the time.
But also, like, again, going back to the philosophical perspective, there's a certain
certain amount that we know as humans.
And maybe we can use computation to predict a bit outside of that circle, sphere.
But the farther we get from the sphere, the less good our approximations will be.
So this is that issue, right, where the things we know well, we can predict well.
But the things we really want to predict are the things we don't know well.
So from that perspective, experiments and the real universe can always be needed to always, you know,
validate our predictions to train the next model.
But yeah, I mean, so there's been these efforts.
So you might have heard about the TRI effort.
I think there are few other efforts coming up where exactly as you proposed,
they're trying to create a lot of experimental synthesis data so that you can kind of like
bootstrap and start using machine learning and computation.
Because currently part of the issue is we don't even have a good large data set
where you can train or validate your synthesis predictions on.
This is sort of an aside, and I don't know if you'll have an opinion on this, but one thing I'm curious about, so as I'm sure you're aware, there are lots of startups now emerging who are saying we're going to do AI for materials discovery. And we've got some kind of a black box machine. You input what you need out of a new material, and we're going to tell you what material you should use, obviously, with more in between there. Oftentimes I find, you know, so I live in climate tech world. And so there's a bunch of applications.
of where a novel material could have a big impact on climate change.
I find a lot of them, the startups at least, they start by saying,
we're going to find a metal organic framework for carbon capture.
That seems to be a very common example, weirdly common.
And I'm curious why that would be, and whether it tells you something about the types of problems
that these models actually can attack early on.
So this is actually something we thought about a lot.
You know, part of the issue is, like, let's say you're trying to discover a battery material.
Like, let's say you discover an amazing electrolyte, solid electrolyte.
One issue you might face is that by itself isn't a battery.
And then you have to put it in a battery and then, like, will it work with the cathode, the anode, the interface, with the manufacturing line?
So I'm wondering if one of the reasons MOFs have become quite popular for these startups is it might be like a standalone material as a product.
Like, I guess you could take the moth, put it in a room, and it will capture some amount of carbon.
And you don't have to worry as much about, like, the other parts.
But, you know, I mean, I think that's a good question, because you don't currently see moths as being, like, very commercially impactful.
So maybe they're also betting on the fact that in the future it might be.
Yeah, moffs are one of those categories, like, like, graphene, where you're like, you know, for a long time, it's been the Holy Grail of lots of different things.
And you can imagine a million different applications and people have tried maybe now.
the time is nigh for moths to really take off.
But I think that other point is actually a really interesting one.
Like, in battery world, nothing exists in a vacuum.
So you can't create a novel material and then be done with it.
You have to figure out not only the material,
but then it's interaction with all the other materials,
which are also in flux,
and that's part of what makes batteries so difficult.
So maybe that tells you that, like, the types of problems
that these models can attack early on are the ones that are self-contained.
If you solve that problem with that material,
that's all you really need
and you don't need to deal
with all this other interaction
and stuff like that.
Yeah, and that's exactly right.
And one other potential factor is,
and I see this often at Google,
when somebody outside of material science
gets interested in trying to contribute
to material science,
a common reason is they worry about the climate
and they want to help climate.
So it's less often that I see
a non-material scientist come to me
and say, how can I improve
the ionic conductivity in essence?
But more often I hear them say, how can I help carbon capture problem?
And this other reason, maybe the startups gain kind of like more interest and more people
are excited to work there because they're potentially going to contribute to carbon capture.
Are there any particular areas that you're most excited about, like domains or materials
requirements?
Like what do you think?
Where might we see?
You know, as you said, we haven't really yet proven a whole lot about the ability
of AI in these new methods to discover new materials?
Like, where might we?
Where are you most optimistic?
Yeah, so there are...
Okay, so different applications, I feel like, have different issues.
So maybe I can cover a few and say why some of them might be more promising.
So, you know, if you think about something like optical properties or electronic properties,
I think one of the limitations might be that DFT itself isn't as good at predicting electronic
properties as it is on predicting structural properties. So DFT tends to be better at predicting
the formation energy, like the kind of stability of a material, but not necessarily the band gap.
And the band gap is crucial for understanding the optical properties, like how the light and
the electrons will interact. So that's one reason, for example, if people are using the current
state of DFT, they might be less successful at discovering optical applications than something else.
Can you define DFT for folks who are not familiar?
Oh, yeah. So DFT stands for density functional theory, and it's been really, really impactful in material science.
So basically what happened is people were trying to figure out how to simulate the quantum mechanical aspects of material.
Because what's really interesting in material science, for me, is that the properties really depend on how atoms interact, and atoms interact at the quantum mechanical scale.
And, you know, there's many methods proposed over the century, but,
it seems like DFT has really taken on as like efficient enough, fast enough, but also
accurate enough sometimes as a simulation tool.
So now if you look at citations, I think Walter Cohn, who got the Nobel Prize for it,
has like an incredible number of citations because everyone uses DFT these days to try to
simulate materials.
Okay, so I interrupted you, but so DFT is a technique, essentially.
And so you're saying there's certain things that DFT is.
better at than others, what is that going to lead us to in terms of where we might use DFT
to make a significantly, globally significant discovery?
That's right.
So if you think about batteries, there are many aspects of batteries that seem like a better
fit to DFT.
So, for example, you'd like your battery materials to be stable, and you'd like them to,
for example, the electrolyte to conduct the ions.
Like this lithium-ion battery, lithium should go through it fast.
So for predictions of this type, DFT seems a bit better.
And there's probably not a surprise that the one example I gave you earlier was for a battery with DROC cell.
And a lot of DFT practitioners study batteries at some point.
So I have, for example, for my PhD, I studied silicon as an anode material.
So that's one example.
If you think about catalysis, I think there's a lot of excitement around catalysis because it's like a very important application.
But one issue maybe is that the surface, like heterogeneous catalysis, the surface is very messy.
and it's dynamic over its use.
So if you don't really know what's happening at the surface,
you might not be able to predict what's going to happen as a function of the structure.
So there's like one challenge with catalysis.
Superconductivity is very exciting, of course,
both from a scientific perspective and maybe from like a climate and technological perspective.
But superconductivity often involves very complex quantum mechanical interactions.
So it's, you know, yet to be seen if the air.
safety can be useful.
So yeah, I think every different vertical has these different issues, and it's not clear with
machine learning support, which one will actually be helpful.
Are we going to see a watershed moment in this space in the same sense that GPT3 was for LLMs?
Like, is there something like that that could or will happen, or is it going to be more steady
progress, perhaps faster than historically, but like more consistent as opposed to step function?
Yeah, so a very good, I think comparison point is alpha-fold.
I think when alpha-fold came out, people saw it as a watershed movement.
And I think part of what was good for alpha-fold is that there was this competition that people really cared about.
There was this problem people really cared about protein folding.
And doing well on that, and much better than previous methods kind of made it clear that it is a very useful tool.
And it's objective.
Like you could objectively measure whether you were better at it.
That's right.
In material science, I think one of the issues is,
the experimental data is actually quite noisy.
So, you know, this is something that you might hear often
that simulations and DFT isn't very accurate.
And that's true, but maybe one thing that people don't notice it
is the experimental uncertainty is actually usually
at the same level of computational error.
And the reason this is really bad is because this means that
even if you want to improve your simulations,
the experimental labels are noisier.
than where you want to get to.
One thing that is clear is for CASP to be such an impactful data set,
a lot of experimentalists have spent a lot of important effort
in trying to get kind of consistent and useful data.
And I think maybe because now machine learning really needs this high-precision, large data set,
there are these bigger efforts trying to create a CASP-like database,
but it's not there yet.
Well, I guess to wrap up then, I mean, I've asked you to talk a lot about the field.
Curious to hear what you're working on and what you're most excited about in terms of your work at DeepMind.
Yeah, so last year we published our paper, NOM, and that paper was mainly about seeing if machine learning can be used to discover material stable at zero Kelvin.
So ideally, we'd like to discover materials that are stable at room conditions, but that's quite a bit far from the,
kind of level of the field, especially back then.
And what we realized is even for zero Kelvin stability prediction,
which is like a simpler task,
there weren't that many predictions that are from DFT.
So there were about at most 48,000,
and only about 28,000 or so had come from computation,
and 20,000 had come from previous experiments.
So we saw that machine learning can really speed up this process.
Part of the reason is, you know,
you've seen with LLMs and with vision models
that the more training data you put into LLMs,
the better results you get.
And how much better your results are actually predictable.
It's kind of like a power law.
This comes back from a paper from Bidu research
from back in 2016, I think.
And it seems to apply to all kinds of deep learning,
including quantum mechanics and material science.
So we realize that as we make our model better and better,
its predictive ability improves to a point
that it can actually now discover crystals
that are stable as zero Kelvin.
So that's what we did last year.
And as we said in that paper,
one of our next goals is to predict
not just zero Kelvin stability,
but finite temperature stability.
And this is much harder
because of finite temperature,
there is entropy effects.
And we're interested in, you know,
finding not just materials that are stable,
but also exciting.
So like superconductors,
battery materials,
kind of like materials
that really will impact the technology.
And that's another thread
that we're going towards.
And finally,
one other thing that we really care about is
taking DFT and making
it more predictive. So DFT
has been around for decades, but
it's mainly been like a theoretically
developed tool. So the equations
that describe it are, you know, as
simple as a really good theoretical physicist
can model. But
these models can actually be a lot more
complicated because we have data and we have
machine learning now. So we'd also like to improve
DFT, which is something else we're working on.
All right, Josh, this was
a lot of fun. I
I'm still lost in half of this stuff,
but I feel like I have a better understanding
of the overall state of affairs,
which is really what I was hoping to get out of this.
So really appreciate it. Thanks for the time.
Awesome. Yeah, it was super fun. Thank you.
Doge Chubuk is a research scientist
at Google DeepMind focused on materials discovery.
This show is a production of Latitude Media.
You can head over to Latitudemedia.com
for links to today's topics.
Latitude is supported by Prelude Ventures.
Prelude-backed visionaries,
accelerating climate innovation that will reshape the global economy for the betterment of people
and planet. Learn more at preludeventures.com. This episode was produced by Daniel Waldorf,
mixing by Roy Campanella and Sean Marquan, theme song by Sean Markwan. I'm Shao Khan,
and this is Catalyst.
