Catalyst with Shayle Kann - Can AI revolutionize materials discovery?

Starting point is 00:00:01 Latitude Media, podcast at the frontier of climate technology. I'm Shail Khan, and this is Catalyst. There is a certain amount that we know as humans, and maybe we can use computation to predict a bit outside of that circle, sphere. But the farther we get from the sphere, the less good our approximations will be. Will materials discovery be the killer app for AI in climate tech, or is it a lot harder than we think it is? When utilities need flexible capacity they can count on, they turn to Energy Hub.

Starting point is 00:00:45 Energy Hub works with more than 170 utilities, coordinating over 2.5 million devices to manage 3.4 gigawatts of flexibility, built for the moments when utilities can't afford uncertainty. Energy Hub builds and operates virtual power plants that utilities actually stake their grid planning on, coordinating EVs, batteries, thermostats, and more through a single platform built for utility scale. predictive, verifiable, and designed to perform when it counts. Learn more at Energy Hub.com. Trillions of dollars are flowing into clean and critical infrastructure,

Starting point is 00:01:17 but those investments aren't driven by technology alone. They're shaped by markets, by policy, by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux, and host of a brand new podcast, Critical Capital. Each episode, I talk with people deploying capital, shaping policy and building the clean economy. tune in as we unpack how progress is actually made.

Starting point is 00:01:39 Listen to critical capital on Spotify, Apple, or wherever you get your podcasts. Catalyst is supported by Fish Tank PR, an award-winning PR firm focused on climate and energy tech, renewables, and sustainability. Fish Tank is known for generating prominent and effective media coverage for the brands they work with. If you want a PR partner that's thoughtful, shoots straight, and gets results, you'll like Fish Tank PR. To learn more about Fish Tank's approach, visit Fish, tankpr.com. That's F-I-S-C-H-Fish-Tankpr.com. I'm Shail Khan. I invest in revolutionary climate technologies and energy impact partners. Welcome. Well, as the AI boom of the past couple of years has taken off, the way that I've

Starting point is 00:02:25 thought about the intersection between AI and what I do, which is climate tech, is that they're kind of two distinct components. The first is the impact of the growth of AI, or really the growth of AI data centers and compute on energy and therefore on climate. That one we've talked about a bunch here. The other one, though, which we haven't talked about as much, are the actual applications of AI for climate tech. And to be honest, one of the reasons that we haven't talked about it so much is that I'm still kind of searching for what I think is a real tangible opportunity that would drive big impact. Of course, the world abounds with ways to use AI to do things more efficiently, but to really move the needle on gigatons of emissions, I think is trickier.

Starting point is 00:03:09 But here's one category I've been pretty curious about, which is AI for materials discovery. Undoubtedly, a big part of the technical challenge of getting to Net Zero is a materials challenge. And one of the areas that, at least on the surface, you can pretty easily imagine AI creating a step-function improvement, is in doing the complex and currently quite slow work of discovering new materials. So is this our AI climate killer app? Let's find out. For this one, I spoke to Doge Chubuk, who is a research scientist studying materials discovery at Google Deep Mind. Here's Doge.

Starting point is 00:03:48 Doge, welcome. Hi, Shail. Thanks for having me. I'm really excited to talk to you about AI for materials discovery. I want to start by talking pre-AI. So obviously, in the history of humanity, we've discovered many, many new materials. We've commercialized many of them. Just talk to me about before AI, just walk me through the process of new materials discovery broadly.

Starting point is 00:04:11 Yeah, that's a great question. And it goes really back far, right? So if you think about the invention of money, for example, I think one of the timelines for people talk about is when we invented gold or like gold money. And if you think about how that happened, it turns out a lot of the oars have gold and silver mixed, so they're alloys. And I think the big innovation there was when some humans found a way of extracting the gold and silver from each other. And once you had pure gold, that was used as money. You know, it's interesting to think about those times. But in more recent times, I feel like one thing that's very relevant to this conversation is that a lot of materials discovery has been by random trial and error.

Starting point is 00:04:57 And it's been very serendipitous. Actually, the more I look into this, the more I realize that almost all fields involve some kind of important strandipolis discovery. So one of the fun examples we often talk about is, you know, for light bulbs around 1905 or something, they were realizing that tungsten is a good material, is a filament. But tungsten wasn't ductile enough to be kind of like wrapped up as a coil. And then apparently by mistake, one time it was dropped. into a pool of liquid mercury,

Starting point is 00:05:30 and turns out when mercury and tungsten react, it becomes more ductile. So then came ductile tungsten. So we can talk more about this, but I think history is just full of examples like this. Like if you think about the inventional kind of lithium ion battery, I think one of the stories is at Exxon, they were looking at discovering superconductors,

Starting point is 00:05:54 and they had this reason to think that lithium-ion-intercalation could be interesting for studying superconductivity, but as they were interclating lithium, they realized it's actually really good at storing energy. So, and this is one of the reasons I'm really excited about trying to see if AI and simulations can be useful here, because a lot of the discoveries have just been randomly trying kind of relevant materials. Maybe another example I can give you is, you know, in 1949, I think, when Bardeen and his collaborator first invented the transistor, the solstay transistor.

Starting point is 00:06:31 And you can look at their diary, and you noticed that they tried so many different materials for all the different parts of it because they just didn't know what would work. So at the time, they were hoping silicon would work, but turns out silicon didn't work for them. So they ended up switching to germanium, and that worked. And then I think they had a problem with the glue, so they had to change the glue. They had a problem with the metal electrode in the device. So what I'm trying to say is that Bardeen was maybe one of the best Salisapis ever lived. But even he was just randomly trying materials to make this transistor work.

Starting point is 00:07:02 Yeah, it's funny. When you talk about the like accidentally dropping tungsten into a bed of, what is it, liquid mercury, it makes me think sometimes of like I have a two and a half year old son. I should just like put him in a chemistry lab, you know, with a bunch of materials in a bunch of different places and just give him enough time and eventually he's going to accidentally do something that's going to discover some amazing new material. Obviously, I'm a better father than that. But the transistor one is, I think, a good one to talk about because, okay, sure, in the arc

Starting point is 00:07:31 of human history, many of the important discoveries have been made purely accidentally. But certainly in the past few decades, I would presume we've developed a body of knowledge about the characteristics and properties of various materials. And so if you're trying to do a material discovery, solve a material discovery problem, I don't know, 10 years ago, probably you're not just doing totally random trial. an error, right? Like, what is the depth of our, what was the depth of our knowledge and our ability to iterate on different designs of materials and so on that went beyond the purely random, again, prior to AI? Yeah, that's a great question. So, and obviously, right, even the examples I was

Starting point is 00:08:12 giving, they were not purely random. Like, for example, Bardeen knew that the semiconductor would be something like silicon or germanium. Like, they both have, you. know, four electrons in their last shelf. Like, there was a lot of physical understanding. They knew about the surface state of silicon. So it's quite different than, I think, as you said, like a random kid going in, although your kid is probably very smart, a random kid just randomly trying stuff. But here comes, I think, a very interesting philosophical contradiction.

Starting point is 00:08:40 And I think this is true for science, but also for machine learning. The better you know a system, the more you can continue optimizing the system. But it doesn't necessarily mean that knowledge, will help you discover something different. And I think this is probably why a lot of important discoveries are serendipitous, because, like, let's say you're in a company and your company is really an expert on material A. So you're right, like you've developed so many important sets of expertise that you can really optimize A, but most likely those skills don't help you discover C, which is very different or even B.

Starting point is 00:09:19 And this, I think, is definitely an issue with machine learning. So, you know, in machine learning, we know that we do really well on the training set distribution, like on the kinds of things you trained on. And the farther you get from the training set distribution, the worse your predictions are. And this is also true for science, right? Like, the closer things are to the textbooks, the better our theories are at predicting them. And this is partly why I think material discovery has become so difficult in, the commercial space because

Starting point is 00:09:51 like if you think about plastics, we're still mostly using things that we discovered 70 years ago, 80 years ago, and we've gotten so good at, you know, manufacturing them, optimizing them. So now for someone to come up and say, oh, I discovered a completely different one, it's quite difficult.

Starting point is 00:10:08 And for this reason, yeah, it just lends itself to us optimizing non-materials and not necessarily discovering completely new ones that might be better. And it's probably also why a lot of the materials we're using today in many technologies are quite simple. Like if you think about transistor, it's just like pure silicon, right?

Starting point is 00:10:25 If you think about, I think in MR machines, the superconductors they're using are quite a bit more simple and older than, you know, the new corporates and stuff. So, yeah, it's quite common that I think we're having difficulty discovering new materials, even if we're pretty good at modeling some of the older materials. Okay, so to my layman's ear, I guess what I'm hearing is that what we have gotten pretty good at historically is taking a sort of incremental step in material discovery. We know a system, we know a category of material and so on. We can optimize the hell out of it. Maybe this is what we've done with plastics over 80 years. What we have had a harder and harder

Starting point is 00:11:07 time doing, in part because maybe we've discovered the low-hanging fruit, is finding entirely novel materials. And so that's maybe a good segue to talking about the new world of AI and whether and to what extent it has a role to play in helping to crack that code. Because the fundamental thing I wonder about is, okay, presumably the thing that makes it difficult to discover an entirely novel, you know, option C or whatever you called it before is that the possibility space is virtually endless. It's just a huge number of possibilities of things that you could you could do. And so the question is, is our ability to do this kind of computation that AI is introducing? Does that make that easier in the sense that you can just run a million combinations if theoretically you can simulate

Starting point is 00:11:58 the properties of materials? Or does it actually make it just as hard for exactly the reason you describe, which is, you know, we have a corpus of data we're going to train these models on, but that corpus of data is grounded in what we already know. And so definitionally, it's going to be hard for it to find the next thing. Yeah, that's a great question. And, you know, it's not just AI. So there are two things that are happening right in the recent decades. So simulations are becoming more and more commonplace.

Starting point is 00:12:24 And there's probably very correlated with why AI is becoming very commonplace because our computing infrastructure is growing and computer is getting cheaper. So now we're getting to a point where we're better at training neural networks. We're better at using them. and we're better at simulating atoms and materials, and we're doing it for cheaper. But I think exactly as you said, both simulations like density functional theory

Starting point is 00:12:48 and machine learning have kind of the same bias as the regular pre-simulation science, theoretical science, and maybe it's even worse, because humans as biased as they might be when doing science, they also clearly have this ability to extrapolate. Like humans have found ways of discovering things that were beyond their theories. There's been like these paradigm shifts. And AI hasn't really done this yet.

Starting point is 00:13:17 Even today's best AI models seem to be really good at kind of doing the textbook stuff, you know, like high school, college. But then when you think about being more creative and, you know, trying to shift the paradigm, it's been more difficult. Okay, so that's the pessimistic part. But I think the optimistic part is even the less creative parts of science, actually could really benefit from becoming more efficient. So let me give you an example on that. So if you think about high temperature superconductivity, as you know, this is different than conventional superconductivity,

Starting point is 00:13:48 but it can be much higher transition temperature. And we still don't know as physicists where high temperature superconductivity comes from. It's like a crazy thing. You know, it's been around for 50 years, 40 years. We don't know why it happens. But we can still optimize it. So the first high-temperature supercondu

Starting point is 00:14:07 that was discovered was it's called L-B-C-O. And that's important because it's like L-E-L-is-Lanthinum, B is barium, and then copper-oxide is the COO. So I think when Mueller and his collaborator first discovered this at IBM, I think people thought that Mueller was crazy for considering coup rates for superconductors because all other superconductors were B-CS and they were different. But if you look at Mueller's Nobel Prize speed,

Starting point is 00:14:34 He actually talks about how he used the old understanding, the conventional superconductivity, to be able to consider Cooperate as an example. But we now know that it's actually not a great transfer because Cooperate is actually quite different than conventional superconductors. Okay. So LBCO turns out to be quite interesting, but not good enough. So then what people did is, even though they don't know why it's a superconductor, they started replacing elements with similar elements.

Starting point is 00:15:00 And then the first one really made it, and the reason I'm saying it made it is because, because it was at a temperature above the liquid nitrogen. And there was the YBCO. So what you notice is just the lantanum was replaced with an element. And so humans here were able to find a good enough superconductor, even though they didn't understand why it was a superconductor. Okay. So I think what computation and machine learning can give us here is,

Starting point is 00:15:27 even if they can't do the paradigm shift and go from Kuprae to a completely different superconductor, they can at least help us do this optimization, exploitation part to go from LBCO to YBCO faster. And soon when we start talking about our own work, you'll see clear examples of this. So would I be right in that example to think, okay, we have some high-temperature superconductors. And one of the things that as a non-physicist has always bugged me about that terminology, like high-temperature superconductors still very cold, need to be extremely cold, right?

Starting point is 00:15:59 The Holy Grail, of course, is a room-temperature superconductor, which we have not yet discovered. would it be right to think, okay, maybe the type of thing that computational machine learning might be good at is optimizing the high temperature, optimizing and tweaking the recipes that we've got for high temperature superconductors, probably less likely at least today to discover the room temperature superconductor because that probably requires some completely orthogonal type of thinking? Yeah, I think that's exactly right. So if you look at last year, the last few years, the most promising discoveries on the computational side, they've been looking at hydrides, which might have conventional superconductivity, and at high enough pressure, they might have high enough temperature. So there's been some really good coming out of Picard Group and a few other groups, where they use simulations to study these.

Starting point is 00:16:51 And the hope exactly, like you said, is maybe take a known kind of superconductivity and optimize it to get as close room temperature as possible. Maybe that's a good segue to giving some examples of the types of things that in recent years, since the boom in AI, like, what has been proven to work so far? What have we discovered collectively using AI that perhaps either we wouldn't have otherwise or would have taken a whole lot longer and more work and effort? Like, tangibly, what have we shown? You know, I think not a whole lot. And, you know, I would actually even make the question a bit larger and ask, like, what has simulation given us that was not possible? Because, you know, we have to realize that simulations have started becoming a thing in material science since early 2000s or even late 1990s. And so it's been several decades at this point.

Starting point is 00:17:45 And it's important to ask what has that given us as an actual material in product? that we didn't have before. And I think that's the crucial question. So there's one example that often gets talked about. I think one of the cathode materials maybe from Seder Group and Materials project, I think is in Dura cell batteries. So there's one example that's known. This is like a bit out of material science,

Starting point is 00:18:11 but in topological insulators, the first three-dimensional topological insulators were proposed in a DFT paper. but otherwise, yeah, so it's actually not that great. I mean, materials discovery in general is quite hard, right, for the reasons we mentioned. So it's usually just like experimentalist randomly trying stuff. But the good news is, I think, the simulations and machine learning has been making progress, not yet in putting materials in devices, products, but at least in making useful predictions. virtual power plants are becoming a reliable way for utilities to manage capacity.

Starting point is 00:18:53 But enrolling devices is just the start. What really matters is confidence, knowing those resources will perform when dispatched, and being able to prove it, from the control room to the living room. Energy Hub's platform handles the full picture, from near real-time forecasting, locational dispatch, and the kind of rigorous verification that holds up when regulators, grid operators, or leadership, ask, did it deliver? easy enrollment creates momentum, proven performance builds trust. That's why more than 170 utilities rely on Energy Hub to manage over 2.5 million devices

Starting point is 00:19:25 delivering 3.4 gigawatts of flexible capacity. See what that looks like at energyhub.com. We're living through a profound economic shift, and energy sits at the center of all of it. Trillions of dollars are flowing into power plants, transmission lines, battery factories, data centers, but the future of energy isn't shaped by technology alone. It's shaped by markets, by policy, by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux, the capital platform for the clean economy. Join me for my brand new show, Critical Capital, as I talk with people deploying capital, shaping policy and building projects. Together, we unpack how risk is priced, how incentives are structured, and how progress is actually made.

Starting point is 00:20:10 Listen to Critical Capital on Spotify, Apple, or wherever you get your podcasts. Are you tired of overpaying for big-name PR firms, but not really knowing what they're delivering? Is your comms team wasting time reviewing lengthy messaging briefs and decks, instead of engaging journalists or producing content? Are you wondering why your competitors are getting press and you aren't? Fish Tank PR is an award-winning climate and energy tech, renewables, and sustainability-focused PR firm dedicated to elevating the work of both early stage and established companies. Whether you need to position yourself as a thought leader in between project announcements or translate complex ideas and technologies into tangible, compelling stories that resonate with the media, Fish Tank can help. Check out fish tankpr.com. That's F-I-S-C-H-Fish-Tankpr.com.

Starting point is 00:21:00 We talked about this briefly, but I guess I want to understand it a little bit better. You know, folks are going to be familiar with AI in the form of LLMs and things like that, all the degenerative AI, world. And one of the benefits that that world has is that the corpus of training data for those models is enormous. You're like training LLMs on the internet, basically. How does it look when you're trying to do simulations for the purpose of material science? Like how big is the data upon which you can train? Is it big enough? And do we need to be generating an enormous amount of synthetic data in order to sufficiently train these models? Like, is that a real constraint here? Yeah, that's a great question. So, you know, if you look at ICSD, which is the inorganic crystal structure database,

Starting point is 00:21:46 there are about a bit more than 200,000 inorganic crystals there. So they kind of tells you that it's quite a bit smaller than the Internet scale. One good news for us is, as you said, we can simulate data, and the simulations come from our physical approximations of quantum mechanics. So they tend to be somewhat informative. And with density functional theory simulations, we can do a lot more than 200,000. So, you know, if you look at our known paper, we had results for several million training points, and people have been actually pushing that. So now there are many different groups that have results for like 50 million points. So that's one thing. But then the question is how many experimental data points is worth how many computational data points.

Starting point is 00:22:32 And but this is actually not that different than the internet scale. So when LLMs are trained on the internet, the data is. is not very high quality, it's just sentences. And they aren't necessarily very good labels for them. The sentences could be written on the internet. It's not very high quality. But as you mentioned, what can be quite effective is you pre-trained on the internet and then you fine-tune on specific tasks.

Starting point is 00:22:56 And that might be pretty similar to us. Like, if we end up having these hundreds or millions of points from computation and then a few points, like 100,000 points from experiment, maybe then we can get some good results. Part of the big issues is, so 200,000 crystals on ICSD I mentioned, but for many of them, we actually don't know the properties. So, like, what's their band gap? Was their electronic conductivity? And that even becomes a smaller set.

Starting point is 00:23:22 Like, then you may have like 1,000, 2,000 data points. And it's a real problem, I think, yeah. Yeah, that really drives home why this is challenging. You have 1,000 to 2,000 data points where you actually understand the properties of a thing that you're training your data on. And that seems those numbers, even I know, that's tiny. I guess the other question, I think people can imagine, right, like the sort of world of material discovery, particularly the way you describe it historically, right, when this stuff has happened semi-accidentally, is a lot of, like, physical trial and error.

Starting point is 00:23:54 You're like doing something in a lab. People are doing something in a lab and seeing what happens and then measuring those results and then inferring something and moving on. And then you can imagine the world in which. AI, ML, et cetera, replaces the lab work because you're able to utilize computation to figure out what's going to happen if you're to do that stuff in the lab. Do you see that as being a realistic future? Like, are we going to replace lab work? Or in, I mean, I could make a case, I guess, based on what you just said, that's kind of the opposite. Because at least for a while,

Starting point is 00:24:28 in order to get sufficient training data where we know the properties of the things, like there's a chicken or egg problem, and actually maybe you have to do a lot more lab work up front to get the data to train the model that then replaces the lab work? Yeah, I mean, that's a great question. So I think I can't imagine a future where we completely eliminate lab work, because, first of all, we don't know if quantum mechanical simulations will ever become good enough to correctly predict experiments, you know, all the time. But also, like, again, going back to the philosophical perspective, there's a certain

Starting point is 00:25:03 certain amount that we know as humans. And maybe we can use computation to predict a bit outside of that circle, sphere. But the farther we get from the sphere, the less good our approximations will be. So this is that issue, right, where the things we know well, we can predict well. But the things we really want to predict are the things we don't know well. So from that perspective, experiments and the real universe can always be needed to always, you know, validate our predictions to train the next model. But yeah, I mean, so there's been these efforts.

Starting point is 00:25:39 So you might have heard about the TRI effort. I think there are few other efforts coming up where exactly as you proposed, they're trying to create a lot of experimental synthesis data so that you can kind of like bootstrap and start using machine learning and computation. Because currently part of the issue is we don't even have a good large data set where you can train or validate your synthesis predictions on. This is sort of an aside, and I don't know if you'll have an opinion on this, but one thing I'm curious about, so as I'm sure you're aware, there are lots of startups now emerging who are saying we're going to do AI for materials discovery. And we've got some kind of a black box machine. You input what you need out of a new material, and we're going to tell you what material you should use, obviously, with more in between there. Oftentimes I find, you know, so I live in climate tech world. And so there's a bunch of applications. of where a novel material could have a big impact on climate change.

Starting point is 00:26:35 I find a lot of them, the startups at least, they start by saying, we're going to find a metal organic framework for carbon capture. That seems to be a very common example, weirdly common. And I'm curious why that would be, and whether it tells you something about the types of problems that these models actually can attack early on. So this is actually something we thought about a lot. You know, part of the issue is, like, let's say you're trying to discover a battery material. Like, let's say you discover an amazing electrolyte, solid electrolyte.

Starting point is 00:27:09 One issue you might face is that by itself isn't a battery. And then you have to put it in a battery and then, like, will it work with the cathode, the anode, the interface, with the manufacturing line? So I'm wondering if one of the reasons MOFs have become quite popular for these startups is it might be like a standalone material as a product. Like, I guess you could take the moth, put it in a room, and it will capture some amount of carbon. And you don't have to worry as much about, like, the other parts. But, you know, I mean, I think that's a good question, because you don't currently see moths as being, like, very commercially impactful. So maybe they're also betting on the fact that in the future it might be. Yeah, moffs are one of those categories, like, like, graphene, where you're like, you know, for a long time, it's been the Holy Grail of lots of different things.

Starting point is 00:27:55 And you can imagine a million different applications and people have tried maybe now. the time is nigh for moths to really take off. But I think that other point is actually a really interesting one. Like, in battery world, nothing exists in a vacuum. So you can't create a novel material and then be done with it. You have to figure out not only the material, but then it's interaction with all the other materials, which are also in flux,

Starting point is 00:28:17 and that's part of what makes batteries so difficult. So maybe that tells you that, like, the types of problems that these models can attack early on are the ones that are self-contained. If you solve that problem with that material, that's all you really need and you don't need to deal with all this other interaction and stuff like that.

Starting point is 00:28:34 Yeah, and that's exactly right. And one other potential factor is, and I see this often at Google, when somebody outside of material science gets interested in trying to contribute to material science, a common reason is they worry about the climate and they want to help climate.

Starting point is 00:28:52 So it's less often that I see a non-material scientist come to me and say, how can I improve the ionic conductivity in essence? But more often I hear them say, how can I help carbon capture problem? And this other reason, maybe the startups gain kind of like more interest and more people are excited to work there because they're potentially going to contribute to carbon capture. Are there any particular areas that you're most excited about, like domains or materials

Starting point is 00:29:18 requirements? Like what do you think? Where might we see? You know, as you said, we haven't really yet proven a whole lot about the ability of AI in these new methods to discover new materials? Like, where might we? Where are you most optimistic? Yeah, so there are...

Starting point is 00:29:36 Okay, so different applications, I feel like, have different issues. So maybe I can cover a few and say why some of them might be more promising. So, you know, if you think about something like optical properties or electronic properties, I think one of the limitations might be that DFT itself isn't as good at predicting electronic properties as it is on predicting structural properties. So DFT tends to be better at predicting the formation energy, like the kind of stability of a material, but not necessarily the band gap. And the band gap is crucial for understanding the optical properties, like how the light and the electrons will interact. So that's one reason, for example, if people are using the current

Starting point is 00:30:19 state of DFT, they might be less successful at discovering optical applications than something else. Can you define DFT for folks who are not familiar? Oh, yeah. So DFT stands for density functional theory, and it's been really, really impactful in material science. So basically what happened is people were trying to figure out how to simulate the quantum mechanical aspects of material. Because what's really interesting in material science, for me, is that the properties really depend on how atoms interact, and atoms interact at the quantum mechanical scale. And, you know, there's many methods proposed over the century, but, it seems like DFT has really taken on as like efficient enough, fast enough, but also accurate enough sometimes as a simulation tool.

Starting point is 00:31:05 So now if you look at citations, I think Walter Cohn, who got the Nobel Prize for it, has like an incredible number of citations because everyone uses DFT these days to try to simulate materials. Okay, so I interrupted you, but so DFT is a technique, essentially. And so you're saying there's certain things that DFT is. better at than others, what is that going to lead us to in terms of where we might use DFT to make a significantly, globally significant discovery? That's right.

Starting point is 00:31:34 So if you think about batteries, there are many aspects of batteries that seem like a better fit to DFT. So, for example, you'd like your battery materials to be stable, and you'd like them to, for example, the electrolyte to conduct the ions. Like this lithium-ion battery, lithium should go through it fast. So for predictions of this type, DFT seems a bit better. And there's probably not a surprise that the one example I gave you earlier was for a battery with DROC cell. And a lot of DFT practitioners study batteries at some point.

Starting point is 00:32:09 So I have, for example, for my PhD, I studied silicon as an anode material. So that's one example. If you think about catalysis, I think there's a lot of excitement around catalysis because it's like a very important application. But one issue maybe is that the surface, like heterogeneous catalysis, the surface is very messy. and it's dynamic over its use. So if you don't really know what's happening at the surface, you might not be able to predict what's going to happen as a function of the structure. So there's like one challenge with catalysis.

Starting point is 00:32:41 Superconductivity is very exciting, of course, both from a scientific perspective and maybe from like a climate and technological perspective. But superconductivity often involves very complex quantum mechanical interactions. So it's, you know, yet to be seen if the air. safety can be useful. So yeah, I think every different vertical has these different issues, and it's not clear with machine learning support, which one will actually be helpful. Are we going to see a watershed moment in this space in the same sense that GPT3 was for LLMs?

Starting point is 00:33:10 Like, is there something like that that could or will happen, or is it going to be more steady progress, perhaps faster than historically, but like more consistent as opposed to step function? Yeah, so a very good, I think comparison point is alpha-fold. I think when alpha-fold came out, people saw it as a watershed movement. And I think part of what was good for alpha-fold is that there was this competition that people really cared about. There was this problem people really cared about protein folding. And doing well on that, and much better than previous methods kind of made it clear that it is a very useful tool. And it's objective.

Starting point is 00:33:48 Like you could objectively measure whether you were better at it. That's right. In material science, I think one of the issues is, the experimental data is actually quite noisy. So, you know, this is something that you might hear often that simulations and DFT isn't very accurate. And that's true, but maybe one thing that people don't notice it is the experimental uncertainty is actually usually

Starting point is 00:34:12 at the same level of computational error. And the reason this is really bad is because this means that even if you want to improve your simulations, the experimental labels are noisier. than where you want to get to. One thing that is clear is for CASP to be such an impactful data set, a lot of experimentalists have spent a lot of important effort in trying to get kind of consistent and useful data.

Starting point is 00:34:43 And I think maybe because now machine learning really needs this high-precision, large data set, there are these bigger efforts trying to create a CASP-like database, but it's not there yet. Well, I guess to wrap up then, I mean, I've asked you to talk a lot about the field. Curious to hear what you're working on and what you're most excited about in terms of your work at DeepMind. Yeah, so last year we published our paper, NOM, and that paper was mainly about seeing if machine learning can be used to discover material stable at zero Kelvin. So ideally, we'd like to discover materials that are stable at room conditions, but that's quite a bit far from the, kind of level of the field, especially back then.

Starting point is 00:35:27 And what we realized is even for zero Kelvin stability prediction, which is like a simpler task, there weren't that many predictions that are from DFT. So there were about at most 48,000, and only about 28,000 or so had come from computation, and 20,000 had come from previous experiments. So we saw that machine learning can really speed up this process. Part of the reason is, you know,

Starting point is 00:35:51 you've seen with LLMs and with vision models that the more training data you put into LLMs, the better results you get. And how much better your results are actually predictable. It's kind of like a power law. This comes back from a paper from Bidu research from back in 2016, I think. And it seems to apply to all kinds of deep learning,

Starting point is 00:36:11 including quantum mechanics and material science. So we realize that as we make our model better and better, its predictive ability improves to a point that it can actually now discover crystals that are stable as zero Kelvin. So that's what we did last year. And as we said in that paper, one of our next goals is to predict

Starting point is 00:36:30 not just zero Kelvin stability, but finite temperature stability. And this is much harder because of finite temperature, there is entropy effects. And we're interested in, you know, finding not just materials that are stable, but also exciting.

Starting point is 00:36:42 So like superconductors, battery materials, kind of like materials that really will impact the technology. And that's another thread that we're going towards. And finally, one other thing that we really care about is

Starting point is 00:36:54 taking DFT and making it more predictive. So DFT has been around for decades, but it's mainly been like a theoretically developed tool. So the equations that describe it are, you know, as simple as a really good theoretical physicist can model. But

Starting point is 00:37:09 these models can actually be a lot more complicated because we have data and we have machine learning now. So we'd also like to improve DFT, which is something else we're working on. All right, Josh, this was a lot of fun. I I'm still lost in half of this stuff, but I feel like I have a better understanding

Starting point is 00:37:25 of the overall state of affairs, which is really what I was hoping to get out of this. So really appreciate it. Thanks for the time. Awesome. Yeah, it was super fun. Thank you. Doge Chubuk is a research scientist at Google DeepMind focused on materials discovery. This show is a production of Latitude Media. You can head over to Latitudemedia.com

Starting point is 00:37:45 for links to today's topics. Latitude is supported by Prelude Ventures. Prelude-backed visionaries, accelerating climate innovation that will reshape the global economy for the betterment of people and planet. Learn more at preludeventures.com. This episode was produced by Daniel Waldorf, mixing by Roy Campanella and Sean Marquan, theme song by Sean Markwan. I'm Shao Khan, and this is Catalyst.

Catalyst with Shayle Kann - Can AI revolutionize materials discovery?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.