Catalyst with Shayle Kann - Inside a $300 million bet on AI for physical R&D

Starting point is 00:00:02 Latitude Media, covering the new frontiers of the energy transition. I'm Shail Khan, and this is Catalyst. I have to say there's a difference between winning gold medals in Math Olympiads and scientific discovery. Like, you can practice for Math Olympias by studying previous years' problems. You can't really practice how to discover the next big theory, but they were getting better at reasoning on complex. Flakes problems. Coming up, can AI discover a room temperature superconductor? Volume 2. When utilities need flexible capacity they can count on, they turn to Energy Hub. Energy Hub works

Starting point is 00:00:54 with more than 170 utilities, coordinating over 2.5 million devices to manage 3.4 gigawatts of flexibility, built for the moments when utilities can't afford uncertainty. Energy Hub builds and operates virtual power plants that utilities actually stake their grid planning on, coordinating EVs, batteries, thermostats, and more through a single platform built for utility scale. Predictive, verifiable, and designed to perform when it counts. Learn more at energyhub.com. Trillions of dollars are flowing into clean and critical infrastructure, but those investments aren't driven by technology alone. They're shaped by markets, by policy, by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux, and host of a brand

Starting point is 00:01:36 new podcast, Critical Capital. Each episode, I talk with people deploying capital, shaping policy, and building the clean economy. Tune in as we unpack how progress is actually made. Listen to Critical Capital on Spotify, Apple, or wherever you get your podcasts. Catalyst is supported by Fishtank PR, an award-winning PR firm focused on climate and energy tech, renewables, and sustainability. FishTink is known for generating prominent and effective media coverage for the brands they work with. If you want a PR partner that's thoughtful, shoots straight, and gets results, you'll like Fish Tank PR. To learn more about Fish Tank's approach, visit fish tankpr.com.

Starting point is 00:02:15 That's F-I-S-C-H-Fish-Tankpr.com. I'm Shale Khan. I lead the early-stage venture strategy and energy impact partners. Welcome. So a year ago, a little over a year ago, I had Doge Chubuk on this podcast to talk about using AI for materials discovery, which has all sorts of. interesting applications in the spaces that we talk about here. At that time, Doge had been leading efforts in that area for Google DeepMind for some time. And I thought of him as being both

Starting point is 00:02:47 very knowledgeable in the space, obviously, also pretty sober about it. Fast forward a year, Doge left Google DeepMind earlier this year, and along with Liam Fettis, who was one of the co-creators of ChatsyPT, started a company called Periodic Labs, which raised, wait for it, a $300 million Seed Round, led by Andrescent Horowitz. Periodic is doing AI for materials discovery. And not just that, also physics and chemistry. And they're also very much hardware in the loop. The way I like to frame it is that they're building two kinds of frontier lab at once. There's a frontier AI lab and a frontier scientific lab, the type of lab that we used to talk about. And then they're trying to make those two things work together to make breakthrough discoveries. Notably, one thing we talked about last time was

Starting point is 00:03:33 how the AI materials discovery companies at the time tended to start by going after, often discovery of something like metal organic frameworks or moffs for carbon capture, which I think of as less of a breakthrough opportunity, really, from a global scale, whereas the big, perhaps biggest breakthrough to prove, would be the discovery of a room temperature superconductor. Well, periodic makes no promises, but they're very publicly working on high temperature and maybe room temperature superconductors. Based on that last conversation, to be honest, I wouldn't have predicted this.

Starting point is 00:04:08 So it was time to have Doge back on and hear what changed and how. Here's Doge. Doge, welcome back. It's great to be back. It's great to see you. A lot has changed since the last time we talked. So I looked back. So the last conversation that we had was just over a year ago.

Starting point is 00:04:25 It was September 2024. And I was having you explain to me the wild world of AI for materials discovery in particular. and the work that you'd been doing at Google DeepMind, but also just like the broader landscape. And I'll tell you my takeaway from that conversation, which you could tell me if I had the wrong takeaway at the time, but my takeaway was promising field, pretty unclear if and when this new wave of LLMs and all the reinforcement learning,

Starting point is 00:04:56 all the things that have shown up in the past few years, pretty unclear if and when that would generate a real meaningful, breakthrough discovery in materials. And we talked through a bunch of the reasons why it's challenging training data, maybe chief amongst them. But I came away with a pretty, I think, like a sober view of the path there. Okay, so fast forward a year and you left Google Deep Mine and started a company to do that amongst other things.

Starting point is 00:05:24 So I guess the first question that I have for you is, what changed in the last 12 months to give you conviction that like now is the time? Great question. So when we talked, I was doing research in the field of computational material science and machine learning. You know, specifically, we were using graph neural networks. We were using density functional theory and we were trying to discover materials. One thing that changed since our discussion was the LLMs have improved even further. So at the time, I wasn't using LMs much at all.

Starting point is 00:06:00 but I think right around when we were talking, the 01 came out, right? The reasoning models started showing up. And that was a huge update for me because you might remember that one of my big concerns is machine learning works best on the training set distribution. But in science and technology, we almost only care about auto domain generalization. So what O1 showed is if you spend test time compute, you can get better results. So that was very exciting to me because there was one way of investing resources that was beyond the training set. So, okay, if I can try to translate that into layperson terms, the reasoning model, like Open AI is a 1 model, introduced, unlocked a door, kind of,

Starting point is 00:06:54 that maybe allows you to break this challenge of the limited training data set that you have in materials discovery. That was what we spent a lot of time talking about a year ago, was like, you know, you can compare the corpus of data that an LLM trains on to-do language, which is enormous. It trains on the internet, basically, versus the corpus of data that you were dealing with

Starting point is 00:07:18 in trying to discover novel materials, and it was thousands of data points, not tens of billions or whatever. And so that presumably hasn't changed, at least yet. But you're saying that the reasoning models have gotten good enough that they are able to sort of get around that challenge via reasoning or possibly generating their own synthetic data. Like, what is it that allows them to break that? So I'm not saying that they're good enough already, but that was one step in the positive direction. And another thing, you know, we've seen is they've gotten really good at math.

Starting point is 00:07:53 So, you know, since last time we talked, they started winning gold medals in math Olympiads. And they're doing similarly well on the coding, really well on physics Olympiads. And I have to say there's a difference between those things and scientific discovery. Like, you can practice for math Olympias by studying previous years' problems. You can't really practice. how to discover the next big theory. But it did show you that they were getting better

Starting point is 00:08:26 at reasoning on complex problems. So then what else do we need? So I think the biggest thing we need is to have our own lab because once you have a very intelligent reasoning LLM, you still can't discover things unless you make trials, right? Just like humans, the LLMs will be wrong often when they try to predict things outside of their training set. But you try many things, and then at some point, you get a really cool discovery.

Starting point is 00:08:55 And this is, you know, as we talked about history last time, this is quite common in solid state chemistry, solstate physics, where a lot of discoveries happen somewhat by accident, but of course with a lot of background, understanding of the physical system, and a lot of trial and error. So, okay, so this is what you're doing at periodic, right? You're sort of combining the digital domain with the physical domain. You have a lab in both sense.

Starting point is 00:09:20 It's a frontier lab in an LLM sense and a frontier lab in a laboratory sense in the traditional sense of the word. And you're sort of merging the two. I'm curious in practice, like, how you imagine that feedback loop working. So is it a traditional, you develop a theory, you run an experiment, you generate data from that experiment, but in this case, you feed the experiment back into your customized LLM. as an additional set of training data, and then that's the way that the loop works, or is it more complicated than that? Yeah, exactly.

Starting point is 00:09:51 I mean, it's pretty simple, I think, as you said. So the LLM can propose, for example, synthesis recipes, or it can propose simulations to run. And because the LLMs are pretty good at tool use, it can actually do it itself, and then you get some results back. So the results from experiment could be some characterization data, results from the simulation can be some, you know,

Starting point is 00:10:11 trace or some simulation you did. And now the LLM can be some. can go through it with the context of its previous training, maybe the context of relevant papers, textbooks, but also now the results that it just got that no one else have ever seen. And then now it can kind of tweak the experiment, tweak the simulation for the next step. Right. You said one thing in there that I guess is worth pointing out. Like you're trying to automate this as much as possible. The LLM might run the experiment. Yeah, absolutely. I mean, one of the other advances that's been happening recently that I think

Starting point is 00:10:42 made periodic possible is the high throughput experiments have been getting better. You know, there are many examples of this now across academia industry where these robots that became quite commoditized actually, just mixing powders or mixing liquids and then sending it to characterization. I think one thing that isn't as advanced right now, but we feel like we can do pretty soon, is automated characterization itself. So you mix powders, you put it in some characterization tool, you get the result out. What is the actual output?

Starting point is 00:11:19 I think this is pretty difficult right now for AI tools, but we feel like we can improve that pretty quickly. I want to talk about one specific application that I know you're going after that we actually did talk about a year ago. But also, I want to talk about it as a way to see whether one of the other things you described as a fundamental challenge. is this changed, which was, as I understood it, AI being pretty good at the next incremental discovery, but not necessarily good at the breakthrough discoveries you said through history. Usually that's done accidentally, or often it's done accidentally, because it's not, you can't, like, reason your way to, to this massive breakthrough discovery. So let's talk about superconductive materials.

Starting point is 00:12:02 So, right, we talked about this last time where all these companies that existed at the time that were doing AI for materials discovery, we're starting on things like discovering a novel moth for carbon capture or whatever. But we said, like, the thing that would be the real breakthrough, the big thing, would be a room temperature superconductor. And you guys have since launching been very public about, like, superconductive materials, maybe not room temperature. I'm curious for you to tell me how likely you think that is, but, like, high temperature

Starting point is 00:12:28 superconductors is on the roadmap. So why, first of all? And then second of all, like this question of, do you think you have a, a path to the truly breakthrough, what would the path be to a truly breakthrough discovery as opposed to finding something that is a material that is superconductive at a ever so slightly higher temperature than the best that we've got today? Yeah. So to answer your first question, I think it's still true that it would be difficult to just reason your way into a much better superconductor. I actually would guess that there's a law out there that we haven't discovered yet that says that

Starting point is 00:13:08 you can't just look at your training set that's different than what you're trying to discover and just predict it. You know, there's been rules that we discovered from 1800s on where, like, you connect energy to work. So thermodynamics is the first example. There's more recently land hours limit, which shows that you have to spend a certain amount of energy to delete information, which can be used to describe Maxwell's demon contradiction. I bet there's something similar for how hard it is, to discover things, it's outside of your training domain. Okay, so I don't think that's been fixed since last time we talked.

Starting point is 00:13:44 But because we have a lab internally, we can just try things and try them at large scale and often, and hopefully as intelligent as possible. So even though we won't reason our way into a much better superconductor, we'll be able to push our trials in the direction that's most promising or, you know, most promising for us given our training set at the time. So, yeah, I think that hasn't changed. And I think there's reason to be hopeful because, you know, in the big scheme of things, it's a pretty new field.

Starting point is 00:14:17 I mean, if you look at coup rates, they were from 1985. There's been a lot of advances, you know, more recently. So, yeah, we're very excited. One reason we chose superconductivity is if you find a good superconductor, that's impactful immediately, right? Like, last time we talked about how long it can take to translate materials, improvements, into products. One nice thing about superconductors is if somebody discovers a room temperature

Starting point is 00:14:42 superconductor today, even before it makes it into a product, it is huge impact, right? Like, first of all, it changes how we think about the universe. Second, it helps us do physics experiments that wasn't possible before. And, you know, whenever you think about like a sci-fi-ish technology, like quantum computing, fusion, superconductors come up because it's kind of what we need. It's kind of like one of the most exciting. macro scale quantum phenomena. So that's one reason we picked it because it kind of is exciting as soon as we succeed.

Starting point is 00:15:16 The other reason is it requires all sorts of improvements to get there. You know, when we think about open AI and deep mind, I remember back in 2016, people used to make fun of these institutions for prioritizing AGI so much because they were saying we're going to do AGI. But what happened is they developed so many other tools on the way to AGI. GI, they were useful in themselves. But today they have these LLMs that, you know, you might consider AI or like something really impressive.

Starting point is 00:15:44 Superconductivity is a bit like that. To discover an exciting superconductor, we probably have to develop so many capabilities on the way there that's by themselves very useful. For example, automated synthesis, automatic characterization, being able to model or predict high temperature superconductivity because we don't have a theory for it yet. So it's kind of like a nice goal that you. unites people and requires a lot of other useful things to happen on the way. And it's one of those things that physicists find really exciting.

Starting point is 00:16:16 So the physicists in our company are really excited by this mission, but also computer scientists find it very exciting. It's just one of those things that I think both sides can really appreciate. So those are some of the reasons that we picked it. Virtual power plants are becoming a reliable way for utilities to manage capacity. But enrolling devices is just the start. What really matters is confidence, knowing those resources will perform when dispatched and being able to prove it, from the control room to the living room.

Starting point is 00:16:46 Energy Hub's platform handles the full picture, from near real-time forecasting, locational dispatch, and the kind of rigorous verification that holds up when regulators, grid operators, or leadership ask, did it deliver? Easy enrollment creates momentum, proven performance builds trust. That's why more than 170 utilities rely on Energy Hub to manage over 2.5 million devices, delivering 3.4 gigawatts of flexible capacity. See what that looks like at energyhub.com. We're living through a profound economic shift, and energy sits at the center of all of it. Trillions of dollars are flowing into power plants, transmission lines, battery factories, data centers,

Starting point is 00:17:26 but the future of energy isn't shaped by technology alone. It's shaped by markets, by policy, by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux, the capital platform for the clean economy. Join me for my brand new show, Critical Capital, as I talk with people deploying capital, shaping policy and building projects. Together, we unpack how risk is priced, how incentives are structured,

Starting point is 00:17:51 and how progress is actually made. Listen to Critical Capital on Spotify, Apple, or wherever you get your podcasts. Are you tired of overpaying for big-name PR firms, but not really knowing what they're delivering? Is your comms team wasting time reviewing lengthy messaging briefs and decks, instead of engaging journalists or producing content.

Starting point is 00:18:10 Are you wondering why your competitors are getting press and you aren't? Fish Tank PR is an award-winning climate and energy tech, renewables, and sustainability-focused PR firm dedicated to elevating the work of both early stage and established companies. Whether you need to position yourself as a thought leader in between project announcements or translate complex ideas and technologies into tangible, compelling stories that resonate with the media, fish tank can help. Check out fish tankpr.com. That's F-I-S-C-H-F-T-T-P-R.com.

Starting point is 00:18:42 I guess back to this question of how do you distinguish between the incremental innovation, which to be clear, if you develop or discover a superconductor at a higher temperature than anything that we've got today, that's meaningful. But it's probably orders of magnitude less meaningful than if you discover a room temperature superconductor. And I presume that the scientific challenge. is commensurately distinct between those two. And, you know, the way that LLM's work, as I understand it, at least in part, is on these

Starting point is 00:19:15 reward functions. And so are you setting your AI system a goal of, you know, find a room temperature superconductor? And then everything flows back from there. Here are the steps and all the things we have to fix to get to room temperature. Or do you say, improve this characteristic such that we can incrementally. you know, build our way there. In other words, are you going to find 10 super-c—I think of it is sort of a different thing from, but like the alternative version of this is what happens in fusion, nuclear fusion, where everybody is sort of chasing this same goal of Q is greater than one,

Starting point is 00:19:54 right? Like energy, energy break even. And everybody is getting incrementally closer and closer and eventually NIF breaks it or somebody breaks it. Is it going to look like that, or is it going to look like we've discovered nothing until we discover the room temperature superconductor. So, as you said, I think there are many different ways of improving superconductors without getting a room temperature superconductor. So one of them could be having a significant increase in T.C. But another one could be a really high critical magnetic field, which turns out might even be more important for fusion applications than TC itself.

Starting point is 00:20:28 Another one can be more mundane, like something like mechanical properties, like a superconductor that also is ductile and, you know, we can make it into devices. So we wouldn't rule out, you know, all of these very exciting developments just for, like, a room temperature, T.C. But how do you set the reward function for your model? What are you optimizing it for then? I mean, I think that's an empirical question. I think one thing I should say is it's quite nice because it's a bit, it's hard to reward hack. You know, one of these issues with RL and training LLMs is you might worry about reward hacking.

Starting point is 00:21:02 And in simulations, again, reward hacking can be a problem, even in DFT. But for real-life experimental measurement of TC, it's much harder to reward hack, which we love. So, like, if our reward was increasing TC, that just seems like a nicer, unhackable reward. But in terms of, like, what specifically will get us there, you know, we're not sure yet. I mean, it's an empirical question. We can probably try all of them. I'll list the things you propose and we'll try all of them. I guess that gets to the other question, which is like, what does the human in the loop look like here?

Starting point is 00:21:38 Right. And again, as you said, like, if we haven't solved the sort of AI is good at incremental innovation and not orthogonal breakthrough innovation thing, but humans are historically, at least better at it, is it like folks on your team developing a theory of something and that gets fed through the model and you get the results and you feed it back in, you see whether it's a promising category. Like, is the germ of the original idea of what to look for coming from a human? Or is it coming predominantly from the model,

Starting point is 00:22:13 and then the humans have to interpret and send it off in various directions? Yeah, I mean, that's a great question. You know, we're not really prioritizing full automation anyway. So if we get better results with humans doing part of it, that's great. Like, this is also actually a question for a lab. right, like, do we want to automate every single aspect of the lab? At some point, you end up needing humanoid for that. And I think that's not, like, Liam, my co-founder and I,

Starting point is 00:22:41 we are trying to be very pragmatic about it. Like, our goal is to get the best result possible on the things we care about. And, you know, how much of the automation comes from the ML, how much of it comes from more traditional tools and how much what gets done by humans, I think that's kind of, again, an empirical question. So, yeah, we're not, like, I think, as you said, it does seem like today, there are things that ML AI is better than humans. But one of those things is not hypothesis generation.

Starting point is 00:23:10 So, I mean, there are two options that we either have to improve these other landowners hypothesis generation, which is possible. Or the other option is we have humans providing some of the hypotheses and then AI doing the execution. I guess the other question here is cost. I mean, you guys raised a $300 million. seed round. So that implies on the outside that your cost structure will look similar to other companies that basically are going to use just an enormous amount of compute. And so like a lot of that cost comes from compute. In your context, I could imagine maybe that being true, but also maybe that not being true because you, again, you just don't have the same corpus of data. You can't

Starting point is 00:23:51 build a 10 billion parameter model right now because the data isn't there to do it. And so instead, that cost is going to go more toward the robotic lab and all that kind of stuff. How should I be thinking about how much compute you'll use and where that costs come from? Yeah. So honestly, computer is very expensive. And we are going to train LLMs. We are going to use GPUs to run simulations. So that does end up being a large part of the cost.

Starting point is 00:24:19 Yeah, it's funny. If you asked me this question 10 years ago, I would have thought that the biggest part of the cost must come from the labs. because like physical is real, you're building this lab, you're buying instruments. But it turns out the GPUs are so expensive and training al-LMs is so expensive. So when we were thinking about how much to raise, we kind of laid it out in terms of the GPU cost, the lab cost, and this was kind of a minimum number we felt like was viable. And, yeah, we'll see the GPUs having getting more expensive recently. I guess we'll see how the market dynamics continue.

Starting point is 00:24:53 To what extent are you, do you end up building? generalized model or models versus models designed to a specific domain, even a specific scientific domain, right? Like you guys are doing your material discovery obviously, but physics and chemistry and these things all intertwined. But like, is the same model going to be equally capable across all domains? Is that the intent? Or is that just not how they're supposed to be architected?

Starting point is 00:25:19 That's right. And it's actually something we're very excited about. You know, one thing I've been kind of noticing is like in the past, say, three, four years, I had a chance to collaborate with very, you know, world-class best in their field scientists. And even when you work with them, you realize that while their expertise on a few domains is, you know, incredible, maybe best in history, there's just so much more to know in chemistry and physics that they may not know all the other aspects of it. So this is why I brought up superconductivity, because you might actually need to be really good at,

Starting point is 00:25:53 you know, self-state chemistry and synthesis of difficult novel material. just because, you know, you don't know which chemistry the superconductor is going to come from. Some of these ideas you might have may not be as stable thermodynamically, so you need to be intelligent about how to kinetically force it into that phase you want. But at the same time, in addition to salsa chemistry, you need to be incredible at condensed matter physics, right? Because, like, there are so many different kinds of superconductivity. We don't understand most of them very well. And there's nobody in the world who knows both of those equally well.

Starting point is 00:26:26 or like sufficiently well. And turns out this is true for many different aspects. Like if you need to use robots for high triple synthesis, again, like there are only so many people who understand robots and how to use them for synthesis. So I think this was different in 1800s probably. Like there was probably a time when a physicist could contribute and be one of the best in the world on many fields of physics.

Starting point is 00:26:49 But it's definitely not true today. And this is one of the reasons I think we are very excited about LLMs because when you talk about LLMs, because when you talk to them, they seem like they have a pretty good understanding of cell state chemistry and cell state physics at the same time already. And we're, you know, trying to improve them further in the physical sciences specifically because that's where we are really interested in. And then we're hoping that they'll be good at multiples of these.

Starting point is 00:27:15 And then a really exciting prospect with that is a lot of the exciting discoveries happen to lie in between fields, right? That's why it's sometimes easier to be interdisciplinary. And there's so many of these surface areas between these different fields. I guess science is kind of like a fractal in the way it's hierarchically organized. And there's so much surface area that humans have exploited, of course. But then there's probably so much left to exploit. And we're excited about an LLM that can basically do that.

Starting point is 00:27:47 It is scale that humans couldn't yet. How good are the LLMs today or the best? in class of what you guys have at generating synthetic data in this domain. Another way to ask this question is, you know, if you fast forward three years, you're fully up and running and you're operating, how much of the valuable insight you will generate? Do you think will come from the physical data coming out of your lab versus the synthetic data that the LLMs create on top of that? Yeah, that's a great question.

Starting point is 00:28:22 I obviously don't know the answer, but it's great to brainstorm about that. Because on the one hand, the lab data will be kind of our additional data that other ELLMs may not have. And you might think then the lab data will only be as valuable as the results in it. But on the other hand, what's interesting about scientific data is it's not just a few bits of numbers, right? Like, for example, there are certain experiments you can run where the result you get, from it is just, say, three floating point numbers. But the implications of those could be tremendous, right? It's not just going to be like a few bytes. It will actually be potentially an incredible amount of understanding just from a few experiments. And this has been how it is

Starting point is 00:29:10 in human history, right? Like there are certain experiments that told us so much about how we understand about the universe. And the way to do this with synthetic data can, of course, be, you know, you run simulations that relate to that experiment. And when you get the experiment, mental result that actually validates or refute so much of the simulations you ran. And then that is a lot of information in itself. So, you know, it's a very interesting question. And I think there are some actually differences about how you think about synthetic data when it comes to an LLM that's good as science.

Starting point is 00:29:42 And exactly, I mean, this is one of the reasons I really want to work on this because this opens up questions for LLMs and LM training that may be different than what the frontier labs are thinking about right now. You know, if they're only thinking about math and logic and kind of what's on the internet, like accounting tasks, that's a bit different than if you're trying to do experimental physics, experimental chemistry. It just seems like a very exciting question to explore.

Starting point is 00:30:09 I want to talk a little bit about how you build a business out of this. I mean, you mentioned the superconductor example, and you said, like, there's a lot of value in this long before this novel superconductor falls, goes into a process. product, but ultimately kind of has to go to a product of some sort for you guys. And I think we talked about this a little bit last time, too, there's this question, okay, so if your job, your core job, the periodic, is to discover new things that are going to be valuable in the world, say you do it. To my mind, there's sort of a binary decision you have to make at that point.

Starting point is 00:30:44 Do you try to sell the discovery, license the technology, license the IP, to somebody else who's going to go produce it and turn it into a product, or do you produce it? Do you sell the product? Do you sell the discovery or do you sell the product? Do you have a prior on which direction you want to go here? Great question. And, you know, I think the two options can be correct depending on the context, depending on the timelines. But honestly, it also depends on where we are in the company. So at the very beginning, you can imagine our LLLMs will be very impactful. for other companies doing physical R&D. Like, you know, already today, most people.

Starting point is 00:31:28 Yeah, exactly. I mean, there's a lot of interest in being able to use these LLMs. Sometimes the data restrictions don't allow it because you don't necessarily want to put your data on an LLM, you know, on the web. Sometimes the other issue is you haven't trained the LLM on your data, so it's actually not as good as it could have been. that kind of improvement could be really impactful because we've seen how impactful ELMs can be

Starting point is 00:31:54 in other fields where they have access to the data. So there's a lot of, I think, headroom for impact there. But in the longer run, you can also imagine a case where we as a field get really good at designing materials intentionally. You know, that hasn't been the case. But if you look at drug design, there was a time when designing drugs wasn't very profitable. And I think people will look at it and say, this is not a good business.

Starting point is 00:32:20 But what happened with Genentech is the field got so good at designing drugs that at some point it became very valuable itself. The machine learning field has been making huge improvements in material science that, you know, was kind of hard to predict. So it will be interesting to see how far that goes and whether material discovery by itself becomes a very exciting business similar to drug discovery. But for us, we already see this big need and a big potential for impact by providing these al-Ms to do physical R&B.

Starting point is 00:32:56 Yeah, almost like this is going to be the wrong analogy, but it's partially right. Like almost like an AWS, like you're going to have the infrastructure. In this case, the infrastructure is your custom-designed LLM that is smart about physics and chemistry and all these domains and also your physical lab and them being interconnected with each other. And so you have all this infrastructure and scale in that infrastructure that you can use to go convince whatever large company that's doing R&D that they should just be outsourcing it to you rather than building rebuilding the whole same thing in-house, which is not exactly what the cloud providers are. But, you know, there's enough of a relationship there. So that feels right. But it is, I suspect, yeah, I guess this is what you're saying.

Starting point is 00:33:41 I suspect a smaller ultimate opportunity than the you proactively discover a bunch of novel materials that change the world and then however you monetize them, you prove you're able to do so repeatedly and then you're Genentec and, you know, it's a whole different category. Yeah, and you know,

Starting point is 00:34:02 one other thing you see is people are so excited about this. You know, they want to see LLMs not just conquer the digital world but also, you know, really impact the physical. world and impact the atoms basically. So I feel like this has to be done. And the team has been very excited. It's really amazing.

Starting point is 00:34:22 We're hosting weekly seminars where the physicists will teach the computer scientists about the physics and the computer scientists will teach the physicists about LLMs. And of course, there are a lot of people in between, right? Like it's actually, again, like a fractal. So yeah, I think there's been a lot of excitement about seeing if these technologies can be used, not just for the digital world, but also for constructing the items around us. I guess maybe the last question you said at the beginning, the thing that changed between a year ago and now, in part, was advancements in the big LLMs, right? The O1 model and so on.

Starting point is 00:34:58 Is there a next, like, what could change in, what could Open AI release in a year or two years from now that would be a big leapfrog for you? Are you branching off now from what the big LLLMs are going to do and everything that all advancements are going to come from periodic? Or is there something else that they could offer that is a step function change in your capability to discover new materials or whatever? Yeah, great question. I think we actually basically rise with the tide, right? As LLMs get better, there's so many advantages of that to other applications.

Starting point is 00:35:38 for example, the ALMs are getting very good at coding. And that's not surprising that because programming is a kind of closed environment. You can just simulate in your computer and get valuable feedback and then quickly improve. But as computers get better at coding, that's huge for science because then you can run simulations more efficiently. The simulations themselves can improve, similarly with tool use experiments. So I think as LLMs improve in general, it's going to help a lot with science applications. There are maybe longer-term things that can happen. One of them could be, you know, things like hypothesis generation or more auto-domain generalization.

Starting point is 00:36:21 But then a question there is, will that come from status quo, like how LLMs are being trained now, or will it come from actually labs that try to improve scientific reasoning for these LLMs? because then maybe hypothesis generation emerges naturally or auto domain generalization emerges naturally because that's what you're kind of trying to get at with your reward. So I think there will be a very exciting question to see maybe next time we chat.

Starting point is 00:36:47 Love it. All right. Thank you so much for taking some time again. Congrats on periodic. Super excited to see what you guys discover and for when your room temperature superconductor is shooting electricity all around the world around me. Okay, but thanks on it.

Starting point is 00:37:01 It was great chat show. Doge Chubuk is the co-founder of periodic labs and a former researcher at Google Deep Mind. This show is a production of Latitude Media. You can head over to Latitude Media.com for links to today's topics. Latitude is supported by Prelude Ventures. This episode was produced by Daniel Waldorf, mixing and theme song by Sean Marquand. Stephen Lacey is our executive editor. I'm Shail Khan, and this is Catalyst.

Catalyst with Shayle Kann - Inside a $300 million bet on AI for physical R&D

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.