Catalyst with Shayle Kann - Inside a $300 million bet on AI for physical R&D
Episode Date: November 6, 2025A big problem with using artificial intelligence to discover new materials? It struggles to predict beyond its training data. That means AI might be better at optimizing known materials than discoveri...ng entirely new ones — like a room temperature superconductor or carbon-capture sorbents. But since we last covered the topic in September 2024, a few things have changed. OpenAI released its powerful O1 reasoning model. Large language models have also gotten better at math, physics, and coding. And lab automation — robots mixing liquids and powders, running characterization tests — has improved, allowing for a higher volume of experiments. So, can these improvements overcome AI’s training data problem? In this episode, Shayle talks to Ekin Dogus Cubuk, cofounder of Periodic Labs, which raised $300 million seed round in September. Last year, Dogus took a more cautious view on using AI for materials discovery. Now though, he’s convinced there’s a clearer path forward for physical science research and development, especially materials discovery. Shayle and Dogus cover topics like: Creating experimental and synthetic data to overcome AI’s limitations of predicting beyond its training set Why we should focus on breakthrough discoveries over easier, incremental wins The different roles humans and AI play in the discovery process Period’s focus on automated experimental labs using AI-generated hypotheses Resources: Catalyst: Can AI revolutionize materials discovery? Latitude Media: This ‘superintelligence platform’ just raised $200m in seed funding Latitude Media: Can AI get us closer to fusion? The New York Times: Top A.I. Researchers Leave OpenAI, Google and Meta for New Start-Up Credits: Hosted by Shayle Kann. Produced and edited by Daniel Woldorff. Original music and engineering by Sean Marquand. Stephen Lacey is our executive editor. Catalyst is brought to you by EnergyHub. EnergyHub helps utilities build next-generation virtual power plants that unlock reliable flexibility at every level of the grid. See how EnergyHub helps unlock the power of flexibility at scale, and deliver more value through cross-DER dispatch with their leading Edge DERMS platform, by visiting energyhub.com. Catalyst is brought to you by Bloom Energy. AI data centers can’t wait years for grid power—and with Bloom Energy’s fuel cells, they don’t have to. Bloom Energy delivers affordable, always-on, ultra-reliable onsite power, built for chipmakers, hyperscalers, and data center leaders looking to power their operations at AI speed. Learn more by visiting BloomEnergy.com. Catalyst is supported by Third Way. Third Way’s new PACE study surveyed over 200 clean energy professionals to pinpoint the non-cost barriers delaying clean energy deployment today and offers practical solutions to help get projects over the finish line. Read Third Way's full report, and learn more about their PACE initiative, at www.thirdway.org/pace.
Transcript
Discussion (0)
Latitude Media, covering the new frontiers of the energy transition.
I'm Shail Khan, and this is Catalyst.
I have to say there's a difference between winning gold medals in Math Olympiads and scientific discovery.
Like, you can practice for Math Olympias by studying previous years' problems.
You can't really practice how to discover the next big theory,
but they were getting better at reasoning on complex.
Flakes problems. Coming up, can AI discover a room temperature superconductor? Volume 2.
When utilities need flexible capacity they can count on, they turn to Energy Hub. Energy Hub works
with more than 170 utilities, coordinating over 2.5 million devices to manage 3.4 gigawatts
of flexibility, built for the moments when utilities can't afford uncertainty. Energy Hub builds and
operates virtual power plants that utilities actually stake their grid planning on, coordinating EVs,
batteries, thermostats, and more through a single platform built for utility scale.
Predictive, verifiable, and designed to perform when it counts. Learn more at energyhub.com.
Trillions of dollars are flowing into clean and critical infrastructure, but those investments
aren't driven by technology alone. They're shaped by markets, by policy, by capital,
and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux, and host of a brand
new podcast, Critical Capital. Each episode, I talk with people deploying capital, shaping policy,
and building the clean economy. Tune in as we unpack how progress is actually made. Listen to
Critical Capital on Spotify, Apple, or wherever you get your podcasts. Catalyst is supported by Fishtank PR,
an award-winning PR firm focused on climate and energy tech, renewables, and sustainability.
FishTink is known for generating prominent and effective media coverage for the brands they work with.
If you want a PR partner that's thoughtful, shoots straight, and gets results, you'll like
Fish Tank PR.
To learn more about Fish Tank's approach, visit fish tankpr.com.
That's F-I-S-C-H-Fish-Tankpr.com.
I'm Shale Khan.
I lead the early-stage venture strategy and energy impact partners.
Welcome.
So a year ago, a little over a year ago, I had Doge Chubuk on this podcast to talk about using AI for
materials discovery, which has all sorts of.
interesting applications in the spaces that we talk about here. At that time, Doge had been
leading efforts in that area for Google DeepMind for some time. And I thought of him as being both
very knowledgeable in the space, obviously, also pretty sober about it. Fast forward a year,
Doge left Google DeepMind earlier this year, and along with Liam Fettis, who was one of the
co-creators of ChatsyPT, started a company called Periodic Labs, which raised, wait for it, a $300 million
Seed Round, led by Andrescent Horowitz. Periodic is doing AI for materials discovery. And not just that,
also physics and chemistry. And they're also very much hardware in the loop. The way I like to frame it is
that they're building two kinds of frontier lab at once. There's a frontier AI lab and a frontier
scientific lab, the type of lab that we used to talk about. And then they're trying to make those
two things work together to make breakthrough discoveries. Notably, one thing we talked about last time was
how the AI materials discovery companies at the time tended to start by going after,
often discovery of something like metal organic frameworks or moffs for carbon capture,
which I think of as less of a breakthrough opportunity, really, from a global scale,
whereas the big, perhaps biggest breakthrough to prove,
would be the discovery of a room temperature superconductor.
Well, periodic makes no promises,
but they're very publicly working on high temperature and maybe room temperature superconductors.
Based on that last conversation, to be honest, I wouldn't have predicted this.
So it was time to have Doge back on and hear what changed and how.
Here's Doge.
Doge, welcome back.
It's great to be back.
It's great to see you.
A lot has changed since the last time we talked.
So I looked back.
So the last conversation that we had was just over a year ago.
It was September 2024.
And I was having you explain to me the wild world of AI for materials discovery in particular.
and the work that you'd been doing at Google DeepMind,
but also just like the broader landscape.
And I'll tell you my takeaway from that conversation,
which you could tell me if I had the wrong takeaway at the time,
but my takeaway was promising field, pretty unclear
if and when this new wave of LLMs and all the reinforcement learning,
all the things that have shown up in the past few years,
pretty unclear if and when that would generate a real meaningful,
breakthrough discovery in materials.
And we talked through a bunch of the reasons why it's challenging training data,
maybe chief amongst them.
But I came away with a pretty, I think, like a sober view of the path there.
Okay, so fast forward a year and you left Google Deep Mine and started a company to do that
amongst other things.
So I guess the first question that I have for you is, what changed in the last 12 months
to give you conviction that like now is the time?
Great question.
So when we talked, I was doing research in the field of computational material science and machine learning.
You know, specifically, we were using graph neural networks.
We were using density functional theory and we were trying to discover materials.
One thing that changed since our discussion was the LLMs have improved even further.
So at the time, I wasn't using LMs much at all.
but I think right around when we were talking, the 01 came out, right?
The reasoning models started showing up.
And that was a huge update for me because you might remember that one of my big concerns is machine learning works best on the training set distribution.
But in science and technology, we almost only care about auto domain generalization.
So what O1 showed is if you spend test time compute, you can get better results.
So that was very exciting to me because there was one way of investing resources that was beyond the training set.
So, okay, if I can try to translate that into layperson terms, the reasoning model, like Open AI is a 1 model,
introduced, unlocked a door, kind of,
that maybe allows you to break this challenge
of the limited training data set that you have in materials discovery.
That was what we spent a lot of time talking about a year ago,
was like, you know, you can compare the corpus of data
that an LLM trains on to-do language,
which is enormous.
It trains on the internet, basically,
versus the corpus of data that you were dealing with
in trying to discover novel materials, and it was thousands of data points, not tens of billions
or whatever.
And so that presumably hasn't changed, at least yet.
But you're saying that the reasoning models have gotten good enough that they are able to
sort of get around that challenge via reasoning or possibly generating their own synthetic data.
Like, what is it that allows them to break that?
So I'm not saying that they're good enough already, but that was one step in the positive direction.
And another thing, you know, we've seen is they've gotten really good at math.
So, you know, since last time we talked, they started winning gold medals in math
Olympiads.
And they're doing similarly well on the coding, really well on physics Olympiads.
And I have to say there's a difference between those things and scientific discovery.
Like, you can practice for math Olympias by studying previous years' problems.
You can't really practice.
how to discover the next big theory.
But it did show you that they were getting better
at reasoning on complex problems.
So then what else do we need?
So I think the biggest thing we need is to have our own lab
because once you have a very intelligent reasoning LLM,
you still can't discover things unless you make trials, right?
Just like humans, the LLMs will be wrong often
when they try to predict things outside of their training set.
But you try many things, and then at some point, you get a really cool discovery.
And this is, you know, as we talked about history last time,
this is quite common in solid state chemistry,
solstate physics, where a lot of discoveries happen somewhat by accident,
but of course with a lot of background, understanding of the physical system,
and a lot of trial and error.
So, okay, so this is what you're doing at periodic, right?
You're sort of combining the digital domain with the physical domain.
You have a lab in both sense.
It's a frontier lab in an LLM sense and a frontier lab in a laboratory sense in the traditional sense of the word.
And you're sort of merging the two.
I'm curious in practice, like, how you imagine that feedback loop working.
So is it a traditional, you develop a theory, you run an experiment, you generate data from that experiment, but in this case, you feed the experiment back into your customized LLM.
as an additional set of training data,
and then that's the way that the loop works,
or is it more complicated than that?
Yeah, exactly.
I mean, it's pretty simple, I think, as you said.
So the LLM can propose, for example, synthesis recipes,
or it can propose simulations to run.
And because the LLMs are pretty good at tool use,
it can actually do it itself,
and then you get some results back.
So the results from experiment could be some characterization data,
results from the simulation can be some, you know,
trace or some simulation you did.
And now the LLM can be some.
can go through it with the context of its previous training, maybe the context of relevant papers,
textbooks, but also now the results that it just got that no one else have ever seen.
And then now it can kind of tweak the experiment, tweak the simulation for the next step.
Right. You said one thing in there that I guess is worth pointing out.
Like you're trying to automate this as much as possible. The LLM might run the experiment.
Yeah, absolutely. I mean, one of the other advances that's been happening recently that I think
made periodic possible is the high throughput experiments have been getting better.
You know, there are many examples of this now across academia industry where these robots
that became quite commoditized actually, just mixing powders or mixing liquids and then sending
it to characterization.
I think one thing that isn't as advanced right now, but we feel like we can do pretty soon,
is automated characterization itself.
So you mix powders, you put it in some characterization tool, you get the result out.
What is the actual output?
I think this is pretty difficult right now for AI tools, but we feel like we can improve that pretty quickly.
I want to talk about one specific application that I know you're going after that we actually did talk about a year ago.
But also, I want to talk about it as a way to see whether one of the other things you described as a fundamental challenge.
is this changed, which was, as I understood it, AI being pretty good at the next incremental
discovery, but not necessarily good at the breakthrough discoveries you said through history.
Usually that's done accidentally, or often it's done accidentally, because it's not,
you can't, like, reason your way to, to this massive breakthrough discovery.
So let's talk about superconductive materials.
So, right, we talked about this last time where all these companies that existed at the time
that were doing AI for materials discovery, we're starting on things like discovering a novel
moth for carbon capture or whatever.
But we said, like, the thing that would be the real breakthrough, the big thing, would be a
room temperature superconductor.
And you guys have since launching been very public about, like, superconductive materials,
maybe not room temperature.
I'm curious for you to tell me how likely you think that is, but, like, high temperature
superconductors is on the roadmap.
So why, first of all?
And then second of all, like this question of, do you think you have a,
a path to the truly breakthrough, what would the path be to a truly breakthrough discovery as opposed to
finding something that is a material that is superconductive at a ever so slightly higher temperature
than the best that we've got today? Yeah. So to answer your first question, I think it's still true
that it would be difficult to just reason your way into a much better superconductor.
I actually would guess that there's a law out there that we haven't discovered yet that says that
you can't just look at your training set that's different than what you're trying to discover and just predict it.
You know, there's been rules that we discovered from 1800s on where, like, you connect energy to work.
So thermodynamics is the first example.
There's more recently land hours limit, which shows that you have to spend a certain amount of energy to delete information,
which can be used to describe Maxwell's demon contradiction.
I bet there's something similar for how hard it is,
to discover things, it's outside of your training domain.
Okay, so I don't think that's been fixed since last time we talked.
But because we have a lab internally, we can just try things and try them at large scale and
often, and hopefully as intelligent as possible.
So even though we won't reason our way into a much better superconductor, we'll be able
to push our trials in the direction that's most promising or, you know, most promising for
us given our training set at the time.
So, yeah, I think that hasn't changed.
And I think there's reason to be hopeful because, you know, in the big scheme of things,
it's a pretty new field.
I mean, if you look at coup rates, they were from 1985.
There's been a lot of advances, you know, more recently.
So, yeah, we're very excited.
One reason we chose superconductivity is if you find a good superconductor, that's impactful
immediately, right?
Like, last time we talked about how long it can take to translate materials, improvements,
into products.
One nice thing about superconductors is if somebody discovers a room temperature
superconductor today, even before it makes it into a product, it is huge impact, right?
Like, first of all, it changes how we think about the universe.
Second, it helps us do physics experiments that wasn't possible before.
And, you know, whenever you think about like a sci-fi-ish technology, like quantum computing,
fusion, superconductors come up because it's kind of what we need.
It's kind of like one of the most exciting.
macro scale quantum phenomena.
So that's one reason we picked it because it kind of is exciting as soon as we succeed.
The other reason is it requires all sorts of improvements to get there.
You know, when we think about open AI and deep mind, I remember back in 2016,
people used to make fun of these institutions for prioritizing AGI so much because they were
saying we're going to do AGI.
But what happened is they developed so many other tools on the way to AGI.
GI, they were useful in themselves.
But today they have these LLMs that, you know, you might consider AI or like something
really impressive.
Superconductivity is a bit like that.
To discover an exciting superconductor, we probably have to develop so many capabilities
on the way there that's by themselves very useful.
For example, automated synthesis, automatic characterization, being able to model or predict high
temperature superconductivity because we don't have a theory for it yet.
So it's kind of like a nice goal that you.
unites people and requires a lot of other useful things to happen on the way.
And it's one of those things that physicists find really exciting.
So the physicists in our company are really excited by this mission,
but also computer scientists find it very exciting.
It's just one of those things that I think both sides can really appreciate.
So those are some of the reasons that we picked it.
Virtual power plants are becoming a reliable way for utilities to manage capacity.
But enrolling devices is just the start.
What really matters is confidence, knowing those resources will perform when dispatched and being
able to prove it, from the control room to the living room.
Energy Hub's platform handles the full picture, from near real-time forecasting, locational
dispatch, and the kind of rigorous verification that holds up when regulators, grid operators,
or leadership ask, did it deliver?
Easy enrollment creates momentum, proven performance builds trust.
That's why more than 170 utilities rely on Energy Hub to manage over 2.5 million devices,
delivering 3.4 gigawatts of flexible capacity. See what that looks like at energyhub.com.
We're living through a profound economic shift, and energy sits at the center of all of it.
Trillions of dollars are flowing into power plants, transmission lines, battery factories, data centers,
but the future of energy isn't shaped by technology alone. It's shaped by markets, by policy,
by capital, and by the institutions that connect them. I'm Alfred Johnson, CEO of Crux,
the capital platform for the clean economy.
Join me for my brand new show, Critical Capital,
as I talk with people deploying capital,
shaping policy and building projects.
Together, we unpack how risk is priced,
how incentives are structured,
and how progress is actually made.
Listen to Critical Capital on Spotify, Apple,
or wherever you get your podcasts.
Are you tired of overpaying for big-name PR firms,
but not really knowing what they're delivering?
Is your comms team wasting time reviewing
lengthy messaging briefs and decks,
instead of engaging journalists or producing content.
Are you wondering why your competitors are getting press and you aren't?
Fish Tank PR is an award-winning climate and energy tech, renewables, and sustainability-focused PR firm
dedicated to elevating the work of both early stage and established companies.
Whether you need to position yourself as a thought leader in between project announcements
or translate complex ideas and technologies into tangible, compelling stories that resonate with the media,
fish tank can help.
Check out fish tankpr.com.
That's F-I-S-C-H-F-T-T-P-R.com.
I guess back to this question of how do you distinguish between the incremental
innovation, which to be clear, if you develop or discover a superconductor at a higher
temperature than anything that we've got today, that's meaningful.
But it's probably orders of magnitude less meaningful than if you discover a room
temperature superconductor.
And I presume that the scientific challenge.
is commensurately distinct between those two.
And, you know, the way that LLM's work, as I understand it, at least in part, is on these
reward functions.
And so are you setting your AI system a goal of, you know, find a room temperature superconductor?
And then everything flows back from there.
Here are the steps and all the things we have to fix to get to room temperature.
Or do you say, improve this characteristic such that we can incrementally.
you know, build our way there. In other words, are you going to find 10 super-c—I think of it
is sort of a different thing from, but like the alternative version of this is what happens in
fusion, nuclear fusion, where everybody is sort of chasing this same goal of Q is greater than one,
right? Like energy, energy break even. And everybody is getting incrementally closer and closer and
eventually NIF breaks it or somebody breaks it. Is it going to look like that, or is it going to
look like we've discovered nothing until we discover the room temperature superconductor.
So, as you said, I think there are many different ways of improving superconductors
without getting a room temperature superconductor.
So one of them could be having a significant increase in T.C.
But another one could be a really high critical magnetic field, which turns out might even
be more important for fusion applications than TC itself.
Another one can be more mundane, like something like mechanical properties, like a superconductor
that also is ductile and, you know, we can make it into devices.
So we wouldn't rule out, you know, all of these very exciting developments just for, like, a room temperature, T.C.
But how do you set the reward function for your model?
What are you optimizing it for then?
I mean, I think that's an empirical question.
I think one thing I should say is it's quite nice because it's a bit, it's hard to reward hack.
You know, one of these issues with RL and training LLMs is you might worry about reward hacking.
And in simulations, again, reward hacking can be a problem, even in DFT.
But for real-life experimental measurement of TC, it's much harder to reward hack, which we love.
So, like, if our reward was increasing TC, that just seems like a nicer, unhackable reward.
But in terms of, like, what specifically will get us there, you know, we're not sure yet.
I mean, it's an empirical question.
We can probably try all of them.
I'll list the things you propose and we'll try all of them.
I guess that gets to the other question, which is like, what does the human in the loop look like here?
Right.
And again, as you said, like, if we haven't solved the sort of AI is good at incremental innovation
and not orthogonal breakthrough innovation thing, but humans are historically, at least better at it,
is it like folks on your team developing a theory of something and that gets fed through the model
and you get the results and you feed it back in,
you see whether it's a promising category.
Like, is the germ of the original idea of what to look for coming from a human?
Or is it coming predominantly from the model,
and then the humans have to interpret and send it off in various directions?
Yeah, I mean, that's a great question.
You know, we're not really prioritizing full automation anyway.
So if we get better results with humans doing part of it, that's great.
Like, this is also actually a question for a lab.
right, like, do we want to automate every single aspect of the lab?
At some point, you end up needing humanoid for that.
And I think that's not, like, Liam, my co-founder and I,
we are trying to be very pragmatic about it.
Like, our goal is to get the best result possible on the things we care about.
And, you know, how much of the automation comes from the ML,
how much of it comes from more traditional tools
and how much what gets done by humans,
I think that's kind of, again, an empirical question.
So, yeah, we're not, like, I think, as you said, it does seem like today, there are things that ML AI is better than humans.
But one of those things is not hypothesis generation.
So, I mean, there are two options that we either have to improve these other landowners hypothesis generation, which is possible.
Or the other option is we have humans providing some of the hypotheses and then AI doing the execution.
I guess the other question here is cost.
I mean, you guys raised a $300 million.
seed round. So that implies on the outside that your cost structure will look similar to other
companies that basically are going to use just an enormous amount of compute. And so like a lot of that
cost comes from compute. In your context, I could imagine maybe that being true, but also maybe
that not being true because you, again, you just don't have the same corpus of data. You can't
build a 10 billion parameter model right now because the data isn't there to do it. And so instead,
that cost is going to go more toward the robotic lab and all that kind of stuff.
How should I be thinking about how much compute you'll use and where that costs come from?
Yeah.
So honestly, computer is very expensive.
And we are going to train LLMs.
We are going to use GPUs to run simulations.
So that does end up being a large part of the cost.
Yeah, it's funny.
If you asked me this question 10 years ago, I would have thought that the biggest part of the cost must come from the labs.
because like physical is real, you're building this lab, you're buying instruments.
But it turns out the GPUs are so expensive and training al-LMs is so expensive.
So when we were thinking about how much to raise, we kind of laid it out in terms of the
GPU cost, the lab cost, and this was kind of a minimum number we felt like was viable.
And, yeah, we'll see the GPUs having getting more expensive recently.
I guess we'll see how the market dynamics continue.
To what extent are you, do you end up building?
generalized model or models versus models designed to a specific domain, even a specific
scientific domain, right?
Like you guys are doing your material discovery obviously, but physics and chemistry and
these things all intertwined.
But like, is the same model going to be equally capable across all domains?
Is that the intent?
Or is that just not how they're supposed to be architected?
That's right.
And it's actually something we're very excited about.
You know, one thing I've been kind of noticing is like in the past, say, three, four
years, I had a chance to collaborate with very, you know, world-class best in their field scientists.
And even when you work with them, you realize that while their expertise on a few domains is,
you know, incredible, maybe best in history, there's just so much more to know in chemistry
and physics that they may not know all the other aspects of it.
So this is why I brought up superconductivity, because you might actually need to be really good at,
you know, self-state chemistry and synthesis of difficult novel material.
just because, you know, you don't know which chemistry the superconductor is going to come from.
Some of these ideas you might have may not be as stable thermodynamically,
so you need to be intelligent about how to kinetically force it into that phase you want.
But at the same time, in addition to salsa chemistry, you need to be incredible at condensed matter physics, right?
Because, like, there are so many different kinds of superconductivity.
We don't understand most of them very well.
And there's nobody in the world who knows both of those equally well.
or like sufficiently well.
And turns out this is true for many different aspects.
Like if you need to use robots for high triple synthesis,
again, like there are only so many people who understand robots
and how to use them for synthesis.
So I think this was different in 1800s probably.
Like there was probably a time when a physicist could contribute
and be one of the best in the world on many fields of physics.
But it's definitely not true today.
And this is one of the reasons I think we are very excited about LLMs
because when you talk about LLMs,
because when you talk to them, they seem like they have a pretty good understanding of
cell state chemistry and cell state physics at the same time already.
And we're, you know, trying to improve them further in the physical sciences specifically
because that's where we are really interested in.
And then we're hoping that they'll be good at multiples of these.
And then a really exciting prospect with that is a lot of the exciting discoveries happen
to lie in between fields, right?
That's why it's sometimes easier to be interdisciplinary.
And there's so many of these surface areas between these different fields.
I guess science is kind of like a fractal in the way it's hierarchically organized.
And there's so much surface area that humans have exploited, of course.
But then there's probably so much left to exploit.
And we're excited about an LLM that can basically do that.
It is scale that humans couldn't yet.
How good are the LLMs today or the best?
in class of what you guys have at generating synthetic data in this domain.
Another way to ask this question is, you know, if you fast forward three years, you're fully
up and running and you're operating, how much of the valuable insight you will generate?
Do you think will come from the physical data coming out of your lab versus the synthetic
data that the LLMs create on top of that?
Yeah, that's a great question.
I obviously don't know the answer, but it's great to brainstorm about that.
Because on the one hand, the lab data will be kind of our additional data that other ELLMs may not have.
And you might think then the lab data will only be as valuable as the results in it.
But on the other hand, what's interesting about scientific data is it's not just a few bits of numbers, right?
Like, for example, there are certain experiments you can run where the result you get,
from it is just, say, three floating point numbers. But the implications of those could be
tremendous, right? It's not just going to be like a few bytes. It will actually be potentially
an incredible amount of understanding just from a few experiments. And this has been how it is
in human history, right? Like there are certain experiments that told us so much about how we
understand about the universe. And the way to do this with synthetic data can, of course, be,
you know, you run simulations that relate to that experiment. And when you get the experiment,
mental result that actually validates or refute so much of the simulations you ran.
And then that is a lot of information in itself.
So, you know, it's a very interesting question.
And I think there are some actually differences about how you think about synthetic data
when it comes to an LLM that's good as science.
And exactly, I mean, this is one of the reasons I really want to work on this
because this opens up questions for LLMs and LM training that may be different
than what the frontier labs are thinking about right now.
You know, if they're only thinking about math and logic
and kind of what's on the internet, like accounting tasks,
that's a bit different than if you're trying to do experimental physics,
experimental chemistry.
It just seems like a very exciting question to explore.
I want to talk a little bit about how you build a business out of this.
I mean, you mentioned the superconductor example,
and you said, like, there's a lot of value in this long before this novel
superconductor falls, goes into a process.
product, but ultimately kind of has to go to a product of some sort for you guys. And I think
we talked about this a little bit last time, too, there's this question, okay, so if your job,
your core job, the periodic, is to discover new things that are going to be valuable in the
world, say you do it. To my mind, there's sort of a binary decision you have to make at that point.
Do you try to sell the discovery, license the technology, license the IP, to somebody else who's
going to go produce it and turn it into a product, or do you produce it? Do you sell the product? Do you
sell the discovery or do you sell the product? Do you have a prior on which direction you want to go here?
Great question. And, you know, I think the two options can be correct depending on the context,
depending on the timelines. But honestly, it also depends on where we are in the company.
So at the very beginning, you can imagine our LLLMs will be very impactful.
for other companies doing physical R&D.
Like, you know, already today, most people.
Yeah, exactly.
I mean, there's a lot of interest in being able to use these LLMs.
Sometimes the data restrictions don't allow it
because you don't necessarily want to put your data on an LLM, you know, on the web.
Sometimes the other issue is you haven't trained the LLM on your data,
so it's actually not as good as it could have been.
that kind of improvement could be really impactful
because we've seen how impactful ELMs can be
in other fields where they have access to the data.
So there's a lot of, I think, headroom for impact there.
But in the longer run, you can also imagine a case
where we as a field get really good at designing materials intentionally.
You know, that hasn't been the case.
But if you look at drug design,
there was a time when designing drugs wasn't very profitable.
And I think people will look at it and say, this is not a good business.
But what happened with Genentech is the field got so good at designing drugs
that at some point it became very valuable itself.
The machine learning field has been making huge improvements in material science that, you know,
was kind of hard to predict.
So it will be interesting to see how far that goes and whether material discovery by itself
becomes a very exciting business similar to drug discovery.
But for us, we already see this big need and a big potential for impact by providing these
al-Ms to do physical R&B.
Yeah, almost like this is going to be the wrong analogy, but it's partially right.
Like almost like an AWS, like you're going to have the infrastructure.
In this case, the infrastructure is your custom-designed LLM that is smart about physics and chemistry
and all these domains and also your physical lab and them being interconnected with each other.
And so you have all this infrastructure and scale in that infrastructure that you can use to go convince whatever large company that's doing R&D that they should just be outsourcing it to you rather than building rebuilding the whole same thing in-house, which is not exactly what the cloud providers are.
But, you know, there's enough of a relationship there.
So that feels right.
But it is, I suspect, yeah, I guess this is what you're saying.
I suspect a smaller ultimate opportunity
than the you proactively discover
a bunch of novel materials that change the world
and then however you monetize them,
you prove you're able to do so repeatedly
and then you're Genentec and, you know,
it's a whole different category.
Yeah, and you know,
one other thing you see is people are so excited about this.
You know, they want to see LLMs
not just conquer the digital world
but also, you know, really impact the physical.
world and impact the atoms basically.
So I feel like this has to be done.
And the team has been very excited.
It's really amazing.
We're hosting weekly seminars where the physicists will teach the computer scientists
about the physics and the computer scientists will teach the physicists about LLMs.
And of course, there are a lot of people in between, right?
Like it's actually, again, like a fractal.
So yeah, I think there's been a lot of excitement about seeing if these technologies can be used,
not just for the digital world, but also for constructing the items around us.
I guess maybe the last question you said at the beginning, the thing that changed between a year ago and now, in part, was advancements in the big LLMs, right?
The O1 model and so on.
Is there a next, like, what could change in, what could Open AI release in a year or two years from now that would be a big leapfrog for you?
Are you branching off now from what the big LLLMs are going to do and everything that all
advancements are going to come from periodic?
Or is there something else that they could offer that is a step function change in your
capability to discover new materials or whatever?
Yeah, great question.
I think we actually basically rise with the tide, right?
As LLMs get better, there's so many advantages of that to other applications.
for example, the ALMs are getting very good at coding.
And that's not surprising that because programming is a kind of closed environment.
You can just simulate in your computer and get valuable feedback and then quickly improve.
But as computers get better at coding, that's huge for science because then you can run simulations more efficiently.
The simulations themselves can improve, similarly with tool use experiments.
So I think as LLMs improve in general, it's going to help a lot with science applications.
There are maybe longer-term things that can happen.
One of them could be, you know, things like hypothesis generation or more auto-domain generalization.
But then a question there is, will that come from status quo, like how LLMs are being trained now,
or will it come from actually labs that try to improve scientific reasoning for these LLMs?
because then maybe hypothesis generation emerges naturally
or auto domain generalization emerges naturally
because that's what you're kind of trying to get at
with your reward.
So I think there will be a very exciting question
to see maybe next time we chat.
Love it.
All right.
Thank you so much for taking some time again.
Congrats on periodic.
Super excited to see what you guys discover
and for when your room temperature superconductor
is shooting electricity all around the world around me.
Okay, but thanks on it.
It was great chat show.
Doge Chubuk is the co-founder of periodic labs and a former researcher at Google Deep Mind.
This show is a production of Latitude Media.
You can head over to Latitude Media.com for links to today's topics.
Latitude is supported by Prelude Ventures.
This episode was produced by Daniel Waldorf, mixing and theme song by Sean Marquand.
Stephen Lacey is our executive editor.
I'm Shail Khan, and this is Catalyst.
