a16z Podcast - Building an AI Physicist: ChatGPT Co-Creator’s Next Venture

Starting point is 00:00:00 Ultimately, sciences driven against experiment in the real world. And so that's what we're doing with periodic labs. We're taking these precursor technologies and we're saying, okay, if you care about advancing science, we need to have experiment in the loop. The applications of building an AI physicist, for lack of a better word, that can design the real world are so broad. You can apply them to advanced manufacturing. You can apply them material science to chemistry.

Starting point is 00:00:25 Any process where there's R&D with the physical world required, it seems like, will benefit from breakthroughs that periodic is working on. For example, if you could find a 200 Kelvin superconductor, even before we make any product with it, to be able to see such quantum effects at such high temperatures, I think it would be such an update to people's view how they see the universe. What if AI could move from talking about science to doing science?

Starting point is 00:00:51 Today's conversation features Anjone Mehta, general partner A16Z, with Liam Vedas and Doge Chubuk, co-founders of periodic labs, a frontier research lab building experiment in the loop AI for physics and chemistry. They unpack why real-world reward functions matter, how mid-training and high-compute RL fit together, and why superconductivity and magnetism are the first B-lines towards an AI physicist. They also get into noisy data sets and negative results, what happens when ML researchers sit shoulder-to-shoulder with bench scientists, and the near-term payoff. Co-pilot tools for

Starting point is 00:01:28 for advanced industries, from semiconductors to space and manufacturing. Let's get into it. So Liam, you were the co-creator of Chat, GPT. Doge, you were running some of the physics teams at DeepMind. Let's talk about how you guys met, and what was the moment where you realized that you guys had to leave both of those labs to start periodic? I believe we met eight years ago at Google Brain

Starting point is 00:01:56 flipping over a large tire. Yep. At the Google. You got to give us more on that stuff. So Google Rails was one of the gems at the Google facilities. And I think that's where Doge and I met and are just this massive tire that like a single person basically can't flip single, like by themselves. And so Doge was trying to flip it. And he like, I think the two of us could do it.

Starting point is 00:02:22 And why were you trying to flip this tire? You know what? Why not? But I tried doing it, I couldn't do it. And then I was like, who's the strongest person I can find? And I was either Barrett or Liam. And it worked. We just flip it.

Starting point is 00:02:39 And was that the moment where you guys both realized you had physics backgrounds? How did that happen? How did you go from flipping tires to flipping experiments? Yeah, I mean, so enough Liam remembers this, but we would catch up, you know, over the years. And we would often end up talking about either quantum materials. or like superconductivity. It was like very common, but I never thought that we'd end up working on physics together. So Liam was working on LLMs and they were going really well. And I was not using LLMs, but I was noticing that LLMs are becoming more and more impactful in

Starting point is 00:03:13 my work. So one way it was becoming impactful is when I was trying to remember some things about chemistry, physics, I could just talk to the chatbot and actually learn a lot of stuff I forgot. Another way was, of course, coding. Like we were writing simulations and the LAM was so helpful in writing these simulations for us. So then the question was like, can we use LLM's kind of more as a first-class citizen in the physics research? Yeah, I think kind of leading up to this decision to leave, Doge and I were just, you know, connecting and talking about these different tech trees. We're looking at the improvements on language models, on reasoning. We're seeing what high-compete reinforcement learning could do. And on the material science side, we're seeing

Starting point is 00:03:53 scaling laws within physics, within chemistry, both with respect to simulation. with respect to experiment, and it's like the same kind of principles at play and ML. And I think to both of us, and to a lot of people in the field, the goal of this technology is accelerate science, accelerate physical R&D. It's, you know, chatbots was like a great milestone along the way, but we really want to see technology out in the world. And we felt like this was just the right place to begin. Physics is very verifiable.

Starting point is 00:04:24 It's a great reward function. fairly fast iteration loop, you have simulators for large classes of physical systems, and we felt like in order to create this AI scientist, this is like the beginning of this path. So built that conviction and decided to found periodic. Well, let's take a second to talk about what periodic is and what does it do. So periodic labs is a frontier AI research lab, this trying to use LLMs to advance physics and chemistry. We feel like having experiment in the loop tightly coupled with simulations and LMs is extremely important.

Starting point is 00:05:01 So we're building up a lab that will generate high throughput, high quality data. And we will use LLMs and simulations in conjunction with the experiments to try to iterate. Science, by its nature, is an iterative direction. And we feel like LLMs using all these tools that are available to humans can do great. job in accelerating physical R&D? I'd say the objective is, let's replace the reward function from math graders and code graders that we're using today. So, like, math graders, you know, to give an example, you have a prompt, what is 2 plus 2?

Starting point is 00:05:41 You know, the ground truth is 4. You can put a lot of optimization pressure against problems like that that are programmatically checkable. And what we're doing, and by having the lab, is we create a, a form. physically grounded reward function. That becomes the basis on which we're optimizing against. And so if a simulator has some deficiencies or some issues, we always error correct because for us, the ground truth is the experiment, like the RL environment. Nature like is our RL environment in our setting. Let's just take a second for folks who might not be familiar to explain what you

Starting point is 00:06:16 guys mean by a lab that will verify RL in the real world? Can you talk a little bit about how experiments work? How are AI models trained today? And how are those different from how they're going to be trained and developed and post-trained and deployed at periodic? And it might be helpful to talk about how you created chat GPT. So chat chabit, originally, the technology evolved very rapidly over the last few years. When we're first creating it, it was a very standard RLHF pipeline. So you have a pre-trained model, and it's sort of like this raw substrate. And what you're trying to do is take this auto-completion model and turn it into something useful. The way we did it at that point was we would have supervised data. So given some input,

Starting point is 00:07:05 we would say this is a desired output. So if we're trying to get it to act as an assistant, you know, we create some tuples like that. Then you run. reinforcement learning, but now you're learning against a reward function that's trained against human preferences. So humans will say, well, given this input, I would prefer completion A to completion B, and you do that over and over again, and you can create a reward function that can then be optimized against. That is sort of the basis of how we created chat should be two.

Starting point is 00:07:37 But then there's a huge gap between the original model and what we have today. And I think part of that is reasoning, but also part of that is just much better, more precise reward functions. So the reward functions that we were using originally couldn't determine whether you were mathematically correct or not. So early versions of Chatabutee were mathematically not particularly strong. And it sort of results from the reward function. What did you optimize against? The reward function basically encoded be a friendly assistant, try to help people get to their thing. but it had no sense of, is this mathematically correct or not? Is this code valid or not?

Starting point is 00:08:17 And we made huge advances over the correctness of a reward function. But this is all digital. We're creating tasks based on the internet, textbooks, papers. And this is great. This lays a foundation. But ultimately, science is driven against experiment in the real world. And so that's what we're doing with periodic labs. We're taking these precursor technologies and we're saying, okay, if you care about advancing science, we need to have experiment in the loop and that becomes our reward function for our agents. So as DOSHA is saying, our agents are doing the same type of things you would use for coding or to help answer a query. But now instead of just giving tools like here's Python, here's a browser, now we have tools like quantum mechanics. So simulate different systems.

Starting point is 00:09:08 but ultimately we're going to a lab and then that becomes like the basis of what is the system optimizing against. So that's sort of just like the natural end state of these systems. People in AI often say lab. Often what they're referring to is quite different from what you guys mean by a lab.

Starting point is 00:09:26 What's the difference? That's right. So as you mentioned so far, the alms have gotten really good at logic and math. There's like verifiable rewards. What is like the next frontier in terms of, you know, inquiry after logic and math, I'd say it's physics. And then when you say physics, there are different energy scales. So there's astrophysics, studying galaxies, there's fusion, nuclear physics,

Starting point is 00:09:46 but then there's the energy scale of physics that's more relevant to our life. And that's the quantum mechanics, like Schrodinger's equation, this is where, you know, biology happens, chemistry around us happens, materials happen. So we felt like our first lab should be basically probing that quantum mechanical energy scale. And for us, that would be physics at the level of salt state physics, material science, and chemistry. One of the more fundamental ways of making things around us is powder synthesis. So you take powders of existing materials, you mix them, and you heat them up to certain temperature, and it becomes a new material.

Starting point is 00:10:22 So this is one of our labs. We're going to have a powder synthesis lab, and turns out this is one of those methods where robots can do it, like very cheap, simple methods. I know if you saw this coffee-making robot in the SF airport. You know, a robot that's basically at that level can mix powders and put it in a furnace. And there's a very rich field. So you can actually, using their method, discover new superconductors, magnets, you know, all kinds of materials that are very important for technologies around us. But at the core of it, it's just quantum mechanics.

Starting point is 00:10:51 And we feel like teaching these LLMs to be foundation models, but for quantum mechanics will be the next frontier for LLMs. Why haven't the models that are currently out in the world and deployed, able to do this? Great question. I think, as you mentioned earlier, science is by its nature iterative, right? Like, even the smartest humans tried many times before they discover the things they discovered. And I think maybe this is one of the confusing points about LLMs. Alums can be very smart, but if they're not iterating on science, they won't discover science. You know, to be honest, humans won't either.

Starting point is 00:11:27 Like you put a human in a room without any chance to iterate on something. They won't discover anything important. So we feel like the important thing to teach these LLMs is the method of scientific inquiry. So you do simulations, you do theoretical calculations, you do experiments, you get results. And the results are probably incorrect or not what you want at first, but you iterate on it. And we feel like that hasn't been done yet. So this is what we want to do. But we feel like you have to do it with the real physics, not just the simulation.

Starting point is 00:11:56 So this is why we have our own lab, where the LLM will have the opportunity to iterate on its understanding of quantum mechanics. Fundamentally, machine learning models are good at what you train them to do. And that's sort of like the nature of it. And so if a model is acting badly, you're like, well, did you train it to do that task? Kind of building on Dosh's point, there's sort of like an epistemic uncertainty, this like reducible uncertainty that you aren't really building or collapsing in. unless you're actually running an experiment. So, for instance, one of the engineers on our team was looking at a reported property

Starting point is 00:12:32 of some physical property in the literature, and it spanned many orders of magnitude. So if I train a system on that, these systems aren't magic, the best they can do is replicate that distribution, but it's really no closer to a deeper understanding of the universe, physics, chemistry. Then another point is it's very uncommon

Starting point is 00:12:52 to publish negative results. All of the results are basically, positive and a valid negative result is very valuable. A negative result could be discarded because it's a sloppy science. But there are valid negative results and that's a learning signal. And this is something that our lab will produce as well. So I think these three things, it's just like noisy data, no negative results. And you need the ability to act in order to actually do science, which is an iterative endeavor. Those are like the core thesis of why we need a lab. And what might be the core way to measure if the diuretic progress against that goal in your guys' minds?

Starting point is 00:13:30 One simple one is, let's say high temperature superconductivity. What is the highest temperature superconductor we synthesized? Today, the best number for ambient pressure is 135 Kelvin or so. So we'll know very easily if you're doing well if we can go beyond that number. So that's pretty fundamental. On the more applied side, you know, there's processing of materials and its effect on the materials properties. So we can just measure these properties directly.

Starting point is 00:13:56 They say it's the ductility, it's the toughness, strength of the material. And as we measure it, the LLM will get a very clear signal. It's hard to hack, you know, unlike these other LLM training techniques. It's like really what you see in real life is the signal that's going to the LLM. Yeah, effectively. So can you design the world around you? So you're like, I need something with this property. Can this system discover and produce that, both from like,

Starting point is 00:14:23 a fundamental scientific discovery perspective, but also in industry. So, like, someone's working in space or defense or semiconductors and, like, yeah, we're having these issues. We're trying to achieve this property of this material or this layer. Can the system accelerate the development of those technologies? So it's very grounded. That's how we'll know it's working. It feels like the applications of solving, building an AI.

Starting point is 00:14:53 physicist, for lack of a better word, that can design the real world are so broad. You can apply them to advanced manufacturing. You can apply the material science, to chemistry, to anything that any process where there's R&D with the physical world required, it seems like we'll benefit from breakthroughs that periodic is working on. Why hasn't it been done before? And what is it about this moment in history that makes it the right time to attack this problem? Maybe one comment is difficult. What makes it so difficult? I mean, I think part of it is the team. So in our view, this has been enabled by frontier technology in the last couple of years.

Starting point is 00:15:32 And so Doge and I have been so focused on basically putting together like this N of one team. Like these group of physicists, chemists, simulation experts, and some of the best machine learning researchers in the world have never been part of one concerted effort. And we feel in order to actually achieve this, you need all these expertise. need these pillars to do this. So when you guys went about designing the team, you know, after you left opening on DeepMind, what was the primary heuristic that you used to guide yourself and figuring out who we wanted on the team? So in terms of expertise, we wanted to have a LLM expertise covered, the experimental

Starting point is 00:16:12 expertise, and simulation. And for each of these, we wanted to have basically world-class talent. And of course, for each team, there's actually a lot of sub-teams, like it's like a fractal, right? That expertise is very fractal-like. So for the experimental side, we want to cover salt-state chemistry, salt-state physics, automation, and kind of the more facilities, like the more operational aspects of experiments. On the simulation side, there's the more kind of theoretical physics parts, there's the more kind of coding aspects of simulations. And on the LLM side, of course, there's mid-training, RL, Infra. And yeah, for each of these,

Starting point is 00:16:49 we try to get basically the best people who have innovated in these sub-pillars. Yeah. So I think it's like there is not a team to do it. The technology that we think is necessary to do it has really just emerged in the last couple of years. And this data isn't like on a Reddit forum or something. Like you need to actually go produce experimental data, simulation data. It's siloed across all of these advanced industries.

Starting point is 00:17:18 and many of them, while there's a desire, they may not have knowledge of some of the most recent techniques that's been driving this recent wave in AI. There was a moment in time when models like, or papers like the GPT3 paper, for example, that said language models are few shot learners and proposed the idea of scaling laws. And then there was a follow-up paper,

Starting point is 00:17:41 if you guys remember from Open AI, that was called, I think, scaling laws for generative modeling. that just showed that as long as you just kept throwing, you scaled up the amount of compute and data in the right combination, you could very predictably improve the performance of these models. And the theory was that if you just kept doing that, you know, at infinitum,

Starting point is 00:18:03 there would be a bunch of emergent capabilities. These models would be able to reason about all kinds of problems out of domain, out of distribution. Wouldn't that argue, how would you, how would you square the circle with that school of thought that um that you know that naively the current pre-training and post-training uh sort of pipelines at most of the frontier labs won't just eventually crack physics as well why why is is this idea of physical verification um so necessary and are is that school of is that school of sort of reasoning wrong yeah um excellent question

Starting point is 00:18:41 uh scaling laws empirically seem to continue to hold so That's not in question. But I think there's a question of what is this y-axis? And that test distribution is very different from like what we're talking about. That test distribution, let's say you're pre-taining on the internet might be, you know, a representative sat from the internet. And you'll have these sort of predictable scaling properties. But that's not going to capture that you have a very different set of scaling properties.

Starting point is 00:19:16 with respect to different distributions. So I try to make this a little bit more concrete. Let's say hypothetically we're training a coding model, and we have unit tests to provide some reward signal. So the model writes some PR. We check that the unit tests go from failing to passing, and we say this was successful, we're going to reinforce these things. You might say you start optimizing this,

Starting point is 00:19:40 and now the system is becoming ever more capable of writing code for its own development. And you have this acceleration, you have this kind of takeoff scenario. Code is one of the most promising areas for this because there's abundant of data online. You have this feedback loop where the system itself can begin to improve itself.

Starting point is 00:20:01 And it's a very promising technique. And we're all seeing the benefits of advanced coding models and it's accelerating quickly. However, that model is not going to then cure cancer. the knowledge simply doesn't exist. It doesn't, you need to optimize against the distribution you care about. So that model, while it's going to be a very valuable tool as a software engineer, it may help a cancer researcher do their analysis.

Starting point is 00:20:28 It simply doesn't have the data, the knowledge, or the expertise iterating against that environment. And I think that's just sort of like the fundamental belief we have. Yeah, I mean, so actually Lee and I worked on this a bit, when we're looking at the scaling laws for vision models. And, you know, this also came up a lot in the clip paper from Open AI. Like the in-domain generalization and the auto-domain generalization are monotonically correlated, but it's not linear necessarily. And so what that means is you can keep improving your model

Starting point is 00:20:59 and it will improve as the power law in-domain. And for auto-domain tasks, by which I mean, as Liam said, the things that you're trying to do is a bit different than what's in your training set, will also improve the power law, but the slope of that power law may not be good enough. So you might need to, you know, spend centuries before you get to result you want. We saw this in the non-paper, for example. We published a paper where we saw that as you increased the size of your training set, the IID performance, the in-domain performance improves the power law.

Starting point is 00:21:30 Ado domain performance also improves as a power law, but depending on what the auto domain is, like how far you are from the training distribution, the power law might have such a small slope that is basically useless. So this is one of the reasons we feel like the best way to make progress is to make your target as close to your in-domain training set as possible. And the best way you're doing this is to basically iterate on changing your training set to be more like what you want to do. So this is one answer. The other one is actually maybe even simpler. The experimental data we want actually doesn't exist.

Starting point is 00:22:03 So, for example, if you look at, like you want to say learn on the experimental data in literature for synthesis. turns out the formation enthalpy labels, which is like the energy it takes to basically assemble the atoms in the shape you want, is so high that if you train a machine learning model on it, it's not predictive enough to predict the next one.

Starting point is 00:22:26 And one of the reasons for this is Liam mentioned, people don't usually publish negative results. And negative results are usually very context dependent. So what's the negative result for someone might be positive if they do things differently. So, yeah, so not only, Is there this domain shift problem where what you're trying to do might be different than your training sets? So the power law won't have the large and a slope you want.

Starting point is 00:22:48 But the other problem is for some of these things we want to do, there's no data for it. For example, for superconductivity, there is a lot of data sets you can look at. But the noise floor on them is so high that training on them usually doesn't help. Doge made the entire team are deep believers in scaling up and scaling laws. But it's just do a B-line for the thing you care about. and in our case we care about advancing science advancing physical R&D that's that's sort of like the thesis is there a tension between being super bitter lesson filled and just throwing more compute at the problem and the I guess domain specific

Starting point is 00:23:24 pipelines that the lab you guys just described will have to focus on in the case of periodic I think you mentioned the first B lines you guys are making are towards superconductivity and magnetism right what is it about those domains that make them good candidates for the first few pipelines that Biotics working on and why are they just, are they,

Starting point is 00:23:46 it stops along the way to an AI physicist that generalizes across all kinds of domains or is there a danger of them being essentially off-ramps that don't result

Starting point is 00:23:55 in sort of a, the AI sort of scientific superintelligence that is the North Star for what you guys are doing. Yeah, I feel like, for example, the high-temper-conductivity goal

Starting point is 00:24:07 is actually a goal that has so many sub-goals in it. It's a bit like when deep-minded opening I started and said we're going to do AGI. But what they meant was they had to do so many things before they got to these cool results. Like for us, if we want to get a high-temperature superconductor, we probably

Starting point is 00:24:22 need to get good at autonomous synthesis, autonomous characterization. We need to get good at characterizing different aspects of the material, using the LLM to run the simulations correctly. So it's a North Star and there's so many goals on the way that

Starting point is 00:24:38 would be very, I think, impactful for the community. That's one reason. Another reason is I feel like high temperature superconductivity is such a fundamentally interesting question. For example, if you could find a 200 Kelvin superconductor, even before we make any product with it, that in itself says so much about the universe that we didn't know yet. You know, to be able to see such quantum effects at such high temperatures, I think would be such an update to people's view of how they see the universe.

Starting point is 00:25:04 So we feel like it'll be really impactful for humanity even before we make a product. out of it. I think that's one of the reasons. A technical reason also is superconductivity is a phase transition. So it's pretty robust to some of these details that we cannot simulate yet. So for example, when you make the material, the superconducting temperature usually is more dominated by its kind of crystal fundamental property than the defects or microstructure, whereas there are certain other materials properties where even if the crystal has the property you want, there's so many other factors that you cannot simulate that would prevent you from seeing that property. So superconductivity has this like nice, philosophical upside to it,

Starting point is 00:25:44 has this technical upside to it. And it's like really rallies both the physicists. Like there are people who studied physics for 40 years and really excited about superconductivity. And there are people who never studied physics, but very excited about superconductivity. It's like quite rare to find the topic that unites the whole team. Yeah. I mean, it's like Doge said, it's like we, in order to do this, there are so many foundational pieces to solve. And, And our tactic is in order to actually get to this goal of AI scientists, you need to make contact, do the full loop somewhere. If you say you're doing this in just like very vague terms, you sort of just end up back on archive papers and textbooks. And so it's really important for us to do the loop, but then create this repeatable process.

Starting point is 00:26:30 Like how do you go from subdomain to subdomain? And there's really interesting questions about how well do the ML systems generalize between these things. What is the generalization of a system between like superconductivity data to magnetism data, for instance? And maybe that looks very different than its ability to generalize to fluid mechanics. And I think there's like fundamental arguments to make there. But the goal is create this repeatable system, prove it, and then just go through the different domains that way. So I can see the argument for why cracking room temperature superconductivity from an experimental basis is extraordinarily valuable for humanity.

Starting point is 00:27:10 But you guys are building a startup. And to use an analogy for why you need to have a clear medium-term path or short a medium-term path along the way to a North Star that is both commercially viable and net positive to the society, what we've seen, for example, with other frontier labs that are working on automating white-collar work or software knowledge work is that, you know, there's this North Star of an AI researcher. But that along the way, there were a bunch of sub-goals and so on, but a concrete kind of application that opened up a ton of commercial value

Starting point is 00:27:49 and benefits for users on the way to that AI researcher was the idea of AI programming. Software engineering has become probably the first major domain that's caused people to really update their priors about how useful AI models are beyond kind of consumer applications. And in terms of productivity, their impact has been extraordinary just in a few short months. So if the traditional frontier labs as North Star was an AI researcher, and the path along the way to get there was programming, AI programming, what is that for periodic?

Starting point is 00:28:24 Basically, co-pilots for engineers, researchers in advanced industries. So maybe perhaps just being in Silicon Valley, we really think about like computer-oriented work. Everything is digital. Everything is bits. But there's so many industries. Like we were kind of talking about a few like space, defense, semiconductors, where they're dealing with iteration of materials of physics. And that's part of their workflow. Like how are they designing these new technologies, these new devices?

Starting point is 00:28:55 And in the absence of data, in the absence of good systems, they don't really have particularly good tools. That is our opportunity. And these are massive R&D budgets. So, yeah, while high-temp superconductivity is a great North Star, we very much understand that technology and capital are intertwined. We're going to be able to maximally accelerate science if this is a wildly successful commercial entity.

Starting point is 00:29:24 And to do so, we want to accelerate advanced manufacturing in all these different industries, become like an intelligence layer for all these teams to accelerate their workflow and start reducing their iteration time, get them to better solutions more quickly, accelerate their researchers and their engineers. Let's click a little bit deeper on that in practice, sort of a day in the life of a periodic team member where, let's say, half the team, is this roughly right?

Starting point is 00:29:51 About half the team are ML scientists with machine learning backgrounds, and the remaining half are physical scientists with physics or chemistry backgrounds. how do you start by uniting the cultures, right? How do you take somebody whose primary career so far in work has been experiments in a lab in wet labs doing physics and chemistry and give them an intuition for ML and vice versa? Because you know, you guys are both physicists who then had the career trajectory where you also had the chance to be at frontier AI labs

Starting point is 00:30:23 and we're part of training systems that are now considered sort of land. hallmark machine learning systems like Jack GBT, like GPM, but for others who might be coming from one domain, how do you get the team to build an intuition for the other? Yeah, so this is a great question. I mean, it's actually crucial for us to make sure these teams work very closely with each other. So one of the things we're seeing is the physics and the chemists need to figure out how to teach the LLM, how to reason about these things. Because I think the frontier AI labs have figured out how to train them on math and logic, but not yet on physics, chemistry.

Starting point is 00:31:01 So one thing we're seeing that's been really, I think, productive is the physicists and chemists are thinking about what are the steps we should include in the mid-training, in the RL training that will teach the LLM how to reason correctly about quantum mechanics, how to reason correctly about these physical systems. Another one, of course, is the LLM researchers are learning quite a bit about the physics, the simulation tools, the goals. So they've been working together really well. We have weekly teaching sessions where the LLM researchers teach, you know, how the RL loops work, how the data cleaning works,

Starting point is 00:31:35 and then the physicists and chemists are teaching about different aspects of the science, the history of science. That's also very important. So we feel like that's been going really well. And, you know, one way we are looking at this is the things we have to teach the LLM to be able to discover, say, a superconductor, include being able to read the literature really well, like read all the papers, the textbooks, find the relevant parts, and then being able to run simulations, theoretical calculations, and then take action, run experiments. You know, we feel like this is quite similar to the physical R&D researchers in these companies.

Starting point is 00:32:09 They have to read the literature, read maybe internal documents or external documents, and then run simulations, run theoretical calculations, and then actually attempt the thing experimentally, learn from that. So we feel like all the progress we're making towards our internal superconductivity or physics goals. actually is making our LLMs much better at serving our customers who are doing very similar workflows. Yeah, I think just culture, no stupid questions. You can ask just like the dumbest, like physics question,

Starting point is 00:32:40 the dumbest ML question. And, I mean, there's a few faculty as part of our company and they're actually excellent teachers. So, I mean, these like learning sessions have been really fantastic. And another thing I noticed is computer scientists often think in terms of like APIs. So scientists will say something, and they're always trying to map it. You're like, okay, well, what's the input, what's the output, what's the target? How do I map that back?

Starting point is 00:33:05 And it's always just like this translation. And I think we also have built up as part of the team. There's people like on these different edges. So like if you have a simplex of like, you know, pure ML, LLM, pure experimentalist, pure simulation, there's people who like kind of live in this inside as well. And so they've been like excellent bridges for translating between these different groups of people. So it's like active learning to like learn the other spaces, creating APIs and then these kind of bridge connector peoples. I think Doge being an excellent example of that.

Starting point is 00:33:41 Is it a requirement for somebody who wants to join periodic to have to have an advanced degree in physics or chemistry? Absolutely not. You know, one of the jokes we're making is who was the NBA player who was saying that I'm much closer to. LeBron, James, and you are to me. We were saying the opposite of that two candidates, because the amount that even our best physicist doesn't know about physics is much bigger than the amount that they know about physics. So for this new candidate, even if they have no background in physics, how much they have to learn about what we're trying to do is actually not that different on how much the best physicist has to learn, because there's so much chemistry

Starting point is 00:34:17 to learn, so much material science to learn. And I think this is one of the interesting aspects of science today. In the past, in 1800s, there were these physicists that could do so many different things at the frontier. Today, we've reached a point where our intellectual knowledge is so large that a leading thinker can usually only advance in one very specific field. And maybe this is actually holding us back because, say, to discover an amazing superconductor, as we keep going back to this example, you have to know so much about chemistry, physics, synthesis, characterization. And unfortunately, I don't think any human knows enough about all of these. So we have to collaborate.

Starting point is 00:34:55 So I think our team is kind of like a small example of this where we have, as Liam said, a lot of different points in that simplex. And for any person, they have so much to learn, but that's true for basically every other scientist. So, for example, I supposedly come from the physics side of it, but I've been learning so much more physics because we now have people from different areas of chemistry in the team, different areas of physics. And I think it's true for LLM researchers as well. I mean, they come in their aspects of LLM that they probably didn't know until they started working with other researchers in our team.

Starting point is 00:35:29 So I think it's a great, and it's like a small example of what we're trying to do with the LLM because we're trying to teach this LLM all these different things that we're learning as researchers. It's like a really fun experience, I think, yeah. And what are you finding

Starting point is 00:35:42 makes a great researcher at periodic that's different from what might make a great research at Open AI or Anthropic or Deep Mind? I would say there's very high, high overlap, but probably one of the biggest determinants is, do you care about this mission, is accelerating science, to you, is that like the big goal? And I think looking at the team right now, it's just incredibly mission-driven set of folks who are like, yeah, this is the North Star. Let's do that. If someone really wants to improve some megacorp's products, yeah, you'd probably

Starting point is 00:36:19 be better off at that megacorp in iterating and improving their products. But if you care about scientific discovery, I think periodic labs is the best place to do that. How big is the team today? We're roughly 30, I believe. And as you think about taking a lot of the research that's going on at the company and deploying that out in the real world, the kinds of customers that we've talked about space, defense, advanced manufacturing, these are mission-critical industries that are known for being essential to whatever part of the economy they're part of, but often they're not

Starting point is 00:36:57 the most, they're not the fastest to adopt new technology. How do you think about deploying the kinds of frontier agents that we've talked about that are great at science, great at physics in companies or organizations that might not be anywhere close to as sophisticated as you are in AI or ML. Do you have a working thesis for how to make sure that the arc of progress is not bottlenecked on deployment? It sounds like you have a fairly good thesis on how to unblock the arc of scientific progress

Starting point is 00:37:31 on the research side. But when it comes to deployment, what might be a working theory that you guys are optimistic about that would help get the systems that periodic is building out into the real world? Well, maybe one thing that we've noticed in our conversations, with all these companies is they all are looking for their AI strategy.

Starting point is 00:37:52 They understand that the technology is shifting really quickly, and they're looking at how they're doing their work, and it's not changing as quickly as they think it should be. Some industries also are losing, like, kind of key expertise in different fields, and they're losing these, like, senior engineers, senior researchers, and they're like, okay, how do we, like, preserve that? But one thesis is, you know, kind of thinking about these, like, APIs and thinking about what are the evaluations,

Starting point is 00:38:20 what are the biggest bottlenecks for these companies, looking at some of the problems they face, and we can map that to our systems, and we say, well, we think we can dramatically accelerate this. And so it's not coming in and saying, hey, we're going to transform your fabline on day one. We're going to transform how you're doing everything. Forget everything.

Starting point is 00:38:39 It's like, no, we're going to solve a really critical problem, well-scoped, very clear evaluations. We kind of co-draft that with them. And just show them, like, how powerful this technology can be when you optimize against the thing you care about. So, you know, nothing particularly, like, surprising here, but, you know, sort of like a land and expand type method, as you might expect. But really looking for who are the biggest promoters within that company? What are the biggest problems? Make sure you're solving a very real thing for them and intersect that with where is our technical capability the highest?

Starting point is 00:39:15 You know, you were on a call this morning with one of the customers in your pipeline. We don't need to name who, but what was some of the things you heard as their most urgent problems that they'd like for periodic to solve? So one of them was simulations. You know, they spent a lot of time training people on some of these simulations. They need to use just critical for their development. And being able to automate those simulations, I think, would be quite enabling. The design process.

Starting point is 00:39:42 and then kind of like some of the small things like matching the formats, being able to feed the simulation results into the design pipeline, all of these seem quite important, and then being able to treat the data together in the same place. What else? Well, I think there's a really fundamental question. So a lot of these companies will rely on retrieval. So that's sort of like a super lightweight thing.

Starting point is 00:40:05 Someone shows up with a neural net and they're like, great, we'll just retrieve over all of your data and then that's your solution. However, as we've seen with things like Chachb-T and other things, it's when you pre-train on the data, when you actually encode the knowledge into the weights, it's not just a retrieval system, you have a richer, deeper understanding of the material. And I think this is a big fundamental challenge.

Starting point is 00:40:28 So, for instance, for this customer, they can give privileges to their employees and have retrieval as acting on behalf. The system acts as the user, and so you can match those same kind of like privileges for access. But if you start doing pre-training or mid-training on different parts, it's like, well, if you pre-trained on every piece of data, that might only be accessible to say, like, the CEO of that company. So then you have to figure out how do you sort of bucket that knowledge and create different types of systems. But I think right now, like we're, after talking with the user, they don't seem to have a great. solution for sort of distilling all of the knowledge into like a single model or into a set of

Starting point is 00:41:17 models. So like going, you know, going beyond retrieval to, you know, proper training. And then I think also the supervised training they're doing is really akin to like the early days of chat GPT where it's like input output, you have a few examples and kind of transforming this new way of thinking. It's like, No, high compute reinforcement learning is really effective. This is how you should think about the strategies that's using. This is how you create effective tool using towards those problems. And this is how you optimize it effectively. Could you describe for folks who may not be familiar with it, what do you mean by mid-training?

Starting point is 00:41:55 Because people are familiar with pre-training. They're familiar with post-training. But in the periodic context, what does mid-training mean? Yeah, sorry for the lingo. So I think this term came up years ago where it's like, well, we had pre-training. We had post-training, but sometimes you need to put in a little bit more knowledge. So before search worked really well, there was an issue of freshness. So we had pre-trained models, and they have a knowledge cutoff.

Starting point is 00:42:17 So there's like a scrape of the internet at that point, but users want more real-time knowledge. So it's like, how do you get that in there and enter mid-train? Mid-Train is basically you're taking new data, new knowledge that's not in the model, and you continue pre-trained. And this differs from standard post-training where post-training typically is more reinforcement learning, supervised learning. And the mechanism is basically, or the goal of it is just put a lot of knowledge into the model that doesn't exist before. So that's mid-training in a nutshell. And in the periodic context, does that mean essentially going and injecting a ton of custom sort of data from an experimental implementation in a, a particular customer, a particular industry? What is the, what are the sort of the lines,

Starting point is 00:43:09 the atomic unit that you guys think will, of mid-training, that will improve the capabilities of the models on problems that they're just terrible at today? I mean, it's just, it's all, all the knowledge. So it's like, you can have very low-level descriptions of physical objects, like crystal structures, for instance. You can also have higher-level semantic descriptions of, like, Well, this is how I made material X, Y, Z. And trying to get all this data into the model is really valuable. So it's like simulation data, experimental data, none of this exists. And basically putting that knowledge into the model and making sure that these distributions are connected in some way.

Starting point is 00:43:54 And what I mean by that is if you just sort of mix together distribution A, B, and C, there's no guarantee of generalization. What you want to hope to see from these systems is the inclusion of this other data set is improving performance on the other data sets. And so these are sort of just like machine learning techniques or machine learning problems to solve, but basically just make it an expert in physics and chemistry

Starting point is 00:44:18 and where it was deficient before. You guys both know that I spent some time running e-valds on a bunch of these models at the Stanford physics lab earlier this year, and the results were that the models are terrible at scientific analysis. Because they weren't trained to do so. Because they weren't trained to do so.

Starting point is 00:44:36 But on the other hand, many of the existing research teams working on the general models are investing in trying to make these better. Is there something about the way you're building periodic that gets to draft off of all of that progress in the base models, or do you have to start everything from scratch and therefore not be able to be composable

Starting point is 00:44:53 with advancements happening in the mainline models today? Yeah, I mean, we benefit from all the different advances. So one of them is the LLMs are getting better. And we definitely benefit from that because we take a pre-trained model and then mid-trained it, you know, high-computer. Another one is the physical simulation tools are getting better. So deep mind, meta, Microsoft, academic groups, they're open sourcing, new ways of simulating, new ways of using machine learning to predict properties. So we get to basically utilize all of those. And it seems like machine learning has made such an impact in the physics and chemistry fields

Starting point is 00:45:32 that we expect these improvements to continue. I think another thing is when we think about tools for agents, we think of like, here's a browser, here's a Python, but increasingly people think about tools as other neural nets as other agents. And so if you look at a lot of physics code, it's not particularly deep. This is in competition programming. This is kind of like hacky scripts. But you can rely on some of the best systems for, you know, wherever they spike on.

Starting point is 00:46:04 So neural net as a tool to these agents is something that immediately accelerates our work. So you don't have to like replicate every, everything. There's a historical pattern that a lot of the fundamental research in the physical sciences that we're talking about, here, physics, chemistry, biology, has historically been done at university labs. Is there a role at all that the university ecosystem you think will play in periodic's future, or do you think these are just completely divergent paths? Absolutely. I mean, so much of the simulation tooling we use have been developed in academia.

Starting point is 00:46:44 Many of it is in Europe, for example, a lot of the novel synthesis methods. So we definitely benefit from a lot of these different, very deep technology. technical progress. Like, for example, all the physical simulation tools are these, you know, complicated Fortran code that in our team, for example, we don't really like know how to develop very efficiently. But we feel like there's definitely a very deep connection between academia and industry labs. So, for example, recently a lot of the large-scale simulations have been done in industry labs like Microsoft deep mind and meta. But a lot of those tools have been actually developed in academia and then passed on.

Starting point is 00:47:24 there's actually really nice synergy there. I think it added a few other things too. So like you found when you were evaluating models on their ability to do scientific analysis, they were deficient. This was probably, I mean, not a direct goal for those teams training those models. So I think academia and these collaborations say, we'll help us inform what are the important tasks? How do you do this analysis? What skills do we want to put in the model?

Starting point is 00:47:54 a skill could be a full analysis or a skill could be like a smaller primitive as part of a larger analysis. But also secondarily, it's how do you think? So one of the physicists was looking at the reasoning strategies of one of our models. He's like, it's all wrong. It's all wrong. And we're like, what do you mean? He's like, no, this should be thinking higher level. It should be thinking in terms of symmetries. This is the book like that encodes like the thinking strategies that will be more effective. And of course, your reinforcement learning environment needs to reward those types of strategies, but given some of the most premier scientists are using these strategies, they're likely effective. And these are types of things where it's like an industry academic partnership can

Starting point is 00:48:35 just be so powerful because industry just simply is blind to these types of analyses, these tools, as well as just this way of thinking. Yeah. And there's a way of connecting that to the tooling question as well, because, you know, language is very important. but then in the human brain, we also see all the visual processing, like geometric. So it's plausible that while these LLMs will keep getting better and better, they'll actually benefit from having a geometric reasoning that's separate. So today we can do that with equivariant graph neural networks. We can do it with diffusion models that are kind of geometric tools by construction,

Starting point is 00:49:12 and the LLM can call them so then it can have both the language aspect, which is very good for, say, synthesis recipe, but also the geometric aspects, which is very good for representing atoms, just design geometries in general. So how are you thinking about deepening periodic styles with academic labs? Yeah, this is very important for us. So we have two major initiatives in this direction.

Starting point is 00:49:31 One of them is we're starting an advisory board. This will be kind of expertise spanning from superconductivity to salt state chemistry to physics. And we want to make sure, you know, we're in touch with this kind of long-term research directions. A lot of important government funding goes to these groups, and we want to have a tight coupling between what's important for them and us. So this, you know, include superconductivity expertise, such as X-Shan from Stanford on the experimental side

Starting point is 00:49:58 and Steve Kowelson from the theory side. We also have census expertise on the advisory board from Mercury Canatsides from Northwestern University and Chris Wolverton on the high throughput DFT side. And then we have Kostia from Manchester University who is really well known for discovering graphene. So he'll be able to advise us on these novel exotic electronic states and materials. And our second initiative is going to be through a grant program. We really want to enable some of this amazing work going in academia, and some of their work isn't a good fit for industry.

Starting point is 00:50:35 You know, it's best done in academia. So we want to kind of accept grant proposals, and we want to enable and support the kind of work that's going to help community, especially in relation to LLM's agents in synthesis, material discovery, physics modeling. So maybe after this show you can include the link. Yeah, we'll include them in the show notes if grants are open starting today. Absolutely. Great.

Starting point is 00:51:01 So for people who might be interested in joining periodic, what are you guys looking for? First off, someone deeply curious, someone who really wants to understand the machine learning, the science at a deeper level, who wants to make contact with reality, who wants to advance science. This has to be a driving thing, but also pragmatic. What we're trying to do is incredibly challenging, and someone who has very careful process and they get to their solution-oriented, they get to goals quickly, and really someone world-class along some dimension, we're looking across all these different pillars, so machine learning, experimentalist, simulation.

Starting point is 00:51:45 and people who can bring some sort of innovation on how do you create a creative ML system? How do you bring new types of tools or new types of thinking to some of these state-of-the-art models, someone who can advance simulations and make it more robust and more reliable with experiment? Yeah, and maybe one more thing I'd add is Liam and I have been really looking for a sense of urgency in Canada's, because we want these technologies not in 10 years. You know, we don't want these LMs to start improving science in 10 years, but we want them ASAP. So if the candidate feels like a sense of urgency for improving these physical systems, discovering these amazing materials, innovating on superconductivity, they would be a good fit. Yeah. If you match all these, please reach out.

Starting point is 00:52:31 All right. It sounds like we got to amp up the speed, the scale of stuff happening at periodic. And we'll put the career links in the show notes. Thanks for coming, guys. Thanks for listening to the A16Z podcast. If you enjoy the. the episode, let us know by leaving a review at rate thispodcast.com slash A16Z. We've got more great conversations coming your way. See you next time. As a reminder, the content here is for informational purposes only. Should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments

Starting point is 00:53:11 in the companies discussed in this podcast. For more details, including a link to our investments. Please see A16Z.com forward slash disclosures. Thank you. Thank you. Thank you. Thank you. Thank you.

a16z Podcast - Building an AI Physicist: ChatGPT Co-Creator’s Next Venture

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.