The a16z Show - Building an AI Physicist: ChatGPT Co-Creator’s Next Venture

Starting point is 00:00:00 Ultimately, sciences driven against experiment in the real world. And so that's what we're doing with periodic labs. We're taking these precursor technologies and we're saying, okay, if you care about advancing science, we need to have experiment in the loop. The applications of building an AI physicist, for lack of a better word, that can design the real world are so broad. You can apply them to advanced manufacturing. You can apply them material science to chemistry.

Starting point is 00:00:25 Any process where there's R&D with the physical world required, it seems like, will benefit from breakthroughs that periodic is working on. For example, if you could find a 200 Kelvin superconductor, even before we make any product with it, to be able to see such quantum effects at such high temperatures, I think it would be such an update to people's view how they see the universe. What if AI could move from talking about science to doing science?

Starting point is 00:00:51 Today's conversation features Anjane Mita, General Partner A16Z, with Liam Vedas and Doge Chubuk, co-founders of Periodic Labs, a frontier research lab building experiment in the loop AI for physics and chemistry. They unpack why real-world reward functions matter, how mid-training and high-compute RL fit together, and why superconductivity and magnetism are the first B-lines towards an AI physicist. They also get into noisy data sets and negative results, what happens when ML researchers sit shoulder-to-shoulder with bench scientists, and the near-term payoff, co-pilot tools for advanced industries from semi-conduble.

Starting point is 00:01:30 to space and manufacturing. Let's get into it. So Liam, you were the co-creator of Chat, GPT. Doge, you were running some of the physics teams at Deep Mind. Let's talk about how you guys met. And what was the moment where you realized that you guys had to leave both of those labs to start periodic?

Starting point is 00:01:53 I believe we met eight years ago at Google Brain flipping over a large tire. Yep. At the Google. You got to give us more on that. So Google Rails was one of the gems at the Google facilities. And I think that's where Doge and I met and are just this massive tire that a single person basically can't flip single, like by themselves.

Starting point is 00:02:19 And so Doge was trying to flip it. And he like, I think the two of us could do it. And why were you trying to flip this tire? You know, why not? But I tried doing it. I couldn't do it. And then I was like, who's the strongest person I can find? And I was either Barrett or Liam.

Starting point is 00:02:36 And it worked. We didn't flip it. Right. And was that the moment where you guys both realized you had physics backgrounds? How did that happen? How did you go from flipping tires to flipping experiments? Yeah. I mean, so I know if Liam remembers this, but we would catch up, you know, over the years.

Starting point is 00:02:56 And we would often end up talking about either quantum mechanics or like superconductivity. This was like very common. but I never thought they would end up working on physics together. So Liam was working on LLMs and they were going really well. And I was not using LLMs, but I was noticing that LLMs are becoming more and more impactful in my work. So one way it was becoming impactful is when I was trying to remember some things about chemistry, physics, I could just talk to the chatbot and actually learned a lot of stuff I forgot. Another way was, of course, coding.

Starting point is 00:03:25 Like we were writing simulations and the LAM was so helpful in writing these simulations for us. So then the question was like, can we use LLM's kind of more as a first-class citizen in the physics research? Yeah, I think kind of leading up to this decision to leave, Doge and I were just, you know, connecting and talking about these different tech trees. We're looking at the improvements on language models, on reasoning. We're seeing what high-compete reinforcement learning could do. And on the material science side, we're seeing scaling laws within physics, within chemistry, both with respect to simulations, with respect to experiment. it's like the same kind of principles at play and ML. And I think to both of us and to a lot of people in the field, the goal of this technology

Starting point is 00:04:07 is accelerate science, accelerate physical R&D. Chatbots was like a great milestone along the way, but we really want to see technology out in the world. And we felt like this was just the right place to begin. Physics is very verifiable. It's a great reward function, fairly fast iteration loop. you have simulators for large classes of physical systems. And we felt like in order to create this AI scientist,

Starting point is 00:04:34 this is like the beginning of this path. So built that conviction and decided to found periodic. Well, let's take a second to talk about what periodic is and what does it do. So periodic labs is a frontier AI research lab that's trying to use LLMs to advance physics and chemistry. We feel like having experiment in the loop tightly coupled with simulations and alms is extremely important. So we're building up a lab that will generate high throughput, high quality data. And we will use LLMs and simulations in conjunction with the experiments to try to iterate.

Starting point is 00:05:13 Science, by its nature, is an iterative direction. And we feel like LLMs using all these tools that are available to humans can do a great, job in accelerating physical R&D? I'd say the objective is, let's replace the reward function from math graders and code graders that we're using today. So like math graders, you know, to give an example, you have a prompt, what is 2 plus 2? You know the ground truth is 4. You can put a lot of optimization pressure against problems like that that are programmatically

Starting point is 00:05:46 checkable. And what we're doing, and by having the lab, is we create a, a, a, you know, a fullerone, physically grounded reward function. That becomes the basis on which we're optimizing against. And so if a simulator has some deficiencies or some issues, we always error correct because for us, the ground truth is the experiment, like the RL environment. Nature like is our RL environment in our setting. Let's just take a second for folks who might not be familiar to explain what you guys mean by a lab that will verify RL in the real world? Can you talk a little bit about how experiments work? How are AI models trained today? And how are those different from how they're going to

Starting point is 00:06:30 be trained and developed and post-trained and deployed at periodic? And it might be helpful to talk about how you created chat GPT. So chat chabit originally, the technology evolved very rapidly over the over the last few years. When we're first creating it, it was a very standard RLHF pipeline. So you have a pre-trained model, and it's sort of like this raw substrate. And what you're trying to do is take this auto-completion model and turn it into something useful. The way we did it at that point was we would have supervised data. So given some input, we would say this is a desired output. So if we're trying to get it to act as an assistant, you know, we create some tuples like that. Then you run reinforcement learning, but now you're learning against a reward function that's

Starting point is 00:07:19 trained against human preferences. So humans will say, well, given this input, I would prefer completion A to completion B. And you do that over and over again, and you can create a reward function that can then be optimized against. That is sort of the basis of how we created chat chabutea. But then there's a huge gap between the original model and what we have today. And I think part of that is reasoning, but also part of that is just much better, more precise reward functions. So the reward functions that we were using originally couldn't determine whether you were mathematically correct or not. So early versions of ChatubT were mathematically not particularly strong. And it sort of results from the reward function. What did you optimize against? The reward function basically encoded

Starting point is 00:08:08 be a friendly assistant, try to help people get to their thing, but it had no sense of, is this mathematically correct or not? Is this code valid or not? And we made huge advances over the correctness of a reward function. But this is all digital. We're creating tasks based on the internet, textbooks, papers, and this is great. This lays a foundation, but ultimately, science is driven against experiment in the real world. And so that's what we're we're doing with periodic labs. We're taking these precursor technologies and we're saying, okay, if you care about advancing science, we need to have experiment in the loop and that becomes our reward function for our agents. So as DOSHA is saying, our agents are doing the same type of

Starting point is 00:08:55 things you would use for coding or to help answer a query. But now instead of just giving tools like here's Python, here's a browser, now we have tools like quantum mechanics. So simulate different systems, but ultimately we're going to a lab, and then that becomes like the basis of what is the system optimizing against. So that's sort of just like the natural, like, end state of these systems. People in AI often say lab. Often what they're referring to is quite different from what you guys mean by a lab. What's the difference? That's right. So as Liam mentioned, so far, the alms have gotten really good at logic and math. There's like verifiable rewards. What is like the next frontier in terms of, you know, inquiry after logic and math, I'd say it's physics. And then when

Starting point is 00:09:40 you say physics, there are different energy scales. So there's astrophysics, studying galaxies, there's fusion, nuclear physics. But then there's the energy scale of physics that's more relevant to our life. And that's the quantum mechanics, like Schrodinger's equation. This is where, you know, biology happens, chemistry around us happens, materials happen. So we felt like our first lab should be basically probing that quantum mechanical energy scale. And for us, that would be physics at the level of salt state physics, material science, and chemistry. One of the more fundamental ways of making things around us is powder synthesis. So you take powders of existing materials, you mix them and you heat them up to certain

Starting point is 00:10:19 temperature, and it becomes a new material. So that's one of our labs. We're going to have a powder synthesis lab. And turns out this is one of those methods where robots can do it, like very cheap, simple methods. I don't know if you saw this coffee-making robot in the SF airport. A robot that's basically at that level can mix powders and put it in a furnace. And there's a very rich field.

Starting point is 00:10:42 So you can actually, using their method, discover new superconductors, magnets, all kinds of materials that are very important for technologies around us. But at the core of it, it's just quantum mechanics. And we feel like teaching these LLMs to be foundation models, but for quantum mechanics will be the next frontier for LLMs. Why haven't the models that are currently out in the world and deployed, able to do this? Great question. I think, as you mentioned earlier, science is by its nature iterative, right? Like, even the smartest humans tried many times before they discover the things they discovered. And I think maybe this is one of the confusing points by LLMs.

Starting point is 00:11:20 Elims can be very smart. But if they're not iterating on science, they won't discover science. You know, to be honest, humans won't either. Like you put a human in a room without any chance to iterate on something. They won't discover anything important. So we feel like the important thing to teach these LLMs is the method of scientific inquiry. So you do simulations, you do theoretical calculations, you do experiments, you get results. And the results are probably incorrect or not where you want at first, but you iterate on it.

Starting point is 00:11:48 And we feel like that hasn't been done yet. So this is what we want to do. But we feel like you have to do it with the real physics, not just the simulation. So this is why we have our own lab, where the LLM will have the opportunity to iterate on its understanding of quantum mechanics. Fundamentally, machine learning models are good at what you train them to do. And that's sort of like the nature of it. And so if a model is acting badly, you're like, well, did you train it to do that task? Kind of building on Dosh's point, there's sort of like an epistemic uncertainty, this like reducible uncertainty that you aren't really building or collapsing in.

Starting point is 00:12:26 unless you're actually running an experiment. So, for instance, one of the engineers on our team was looking at a reported property of some physical property in the literature, and it spanned many orders of magnitude. So if I train a system on that, these systems aren't magic, the best they can do is replicate that distribution,

Starting point is 00:12:44 but it's really no closer to a deeper understanding of the universe, physics, chemistry. Then another point is it's very uncommon to publish negative results. All of the results are basically, positive and a valid negative result is very valuable. A negative result could be discarded because it's a sloppy science. But there are valid negative results and that's a learning signal. And this is something that our lab will produce as well. So I think these three things,

Starting point is 00:13:12 there's just like noisy data, no negative results. And you need the ability to act in order to actually do science, which is an iterative endeavor. Those are like the core thesis of why we need a up. And what might be the core way to measure if the diuretic progress against that goal in your guys' minds? One simple one is, let's say high temperature superconductivity. What is the highest temperature superconductor we synthesized? Today, the best number for ambient pressure is 135 Kelvin or so. So we'll know very easily if you're doing well, if we can go beyond that number. So that's pretty fundamental. On the more applied side, you know, there's processing of materials and its effect on the materials properties.

Starting point is 00:13:54 So we can just measure these properties directly. Let's say it's the ductility, it's the toughness, strength of the material. And as we measure it, the LLM will get a very clear signal. It's hard to hack, you know, unlike these other LLM training techniques. It's like really what you see in real life is the signal that's going to the LLM. Yeah, effectively. So can you design the world around you? So you're like, I need something with this property.

Starting point is 00:14:19 Can this system discover and produce that? both from like a fundamental scientific discovery perspective, but also in industry. So like someone's working in space or defense or semiconductors and like, yeah, we're having these issues. We're trying to achieve this property of this material or this layer. Can the system accelerate the development of those technologies? So it's very grounded. That's how we'll know it's working. It feels like the applications of.

Starting point is 00:14:51 solving, building an AI physicist, for lack of a better word, that can design the real world, are so broad. You can apply them to advanced manufacturing. You can apply them material science, to chemistry, to all, anything that, any process where there's R&D with the physical world required, it seems like we'll benefit from breakthroughs that periodic is working on. Why hasn't it been done before? And what is it about this moment in history that makes it the right time to attack this problem? Maybe one comment is difficult. What makes it so difficult? I mean, I think part of it is the team.

Starting point is 00:15:26 So in our view, this has been enabled by frontier technology in the last couple of years. And so Doja and I have been so focused on basically putting together like this N of one team. Like these group of physicists, chemists, simulation experts and some of the best machine learning researchers in the world have never been part of one concerted effort. And we feel in order to actually achieve this, you know, need all these expertise. You need these pillars to do this. So when you guys went about designing the team, you know, after you left opening on DeepMind, what was the primary heuristic that you used to guide yourself and figuring out who we wanted on the team? So in terms of expertise, we wanted to have LLM expertise covered the experimental expertise and simulation. And for each

Starting point is 00:16:15 of these, we wanted to have, basically, world-class talent. And of course, for each team, there's actually a lot of sub-teams. It's like a fractal, right? The expertise is very fractal-like. So for the experimental side, we want to cover salt-state chemistry, salt-state physics, automation, and kind of the more facilities, like the more operational aspects of experiments. On the simulation side, there's the more kind of theoretical physics parts. There's the more kind of coding aspects of simulations. And on the LLM side, of course, there's mid-training, RL, Infra. And yeah, for each these, we try to get basically the best people who have innovated in these sub-pillars. Yeah.

Starting point is 00:16:56 So I think it's like there is not a team to do it. The technology that we think is necessary to do it has really just emerged in the last couple of years. And this data isn't like on a Reddit forum or something. Like you need to actually go produce experimental data, simulation data. It's siloed across all of these advanced industries. And many of them, while there's a desire, they may not have knowledge of some of the most recent techniques that's been driving this recent wave in AI. There was a moment in time when models like, or papers like the GPT3 paper, for example, that, you know, said language models are few shot learners and proposed the idea of scaling laws.

Starting point is 00:17:39 And then there was a follow-up paper, if you guys remember from Open AI that was called, I think, scaling laws for generative modeling. that just showed that as long as you just kept throwing, you scaled up the amount of compute and data in the right combination, you could very predictably improve the performance of these models. And the theory was that if you just kept doing that, you know, at infinitum, there would be a bunch of emergent capabilities.

Starting point is 00:18:06 These models would be able to reason about all kinds of problems out of domain, out of distribution. Wouldn't that argue, how would you, how would you square the circle with that school of thought that, that, you know, that naively the current pre-training and post-training sort of pipelines at most of the frontier labs won't just eventually crack physics as well? Why is this idea of physical verification so necessary and is that school of, is that school of sort of reasoning wrong?

Starting point is 00:18:39 Yeah. Excellent question. Scaling laws empirically seem to continue to hold. So that's not in question. But I think there's a question of what is this y-axis? And that test distribution is very different from like what we're talking about. That test distribution, let's say you're appreciating on the internet might be, you know, a representative sat from the internet. And you'll have these sort of predictable scaling properties. But that's not going to capture that you have a very different set of scale.

Starting point is 00:19:15 scaling properties with respect to different distributions. So I try to make this a little bit more concrete. Let's say hypothetically we're training a coding model. And we have unit tests to provide some reward signal. So the model writes some PR. We check that the unit tests go from failing to passing. And we say this was successful. We're going to reinforce these things.

Starting point is 00:19:37 You might say you start optimizing this. And now the system is becoming ever more capable of writing. code for its own development. And you have this acceleration, you have this kind of takeoff scenario. Code is one of the most promising areas for this because there's abundant of data online. You have this feedback loop where the system itself can begin to improve itself. And it's a very promising technique. And we're all seeing the benefits of advanced coding models and it's accelerating quickly.

Starting point is 00:20:10 However, that model is not going to then cure cancer. the knowledge simply doesn't exist. It doesn't, you need to optimize against the distribution you care about. So that model, while it's going to be a very valuable tool as a software engineer, it may help a cancer researcher do their analysis. It simply doesn't have the data, the knowledge, or the expertise iterating against that environment. And I think that's just sort of like the fundamental belief we have. Yeah, I mean, so actually when I worked on this a bit,

Starting point is 00:20:42 when we're looking at the scaling laws for vision models. And, you know, this also came up a lot in the clip paper from Open AI. Like the in-domain generalization and the auto-domain generalization are monotonically correlated, but it's not linear necessarily. And so what that means is you can keep improving your model and it will improve as the power law in-domain. And for auto-domain tasks, by which I mean, as Liam said, the things that you're trying to do this a bit different than what's in your training set, it will also improve the power law, but the slope of that power law may not be good enough.

Starting point is 00:21:14 So you might need to, you know, spend centuries before you get to the result you want. We saw this in the non-paper, for example. We published a paper where we saw that as you increased the size of your training set, the IID performance, the in-domain performance improves the power law. Aude domain performance also improves as a power law, but depending on what the auto domain is, like how far you are from the training distribution, the power law might have such a small slope that is basically useless. So this is one of the reasons we feel like the best way to make progress is to make your target as close to your in-domain training set as possible.

Starting point is 00:21:49 And the best way you're doing this is to basically iterate on changing your training set to be more like what you want to do. So this is one answer. The other one is actually maybe even simpler. The experimental data we want actually doesn't exist. So, for example, if you look at, like you want to say learn on the experimental data in literature for synthesis. turns out the formation enthalpy labels, which is like the energy it takes to basically assembled atoms in the shape you want, is so high that if you train a machine learning model on it, it's not predictive enough to predict the next one.

Starting point is 00:22:26 And one of the reasons for this is Liam mentioned, people don't usually publish negative results. And negative results are usually very context dependent. So what's a negative result for someone might be positive if they do things differently. So, yeah, so not only is it. Is there this domain shift problem where what you're trying to do might be different than your training sets? So the power law won't have the large enough slope you want. But the other problem is for some of these things we want to do, there's no data for it.

Starting point is 00:22:52 For example, for superconductivity, there is a lot of data sets you can look at. But the noise floor on them is so high that training on them usually doesn't help. Doge, me, the entire team, are deep believers in scaling up and scaling laws. But it's just do a B-line for the thing you care about. And in our case, we care about advancing science, advancing physical R&D. That's sort of like the thesis. Is there a tension between being super bitter lesson-pilled and just throwing more compute at the problem and the, I guess, domain-specific pipelines that the lab you guys just described will have to focus on.

Starting point is 00:23:28 In the case of periodic, I think you mentioned, the first B-lines you guys are making are towards superconductivity and magnetism. What is it about those domains that make them good candidates? for the first few pipelines that Biotics working on and why are they just, are they, it stops along the way to an AI physicist that generalizes

Starting point is 00:23:49 across all kinds of domains or is there a danger of them being essentially off-ramps that don't result in sort of a the AI sort of scientific superintelligence that is the North Star for what you guys are doing. Yeah, I feel like for example the high temperature superconductivity

Starting point is 00:24:07 goal is actually a goal that has so many sub-goals in it. It's a bit like when deep-minded opening I started and said, we're going to do AGI. But what they meant was they had to do so many things before they got to these cool results. Like for us, if we want to get a high-temperature superconduct, we probably need to get good at autonomous synthesis, autonomous characterization. We need to get good at characterizing different aspects of the material, using the LLM to run the simulations correctly.

Starting point is 00:24:32 So it's a North Star, and there's so many goals on the way that would be very I think, impactful for the community. That's one reason. Another reason is I feel like high temperature superconductivity is such a fundamentally interesting question. For example, if you could find a 200 Kelvin superconductor, even before we make any product with it, that in itself says so much about the universe that we didn't know yet. You know, to be able to see such quantum effects at such high temperatures, I think would be such an update to people's view of how they see the universe.

Starting point is 00:25:04 So we feel like it'll be really impactful for humanity even before we make a product out of it. I think that's one of the reasons. A technical reason also is superconductivity is a phase transition. So it's pretty robust to some of these details that we cannot simulate yet. So for example, when you make the material, the superconducting temperature usually is more dominated by its kind of crystal fundamental property than the defects or microstructure, whereas there are certain other materials properties where even if the crystal has the property you want, there's so many other factors that you cannot simulate that would prevent you from seeing that property.

Starting point is 00:25:39 So superconductivity has this, like, nice philosophical upside to it, has this technical upside to it. And it's like really rallies both the physicists. Like, there are people who studied physics for 40 years and really excited about superconductivity. And there are people who've never studied physics, but very excited about superconductivity. It's, like quite rare to find a topic that unites the whole team. Yeah.

Starting point is 00:26:01 I mean, it's like, like, in order to do this, there are so many foundational pieces to solve. And our tactic is in order to actually get to this goal of AI scientists, you need to make contact, do the full loop somewhere. If you say you're doing this in just like very vague terms, you sort of just end up back on archive papers and textbooks. And so it's really important for us to do the loop, but then create this repeatable process. Like how do you go from subdomain to subdomain? And there's really interesting questions about how well do the ML systems just? generalize between these things? What is the generalization of a system between like superconductivity

Starting point is 00:26:42 data to magnetism data, for instance? And maybe that looks very different than its ability to generalize to fluid mechanics. And I think there's like fundamental arguments to make there. But the goal is create this repeatable system, prove it, and then just go through the different domains that way. So I can see the argument for why cracking room temperature superconductivity from an experimental basis is extraordinarily valuable for humanity. But you guys are building a startup. And to use an analogy for why you need to have a clear medium-term path, or short a medium-term path along the way to a North Star that is both commercially viable

Starting point is 00:27:27 and net positive society, what we've seen, for example, with other frontier labs that are working on automating white-collar work or software knowledge work is that, you know, there's this North Star of an AI researcher. But that along the way, there were a bunch of subgoals and so on, but a concrete kind of application that opened up a ton of commercial value and benefits for users on the way to that AI researcher was the idea of AI programming. Software engineering has become probably the first major domain that's caused people to really update their priors

Starting point is 00:28:03 about how useful AI models are being. on kind of consumer applications. And in terms of productivity, their impact has been extraordinary just in a few short months. So if the traditional frontier labs as North Star was an AI researcher, and the path along the way to get there was programming, AI programming, what is that for periodic? Basically, co-pilots for engineers, researchers in advanced industries. So maybe perhaps just being in Silicon Valley, we really think about like, computer-oriented work. Everything is digital. Everything is bits. But there's so many industries. We were kind of talking about a few, like, space, defense, semiconductors, where they're dealing

Starting point is 00:28:47 with iteration of materials of physics. And that's part of their workflow. Like, how are they designing these new technologies, these new devices? And in the absence of data, in the absence of good systems, they don't really have particularly good tools. That is our opportunity. And these are massive R&D budget. So, yeah, while high-temp superconductivity is a great North Star, we very much understand that technology and capital are intertwined. We're going to be able to maximally accelerate science if this is a wildly successful commercial entity.

Starting point is 00:29:24 And to do so, we want to accelerate advanced manufacturing in all these different industries, become like an intelligence layer for all these teams to accelerate their workflow and start reducing their iteration time, get them to better solutions more quickly, accelerate their researchers and their engineers. Let's click a little bit deeper on that in practice, sort of a day in the life of a periodic team member, where let's say half the team, is this roughly right? About half the team are ML scientists with machine learning backgrounds, and the remaining half are physical scientists with physics or chemistry backgrounds. How do you start by uniting the cultures, right? How do you take somebody whose primary career so far and work?

Starting point is 00:30:04 has been experiments in a lab in wet labs, doing physics and chemistry, and give them an intuition for ML and vice versa. Because you guys are both physicists who then had the career trajectory where you also had the chance to be at Frontier AI labs and were part of training systems that are now considered sort of landmark, hallmark machine learning systems like Jack GPD, like Gnome, but for others who might be coming from one domain,

Starting point is 00:30:35 how do you get the team to build an intuition for the other? Yeah, so this is a great question. I mean, it's actually crucial for us to make sure these teams work very closely with each other. So one of the things we're seeing is the physics and the chemists need to figure out how to teach the LLM, how to reason about these things. Because I think the frontier AI labs have figured out

Starting point is 00:30:56 how to train them on math and logic, but not yet on physics chemistry. So one thing we're seeing that's been really, really, I think, productive is the physicists and chemists are thinking about what are the steps we should include in the mid-training, in the RL training that will teach the LLM how to reason correctly about quantum mechanics, how to reason correctly about these physical systems. Another one, of course, is the LLM researchers are learning quite a bit about the physics, the simulation tools, the goals. So they've been working together really well.

Starting point is 00:31:27 We have weekly teaching sessions where the LLM researchers teach, you know, how the RLLLM loops work, how the data cleaning works, and then the physicists and chemists are teaching about different aspects of the science, the history of science. That's also very important. So we feel like that's been going really well. And, you know, one way we are looking at this is the things we have to teach the LLM to be able to discover, say, a superconductor, includes being able to read the literature really well, like read all the papers, the textbooks, find the relevant parts, and then being able

Starting point is 00:31:58 to run simulations, theoretical calculations, and then take action, run experiments. You know, we feel like this is quite similar to the physical R&D researchers in these companies. They have to read the literature, read maybe internal documents or external documents, and then run simulations, run theoretical calculations, and then actually attempt the thing experimentally, learn from that. So we feel like all the progress we're making towards our internal superconductivity or physics goals actually is making our LLMs much better at serving our customers who are doing very similar workflows. Yeah, I think just culture, no stupid questions.

Starting point is 00:32:37 You can ask just like the dumbest, like physics question, the dumbest ML question. And I mean, there's a few faculty as part of our company and they're actually excellent teachers. So, I mean, these like learning sessions have been really fantastic. And another thing I notice is computer scientists often think in terms of like APIs. So scientists will say something and they're always trying to map it. you're like, okay, well, what's the input, what's the output, what's the target, how do I map that back? And it's always just like this translation.

Starting point is 00:33:08 And I think we also have built up as part of the team. There's people like on these different edges. So like if you have a simplex of like, you know, pure ML, LLM, pure experimentalist, pure simulation, there's people who like kind of live in this inside as well. And so they've been like excellent bridges for translating between these different groups of people. So it's like active learning to like learn the other spaces, creating APIs and then these kind of bridge connector peoples. I think Doge being an excellent example of that. Is it a requirement for somebody who wants to join periodic to have to have an advanced degree

Starting point is 00:33:47 in physics or chemistry? Absolutely not. One of the jokes we were making is who was the NBA player who was saying that I'm much closer to LeBron James than you are to me. We were saying the opposite of that to candidates. Because the amount that even our best physicist doesn't know about physics is much bigger than the amount that they know about physics. So for this new candidate, even if they have no background in physics, how much they have to learn about what we're trying to do is actually not that different on how much the best physicist has to learn. Because there's so much chemistry to learn, so much material science to learn. And I think this is one of the interesting aspects of science today. In the past in 1800s, there were these physicists that could do so many different things at the frontier.

Starting point is 00:34:28 Today, we've reached a point where our intellectual knowledge is so large that a leading thinker can usually only advance in one very specific field. And maybe this is actually holding us back because, say, to discover an amazing superconductor, as you keep going back to this example, you have to know so much about chemistry, physics, synthesis, characterization. And unfortunately, I don't think any human knows enough about all of these. So we have to collaborate. So I think our team is kind of like a small example.

Starting point is 00:34:58 of this where we have, as Liam said, a lot of different points in that simplex. And for any person, they have so much to learn, but that's true for basically every other scientist. So, for example, I supposedly come from the physics side of it, but I've been learning so much more physics, because we now have people from different areas of chemistry in the team, different areas of physics. And I think it's true for LLM researchers as well. I mean, they come in, their aspects of LLM that they probably didn't know until they started working with other resources. our team. So I think it's a great, and it's like a small example of what we're trying to do with the LLM because we're trying to teach this LLM all these different things that we're learning

Starting point is 00:35:36 as researchers. It's like a really fun experience, I think, yeah. And what are you finding makes a great researcher at periodic that's different from what might make a great research at Open AI or Anthropic or Deep Mind? I would say there's very high overlap, but probably one of the biggest determinants is do you care about this mission? Is accelerating science? To you, is that like the big goal? And I think looking at the team right now, it's just incredibly mission driven set of folks who are like, yeah, this is the North Star. Let's do that. If someone really wants to improve some megacorp's products, yeah, you'd probably be better off at that megacorp in iterating and improving their products. But if you care about scientific discovery, I think Periodic Labs is the

Starting point is 00:36:25 best place to do that. How big is the team today? We're roughly 30, I believe. Yeah. And as you think about taking a lot of the research that's going on at the company and deploying that out in the real world, the kinds of customers that we've talked about space, defense, advanced manufacturing, these are mission critical industries that are known for being, you know, essential to whatever part of the economy they're part of. But often they're not the most, they're not the fastest to adopt new technology. How do you think about deploying the kinds of frontier agents that we've talked about that are great at science, great at physics, in companies or organizations that might not be anywhere close to as sophisticated as you are in AI or ML? Is there, do you have a working thesis for how to make sure that the arc of progress is not bottlenecked?

Starting point is 00:37:25 on deployment. It sounds like you give a fairly good thesis on how to unblock the arc of scientific progress on the research side. But when it comes to deployment, what might be a working theory that you guys are optimistic about that would help get the systems that Buretic is building out into the real world? Well, maybe one thing that we've noticed in our conversations with all these companies is they all are looking for their AI strategy. They understand that like the technology is shifting really quickly.

Starting point is 00:37:55 and they're looking at how they're doing their work, and it's not changing as quickly as they think it should be. Some industries also are losing, like, kind of key expertise in different fields, and they're losing these, like, senior engineers, senior researchers, and they're like, okay, how do we, like, preserve that? But one thesis is, understand, you know, kind of thinking about these, like, APIs and thinking about what are the evaluations, what are the biggest bottlenecks for these companies,

Starting point is 00:38:22 looking at some of the problems they face, And we can map that to our systems. And we say, well, we think we can dramatically accelerate this. And so it's not coming in and saying, hey, we're going to transform your fab line on day one. We're going to transform how you're doing everything. Forget everything. It's like, no, we're going to solve a really critical problem, well-scoped, very clear evaluations. You kind of co-draft that with them.

Starting point is 00:38:46 And just show them, like, how powerful this technology can be when you optimize against the thing you care about. So, you know, nothing particularly like surprising here, but, you know, sort of like a land and expand type method, as you might expect. But really looking for who are the biggest promoters within that company and what are the biggest problems, make sure you're solving a very real thing for them and intersect that with where is our technical capability the highest? You know, you were on a call this morning with one of the customers in your pipeline. We don't need to name who, but what was some of the things you heard? as their most urgent problems that they'd like for periodic to solve? So one of them was simulations. You know, they spent a lot of time training people on some of these simulations.

Starting point is 00:39:32 They need to use just critical for their development. And being able to automate those simulations, I think would be quite enabling. The design process. And then kind of like some of the small things like matching the formats, being able to feed, you know, the simulation results into the design pipeline, all of these seem quite important and then being able to treat the data together in the same place, what else?

Starting point is 00:39:58 Well, I think there's a really fundamental question. So a lot of these companies will rely on retrieval. So that's sort of like a super lightweight thing. Someone shows up with a neural net and they're like, great, we'll just retrieve over all of your data and then that's your solution. However, as we've seen with things like Chachapit and other things, when you pre-train on the data,

Starting point is 00:40:17 when you actually encode the knowledge into the weights, it's not just a retrieval system, you have a richer, deeper understanding of the material. And I think this is a big fundamental challenge. So, for instance, for this customer, they can give privileges to their employees and have retrieval as acting on behalf. Like, the system acts as the user. And so you can match those same kind of like privileges for access. But if you start doing pre-training or mid-training on different parts, it's like, well,

Starting point is 00:40:50 if you pre-trained on every piece of data, that might only be accessible to say, like the CEO of that company. So then you have to figure out how do you sort of bucket that knowledge and create different types of systems. But I think right now, like we're,

Starting point is 00:41:05 after talking with the user, they don't seem to have a great solution for sort of distilling all of the knowledge into like a single model or into a set of models. So like going beyond retrieval to, proper training.

Starting point is 00:41:22 And then I think also the supervised training they're doing is really akin to like the early days of chat GPT where it's like input output, you have a few examples. And kind of transforming this new way of thinking was like, no, high compute reinforcement learning is really effective. This is how you should think about the strategies that's using. This is how you create effective tool using towards those problems. And this is how you optimize it effectively. Could you describe for folks who may not be familiar with it, what do you mean by mid-training?

Starting point is 00:41:55 Because people are familiar with pre-training, they're familiar with post-training, but in the periodic context, what does mid-training mean? Yeah, sorry for the lingo. So I think this term came up years ago where it's like, well, we had pre-training, we had post-training. But sometimes you need to put in a little bit more knowledge. So before search worked really well, there was an issue of freshness. So we had pre-trained models, and they have a knowledge cut off.

Starting point is 00:42:17 So there's like a scrape of the internet at that point, but users want more real-time knowledge. So it's like, how do you get that in there and enter mid-train? Mid-train is basically you're taking new data, new knowledge that's not in the model, and you continue pre-train. And this differs from standard post-training, where post-training typically is more reinforcement learning, supervised learning. And the mechanism is basically, or the goal of it is just put a lot of knowledge into the model

Starting point is 00:42:46 that doesn't exist before. So that's mid-training in a nutshell. And in the periodic context, does that mean essentially going and injecting a ton of custom sort of data from an experimental implementation in a particular customer or particular industry? What is the what are the sort of the lines, the atomic unit that you guys think will, of mid-training that will improve the capabilities of the models on? on problems that they're just terrible at today. I mean, it's just, it's all the knowledge.

Starting point is 00:43:22 So it's like you can have very low-level descriptions of physical objects, so like crystal structures, for instance. You can also have higher-level semantic descriptions of like, well, this is how I made material X, Y, Z. And trying to get all this data into the model is really valuable. So it's like simulation data, experimental data, none of this exists. and basically putting that knowledge into the model and making sure that these distributions are connected in some way.

Starting point is 00:43:54 And what I mean by that is if you just sort of mix together distribution A, B, and C, there's no guarantee of generalization. What you want to hope to see from these systems is the inclusion of this other data set is improving performance on the other data sets. And so these are sort of just like machine learning techniques or machine learning problems to solve. but basically just make it an expert in physics and chemistry and where it was deficient before.

Starting point is 00:44:21 You guys both know that I spent some time running e-valves on a bunch of these models at the Stanford Physics Lab earlier this year and the results were that the models are terrible at scientific analysis. Because they weren't trained to do so. Because they weren't trained to do so. But on the other hand, you know, many of the existing research teams working on the general models are investing in trying to make these better.

Starting point is 00:44:44 Is there something about the way you're building periodic that gets to draft off of all of that progress in the base models, or do you have to start everything from scratch and therefore not be able to be composable with advancements happening in the mainline models today? Yeah, I mean, we benefit from all the different advances. So one of them is the LLMs are getting better. And we definitely benefit from that because we take a pre-trained model and then mid-trained it, you know, high-computer are all. Another one is the physical simulation tools are getting better. So deep mind, meta, Microsoft, academic groups. They're open sourcing new ways of simulating, new ways of using machine learning to predict

Starting point is 00:45:20 properties. So we get to basically utilize all of those. And it seems like machine learning has made such an impact in the physics and chemistry fields that we expect these improvements to continue. I think another thing is when we think about tools for agents, we think of like, here's a browser, here's a Python, but increasing. increasingly people think about tools as other neural nets as other agents. And so if you look at a lot of like physics code, it's not particularly deep.

Starting point is 00:45:54 This is in competition programming. This is like kind of like hacky scripts. But you can rely on some of the best systems for, you know, wherever they spike on. So neural net as a tool to these agents is something that immediately accelerates our work. So you don't have to replicate every everything. There's a historical pattern that a lot of the fundamental research in the physical sciences that we're talking about here, physics, chemistry, biology has historically been done at university labs. Is there a role at all that the university ecosystem you think will play in periodic's future,

Starting point is 00:46:34 or do you think these are just completely divergent paths? Absolutely. I mean, so much of the simulation tooling we use have been developed in, academia. Many of it is in Europe, for example, a lot of the novel synthesis methods. So we definitely benefit from a lot of these different, very deep technical progress. Like, for example, all the physical simulation tools are these, you know, complicated Fortran code that in our team, for example, we don't really like know how to develop very efficiently. But we feel like there's definitely a very deep connection between academia and industry.

Starting point is 00:47:12 So, for example, recently, a lot of the large-scale simulations have been done in industry labs like Microsoft DeepMind and meta, but a lot of those tools have been actually developed in academia and then passed on. So there's actually a really nice synergy there. I think it added a few other things too. So like you found when you were evaluating models on their ability to do scientific analysis, they were deficient. This was probably, I mean, not a direct goal for those teams training those models.

Starting point is 00:47:42 So I think academia and these collaborations say, we'll help us inform what are the important tasks? How do you do this analysis? What skills do we want to put in the model? A skill could be a full analysis or a skill could be like a smaller primitive as part of a larger analysis. But also secondarily, it's how do you think? So one of the physicists was looking at the reasoning strategies of one of our models. He's like, it's all wrong. It's all wrong. And we're like, what do you mean?

Starting point is 00:48:11 He's like, no, this should be thinking higher level. It should be thinking in terms of symmetries. This is the book that encodes like the thinking strategies. That will be more effective. And of course, your reinforcement learning environment needs to reward those types of strategies. But given some of the most premier scientists are using these strategies, they're likely effective. And these are types of things where it's like an industry academic partnership can just be so powerful because industry just simply is blind to these types of analyses.

Starting point is 00:48:41 these tools, as well as just this way you're thinking. Yeah. And there's a way of connecting that to the tooling question as well, because, you know, language is very important. But then in the human brain, we also see all the visual processing, like geometric. So it's plausible that while these all limbs will keep getting better and better, they'll actually benefit from having a geometric reasoning that's separate. So today we can do that with equivariant graph neural networks.

Starting point is 00:49:06 We can do it with diffusion models that are kind of geometric tools by construction and the LM can call them so then it can have both the language aspect, which is very good for, say, synthesis recipe, but also the geometric aspects, which is very good for representing atoms, just design geometries in general. So how are you thinking about deepening periodic styles with academic labs? Yeah, this is very important for us. So we have two major initiatives in this direction. One of them is we're starting an advisory board.

Starting point is 00:49:33 This will be kind of expertise spanning from superconductivity to salt state chemistry to physics. And we want to make sure, you know, we're in touch with this kind of long-term research directions. A lot of important government funding goes to these groups, and we want to have a tight coupling between what's important for them and us. So this, you know, include superconductivity expertise, such as ZX-Shan from Stanford on the experimental side and Steve Kowelson from the theory side. We also have census expertise on the advisory board from Mercury Canadiqadis from Northwestern University and Chris Wolverton on the high throughput DFTD side.

Starting point is 00:50:12 And then we have Kostia from Manchester University who is really well known for discovering graphene. So he'll be able to advise us on these novel exotic electronic states and materials. And our second initiative is going to be through a grant program. We really want to enable some of this amazing work going in academia and some of their work isn't a good fit for industry. You know, it's best done in academia. So we want to accept grant proposals and we want to enable and support the kind of work that's going to help community,

Starting point is 00:50:46 especially in relation to LLM's agents in synthesis, material discovery, physics modeling. So maybe after this show, you can include the link. Yeah, we'll include them in the show notes if grants are open starting today. Absolutely. Great. So for people who might be interested in joining periodic, what are you guys looking for? first off someone deeply curious someone who really wants to understand

Starting point is 00:51:10 the machine learning, the science at a deeper level who wants to make contact with reality who wants to advance science like this has to be a driving thing but also pragmatic what we're trying to do is incredibly challenging and someone who has like very careful process

Starting point is 00:51:28 and they get to their solution oriented they get to goals quickly and really someone world class along some dimension. We're looking across all these different pillars, so machine learning, experimentalist, simulation, and people who can bring some sort of innovation on how do you create a creative ML system? How do you bring new types of tools or new types of thinking to some of these state-of-the-art models?

Starting point is 00:52:00 someone who can advance simulations and make it more robust and more reliable with experiment. Yeah, and maybe one more thing I'd add is, Liam and I have been really looking for a sense of urgency in candidates, because we want these technologies not in 10 years. You know, we don't want these alms to start improving science in 10 years, but we want them ASAP. So if the candidate feels like a sense of urgency for improving these physical systems, discovering these amazing materials, innovating on superconductivity, they would be a good fit. Yeah. If you match all these, please reach out.

Starting point is 00:52:31 All right. Sounds like we got to amp up the speed, the scale of stuff happening at periodic. And we'll put the career links in the show notes. Thanks for coming, guys. Thanks for listening to the A16Z podcast. If you enjoy the episode, let us know by leaving a review at rate thispodcast.com slash a16Z. We've got more great conversations coming your way.

Starting point is 00:52:52 See you next time. As a reminder, the content here is for informational purposes only. Should not be taken as legal business tax. or investment advice or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com forward slash disclosures.

The a16z Show - Building an AI Physicist: ChatGPT Co-Creator’s Next Venture

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.