No Priors: Artificial Intelligence | Technology | Startups - Chai-2: The AI Model Accelerating Drug Discovery with Chai Discovery Co-Founders Jack Dent and Joshua Meier

Starting point is 00:00:00 Hi, listeners. Welcome back to No Pryors. Today, I'm excited to speak with Josh Meyer and Jack Dent, two of the co-founders at Chai Discovery and former bio-a-I and engineering leaders at Meta.I. At Science Stripe, this week, Chai released their industry-leading Chi-2, Zero-Shod Antibody Discovery Platform, which at its core is a generative model that can design antibodies that bind to specified targets with 100-fold the hit rate of prior computational approaches. We'll talk about their product, The Next Frontier for Chai, why they're bullish on biotech, and why the most effective antibody engineers will soon be working as expert prompt engineers. Jack, Josh,

Starting point is 00:00:41 congrats on the Chai-2 launch. Thanks for doing this. Welcome. Thanks for having us, Sarah. We're excited to be here. Good to be here. Josh, I'll start by just asking, you know, you and several the scientists on the team have been working on AI drug discovery for about a decade now in different settings. I have also been looking at this area for for over a decade. We haven't yet seen successes of drugs to market that were designed, you know, with these AI computational techniques. What made you believe? Why start the company when you guys did? That's a great question. So many of us have been working on the space for a while and, you know, didn't start a company because it was really a research idea, I think, until very recently. You know, there were signs of life that Sunday this was

Starting point is 00:01:20 going to work, but it wasn't really on the timeline of a company, right? You can't really start a company thinking that 10 years from now things are going to work. You also don't want to start a company after it's already working and kind of miss the boat. So the sweet spot is like, okay, we have like maybe one, two years that we have to really get this off the ground. And we made a bet when we started the company that was going to work, there were really a couple of things that fueled that decision. The first one was we made a bet that structure prediction, protein folding was going to get

Starting point is 00:01:48 a lot better. So obviously protein folding is considered solved in a couple of years ago, around like 2020. you had the breakthroughs of alpha-fold, too, and being able to predict protein structures with experimental accuracy, but it was just a single protein structure at a time. So we can take a single protein sequence, and we can see what that protein looks like. That's very useful for basic biology, so we can understand what the proteins we're looking at look like. But if you think about drug discovery, which is where, you know, we're really focused on a CHI discovery. In drug discovery, you need to understand how multiple molecules interact with one another. So you need to understand

Starting point is 00:02:22 how a small molecule drug is going to modulate a protein or how an antibody protein is going to modulate it in an antigen protein. So we started to see early signs of life that that was going to be possible. And again, we made a bet that we would be able to take this to the next level with the kinds of breakthroughs that we were seeing around diffusion models and around language models. The previous generation of structure prediction models would really just predict, you know, like one confirmation protein at a time. It kind of like one view on a protein. It's like the early image models. They didn't have diffusion models. You weren't really able to look at the diversity of generations that could come out. And we thought the same thing would impact drug discovery

Starting point is 00:02:59 and protein folding as well. So that's a bit of color on how we decided to start the company and we did. And maybe lastly, I should say, almost every AI bio company before us has had some kind of very tight lab integration with what they are doing. And it almost too tight. I think the lab integration is great. We do a lot of lab experiments at Chai. But the thing that was missing was could you actually have some kind of portable AI platform, something that would actually be generalizable and could be applied to lots of different areas? If you could do that, it means that your impact could really be taken to the next level. We can take Chai 2, the model that we've just released, and we can deploy it to hundreds of different projects, thousands of different projects.

Starting point is 00:03:37 Chai 1, which we open source, is already being applied throughout the industry, the tons of different problem. We don't even know everything it's being applied to because it's open sourced. But that was something that was also really important to us if we were going to kind of see this transformation of biology from a science into more of an engineering discipline, which is ultimately the goal of the company. Yeah, I want to come back to what you said about lab integration as we talk more about the technical approach here. But, Jack, you and I met in the context of, you know, you being a beloved engineering and product

Starting point is 00:04:08 leader at Stripe coming from the engineering side and looking for, like, the most interesting problems to work on in AI. Why did you decide to work on this versus like some of the other things we? were talking about like co-gen and such. Yes, as you know, Sarah, I spent quite some time thinking about my next steps and what I wanted to do with my life after the period I was at Stripe. And I give a lot of credit to Josh, actually, for this, that, you know, we were good friends going back even to the college.

Starting point is 00:04:36 You know, we were P-S buddies in college at Harvard in many of the same classes together. I was maxing out the CS curriculum. Josh was also doing that somehow for the chemistry and physics and all the other scientific curricula as well. But we had landed in a lot of the same classes. And as we went to our separate ways after college, we really just made a point of keeping in touch every, you know, three, six months. And Josh would always talk to me about his research.

Starting point is 00:05:01 Once it became clear that the research that Josh and others were doing this space was really no longer just a toy but was really going to impact and change the entire industry, that idea became infectious, right? it's sort of become impossible to unsee the future once you have that glimpse and although you didn't know until very recently that any of this go is it was going to work and of course there's still a lot left to prove once you start to grasp the implications of the fact that over the next few years we are going to have the ability as a human race to engineer molecules with atomic precision it's almost hard to work on anything else with with your life the

Starting point is 00:05:44 the impacts for society just broadly and human health, but not just health. There are a ton of other areas which this will touch, which we can get into. But that is just a platform shift in an entire industry. And so put that together with the fact that the kind of the belief or conviction that you might just be able to get it working. And I think it was impossible to say no to working on this in many ways. So this is a breakthrough result in Chi-2. Can you give us a sort of layperson's explanation of what the result was and the model itself and what you think is the most valuable part? Sure. Chai-2 is our latest series of models, which are state-of-the-art across a number of different tasks, but specifically the one we're most excited about is design. And what we've shown

Starting point is 00:06:34 is that we can design a class of molecules known as antibodies, which are some of the most therapeutically interesting molecules as well. These account for close to 50% of all recent drug approvals. And seven of the top 10 best-selling drugs out there are actually antibodies. And so what we've shown with Chai 2 is really the ability to design antibodies against targets that one wants to go after in just a small, what we call a 24 well plate in just 20 attempts. What this means is that we take a target, run our models, ask the models to design a antibody, we then ship that antibody to the lab, we have about a two-week validation cycle in the lab, and two weeks later, we see that roughly close to 20% of these antibodies actually bind their targets

Starting point is 00:07:23 in the intended way. So, Taito is a major breakthrough for the field. When we set out on this project, we were actually only targeting a success rate of 1%. That was the company-wide goal for the entire year. And the reason we set that goal of 1% is that previous attempts of this problem are maybe successful around 0.1% or even lower of the time. And those are the computational techniques. If you look at the traditional lab-based high-throughput screening techniques, people are really screening between millions or billions of compounds just to find one molecule that sticks. There's a reason we call it drug discovery.

Starting point is 00:08:01 It's a discovery problem. It's a search problem. and so people are really just sort of panning for gold in these massive yeast or phage libraries or alternatively you might inject a mouse or a llama, you might wait a couple of weeks for them to get really sick, you might then bleed them, take their plasma, take the antibodies out and isolate them. And this is actually what we did for COVID, actually. We actually took some humans who had already got COVID, took their antibodies out of them, tried to find one which actually then neutralized the virus. So you can imagine not an idea.

Starting point is 00:08:33 all the most efficient or the most principled process. And so what we've shown with Chai II is that we've been able to increase these success rates and discovering antibodies computationally by multiple orders of magnitudes compared to the prior state of the art computationally and by many, many, many orders of magnitudes compared to the traditional lab-based alternatives. And what this means is pretty profound for the industry. our view. You know, there are two ways to look at this. There's, of course, the faster, better, cheaper, you know, this is going to allow us to make drugs against targets and get

Starting point is 00:09:12 them turned around faster. But I think the thing that we're really excited about and what's, I think, more important is the entire class of targets that this will unlock in the future, which have just been inaccessible to previous methods. And I think in general, the biotech industry is everybody's a little glum right now. XBI hasn't done that well over the last five years. I think we're in one of the worst markets in biotech over the last few decades. But I think with Chaito, we're starting to see, I think, those first early signs of a real platform shift in biotech, the sort that comes around only so often.

Starting point is 00:09:50 You know, we've had one in the 70s, we've all sorts of new techniques then. But the idea that in the next five, ten years, that there are going to be entire new class of molecules that we're going to be able to discover. And entire new targets that we're going to unlock and entire markets that we can open up and therapeutics that we can get to patients to really cure diseases that have had no cure before. That's just an incredibly exciting prospect for us. I want to come back to impact because I think the ramifications here are really huge. But if we just go and think about first problem design, you I think looked at 52 problems. Why that many?

Starting point is 00:10:26 And like how do you specify a target? I'm picturing like bind to epitope X, but I'm sure there are. other requirements you'd want to have as drug designers? It's a great question, Sarah. So in the Chai II paper, we look at over 50 targets. Most of the existing papers in this area of doing AI for drug discovery are usually looking at like one, two or three targets. But again, it was important for us if we were seeing this as an engineering problem to make

Starting point is 00:10:52 sure that this is going to be generalizable. It's like, imagine you had a new LLM paper and you said, oh, I solved like one problem in the Yusimo contest, like really, really cool. It's like, no, you need a real benchmark, and you need to actually have that benchmark at scale. You need to have enough problems to convince yourself that the system is working. So that's why whenever we do these experiments, you know, sometimes we'll try one or two to try just to make sure there's not like a huge bug and, you know, make sure not everything fails. But, you know, even if everything fails in one or two, you know, the hit rates 50%, you could have just gone unlucky.

Starting point is 00:11:20 So that's one of the reasons why we decided to do a big benchmark here, really convince ourselves things are working. The way we selected the 50 problems, the biology people would laugh at this, and engineering people would love it. We actually just went to the vendor catalogs to see what was in stock, because we wanted to turn around this experiment quickly. We ordered all of these designs at the same time. So we actually wrote a scraper that would go and see what was in stock. We would go and pick out the protein. We would go look at what that protein sequence was. Now we need to make sure this is held out from training as well, right? So we would take that protein sequence. We would go compare it to all of like the database called the Sabdab. It's a group of like antibody structures

Starting point is 00:11:58 in the protein database. And we'd make sure that like none of the, of these sequences were in there and that none of these sequences were actually even close to anything in there. We removed things that add more than 70% sequence identity. So really things that are like a bit different than what we could have trained on, then selected those, made our designs, and then we shipped everything off to the lab. So we actually think it's possible that the 50% is actually an underbound because we might have just like mess things up because of how we said of this experiment. We did not think about the biology. These are not necessarily things that are even that useful for therapeutic. Some of these already even have drug programs against them. We were just

Starting point is 00:12:31 doing this really from a model assessment perspective. Let's understand how well the model is working. Let's convince ourselves. Let's convince the community that Chaito is working. And then in terms of applying this to problems, I think, you know, now we've got like hundreds of people that want to go and like try the model tomorrow and apply it to the various drug programs that they're working on. So that was really how we came up with those 50 tasks. Let's benchmark this and treat as an engineering problem. We have a broad audience for no priors that ranges from like business people to engineers, machine learning researchers, some scientists, and other fields.

Starting point is 00:13:02 Like, what intuition can you give listeners for how the model works under the hood? Like, especially for anybody who might start with some familiarity with, like, structure prediction models. Yeah, well, structure prediction is really a key part in making these models work. And it's actually the first thing we did when we started the company is we sprinted to build a state-of-the-art structure prediction engine. We actually open-source the first version of that. It's called Chai One.

Starting point is 00:13:25 And, again, like scientists around the world are using that now. But structure prediction basically gives you an atomic-level microscope. and it allows you to see where atoms are placed in 3D space. So once you can do that and you have this microscope, then the next question is, well, can we start moving those atoms around, right? We can now start to make changes in a sequence, and then we can see the ramifications of those changes in 3D space. So the actual design model, you can think of it as you prompt it with some information,

Starting point is 00:13:53 like here's a target that we want to go and design an antibody against. And then the model will try to place again these atoms in 3D3. space in order to satisfy that constraint. We tell the model, here's the target, and I want you to make a molecule that, you know, binds to that location, and then model will go and generate both a sequence and a structure that kind of fits into that. So that's like the high-level intuition for this. Yeah, one piece of intuition around that is that you can almost think about structure prediction

Starting point is 00:14:22 as the image net moment for the field, where with structure prediction, we are asking a model to go from sequence to a predicted structure, and it's sort of like a classification task. And then design, where you're trying to design binders, that is much more like a generative task. That's sort of like mid-jorney for molecules, whereas structure prediction, you are looking to predict the placement of atoms in 3D space. With design, you're taking an existing placement of atoms and you're trying to craft a new set of atoms that is complementary to that original set. So one analogy that people like to use is that of a lock and a key. And that when designing a protein or a drug, you have some target, which is your lock, and you're trying

Starting point is 00:15:14 to design a key using a generative model that fits that lock. And the way that the models work is actually pretty interesting. They reason quite literally by placing individual atoms in 3D space. And often they're getting the resolution of these structures, the error down to less than the width of one atom when we look at the error across the entire structure. So when we talk about atomic level microscope, you can see now why that might be important for design, because how can you hope to be able to design the key if you can't see the lock? Yeah, that's completely wild from a precision of prediction perspective. You know, if we analogize to LLMs, you know, you have learned grammar, syntax, semantics,

Starting point is 00:16:02 capabilities that emerge in the model that you can measure. Is there anything that would be analogous in terms of emergent vocabulary or concepts that you think Chai II has? Yeah, I think this whole point about the atomic level microscope is actually that point, right? There is something really, I don't know, I think deep. We still don't fully understand it about like why these models work. Again, we didn't even know this was possible. Obviously, we tried it so we thought that there was a chance.

Starting point is 00:16:27 And I think it just tells you something about, you know, maybe the signature of how proteins interact with one another is really embedded in the data, right? And we're generalizing to a new setting. So it's not like the model has seen, you know, specific binders against the target. And then we're just trying to do some in-domain generalization and walk through that space. That's actually quite an impactful application as well. And that's already being done through the biotech industry. Our team published work on that years ago already. But I think this really new frontier about generalizing to new space,

Starting point is 00:16:57 It tells us that, again, like the model is learning something really fundamental about how the molecules interact with one another. Again, it's able to generalize to problems that look very different in terms of how we would actually organize it in the biology. I think the whole rules about, you know, what do we think about, like, a protein family being different? These targets that we tested on are, again, a biologist, they are very, quote, unquote, dissimilar from what we saw during training.

Starting point is 00:17:21 But it doesn't seem like the model thinks that way. We actually even have a slide in our paper in the supplement where we actually look at an even harder subset. So not looking at things that are, you know, up to 70% sequence similarity with the model, but actually pushing all the way down to 25%. So really looking at tasks that are very different we saw in training, success rate was basically the same. Like the model didn't care. And again, I think that indicates something very profound about what the models are learning here. I mean, my assumption is the same here where obviously the fastest pack to immediate impact is going to be, you know, antibodies in clinic or whatever other therapeutics,

Starting point is 00:17:56 and its partners work on. But it does raise a question of like if the model has learned something that fundamentally like the biology research community doesn't yet know from a principles perspective, like we will also learn those rules from these models or whatever the principles are of structure and interaction. So I think that's super exciting. Yeah, totally agree. How would you characterize like overall hoped for impact of Chai II in terms of like bringing it to industry or your own programs? It's a great question, Sarah. So there's maybe two main areas. that we can break it down into. The first one is, again, like we've returned it into an engineering problem

Starting point is 00:18:32 and spend spending months or sometimes even years trying to discover some molecule. You know, now we can actually do it way faster because the screening, if you will, is happening on the computer instead of in the lab. But the second area that we're actually even more excited about is how do we actually solve problems, which just weren't even reachable with traditional methods? The model is not perfect. You know, it worked in 50% of the targets that we tried. Maybe it would have been more right for the caveat.

Starting point is 00:18:56 that's what we talked about before. But, you know, we're in 50% of cases. The failure mode of the models is going to be different than the failure mode in the lab today. And I think that's really going to be the sweet spot to focus in on. What are the areas that were not possible, you know, a few months ago, where now we'll be able to actually generate potential molecules really quickly against. So those are the two areas. You know, things that you can do today, let's do them a lot faster, and a lot cheaper.

Starting point is 00:19:19 But I think really the breakthrough opportunities are things that just weren't possible before. Yeah, one other thing that we've announced is that we will be, opening up access to both academic groups and industry partners. I think when you think about how this space is just going to evolve in the next few years and the amount of opportunity that's out there, given this platform shift, there is way too much opportunity for anyone company to capture alone. And their drug discovery itself is just an incredibly resource-intensive process. And I think it would be probably a conceit to assume that we could go after and pursue every target, we've done every program ourselves, even if we wanted to. And so when we

Starting point is 00:20:00 think about impact and think about what is going to move the needle for the company, of course, but also for the world, we think that the way to do that is to go out and bring this to life with a really exciting set of partners. And so we've opened up access. There's an access page on our website, which people can go to and fill out. Currently walking through there has been in dated the requests. But my hope is that we can really enable quite a few use cases with this and do that quite quickly. What has the reception been like so far? What is the biggest objection? Because this is a significant challenge to the ideas of high throughput screening or even like the workflow that even innovative pharma and biotechs have today. Yeah, it's a great question.

Starting point is 00:20:48 Usually when these kinds of papers come out, again, people have tried to do this many times. The critique is often, you know, does this really work? You know, you show this on maybe COVID, for example. Is this going to work for a case where we have less training data? Are the molecules going to be high quality? Do we really, you know, kind of believe the data? So I think the approach we did, like benchmarking this at scale has really helped a lot with that reception. Like, I think people really appreciated that approach, which has been great. Some of the questions people have is, okay, like, I can already discover drugs. So, you know, so now I have AI that can do it a lot faster. But does that actually change the kinds of molecules I can work on? And it goes back to what

Starting point is 00:21:23 we just discussed before. I think there are other folks that are responding to that saying, like, no, like the transformation years, how about those projects that didn't work for you, or where you're really struggling today? Now you've got another tool on the toolkit. And you kind of have to use this tool now, or you might be left behind. So I think that it's been really interesting to see the community kind of digesting this. Of course, a lot of the AI folks are really excited, right? Like we're getting artificial antibodies before we're getting, you know, maybe other breakthroughs we would have expected earlier. But it's overall been really exciting to see that reception. I mean, our inboxes are just flying up. Like, the early access has gone in hundreds

Starting point is 00:21:57 of people, you know, within hours of launching, reaching out to us. We just announced. So I think we're still kind of digesting all that. We're a small team. So we're prioritizing early access to the right people. But we're really excited to kind of get the models out there and for them to start solving some really hard problems in the drug discovery space. Is there an important future for like large scale wet lab screening? Does it just become a data collection exercise to fill out the distribution for chai models? Are there areas where you will, you think will need that in 10 years, 20? Yeah, I think if you just take the models and then you sample more, you probably will get a better result. So we tested only 20 molecules per

Starting point is 00:22:37 targeting the paper, up to 20 molecules. You know, if you were to do 10 times that, a hundred times that, orders magnitude more, you probably just get into spaces with better, better molecules. So, you know, the machine learning model is probabilistic. It's like using chat GPT, if you're trying to solve a math problem and then you look at the top one response or if you look at the top 10,000 responses, you're going to get a better result if you look at the top 10,000. You can't really do that with a product experience on chat GPT. I'm not going to look through 10,000 math responses. I won't even know which one is correct. The cool thing with the lab actually is we actually could just test all 10,000 of those in the lab. So I don't know if you have to, but that's definitely

Starting point is 00:23:13 something that is, I think, going to be tested out with these models. And I think the future of high throughput screening and how they kind of interact with the models, I think the cool. question is still open, but I expect that, you know, people will be creative and will find ways to actually take the best of AI and marry that with the best of biology to kind of push the bounce forward. And just to add to one thing to that, there is a whole host of really amazing CROs and other players with this incredible expertise running those traditional methods. And to Josh's point, we have many, many companies asking us, can you run this not just 20 times, but can you run this 100,000 times, even if it's going to work in 20, because I just might find something

Starting point is 00:23:54 better, right? And that something better can result in a better job. That could be the difference between getting a patient, an antibody, which requires an injection or something which requires subcudosing, for example. And so I think with these tools, you can sample the search space, sort of add infinitum, and that marrying of traditional technique and models will actually hopefully get us into areas of this space, where we can just find better products for patients. I want to ask one more question generally about like predictions for biotech, and then I want to talk about the future of chai as well. What do you think biotech looks like 25 years from now? I realize that's a ludicrous question to anybody working in AI where you're like, hey, I didn't

Starting point is 00:24:35 know this is going to work at all last year. As I mentioned before, there is a lot of doom and gloom in the biotech industry right now due to macro factors with rates where they are and the long-time investment cycles that are required to make biotech viable, there is just a real pessimism in the industry right now. It's sort of the world's market in a couple of decades. And I think that it's moments like this, breakthroughs like this, which give us these flashes of light and these reasons for just immense optimism about the future of this industry, not just in terms of improving timelines and reducing costs, but also in terms of fundamentally enabling those new products. And so if we think ahead over the next 25 years, you know, we've gone from a less than 0.1%

Starting point is 00:25:27 success rate to a close to 20% success rate in a year. Well, who's to say that in another year that can't be a 50 plus or even a close to 100% success rate? I think if you see our mini-protein results. We are, I think, close to 70% on those with picomola affinities, like really, really tight binders for every single target that we tested. So all five targets that we tested works and 70% of the designs that we ordered works, I think that there's no reason that other classes of molecules, those success rates can't be that high as well. And I think once you have that, you really enter this era where you sort of have a computer-aided design. suite for molecules in a way that, you know, we have maybe solid works for mechanical engineering

Starting point is 00:26:17 or we have Photoshop for creatives. And that entire software suite will exist for biology. I think the implications of that, the ability to design, program, understand the interactions between atoms and molecules at the most fundamental level are pretty vast and should just give us a lot a lot of hope and excitement about what's what's about to happen. We're just talking last night about maybe we should be getting baseball caps saying, let's say, bullish on biotech on them. Because I think this is one of those special moments, which I think can really, we've heard from many others writing into the company that this has really shifted their opinion. If you think about going from antibodies to, you know, obviously better success rates and then

Starting point is 00:26:58 also other therapeutics, is there a difficulty hierarchy we should have in our minds or is it just like unexplored space in terms of enzymes and peptides, small molecules? other domains. Yeah, it's actually a lot more than just success rates. There's lots of properties that need to be optimized for a molecule. You know, finding a drug is like looking for a needle in a haystack. And I think we've really passed through massive swaths of that sequence space with Chaito, right, by really focusing out of the things that find.

Starting point is 00:27:24 That's where, like, a lot of the search space has to be searched in the lab today, going deeper into other properties as well. Let's make sure that these antibodies can be manufactured well. Let's make sure that they can be really stable. So there's lots of other properties that we're excited about. So stay tuned for that. And then the other thing is actually there are next generation antibody formats even. So what we predict will happen is people probably won't be as interested in the clinic for things like monoclonal antibodies.

Starting point is 00:27:48 These are antibodies that are hitting, for example, like a specific epitope on a protein. But now if we can make antibodies much faster and more easily, you can imagine a future where if I want to hit a target, let me choose two different parts of that target, make two different proteins that are hitting them. Like basically two different primitive antibodies. and let me bring them together. This is called a bi-paratopic, two paratopes, so basically two different antibody interactions. And that kind of stuff is going to become a lot easier to do today. I think these days there's a lot of trade-offs that get made in biotech

Starting point is 00:28:21 about, like, you know, risk on your target, risk on your discovery process, how hard is it going to be to make a molecule? And I think AI is going to raise the bar across the board. I think the bullish on biotech, you know, movement that, you know, Jack is announcing here as well, If we think about what that could even represent, there's right now a lot of risk in biotech. There's a lot of crowding on the same kinds of targets. The risk actually starts to go down in terms of discovering some of the stuff.

Starting point is 00:28:47 Maybe there's still clinical risk if you try something that's like totally new that people haven't done before. But we've just like opened up, I think, the aperture of opportunities that that can be pursued here. And that's something that I think is really exciting. So there's still a lot more work to do for us to validate that like all this is going to be possible. But I think just the pace at which the field is moving, just gives us a lot of optimism for what's going to be possible next. And maybe I can just share one anecdote about

Starting point is 00:29:11 why we are so optimistic. We had a partner come to us as we were in the process of building these models. We didn't even really know. We hadn't had back our first few batches of data so we didn't know if it was going to really work yet. But this partner had been working on this problem for a few years. They had a team of, I think, five to ten people working on it. They estimated that fully loaded. All of those, those people might have set the company back with the experiments that they had done as well, maybe $5, $10 million. And it was a problem where they wanted to build a molecule that cross-reacts against two different species. So both a human form and a cyno or a monkey form of this protein, such that when they put this molecule into animal testing,

Starting point is 00:29:57 if, you know, they didn't want it to fail because the monkey has a slightly different version of that protein than the human. does. So we're really struggling to get this to work for whatever reason. And we put this into the model and just prompted the model to design for these two targets at the same time, not just one target. So you can imagine that this is a slightly more sophisticated challenge than just designing against once. We ordered actually only 14 sequences to the lab. And I think four of those were hits to humans. One of those was a hit to the Sino. One of them was actually overlapped and hit both. That one now allows us to move forward with that program and gives us a whole

Starting point is 00:30:34 amount, a host of diversity around that molecule that one can explore as well. First of all, that's very cool. And second, I think it's interesting that a lot of industry observers would say, like, the bottleneck in pharma and the expense in pharma is clinical, not discovery. And like, I think you're pointing to the fact, well, like, we can design for clinic, right? And actually, it's intuitive, but it's just because it is an argument from people bearish in biotech or concerned about the ability to make progress in programs and, uh, and reduce cost for, for any given successful drug is, well, you know, if discovery had less risk, as Josh was pointing out, which is like a huge claim, then the entire industry is more efficient, right, and more

Starting point is 00:31:18 effective. That's the, that's the hope. Yeah. And I think we've got a lot of reason to be optimistic. I also don't want to oversimplify things. You know, there's lots of other things that go into making a drug. There's capital markets that go into this. You know, there's tons of clinical risk. This is really just the tip of the iceberg. But we're really excited about the progress that this could represent. I want to ask strategically, like where Chai invests from here. So you talked about other attributes that you want to be able to design in Chai models. But if we just look at this generically as like an AI model company, where do you think the defensibility is? There are two key areas of investment for the company.

Starting point is 00:31:56 I think, firstly, what comes out of these models, these just aren't drugs yet. They're hits, their antibody hits, but there's a lot more work to be done to actually turn these into verbal molecules that we can put into humans. We have early data, which we put in our preprint to suggest that a lot of the properties that one might want from a drug that these molecules actually have. But we need to do a lot more further characterization and assays to convince ourselves. that we can do that. And then I think there's also the next step.

Starting point is 00:32:26 Stage beyond that is actually designing entire drug candidates in zero shot right out of the models. And I think a few months ago, we might have said this was a pretty futuristic idea. And nobody in the company was really, really talking much about this. But I think once you see these results and grapple with the implications, the fact that we can get antibody hips in just 20 attempts, there's no reason that we can generate. in trial drug candidates in that same number of attempts. So I think there's going to be some key investments there and really,

Starting point is 00:32:59 the model right now is a model. It's not really a product. It is a product. And it's certainly useful today. But there's a lot better that products can get with more investment into just making sure we can optimize all the therapeutic properties that people care about. And then, of course, there's the entire interface and software layer around that to make this really easy to use.

Starting point is 00:33:21 and the real platform that goes around supporting that. So, you know, how do you, if you want to hit two targets, design a molecule that hits both? How do you specify that in the software? This is going to be a sufficiently advanced piece of software. It's going to become as advanced as a Photoshop over time. And as we build that out, I think we're going to need to make some really core investments into just the engineering and the products to ensure that that we are building a software that we ourselves and others will really love to use. Yeah, one thing to add on to that as well, we released Chai 1 open source.

Starting point is 00:34:00 We thought of it as a model, and I think Chai 2 is a lot more than a model, right? It's become a product. It's actually more of a bigger pipeline that comes together to even make this happen. And it also becomes trickier to use these models. Protein folding, you put in your sequences, you get out of structure. Design is a different story, right? actually specifying the prompt on its own. We did that programmatically in the paper to go and assess this thing at scale,

Starting point is 00:34:24 but a scientist who wants to use this to initiate a drug discovery program, probably not using a script to come up with that prompt, is probably going to be really thoughtful about it. And I think that's why investing in the product layer here is really important. And not to mention, it's only going to get more complicated from here, right, as we start to support more advanced drug modalities, as there's various properties that come online. As we actually show some early evidence of this in the white paper,

Starting point is 00:34:48 or, you know, you might want to actually optimize for multiple proteins at the same time. Sometimes, you know, actually it's a good time to be a sick mouse. In order to have a human drug, it usually needs to work in animals as well. And sometimes drug programs actually get stuck there. It's like, okay, guys, like we either have a mouse drug or a human drug. It's really hard to get both. And there are actually some cases where people have to discover two different drugs. They have a, they call a surrogate antibody.

Starting point is 00:35:13 I'm going to like make the mouse version. I'm going to study that, convince the FDA that like this mechanism works. And, but you're even taking risk. You're like, maybe this molecule, like, works, like, slightly differently. We literally show that example in the paper of optimizing, we don't do mouse. We actually do monkey. So, like, monkey and human together. But you can throw other species into that as well.

Starting point is 00:35:31 Sometimes we've got the opposite problem. I want to hit this protein. I don't want to hit this other protein. We've got some early evidence as of late that that's possible as well. And these sorts of things, you know, the prompts are just a lot more complicated. And it means that you need to have, like, the right product. what happens when you start doing those experiments in the lab? We want the models to learn from that and then help us really be like a co-pilot

Starting point is 00:35:51 and driving like the next stage of those designs as well. You know, all of this is, again, it's more than just the models. It's really thinking about those workflows as well. And it's even about just getting that word out to people and having them think about this is a new tool in their staff. What happens if you're an antibody engineer and you've been doing things in a certain way for the past 30 years? And now there's a new paradigm in discovering drugs.

Starting point is 00:36:13 Like that itself is actually a problem. that a company needs to solve. So these are all different areas that we're investing in right now. That actually begs a question I was going to ask you is like if you are an antibody engineer or a biologist today, what advice, you know, given, let's say they believe you about how much is going to change and these like CAD for biology, like a software suite that is coming into an existence. Like what should they learn, be good at, like go study?

Starting point is 00:36:43 Well, number one, get access to try to. Number two, you know, figure out how to get your prompts right and actually take full advantage of it. And then I think number three, you know, start dreaming about the new possibilities. You know, it's interesting. We've talked to a lot of antibody engineers since starting the company. And we've been alluding sometimes to, you know, what we're doing here, you know, sometimes you do the market research question. You ask, you know, suppose you have like a 1% success for any antibodies, like what would you use that for? The conversations are changing now that, first of all, it's not 1%, it's 10%, and, like, people see that it's working,

Starting point is 00:37:18 I think that creativity is really being unlocked, even ourselves, right? I think when people are thinking about the answer to that question, there's always some big doubt in your mind. It's like, ah, it's a hypothetical question, you know, your neurons are not activating in the same way of doing something with it. It was the same thing with LN. It's like, imagine asking someone five, 10 years ago, oh, you know, if we could predict the next word in a sentence perfectly, like, what would you do with that? It's actually very hard to imagine until you start playing with the models, even our team internally. You know, now, even without sending it to the lab, you know, we can, again, choose some targets,

Starting point is 00:37:47 choose some prompt, generate stuff against it. You start to look at the generations that are coming out of the model and you're like, oh, wait, I can actually solve this problem by like choosing the right epitope on a target, choosing the right part of the target. Like, these two targets are different. Like, sure, we, we have an engine that the model can optimize for one or optimize for both, you know, or selectivity optimized for one and up the other. But you can actually get a lot of that by choosing your prompt in a smart way. So let me hit part of that protein that is quite different between the two things or quite similar between the two things. These are the sorts of realizations that in retrospect are quite obvious, but they don't really

Starting point is 00:38:22 hit you until you actually start to like use a product like this yourself. So I think people are just, once they get their hands on this, I think they will, they'll start to dream of the new possibilities. I think it just really raises the bar. You know, the people who are most excited about that are often these antibody engineers and these biologists. A lot of the work that they're doing today is painstating. and they're not the biggest fan of these slow feedback loops and these intractable problems,

Starting point is 00:38:49 because many of them that we speak to are just really motivated to solve a particular task. And so you give them, I'm an engineer, you give me a tool which says I have to write less code. I love that. I can now think more about system design and architecture and more complex products and all these other things, but it's really going to raise the bar for a lot of these people. And I think people are only really now starting, as Josh said, to think through all the possibilities. I was on a call with people as a matter a few weeks ago where people saying, when do you think this is going to happen? They say, oh, not for three to five years. This is a really futuristic idea.

Starting point is 00:39:26 And then a couple weeks later, you show them what they have and they sort of fall off their chair. And so there's going to be a sort of joint effort with us alongside these real domain experts to actually figure out these key application areas. because biology is so vast and so complicated that actually there is so much knowledge that so many of the practitioners, the specialists have, that no one company will just ever possess, which is why we're so excited to go out and be partnering with people to really bring this to life. I want to ask a couple questions more, just specifically about company building before we run out of time. And maybe, Jack, I will start with your amazing engineer. And then you guys also have like a very software-oriented team working on bylaw.

Starting point is 00:40:09 problems. Some of those people come from, you know, long-term research in that space in particular. But for yourself, Jack, like, as you said, your software person, how do you get up to speed on the bio area to go do leading work? Well, I think it's two things. First of all, ramping up on any new field is always just a total fight. You have to get to the frontier and to be, have read the right papers and to be knowledgeable about the areas that you need to learn. You just have to sort of put shore her down and push through. And there are waves of excitement and misery in that experience, but you can get there fast if you really set your mind to it. And I'd say the second part is that surrounding yourself with just the most incredible team is the best thing

Starting point is 00:40:55 that you can do, far beyond anything that you can learn by yourself. And we have certainly the most special group of people that I've ever worked with within the company, our co-founder's Matt McPartland and Jack, Boutreau, who are just rare, And then, you know, the entire team beyond that, some of the former heads of AI at other drug discovery companies, some of the top open source contributors, the team is so multi-talented. It's small. It's around a dozen people, but mighty. And I think as we've seen in other areas of AI, small but mighty teams can go a really, really long way these days. And so, you know, the, I think there are actually surprisingly few people on our team, even with a computer science degree, Josh himself. got a chemistry degree, Alex goes his PhD in physics, a whole host of others, but that this work is so interdisciplinary, they're really having that breadth of knowledge across biology, chemistry, physics, artificial intelligence, computer science, engineering. It really takes a village, and everybody is learning from each other every day because

Starting point is 00:42:06 of just how vast that subject matter is that one has to have a command of. I think we've also benefited from such immense focus as well. Everyone has been so passionate about trying to solve this problem. And I think I really credit that to being a huge reason on why we were able to achieve it. And we've also got a team that because of that focus is very engineering-centric as well. So if you look at the whole team, you know, we have a very research-oriented team right now. But everyone is a stellar engineer as well and takes that very seriously. So it's not everyone solving, you know, their favorite pet problem.

Starting point is 00:42:38 we are all going after the same problem and solving that together. And even just 10 people solving a problem together, there's a lot of code being written every day. You have to be very thoughtful about how that all comes together and interacts. And I think especially in our next phase of growth for the company as we start to invest more and more in product and the velocity around that and getting this into folks' hands, that's just going to become even more important.

Starting point is 00:43:02 How do we make sure that the latest research breakthroughs that we're shipping internally are actually making their ways into partners' hands? hands. That's something that, again, we are very thoughtful about at Chai and take very seriously. Yeah, I also remember Jack in our office at Conviction, like, debating the merits of dev containers with some of your scientist teammates at the very beginning of the company. And both of you from the beginning, you know, talked a lot about platform investment. And so I actually think that's like a little bit sort of unconventional in terms of such a research-oriented team to say, like, we need to make this platform investment. Can you talk a little bit about that?

Starting point is 00:43:38 Yeah. So I've gone through the experience of going from zero to 100 on engineering, large engineering products before I worked on Stripe Link, which was a multi-year project. And again, Stripe Capital, where engineering teams scaling from zero to 25, 50 people by the time we were done there. Same for Link, maybe more. And I think you just learn that unless somebody is really taking care to keep the entire system in their head and is an effective technical steward of the architecture that things just evolve and the sort of entropy of a software takes over and slows down your rate of progress to zero because nobody can get work done anymore. And so somebody needs to keep the entire system in their head and the interaction between all those components and make sure that people who are working on individual subcomponents of your codebase actually have to minimize the amount of context that they need to load into their heads to understand how to accomplish their task. So these are just the principles of really, you know, it's pretty

Starting point is 00:44:48 basic. It's just simplicity and modularity, but making sure that's a practice and a kind of cultural, cultural practice, and everybody's on the same page about investing in that, and that, you know, people aren't cutting corners. They see it as their responsibility to lay the groundwork for the next person. And this is doubly hard to do in deep learning code basis because often if you introduce a bug or write a regression, you won't know for weeks that that has shown up. It's sort of terrifying because, you know, you could spend a million dollars on a training run with a bug that crept in four weeks ago. We've literally had to do this in Chai's history, but we've had to go and bisect Git history, run, launch training runs with a sort of a binary search

Starting point is 00:45:35 to identify a small enough range of pull requests, to identify a bug, then go to that pull request, identify the bug. And I think it's those sorts of experiences and the cost that, that, that, that, that, finding that bug probably, I'm sure if it was millions of dollars, but it was certainly tens of thousands of dollars of compute time to go back and find that thing. It's experiences like that, which I think make rigor as such an important practice in engineering rigor in the company. And so being rigorous about, I think, you know, some people are surprised to learn that even though we do deep learning that we are pretty rigorous about writing

Starting point is 00:46:11 unit tests for everything. But I think these basic software engineering practices are actually sort of lacking from most research code bases. And so bringing in some of those basic principles has allowed us to move very fast and not just fast in the short term, but should give us a mechanism to compound on that investment over time. And it's overall very aligned with your just mission of term biology from science to engineering, right? It makes sense that it would go through the core of the company's practice, too. I have two more questions before we run out of time here. The first is, you know, you talk about the expense of, like, training experiments.

Starting point is 00:46:46 Like, what's your decision framework for, like, how quickly to scale compute or, you know, paralyze experiments here? Yeah, we've tried to set up the company in a pretty scrappy way. Actually, when we were getting started, we should have even talked about this as well. You know, we, the company wasn't even, we were based in San Francisco. It wasn't even clear the company being in San Francisco when we started. And, you know, back then, we hadn't, like, raised capital for, for the company really yet.

Starting point is 00:47:10 We were kind of, like, using free compute credits from the cloud providers. I think for us, it's just about being, again, laser focused on solving the problem and just, like, really making the case, like, why are we doing something? And I think that, you know, if that's reasonable, we'll go and invest in it. Again, in an engineering problem, if it's, if it's, you know, kind of clear you're seeing signs of life. You're seeing some scaling law, whatever it is, like, let's go as fast as possible make that work. But let's also not get distracted, like scaling something out if we are not convinced that that it's going to work at. So I think that, you know, kind of scrappy culture

Starting point is 00:47:42 on the, you know, where are we spending side? It kind of goes hand in hand with making really fast progress because it means we have a high bar for like where we're spending our time. Everyone in the team works extremely hard. You know, there's, there's people in the office, like, you know, all times of day, all times of night. And it's, it's pretty beautiful. to see that. So we work hard, but I think we also work really smart. And I think you have to do that to make progress with how fast the field is moving right now. You now see signs of life. You're very, you're very bullish on biotech. That also means, like, given that you are going to try to scale to support, you know, demand from the industry and your own efforts, who are you looking to hire now?

Starting point is 00:48:22 We're really hiring across all functions right now. So we've done, made some really big breakthroughs here on the AI research side. And as we take that to the next level and try to get Chaito in front of the right partners, we're hiring for product engineering, for antibody engineering, for business development, account executive. Like there's a, there's a whole host of roles that are open on our site right now. And again, this work is extremely interdisciplinary. And we really want it to build this in a thoughtful way so that we can make, you know,

Starting point is 00:48:55 try to as useful as possible for the industry. Well, thanks for doing this, guys, and congratulations on, you know, progressing the frontier of AI discovery. Thanks so much for having us on Sarah. It's, it's, it's, it's been really fun. Thank you, Sarah. Find us on Twitter at No Pryor's Pod. Subscribe to our YouTube channel. If you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode at no dash priors.com.

Your Ad Here

No Priors: Artificial Intelligence | Technology | Startups - Chai-2: The AI Model Accelerating Drug Discovery with Chai Discovery Co-Founders Jack Dent and Joshua Meier

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.