No Priors: Artificial Intelligence | Technology | Startups - Chai-2: The AI Model Accelerating Drug Discovery with Chai Discovery Co-Founders Jack Dent and Joshua Meier
Episode Date: July 3, 2025AI has already fueled breakthroughs in biotechnology—but now, further advances in AI are poised to fuel pharmaceutical discoveries as well. Sarah Guo sits down with Joshua Meier and Jack Dent, co-fo...unders of Chai Discovery, whose newly launched Chai-2 designs bespoke antibodies that bind to their targets at a jaw-dropping 20% rate. Jack and Joshua talk about the implications for Chai-2’s success rate at discovering antibodies for the pharmaceutical industry, how structure prediction is pivotal in making the model work, and future potential for using the model to optimize other molecular properties. Plus, they talk about what they believe bioscientists should be learning to best utilize Chai-2’s technology. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @_jackdent | @joshim5 Chapters: 00:00 – Joshua Meier and Jack Dent Introduction 01:09 – Genesis of Chai Discovery 06:12 – Chai-2 Model 10:13 – Criteria for Specifying Targets for Chai-2 13:12 – How the Chai-2 Model Works 16:12 – Emergent Vocabulary from Chai-2 18:15 – Hopes for Chai-2’s Impact 20:33 – Reception of the Chai-2 Model 22:16 – Future of Wet Lab Screening and Biotech 27:08 – Optimizing Other Molecule Properties 31:37 – Where Chai Invests From Here 36:20 – What Bioscientists Should Learn for Chai-2 40:23 – How Jack and Josh Oriented to the Biotech Space 43:38 – Platform Investment and Chai-2 46:53 – Scaling Chai Discovery 48:21 – Hiring at Chai Discovery 49:09 – Conclusion
Transcript
Discussion (0)
Hi, listeners. Welcome back to No Pryors.
Today, I'm excited to speak with Josh Meyer and Jack Dent, two of the co-founders at Chai Discovery
and former bio-a-I and engineering leaders at Meta.I.
At Science Stripe, this week, Chai released their industry-leading Chi-2, Zero-Shod Antibody Discovery
Platform, which at its core is a generative model that can design antibodies that bind to
specified targets with 100-fold the hit rate of prior computational approaches. We'll talk about
their product, The Next Frontier for Chai, why they're bullish on biotech, and why the most
effective antibody engineers will soon be working as expert prompt engineers. Jack, Josh,
congrats on the Chai-2 launch. Thanks for doing this. Welcome. Thanks for having us, Sarah. We're
excited to be here. Good to be here. Josh, I'll start by just asking, you know, you and several
the scientists on the team have been working on AI drug discovery for about a decade now in different
settings. I have also been looking at this area for for over a decade. We haven't yet seen successes
of drugs to market that were designed, you know, with these AI computational techniques. What
made you believe? Why start the company when you guys did? That's a great question. So many of us
have been working on the space for a while and, you know, didn't start a company because it was really a
research idea, I think, until very recently. You know, there were signs of life that Sunday this was
going to work, but it wasn't really on the timeline of a company, right? You can't really start a company
thinking that 10 years from now things are going to work.
You also don't want to start a company after it's already working and kind of miss the boat.
So the sweet spot is like, okay, we have like maybe one, two years that we have to really get
this off the ground.
And we made a bet when we started the company that was going to work, there were really a
couple of things that fueled that decision.
The first one was we made a bet that structure prediction, protein folding was going to get
a lot better.
So obviously protein folding is considered solved in a couple of years ago, around like 2020.
you had the breakthroughs of alpha-fold, too, and being able to predict protein structures with
experimental accuracy, but it was just a single protein structure at a time. So we can take a single
protein sequence, and we can see what that protein looks like. That's very useful for basic biology,
so we can understand what the proteins we're looking at look like. But if you think about
drug discovery, which is where, you know, we're really focused on a CHI discovery. In drug discovery,
you need to understand how multiple molecules interact with one another. So you need to understand
how a small molecule drug is going to modulate a protein or how an antibody protein is going
to modulate it in an antigen protein. So we started to see early signs of life that that was
going to be possible. And again, we made a bet that we would be able to take this to the next level
with the kinds of breakthroughs that we were seeing around diffusion models and around language
models. The previous generation of structure prediction models would really just predict, you know,
like one confirmation protein at a time. It kind of like one view on a protein. It's like the early
image models. They didn't have diffusion models. You weren't really able to look at the diversity
of generations that could come out. And we thought the same thing would impact drug discovery
and protein folding as well. So that's a bit of color on how we decided to start the company
and we did. And maybe lastly, I should say, almost every AI bio company before us has had some
kind of very tight lab integration with what they are doing. And it almost too tight. I think the
lab integration is great. We do a lot of lab experiments at Chai. But the thing that was missing was
could you actually have some kind of portable AI platform, something that would actually be
generalizable and could be applied to lots of different areas? If you could do that, it means that
your impact could really be taken to the next level. We can take Chai 2, the model that we've just
released, and we can deploy it to hundreds of different projects, thousands of different projects.
Chai 1, which we open source, is already being applied throughout the industry, the tons of different
problem. We don't even know everything it's being applied to because it's open sourced.
But that was something that was also really important to us if we were going to kind of
see this transformation of biology from a science into more of an engineering discipline,
which is ultimately the goal of the company.
Yeah, I want to come back to what you said about lab integration as we talk more about
the technical approach here.
But, Jack, you and I met in the context of, you know, you being a beloved engineering and product
leader at Stripe coming from the engineering side and looking for, like, the most interesting
problems to work on in AI.
Why did you decide to work on this versus like some of the other things we?
were talking about like co-gen and such.
Yes, as you know, Sarah, I spent quite some time thinking about my next steps and what I wanted
to do with my life after the period I was at Stripe.
And I give a lot of credit to Josh, actually, for this, that, you know, we were good friends
going back even to the college.
You know, we were P-S buddies in college at Harvard in many of the same classes together.
I was maxing out the CS curriculum.
Josh was also doing that somehow for the chemistry and physics and all the other scientific
curricula as well.
But we had landed in a lot of the same classes.
And as we went to our separate ways after college,
we really just made a point of keeping in touch every, you know, three, six months.
And Josh would always talk to me about his research.
Once it became clear that the research that Josh and others were doing this space
was really no longer just a toy but was really going to impact and change the entire industry,
that idea became infectious, right?
it's sort of become impossible to unsee the future once you have that glimpse and although
you didn't know until very recently that any of this go is it was going to work and of course
there's still a lot left to prove once you start to grasp the implications of the fact that
over the next few years we are going to have the ability as a human race to engineer molecules
with atomic precision it's almost hard to work on anything else with with your life the
the impacts for society just broadly and human health, but not just health. There are a ton of
other areas which this will touch, which we can get into. But that is just a platform shift in an
entire industry. And so put that together with the fact that the kind of the belief or conviction
that you might just be able to get it working. And I think it was impossible to say no to working
on this in many ways. So this is a breakthrough result in Chi-2. Can you give us a sort of layperson's
explanation of what the result was and the model itself and what you think is the most valuable
part? Sure. Chai-2 is our latest series of models, which are state-of-the-art across a number of
different tasks, but specifically the one we're most excited about is design. And what we've shown
is that we can design a class of molecules known as antibodies, which are some of the most
therapeutically interesting molecules as well. These account for close to 50% of all recent drug
approvals. And seven of the top 10 best-selling drugs out there are actually antibodies. And so
what we've shown with Chai 2 is really the ability to design antibodies against targets that one
wants to go after in just a small, what we call a 24 well plate in just 20 attempts. What this means
is that we take a target, run our models, ask the models to design a antibody, we then
ship that antibody to the lab, we have about a two-week validation cycle in the lab, and two weeks
later, we see that roughly close to 20% of these antibodies actually bind their targets
in the intended way.
So, Taito is a major breakthrough for the field.
When we set out on this project, we were actually only targeting a success rate of 1%.
That was the company-wide goal for the entire year.
And the reason we set that goal of 1% is that previous attempts of this problem are maybe successful around 0.1% or even lower of the time.
And those are the computational techniques.
If you look at the traditional lab-based high-throughput screening techniques, people are really screening between millions or billions of compounds just to find one molecule that sticks.
There's a reason we call it drug discovery.
It's a discovery problem.
It's a search problem.
and so people are really just sort of panning for gold in these massive yeast or phage libraries
or alternatively you might inject a mouse or a llama, you might wait a couple of weeks for them
to get really sick, you might then bleed them, take their plasma, take the antibodies out
and isolate them. And this is actually what we did for COVID, actually. We actually took some
humans who had already got COVID, took their antibodies out of them, tried to find one which actually
then neutralized the virus. So you can imagine not an idea.
all the most efficient or the most principled process.
And so what we've shown with Chai II is that we've been able to increase these success rates
and discovering antibodies computationally by multiple orders of magnitudes compared to the prior
state of the art computationally and by many, many, many orders of magnitudes compared to the
traditional lab-based alternatives.
And what this means is pretty profound for the industry.
our view. You know, there are two ways to look at this. There's, of course, the faster,
better, cheaper, you know, this is going to allow us to make drugs against targets and get
them turned around faster. But I think the thing that we're really excited about and what's,
I think, more important is the entire class of targets that this will unlock in the future,
which have just been inaccessible to previous methods. And I think in general, the biotech industry
is everybody's a little glum right now.
XBI hasn't done that well over the last five years.
I think we're in one of the worst markets in biotech over the last few decades.
But I think with Chaito, we're starting to see, I think, those first early signs of a real
platform shift in biotech, the sort that comes around only so often.
You know, we've had one in the 70s, we've all sorts of new techniques then.
But the idea that in the next five, ten years, that there are going to be entire new
class of molecules that we're going to be able to discover.
And entire new targets that we're going to unlock and entire markets that we can open up and therapeutics that we can get to patients to really cure diseases that have had no cure before.
That's just an incredibly exciting prospect for us.
I want to come back to impact because I think the ramifications here are really huge.
But if we just go and think about first problem design, you I think looked at 52 problems.
Why that many?
And like how do you specify a target?
I'm picturing like bind to epitope X, but I'm sure there are.
other requirements you'd want to have as drug designers?
It's a great question, Sarah.
So in the Chai II paper, we look at over 50 targets.
Most of the existing papers in this area of doing AI for drug discovery are usually looking
at like one, two or three targets.
But again, it was important for us if we were seeing this as an engineering problem to make
sure that this is going to be generalizable.
It's like, imagine you had a new LLM paper and you said, oh, I solved like one problem
in the Yusimo contest, like really, really cool.
It's like, no, you need a real benchmark, and you need to actually have that benchmark at scale.
You need to have enough problems to convince yourself that the system is working.
So that's why whenever we do these experiments, you know, sometimes we'll try one or two
to try just to make sure there's not like a huge bug and, you know, make sure not everything fails.
But, you know, even if everything fails in one or two, you know, the hit rates 50%, you could have just gone unlucky.
So that's one of the reasons why we decided to do a big benchmark here, really convince ourselves things are working.
The way we selected the 50 problems, the biology people would laugh at this, and engineering people
would love it. We actually just went to the vendor catalogs to see what was in stock,
because we wanted to turn around this experiment quickly. We ordered all of these designs at the
same time. So we actually wrote a scraper that would go and see what was in stock. We would go and
pick out the protein. We would go look at what that protein sequence was. Now we need to make sure
this is held out from training as well, right? So we would take that protein sequence. We would go
compare it to all of like the database called the Sabdab. It's a group of like antibody structures
in the protein database. And we'd make sure that like none of the,
of these sequences were in there and that none of these sequences were actually even close to
anything in there. We removed things that add more than 70% sequence identity. So really things
that are like a bit different than what we could have trained on, then selected those, made our
designs, and then we shipped everything off to the lab. So we actually think it's possible that the 50%
is actually an underbound because we might have just like mess things up because of how we said
of this experiment. We did not think about the biology. These are not necessarily things that are even
that useful for therapeutic. Some of these already even have drug programs against them. We were just
doing this really from a model assessment perspective. Let's understand how well the model is
working. Let's convince ourselves. Let's convince the community that Chaito is working. And then in
terms of applying this to problems, I think, you know, now we've got like hundreds of people that
want to go and like try the model tomorrow and apply it to the various drug programs that they're
working on. So that was really how we came up with those 50 tasks. Let's benchmark this and treat
as an engineering problem. We have a broad audience for no priors that ranges from like
business people to engineers, machine learning researchers, some scientists,
and other fields.
Like, what intuition can you give listeners for how the model works under the hood?
Like, especially for anybody who might start with some familiarity with, like,
structure prediction models.
Yeah, well, structure prediction is really a key part in making these models work.
And it's actually the first thing we did when we started the company is we sprinted to
build a state-of-the-art structure prediction engine.
We actually open-source the first version of that.
It's called Chai One.
And, again, like scientists around the world are using that now.
But structure prediction basically gives you an atomic-level microscope.
and it allows you to see where atoms are placed in 3D space.
So once you can do that and you have this microscope,
then the next question is, well, can we start moving those atoms around, right?
We can now start to make changes in a sequence,
and then we can see the ramifications of those changes in 3D space.
So the actual design model, you can think of it as you prompt it with some information,
like here's a target that we want to go and design an antibody against.
And then the model will try to place again these atoms in 3D3.
space in order to satisfy that constraint.
We tell the model, here's the target, and I want you to make a molecule that, you know,
binds to that location, and then model will go and generate both a sequence and a structure
that kind of fits into that.
So that's like the high-level intuition for this.
Yeah, one piece of intuition around that is that you can almost think about structure prediction
as the image net moment for the field, where with structure prediction, we are asking a model
to go from sequence to a predicted structure, and it's sort of like a classification task.
And then design, where you're trying to design binders, that is much more like a generative
task. That's sort of like mid-jorney for molecules, whereas structure prediction, you are
looking to predict the placement of atoms in 3D space. With design, you're taking an existing
placement of atoms and you're trying to craft a new set of atoms that is complementary to
that original set. So one analogy that people like to use is that of a lock and a key. And that
when designing a protein or a drug, you have some target, which is your lock, and you're trying
to design a key using a generative model that fits that lock. And the way that the models
work is actually pretty interesting. They reason quite literally by
placing individual atoms in 3D space. And often they're getting the resolution of these
structures, the error down to less than the width of one atom when we look at the error across
the entire structure. So when we talk about atomic level microscope, you can see now why that
might be important for design, because how can you hope to be able to design the key if you
can't see the lock? Yeah, that's completely wild from a precision of prediction perspective.
You know, if we analogize to LLMs, you know, you have learned grammar, syntax, semantics,
capabilities that emerge in the model that you can measure.
Is there anything that would be analogous in terms of emergent vocabulary or concepts that
you think Chai II has?
Yeah, I think this whole point about the atomic level microscope is actually that point, right?
There is something really, I don't know, I think deep.
We still don't fully understand it about like why these models work.
Again, we didn't even know this was possible.
Obviously, we tried it so we thought that there was a chance.
And I think it just tells you something about, you know, maybe the signature of how proteins interact with one another is really embedded in the data, right?
And we're generalizing to a new setting.
So it's not like the model has seen, you know, specific binders against the target.
And then we're just trying to do some in-domain generalization and walk through that space.
That's actually quite an impactful application as well.
And that's already being done through the biotech industry.
Our team published work on that years ago already.
But I think this really new frontier about generalizing to new space,
It tells us that, again, like the model is learning something really fundamental about
how the molecules interact with one another.
Again, it's able to generalize to problems that look very different in terms of how we
would actually organize it in the biology.
I think the whole rules about, you know, what do we think about, like, a protein family
being different?
These targets that we tested on are, again, a biologist, they are very, quote, unquote,
dissimilar from what we saw during training.
But it doesn't seem like the model thinks that way.
We actually even have a slide in our paper in the supplement where we actually look
at an even harder subset. So not looking at things that are, you know, up to 70% sequence similarity
with the model, but actually pushing all the way down to 25%. So really looking at tasks that are
very different we saw in training, success rate was basically the same. Like the model didn't care.
And again, I think that indicates something very profound about what the models are learning here.
I mean, my assumption is the same here where obviously the fastest pack to immediate impact
is going to be, you know, antibodies in clinic or whatever other therapeutics,
and its partners work on. But it does raise a question of like if the model has learned something
that fundamentally like the biology research community doesn't yet know from a principles perspective,
like we will also learn those rules from these models or whatever the principles are of
structure and interaction. So I think that's super exciting. Yeah, totally agree.
How would you characterize like overall hoped for impact of Chai II in terms of like bringing
it to industry or your own programs? It's a great question, Sarah. So there's maybe two main areas.
that we can break it down into.
The first one is, again, like we've returned it into an engineering problem
and spend spending months or sometimes even years trying to discover some molecule.
You know, now we can actually do it way faster because the screening, if you will,
is happening on the computer instead of in the lab.
But the second area that we're actually even more excited about is how do we actually
solve problems, which just weren't even reachable with traditional methods?
The model is not perfect.
You know, it worked in 50% of the targets that we tried.
Maybe it would have been more right for the caveat.
that's what we talked about before.
But, you know, we're in 50% of cases.
The failure mode of the models is going to be different than the failure mode in the lab today.
And I think that's really going to be the sweet spot to focus in on.
What are the areas that were not possible, you know, a few months ago, where now we'll be
able to actually generate potential molecules really quickly against.
So those are the two areas.
You know, things that you can do today, let's do them a lot faster, and a lot cheaper.
But I think really the breakthrough opportunities are things that just weren't possible before.
Yeah, one other thing that we've announced is that we will be,
opening up access to both academic groups and industry partners. I think when you think about
how this space is just going to evolve in the next few years and the amount of opportunity
that's out there, given this platform shift, there is way too much opportunity for anyone
company to capture alone. And their drug discovery itself is just an incredibly resource-intensive
process. And I think it would be probably a conceit to assume that we could go after and
pursue every target, we've done every program ourselves, even if we wanted to. And so when we
think about impact and think about what is going to move the needle for the company, of course,
but also for the world, we think that the way to do that is to go out and bring this to life
with a really exciting set of partners. And so we've opened up access. There's an access page on
our website, which people can go to and fill out. Currently walking through there has been in
dated the requests. But my hope is that we can really enable quite a few use cases with this
and do that quite quickly. What has the reception been like so far? What is the biggest objection?
Because this is a significant challenge to the ideas of high throughput screening or even
like the workflow that even innovative pharma and biotechs have today. Yeah, it's a great question.
Usually when these kinds of papers come out, again, people have tried to do this many times. The
critique is often, you know, does this really work? You know, you show this on maybe COVID,
for example. Is this going to work for a case where we have less training data? Are the molecules
going to be high quality? Do we really, you know, kind of believe the data? So I think the approach
we did, like benchmarking this at scale has really helped a lot with that reception. Like,
I think people really appreciated that approach, which has been great. Some of the questions people
have is, okay, like, I can already discover drugs. So, you know, so now I have AI that can do it a lot
faster. But does that actually change the kinds of molecules I can work on? And it goes back to what
we just discussed before. I think there are other folks that are responding to that saying,
like, no, like the transformation years, how about those projects that didn't work for you,
or where you're really struggling today? Now you've got another tool on the toolkit. And you kind of
have to use this tool now, or you might be left behind. So I think that it's been really
interesting to see the community kind of digesting this. Of course, a lot of the AI folks are
really excited, right? Like we're getting artificial antibodies before we're getting, you know,
maybe other breakthroughs we would have expected earlier. But it's overall been really exciting to see
that reception. I mean, our inboxes are just flying up. Like, the early access has gone in hundreds
of people, you know, within hours of launching, reaching out to us. We just announced. So I think we're
still kind of digesting all that. We're a small team. So we're prioritizing early access to
the right people. But we're really excited to kind of get the models out there and for them to
start solving some really hard problems in the drug discovery space. Is there an important future
for like large scale wet lab screening? Does it just become a data
collection exercise to fill out the distribution for chai models? Are there areas where you
will, you think will need that in 10 years, 20? Yeah, I think if you just take the models and then
you sample more, you probably will get a better result. So we tested only 20 molecules per
targeting the paper, up to 20 molecules. You know, if you were to do 10 times that, a hundred
times that, orders magnitude more, you probably just get into spaces with better, better molecules.
So, you know, the machine learning model is probabilistic. It's like using chat GPT, if you're trying
to solve a math problem and then you look at the top one response or if you look at the top
10,000 responses, you're going to get a better result if you look at the top 10,000. You can't
really do that with a product experience on chat GPT. I'm not going to look through 10,000 math
responses. I won't even know which one is correct. The cool thing with the lab actually is we actually
could just test all 10,000 of those in the lab. So I don't know if you have to, but that's definitely
something that is, I think, going to be tested out with these models. And I think the future of
high throughput screening and how they kind of interact with the models, I think the cool.
question is still open, but I expect that, you know, people will be creative and will find ways
to actually take the best of AI and marry that with the best of biology to kind of push the
bounce forward. And just to add to one thing to that, there is a whole host of really amazing
CROs and other players with this incredible expertise running those traditional methods. And to Josh's
point, we have many, many companies asking us, can you run this not just 20 times, but can you run
this 100,000 times, even if it's going to work in 20, because I just might find something
better, right? And that something better can result in a better job. That could be the difference
between getting a patient, an antibody, which requires an injection or something which requires
subcudosing, for example. And so I think with these tools, you can sample the search space,
sort of add infinitum, and that marrying of traditional technique and models will actually hopefully
get us into areas of this space, where we can just find better products for patients.
I want to ask one more question generally about like predictions for biotech, and then I want
to talk about the future of chai as well. What do you think biotech looks like 25 years from now?
I realize that's a ludicrous question to anybody working in AI where you're like, hey, I didn't
know this is going to work at all last year. As I mentioned before, there is a lot of doom and gloom
in the biotech industry right now due to macro factors with rates where they are and the long-time
investment cycles that are required to make biotech viable, there is just a real pessimism in
the industry right now. It's sort of the world's market in a couple of decades. And I think that
it's moments like this, breakthroughs like this, which give us these flashes of light and these
reasons for just immense optimism about the future of this industry, not just in terms of
improving timelines and reducing costs, but also in terms of fundamentally enabling those new
products. And so if we think ahead over the next 25 years, you know, we've gone from a less than 0.1%
success rate to a close to 20% success rate in a year. Well, who's to say that in another year
that can't be a 50 plus or even a close to 100% success rate? I think if you see our mini-protein
results. We are, I think, close to 70% on those with picomola affinities, like really, really
tight binders for every single target that we tested. So all five targets that we tested works
and 70% of the designs that we ordered works, I think that there's no reason that other
classes of molecules, those success rates can't be that high as well. And I think once you have
that, you really enter this era where you sort of have a computer-aided design.
suite for molecules in a way that, you know, we have maybe solid works for mechanical engineering
or we have Photoshop for creatives. And that entire software suite will exist for biology.
I think the implications of that, the ability to design, program, understand the interactions
between atoms and molecules at the most fundamental level are pretty vast and should just give us a lot
a lot of hope and excitement about what's what's about to happen. We're just talking last night
about maybe we should be getting baseball caps saying, let's say, bullish on biotech on
them. Because I think this is one of those special moments, which I think can really, we've heard
from many others writing into the company that this has really shifted their opinion.
If you think about going from antibodies to, you know, obviously better success rates and then
also other therapeutics, is there a difficulty hierarchy we should have in our minds or is it
just like unexplored space in terms of enzymes and peptides, small molecules?
other domains.
Yeah, it's actually a lot more than just success rates.
There's lots of properties that need to be optimized for a molecule.
You know, finding a drug is like looking for a needle in a haystack.
And I think we've really passed through massive swaths of that sequence space with Chaito,
right, by really focusing out of the things that find.
That's where, like, a lot of the search space has to be searched in the lab today,
going deeper into other properties as well.
Let's make sure that these antibodies can be manufactured well.
Let's make sure that they can be really stable.
So there's lots of other properties that we're excited about.
So stay tuned for that.
And then the other thing is actually there are next generation antibody formats even.
So what we predict will happen is people probably won't be as interested in the clinic for things like monoclonal antibodies.
These are antibodies that are hitting, for example, like a specific epitope on a protein.
But now if we can make antibodies much faster and more easily, you can imagine a future where if I want to hit a target, let me choose two different parts of that target, make two different proteins that are hitting them.
Like basically two different primitive antibodies.
and let me bring them together.
This is called a bi-paratopic, two paratopes,
so basically two different antibody interactions.
And that kind of stuff is going to become a lot easier to do today.
I think these days there's a lot of trade-offs that get made in biotech
about, like, you know, risk on your target, risk on your discovery process,
how hard is it going to be to make a molecule?
And I think AI is going to raise the bar across the board.
I think the bullish on biotech, you know, movement that, you know,
Jack is announcing here as well,
If we think about what that could even represent, there's right now a lot of risk in biotech.
There's a lot of crowding on the same kinds of targets.
The risk actually starts to go down in terms of discovering some of the stuff.
Maybe there's still clinical risk if you try something that's like totally new that people
haven't done before.
But we've just like opened up, I think, the aperture of opportunities that that can be pursued
here.
And that's something that I think is really exciting.
So there's still a lot more work to do for us to validate that like all this is going to be
possible. But I think just the pace at which the field is moving, just gives us a lot of
optimism for what's going to be possible next. And maybe I can just share one anecdote about
why we are so optimistic. We had a partner come to us as we were in the process of building these
models. We didn't even really know. We hadn't had back our first few batches of data so we didn't
know if it was going to really work yet. But this partner had been working on this problem for
a few years. They had a team of, I think, five to ten people working on it. They estimated that
fully loaded. All of those, those people might have set the company back with the experiments
that they had done as well, maybe $5, $10 million. And it was a problem where they wanted to
build a molecule that cross-reacts against two different species. So both a human form and a
cyno or a monkey form of this protein, such that when they put this molecule into animal testing,
if, you know, they didn't want it to fail because the monkey has a slightly different version of
that protein than the human.
does. So we're really struggling to get this to work for whatever reason. And we put this into the
model and just prompted the model to design for these two targets at the same time, not just one
target. So you can imagine that this is a slightly more sophisticated challenge than just designing
against once. We ordered actually only 14 sequences to the lab. And I think four of those
were hits to humans. One of those was a hit to the Sino. One of them was actually overlapped
and hit both. That one now allows us to move forward with that program and gives us a whole
amount, a host of diversity around that molecule that one can explore as well. First of all,
that's very cool. And second, I think it's interesting that a lot of industry observers would say,
like, the bottleneck in pharma and the expense in pharma is clinical, not discovery. And like,
I think you're pointing to the fact, well, like, we can design for clinic, right? And actually,
it's intuitive, but it's just because it is an argument from people bearish in biotech or
concerned about the ability to make progress in programs and, uh, and reduce cost for,
for any given successful drug is, well, you know, if discovery had less risk, as Josh was pointing
out, which is like a huge claim, then the entire industry is more efficient, right, and more
effective. That's the, that's the hope. Yeah. And I think we've got a lot of reason to be
optimistic. I also don't want to oversimplify things. You know, there's lots of other things that go
into making a drug. There's capital markets that go into this. You know, there's tons of clinical
risk. This is really just the tip of the iceberg. But we're really excited about the progress that
this could represent. I want to ask strategically, like where Chai invests from here. So you talked
about other attributes that you want to be able to design in Chai models. But if we just look
at this generically as like an AI model company, where do you think the defensibility is?
There are two key areas of investment for the company.
I think, firstly, what comes out of these models, these just aren't drugs yet.
They're hits, their antibody hits, but there's a lot more work to be done to actually turn
these into verbal molecules that we can put into humans.
We have early data, which we put in our preprint to suggest that a lot of the properties
that one might want from a drug that these molecules actually have.
But we need to do a lot more further characterization and assays to convince ourselves.
that we can do that.
And then I think there's also the next step.
Stage beyond that is actually designing entire drug candidates in zero shot right out of the
models.
And I think a few months ago, we might have said this was a pretty futuristic idea.
And nobody in the company was really, really talking much about this.
But I think once you see these results and grapple with the implications, the fact that we can
get antibody hips in just 20 attempts, there's no reason that we can generate.
in trial drug candidates in that same number of attempts.
So I think there's going to be some key investments there and really,
the model right now is a model.
It's not really a product.
It is a product.
And it's certainly useful today.
But there's a lot better that products can get with more investment into just making
sure we can optimize all the therapeutic properties that people care about.
And then, of course, there's the entire interface and software layer around that
to make this really easy to use.
and the real platform that goes around supporting that.
So, you know, how do you, if you want to hit two targets, design a molecule that hits both?
How do you specify that in the software?
This is going to be a sufficiently advanced piece of software.
It's going to become as advanced as a Photoshop over time.
And as we build that out, I think we're going to need to make some really core investments into just the engineering and the products to ensure that
that we are building a software that we ourselves and others will really love to use.
Yeah, one thing to add on to that as well, we released Chai 1 open source.
We thought of it as a model, and I think Chai 2 is a lot more than a model, right?
It's become a product.
It's actually more of a bigger pipeline that comes together to even make this happen.
And it also becomes trickier to use these models.
Protein folding, you put in your sequences, you get out of structure.
Design is a different story, right?
actually specifying the prompt on its own.
We did that programmatically in the paper to go and assess this thing at scale,
but a scientist who wants to use this to initiate a drug discovery program,
probably not using a script to come up with that prompt,
is probably going to be really thoughtful about it.
And I think that's why investing in the product layer here is really important.
And not to mention, it's only going to get more complicated from here, right,
as we start to support more advanced drug modalities,
as there's various properties that come online.
As we actually show some early evidence of this in the white paper,
or, you know, you might want to actually optimize for multiple proteins at the same time.
Sometimes, you know, actually it's a good time to be a sick mouse.
In order to have a human drug, it usually needs to work in animals as well.
And sometimes drug programs actually get stuck there.
It's like, okay, guys, like we either have a mouse drug or a human drug.
It's really hard to get both.
And there are actually some cases where people have to discover two different drugs.
They have a, they call a surrogate antibody.
I'm going to like make the mouse version.
I'm going to study that, convince the FDA that like this mechanism works.
And, but you're even taking risk.
You're like, maybe this molecule, like, works, like, slightly differently.
We literally show that example in the paper of optimizing, we don't do mouse.
We actually do monkey.
So, like, monkey and human together.
But you can throw other species into that as well.
Sometimes we've got the opposite problem.
I want to hit this protein.
I don't want to hit this other protein.
We've got some early evidence as of late that that's possible as well.
And these sorts of things, you know, the prompts are just a lot more complicated.
And it means that you need to have, like, the right product.
what happens when you start doing those experiments in the lab?
We want the models to learn from that and then help us really be like a co-pilot
and driving like the next stage of those designs as well.
You know, all of this is, again, it's more than just the models.
It's really thinking about those workflows as well.
And it's even about just getting that word out to people and having them think about
this is a new tool in their staff.
What happens if you're an antibody engineer and you've been doing things in a certain way
for the past 30 years?
And now there's a new paradigm in discovering drugs.
Like that itself is actually a problem.
that a company needs to solve.
So these are all different areas that we're investing in right now.
That actually begs a question I was going to ask you is like if you are an antibody
engineer or a biologist today, what advice, you know, given, let's say they believe you
about how much is going to change and these like CAD for biology, like a software suite that
is coming into an existence.
Like what should they learn, be good at, like go study?
Well, number one, get access to try to.
Number two, you know, figure out how to get your prompts right and actually take full advantage
of it. And then I think number three, you know, start dreaming about the new possibilities.
You know, it's interesting. We've talked to a lot of antibody engineers since starting the
company. And we've been alluding sometimes to, you know, what we're doing here, you know,
sometimes you do the market research question. You ask, you know, suppose you have like a 1%
success for any antibodies, like what would you use that for? The conversations are changing now
that, first of all, it's not 1%, it's 10%, and, like, people see that it's working,
I think that creativity is really being unlocked, even ourselves, right?
I think when people are thinking about the answer to that question, there's always some
big doubt in your mind. It's like, ah, it's a hypothetical question, you know, your neurons
are not activating in the same way of doing something with it. It was the same thing with
LN. It's like, imagine asking someone five, 10 years ago, oh, you know, if we could predict
the next word in a sentence perfectly, like, what would you do with that? It's actually
very hard to imagine until you start playing with the models, even our team internally. You
know, now, even without sending it to the lab, you know, we can, again, choose some targets,
choose some prompt, generate stuff against it. You start to look at the generations that are coming
out of the model and you're like, oh, wait, I can actually solve this problem by like choosing
the right epitope on a target, choosing the right part of the target. Like, these two targets are
different. Like, sure, we, we have an engine that the model can optimize for one or optimize
for both, you know, or selectivity optimized for one and up the other. But you can actually get
a lot of that by choosing your prompt in a smart way. So let me hit part of that protein that is
quite different between the two things or quite similar between the two things.
These are the sorts of realizations that in retrospect are quite obvious, but they don't really
hit you until you actually start to like use a product like this yourself.
So I think people are just, once they get their hands on this, I think they will, they'll start
to dream of the new possibilities.
I think it just really raises the bar.
You know, the people who are most excited about that are often these antibody engineers
and these biologists.
A lot of the work that they're doing today is painstating.
and they're not the biggest fan of these slow feedback loops and these intractable problems,
because many of them that we speak to are just really motivated to solve a particular task.
And so you give them, I'm an engineer, you give me a tool which says I have to write less code.
I love that.
I can now think more about system design and architecture and more complex products and all these
other things, but it's really going to raise the bar for a lot of these people.
And I think people are only really now starting, as Josh said, to think through all the possibilities.
I was on a call with people as a matter a few weeks ago where people saying, when do you think this is going to happen?
They say, oh, not for three to five years. This is a really futuristic idea.
And then a couple weeks later, you show them what they have and they sort of fall off their chair.
And so there's going to be a sort of joint effort with us alongside these real domain experts to actually figure out these key application areas.
because biology is so vast and so complicated that actually there is so much knowledge that
so many of the practitioners, the specialists have, that no one company will just ever possess,
which is why we're so excited to go out and be partnering with people to really bring this to life.
I want to ask a couple questions more, just specifically about company building before we run out of time.
And maybe, Jack, I will start with your amazing engineer.
And then you guys also have like a very software-oriented team working on bylaw.
problems. Some of those people come from, you know, long-term research in that space in particular.
But for yourself, Jack, like, as you said, your software person, how do you get up to speed on the
bio area to go do leading work? Well, I think it's two things. First of all, ramping up on any new
field is always just a total fight. You have to get to the frontier and to be, have read the right
papers and to be knowledgeable about the areas that you need to learn. You just have to sort of put
shore her down and push through. And there are waves of excitement and misery in that
experience, but you can get there fast if you really set your mind to it. And I'd say the
second part is that surrounding yourself with just the most incredible team is the best thing
that you can do, far beyond anything that you can learn by yourself. And we have certainly
the most special group of people that I've ever worked with within the company, our co-founder's
Matt McPartland and Jack, Boutreau, who are just rare,
And then, you know, the entire team beyond that, some of the former heads of AI at other drug discovery companies, some of the top open source contributors, the team is so multi-talented. It's small. It's around a dozen people, but mighty. And I think as we've seen in other areas of AI, small but mighty teams can go a really, really long way these days. And so, you know, the, I think there are actually surprisingly few people on our team, even with a computer science degree, Josh himself.
got a chemistry degree, Alex goes his PhD in physics, a whole host of others, but
that this work is so interdisciplinary, they're really having that breadth of knowledge across
biology, chemistry, physics, artificial intelligence, computer science, engineering.
It really takes a village, and everybody is learning from each other every day because
of just how vast that subject matter is that one has to have a command of.
I think we've also benefited from such immense focus as well.
Everyone has been so passionate about trying to solve this problem.
And I think I really credit that to being a huge reason on why we were able to achieve it.
And we've also got a team that because of that focus is very engineering-centric as well.
So if you look at the whole team, you know, we have a very research-oriented team right now.
But everyone is a stellar engineer as well and takes that very seriously.
So it's not everyone solving, you know, their favorite pet problem.
we are all going after the same problem and solving that together.
And even just 10 people solving a problem together,
there's a lot of code being written every day.
You have to be very thoughtful about how that all comes together and interacts.
And I think especially in our next phase of growth for the company
as we start to invest more and more in product and the velocity around that
and getting this into folks' hands,
that's just going to become even more important.
How do we make sure that the latest research breakthroughs that we're shipping internally
are actually making their ways into partners' hands?
hands. That's something that, again, we are very thoughtful about at Chai and take very seriously.
Yeah, I also remember Jack in our office at Conviction, like, debating the merits of dev containers
with some of your scientist teammates at the very beginning of the company. And both of you from the
beginning, you know, talked a lot about platform investment. And so I actually think that's like a little
bit sort of unconventional in terms of such a research-oriented team to say, like, we need to make
this platform investment. Can you talk a little bit about that?
Yeah. So I've gone through the experience of going from zero to 100 on engineering, large engineering products before I worked on Stripe Link, which was a multi-year project. And again, Stripe Capital, where engineering teams scaling from zero to 25, 50 people by the time we were done there. Same for Link, maybe more. And I think you just learn that unless somebody is really taking
care to keep the entire system in their head and is an effective technical steward of the
architecture that things just evolve and the sort of entropy of a software takes over and slows
down your rate of progress to zero because nobody can get work done anymore. And so somebody
needs to keep the entire system in their head and the interaction between all those components
and make sure that people who are working on individual subcomponents of your codebase actually
have to minimize the amount of context that they need to load into their heads to understand
how to accomplish their task. So these are just the principles of really, you know, it's pretty
basic. It's just simplicity and modularity, but making sure that's a practice and a kind of
cultural, cultural practice, and everybody's on the same page about investing in that,
and that, you know, people aren't cutting corners. They see it as their responsibility to lay the
groundwork for the next person. And this is doubly hard to do in deep learning code basis because often
if you introduce a bug or write a regression, you won't know for weeks that that has shown up.
It's sort of terrifying because, you know, you could spend a million dollars on a training run
with a bug that crept in four weeks ago. We've literally had to do this in Chai's history,
but we've had to go and bisect Git history, run, launch training runs with a sort of a binary search
to identify a small enough range of pull requests, to identify a bug, then go to that pull
request, identify the bug.
And I think it's those sorts of experiences and the cost that, that, that, that, that, finding
that bug probably, I'm sure if it was millions of dollars, but it was certainly tens of thousands
of dollars of compute time to go back and find that thing.
It's experiences like that, which I think make rigor as such an important practice in
engineering rigor in the company. And so being rigorous about, I think, you know, some people are
surprised to learn that even though we do deep learning that we are pretty rigorous about writing
unit tests for everything. But I think these basic software engineering practices are actually
sort of lacking from most research code bases. And so bringing in some of those basic principles
has allowed us to move very fast and not just fast in the short term, but should give us a mechanism
to compound on that investment over time.
And it's overall very aligned with your just mission of term biology from science to engineering, right?
It makes sense that it would go through the core of the company's practice, too.
I have two more questions before we run out of time here.
The first is, you know, you talk about the expense of, like, training experiments.
Like, what's your decision framework for, like, how quickly to scale compute or, you know,
paralyze experiments here?
Yeah, we've tried to set up the company in a pretty scrappy way.
Actually, when we were getting started, we should have even talked about this as well.
You know, we, the company wasn't even, we were based in San Francisco.
It wasn't even clear the company being in San Francisco when we started.
And, you know, back then, we hadn't, like, raised capital for, for the company really
yet.
We were kind of, like, using free compute credits from the cloud providers.
I think for us, it's just about being, again, laser focused on solving the problem
and just, like, really making the case, like, why are we doing something?
And I think that, you know, if that's reasonable, we'll go and invest in it.
Again, in an engineering problem, if it's, if it's, you know, kind of clear you're seeing signs
of life. You're seeing some scaling law, whatever it is, like, let's go as fast as possible
make that work. But let's also not get distracted, like scaling something out if we are not
convinced that that it's going to work at. So I think that, you know, kind of scrappy culture
on the, you know, where are we spending side? It kind of goes hand in hand with making really
fast progress because it means we have a high bar for like where we're spending our time.
Everyone in the team works extremely hard. You know, there's, there's people in the office,
like, you know, all times of day, all times of night. And it's, it's pretty beautiful.
to see that. So we work hard, but I think we also work really smart. And I think you have to do that
to make progress with how fast the field is moving right now. You now see signs of life. You're very,
you're very bullish on biotech. That also means, like, given that you are going to try to scale
to support, you know, demand from the industry and your own efforts, who are you looking to hire now?
We're really hiring across all functions right now. So we've done, made some really big breakthroughs
here on the AI research side.
And as we take that to the next level and try to get Chaito in front of the right
partners, we're hiring for product engineering, for antibody engineering, for business
development, account executive.
Like there's a, there's a whole host of roles that are open on our site right now.
And again, this work is extremely interdisciplinary.
And we really want it to build this in a thoughtful way so that we can make, you know,
try to as useful as possible for the industry.
Well, thanks for doing this, guys, and congratulations on, you know, progressing the frontier of AI discovery.
Thanks so much for having us on Sarah. It's, it's, it's, it's been really fun.
Thank you, Sarah.
Find us on Twitter at No Pryor's Pod.
Subscribe to our YouTube channel. If you want to see our faces, follow the show on Apple Podcasts, Spotify, or wherever you listen.
That way you get a new episode every week.
And sign up for emails or find transcripts for every episode at no dash priors.com.