Science Friday - How Alphafold Has Changed Biology Research, 5 Years On

Starting point is 00:00:02 This is Science Friday. I'm Irofleto. You're all familiar with proteins, right? They're made of amino acids. They do many important jobs in the body, and they can take millions of different shapes. And depending on their structure, they do radically different things in our cells. And for a long time, predicting those shapes for research

Starting point is 00:00:25 was considered a grand biological challenge. But in 2020, Google's AI lab, Deep Mind, released an early version of Alpha Fold. That's a tool that was able to accurately predict many of these structures necessary for understanding biology, predicting in a matter of minutes. In 2024, the AlphaFold team was awarded a Nobel Prize in Chemistry for the Advance. That's how important they deemed it. Five years later, after its initial release, we're checking in on the state of that tech and how it's being used in health research with what of the leads to scientists responsible for developing alpha fold. John Jumper, a scientist at Google Deep Mind and co-recipient

Starting point is 00:01:09 of the 2024 Nobel Prize in Chemistry. John, welcome to Science Friday. Oh, it's great to be here. It's nice to have you. All right, let's begin at the beginning. Tell us what exactly protein folding is why it's important to understand how it works. Well, so one of the really important things to say is, you know, the cell has a lot to do. It's, um, in some sense, a factory or a machine, it has many, many parts. And those parts are all encoded in the DNA, right? We talk about DNA as the instruction manual for the cell. And one of the big things that it does is that it gives instructions on how to make different proteins. These are a couple thousand atom machines, really, really tiny nanom machines that do jobs like pump things in and out of the cell, right? So when your nerves fire, for example,

Starting point is 00:02:02 they let ions in and out. They copy DNA, they repair DNA. So they do kind of all the functions of the cell. And there's a machine made largely of proteins that converts our DNA into RNA. And then there's the ribosome, which converts our RNA into proteins. And so it makes this long chain of amino acids, and then just due to the laws of physics, this chain folds up into a really intricate functional shape. The analogy I kind of like for it is it's almost like if you had your, say, IKEA bookshelf, and as soon as you open the box, the bookshelf just built itself. So these proteins are encoded in the sequence of the protein, the amino acids in order, is effectively the shape. But what we didn't know how to do,

Starting point is 00:02:54 was get that shape, or at least not very well. In fact, quite a few Nobel Prizes have been awarded for determining the shape, or normally we call it the structure, of a single protein. But reading the structure requires extraordinarily difficult to experiment, still something like a year or more of a PhD student's time, and maybe $100,000 an expense to go to what is that structure, and that structure is the thing that is actually functional. And there's quite a bit that you understand from the structure of protein.

Starting point is 00:03:24 Right. And so how does Alpha Fold come into this? So AlphaFold is an AI system, a deep learning system. Not exactly like a chatbot, specially trained just for this task. First, I should say that scientists have studied the structure of proteins, even though I just said it costs something like $100,000. People have actually done it a lot because it's really important for biological research. So there's about 200,000 known protein structures that have been collected by scientists over more than 50 years and deposited in what's called the protein data bank, this openly available resource of all these protein structures. So you can think of it as an enormous amount of societal investment in these. And what we did was trained a machine learning system using kind of the inputs that would come from DNA.

Starting point is 00:04:17 Here is the protein sequence. and predicting the 3D structures that scientists have measured, and in fact predicting both the structure and how confident we are in that structure. And we weren't the first to try and do this or to think maybe a computer program could be useful here, but what we really developed was massively more accurate algorithms from the same data that gave a very, very precise prediction of this structure to something comparable in many cases to the accuracy of the experiments themselves, because, of course, no experiment is exactly perfect.

Starting point is 00:04:48 And so what can you do with this? What have scientists used it for so far? So there's a couple of really big things that scientists can do with it. And I think it's worth saying that one thing scientists can do is just use it to understand biology. You know, here's this protein that I want to understand that's maybe associated with the disease. And I want to change that protein. I want to or I want to understand why this mutation in this patient. might cause someone to be sick. And you might say, well, what is the structural context of that mutation? Is that maybe where this protein sticks to another? So a lot of times what scientists will do is they will use alpha fold to predict a structure. And then they'll look at it and they'll say, ah, these data that I had before that didn't make sense,

Starting point is 00:05:37 now they make sense. Or they'll use it when they're designing kind of new proteins, for example, at Oxford, they're working on developing a vaccine, I believe, for malaria. And they used alpha-fold to understand the structure of the protein, and they said, oh, well, this part of the protein from malaria is probably going to be really good for a vaccine. So they design their vaccine based on the parts of the sequence that alpha-fold, say, produce a meaningful bit of structure. Scientists are also using it as part of drug discovery, both alpha-fold two, which predicted the structure of proteins, and then alpha-fold-3, which predicts a wider range of things, for example, how a drug-like molecule, what position it might stick on a protein. And so they'll use it as part of understanding and basically always, in a sense,

Starting point is 00:06:28 using it to find the hypothesis for what's the next experiment to do. And they will use it also to understand things like evolutionary history. Basically, what you should kind of almost think of is this structure is a map that helps you make better hypotheses about proteins. So everyone that works with proteins gains something from understanding the structure that helps them design better experiments. Of course, they still test it in the lab when they're done. Right. So how good is this at actually predicting? So I think there's a couple ways to think about it. One is kind of a numerical number. We're

Starting point is 00:07:01 about 90% correct according to a certain scale called GDT. But I don't really think that's the right way to think about it. What I would more say is that it tends to generate very, very reliable hypotheses, and it says when it's not sure. So where does AlphaFold need more work? Are there proteins that it doesn't predict well? That's an interesting thing. So some proteins don't have a structure at all. So I told you the proteins fold up into a structured thing, and that's true for a majority

Starting point is 00:07:35 of proteins, but actually there are lots of regions that are intrinsically floppy, often because of their function. So in that sense, there's no answer for Alpha Fold to give. The other thing that is a really strong determinant of alpha-fold accuracy is that it uses information not just from that particular protein, but from the evolutionary history of that protein. And that evolutionary history is really saying, oh, well, here's this protein in human, but there's a very similar protein in mice. There may be even a very similar protein in yeast, many, many different evolutionarily related species that have similar proteins. And all of those are used jointly to make the prediction. And so proteins that are evolving extraordinarily quickly, like some viral proteins, or proteins from very obscure organisms where we don't have many other similar organisms, tend to be much harder to predict. Let's talk a bit about AI drug discovery. Deep Mind has its own spin-off company called Isomorphic Labs that uses the Alpha Fold tech and AI to discover new kinds of drugs. The space has been active for over a decade, but a lot of companies have folded, no pun intended. And AI hasn't actually produced a really marketable drug, has it?

Starting point is 00:08:53 Why is this so hard? You know, my expertise is not AI drug discovery. I know some. I obviously talk with the isomorphic people. But the first thing I'll say is that you're asking me at the five-year anniversary, why don't we have a drug yet? And even AlphaFolder that's really for AlphaFold 2, which is a protein, technology and then alpha-fold three, which is protein drug interactions, came a couple years later.

Starting point is 00:09:17 So I think, first of all, we're still very early in this story. I do think there is a second part, though, that making a drug, you have to optimize many factors, right? So first, you have to understand the biology really well. There's lots of diseases which we simply don't understand the biology. What we do expect is that really atomic things, like how do drugs bind, how do we make them mind better. Those are the kinds of problems that you would expect would get a lot better with alpha-fold-derived technologies, right? Because that is a very good understanding of how things come together, how they bind. And I think we are seeing relatively rapid advances, not yet drugs. The drug timeline is typically seven plus years. But we are seeing relatively rapid

Starting point is 00:10:00 advances in how you use AI to do it. And of course, it's not just alpha-fold. You have to build many more technologies. And then you have to solve a lot more problems. You have to understand things like how soluble is my drug going to be. Will it penetrate the cell membrane? Will it get metabolized by various enzymes? Will it be toxic in various ways? There are all these problems that you have to tackle in order to make drugs faster and faster. And I think people are seeing progress against these problems, but they require more work. When it comes to alpha-fold for structure prediction, it's pretty clearly both a breakthrough in the science, right? We now understand how to use AI to do this, and it's a kind of black box piece of software, technological artifact that predicts protein structure really well.

Starting point is 00:10:49 When you think about extending this to drug development, it's still useful in terms of structure prediction, but I said a structure is about $100,000. A drug costs about a billion, so you can't have one structure prediction. Is it likely the gap to making a drug, right? The numbers just don't even work out. But what you see is that this kind of breakthrough that showing you can use AI to solve these problems that it didn't use to be able to do it, that we didn't use to have any computational ability to do by any method. We had experimental ability, right? This should accelerate, but you have to kind of build this into more and more technology. So I think we're seeing a technological build out, taking inspiration from this and others, and how do we build new technologies that will help us do different phases. Of course, we still have the problem that in no sense is biology solved,

Starting point is 00:11:37 or in no sense should we expect, you know, drug design to be easy just because we can do structure prediction. There's so much more to it. I like to think in terms of Alpha Fold itself, you know, maybe we made the field of structural biology five or 10 percent faster as a whole, which is really extraordinary when you think about how many scientists work in it, how much work is done, but there's still a lot left to be, done. There's so much more. I think AlphaFold especially is an important technology, but it's also

Starting point is 00:12:08 kind of a directional indicator that we should expect more powerful technologies to continue coming within this field. But that still takes time. It takes time to work out. It takes time to do trials. But everything I hear from people very directly in the AI drug discovery industry is that they're still very, very excited, that there still looks like there's progress being made and improvements being made, but we have a lot of problems left and we still have giant biology challenges. We have to take a quick break, but don't go away. More on this when we come back. How do we put those together? How do we get, say, a language model that can talk about protein structure, that can reason over it and have these linguistic capabilities?

Starting point is 00:12:51 Well, let's talk about what the future looks like. The new version of Alpha Fold, AlphaFold 3 is being used right now. What's the big advancement with? this model. So we started off by talking about proteins. I told you about they're really important in the cell, but of course proteins are not the only things in the cell, right? Our DNA is in the cell. In fact, proteins stick to DNA. That's how they read them. There's RNA in the cell. RNA is structure, protein stick to it. There's small molecules, either drugs or natural molecules. For example, adrenaline. It's a natural molecule, et cetera. So the protein data bank is named the protein data bank, but it's really the structural biology data bank. It

Starting point is 00:13:42 contains a lot of data about the other atomic components of the cell. How do they associate? And so we extended kind of the ideas of Alpha Fold and said, let's not just make it about proteins. Let's do the protein cinematic universe. Let's do DNA, RNA, small molecules, ions, all of this. And somewhat surprisingly, it worked quite well at a lot of these. There's a lot less data on everything other than proteins. But we are able to develop, I think, reasonably accurate predictors of different types of especially binding and interaction. And this tells the wider story of how all of these pieces fit together in order to make larger cell machines. And that's been kind of the focus of the Alpha Fold 3 technical advance. And it's become also, it's a really,

Starting point is 00:14:26 really important technology for getting to things like drug discovery. Because previously, you could only talk about how proteins stick together in Alpha Fold 2. But now we can talk about things like how does a protein stick to a drug? Let's move on to a related area that I think is maybe concerning to a lot of people. And I'm talking about tech companies developing AI, Google included. They're building out a ton of data centers for their AI efforts. And they require a huge amount of electrical power and water to run. I mean, are we going to be, are people going to be competing with AI for their

Starting point is 00:15:04 electricity? I mean, on the large language model, it's not really my area of expertise. I can tell you about Alpha Fold. Alpha Fold 2 was on 128 GPUs. Alpha Fold 3 was trained on about 256. Actually, Alpha Fold 1 was TPUs. So quite a bit smaller than what I would say is large language models. The other thing that I'll say that's been kind of interesting in the Alpha Fold context or we've looked at is the energy of other things, like experimental structure determination, takes place at synchrotrons, which have enormous energy consumption. So actually, I can't speak to the large language model case. And of course, that's very complicated.

Starting point is 00:15:43 And economic analysis is very complicated. But in the kind of structural biology and alpha fold case, alpha fold is a lot, lot less electricity than the equivalent ways to get protein structure. So I think you do at least have to talk about substitution, economic effect, and others. But you'll have to talk to economists for the details of other cases. Okay, so what is the next alpha fold, the next killer app for AI in science? I think I'm excited about a couple of things. I mean, first, I'm very excited about the maturing of these technologies for drug design,

Starting point is 00:16:20 for protein designs. But I think the other part about science is it will get into large language models. It will get into this question of, okay, we can learn from these big, well-curated data sets like the protein data bank. but how are we going to learn from the scientific literature itself? How are we going to learn to reason better and better? And I think we're already seeing some, you know, large language models are shockingly effective at scientific discourse. They're starting to get scientific reasoning. People are looking at them for scientific workflow. So I think that's going to be a really big deal. And then the question I think that will become a key one, or at least one I'm interested in in the long term, is how are we going to fuse these two technologies?

Starting point is 00:17:00 We have this AI applied to kind of narrow problems like protein structure prediction. And we have these generalist language models that are not as good as a specialist human in their field, but is still a really exciting and powerful development. How do we put those together? How do we get, say, a language model that can talk about protein sequence, protein structure, that can reason over it and have these linguistic capabilities, but also have this extraordinary performance that comes from learning from scientific data. So I think that in a few years will be, if we make that technology work, will be really, really transformative.

Starting point is 00:17:37 Well, thank you very much for taking time to be with us today. And good luck. Thank you. John Jumper is a scientist at Google DeepMind, and we'll be checking back as things develop. This episode was produced by D. Petersmith. I'm Ira Flato. Thanks for listening.

Science Friday - How Alphafold Has Changed Biology Research, 5 Years On

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.