Short Wave - Can AI Crack The Biology Code?

Starting point is 00:00:00 You're listening to Shortwave from NPR. Hey, hey, shortwaver. Emily Kwong here with producer Burley McCoy. What's up, Burley? Hey, Emily. Hello, what do you have for us today? Okay, so Emily, today I want to dig into how AI has shaken up the field of protein science, as in the fundamental building blocks of life proteins.

Starting point is 00:00:23 I've heard of them. Yeah. I mean, this is like what you studied back in your scientist days. Yes, yes. I love proteins. Oh. We love that you love them. How has AI moved the needle in this field, though? Well, scientists have used it to dig into a problem that protein scientists have struggled with for more than 60 years. And that is, what do these building blocks of which there are millions look like?

Starting point is 00:00:46 Like their shape? Like their shape, yeah, exactly. And why is that so important? Well, the ability of a protein to do its specific job, so like carry oxygen through your body or turn light into sugar, that relies wholly on its unique complex. complicated shape. So to understand how it works, you need to know its shape. But why can't scientists just run an experiment to determine the shape? They can for some proteins, but those experiments can take years and years. And Emily, that's because a scientist essentially needs to take the equivalent of a molecular photo of the protein to map its complicated shape. But getting the protein to cooperate to get that photo, so like to hold still, for example, without falling apart, that can be super

Starting point is 00:01:28 tricky, and it could take a grad student's entire PhD program to figure out a single protein. And other proteins were just abandoned because they would never cooperate. Proteins sound difficult, honestly. So the challenge is how do you figure out a protein's shape without running these super tedious experiments? Is this where AI comes in? Yeah, and to give you a sense of kind of how AI has changed the protein game, there's this this protein competition that scientists run every other year. Get out a protein competition. Okay. Yeah, and they've run it for the past 30 years where groups will basically compete on who can accurately guess the most protein shapes.

Starting point is 00:02:12 It's like nerd central for sure. We love. And for most of that 30-year history, participants have really only made incremental progress. But in 2020, Google DeepMind used AlphaFold 2, that's its AI protein prediction model, and Emily. ElfaFold 2 blew the other competition out of the water completely. Wow. Okay. Game changer. And now the Google DeepMind team has taken this AI tool to the next level by expanding it beyond proteins.

Starting point is 00:02:44 So today on the show, how scientists have taken a huge step to understanding the building blocks of life using AI. Plus, how other researchers are using the tech to design brand new proteins, ones never before, in nature. And how AI could help us solve the biggest problems we face today, from disease to climate. You are listening to Shortwave, the science podcast from NPR. Okay, Burley, so scientists, it seems, have been trying to figure out the complicated shapes of proteins for decades to better understand how they work. Why has this been such a complicated thing to figure out? Well, the short answer, Emily, is that there are so many theoretical ways a single protein could fold that it's a big problem to solve. So if you unfolded a protein, it would kind of look like a bunch of beads

Starting point is 00:03:54 on a long string. Those beads are little molecules called amino acids. Oh, I remember this from biology. There are like 20 types of amino acids. Yep. Each one is a little different. Right. So each one has a slightly different shape. And that kind of dictates how that part of the string can be folded up. Because proteins often have a hundred or more amino acids, you can see how imagining all the ways it could fold would get complicated. Yeah, it just sounds like thousands of different shapes or what, hundreds of thousands of different shapes. Okay, try billions of trillions, Emily.

Starting point is 00:04:29 Like, there are theoretically more ways for one single protein to fold than there are stars in our night sky. This sounds like a glorious nightmare. Right? I'm so curious. Okay. So you said that AI has helped us make some leaps and bounds towards a solution. How does this technology work?

Starting point is 00:04:47 So this Alpha Fold model is a type of AI called a deep learning program, which is this huge network of data processing points called nodes. And the purpose of this network is to learn and then make predictions based on what it's learned. In AlphaFold's case and other models like it, it learns about proteins from a huge collection of protein structures that scientists have been building on for decades from their experimental data. Okay. So the idea is that after these. models use all of that carefully gathered experimental data to learn. They can then predict the shapes of proteins they do not know yet. Exactly. Okay, and going back to the protein competition in 2020, how did Alpha Fold blow away the competition? So they essentially changed the whole architecture

Starting point is 00:05:35 of their model. They had been using AI before, but remember the beads on a string analogy? If amino acids are the beads, even if one bead is far from another on the string, when it all folds up, they could be right next to each other. So with Alpha Fold 2, the model looked at distances between all the different amino acids and previous knowledge from solved protein structures. Awesome. And the accuracy and speed of the predictions went way up. Okay.

Starting point is 00:06:03 And I'm assuming that made a huge difference for scientists everywhere studying proteins. Totally. Julian Bergeron, a structural biologist at King's College London, is one of them. He studies the tail-like appendage that propelled. bacteria. So it's called a phlegelum, and it's pretty complicated. It's this huge assembly. So it's longer than the bacterial cell itself. It consists of 20 to 25 different proteins, but many of them have hundreds of thousands of copies of that protein. And these huge propeller machines are what gives some bacteria the ability to make you sick

Starting point is 00:06:40 or build plaque on your teeth. So Julian's lab is trying to figure out how these giant machines work, what their pieces look like and how it all fits together. And so when the AlphaFold II model came out, he just had to try it. And I input a sequence, and then a few hours later, I had the model, and I was like, oh my God, this just did it. And we'd been struggling with that problem for, you know, months, if not years. And all of a sudden, I messaged my lab, and I said, we model everything. And we've had dozens of project that immediately progressed thanks to this.

Starting point is 00:07:20 Okay, so it sounds like overnight Alpha Fold changed the trajectory of his lab. Yeah. But how did you know that using AlphaFold 2 would actually work? Yeah, so the accuracy is super important, right? Especially when you're basing all of your other experiments on the results. And it's important to note that like other AI, AlphaFold 2 isn't right 100% of the time. so you can't just take the results at face value. But unlike some other AI, included in the results is a score,

Starting point is 00:07:49 basically telling you how accurate each part of the structure is. Okay. And are others in the field using AlphaFold too? Yeah. So this is something that actually sets AlphaFold apart from other protein prediction AI models. It's extremely user-friendly. So essentially, anyone who works on a protein or even just has a sequence of a protein can plug it in and get results. I talked to Pushmeet Coley, Vice President of Research at Google DeepMine, and he told me why it was important for them to make this tool open access.

Starting point is 00:08:20 The mission statement that we have for the science program at Google DeepMind is to leverage AI to accelerate an advanced science. Okay, so I'm scrolling through the AlphaFold website, and I'm seeing scientists using this model for all kinds of things. They're working on malaria and cancer research, drug discovery, plastic eating enzymes. And last week, DeepMind released a new version, Alpha Fold 3, which can predict the 3D structure of proteins and other kinds of biomolecules that they attach to. Why are those other biomolecules important? Yes. So I know we talked about how much proteins are super important. I love them.

Starting point is 00:08:58 But I have to admit they rarely work alone. And if we actually want to know how biology works as a whole, we need to understand how proteins work with their proteins. partner molecules. So it really gives you a more detailed and more accurate picture of what is happening inside the body where proteins are just not just sort of existing in isolation. They are interacting in a very rich biological space or soup of RNA and DNA and small molecules and it really sheds light into those rich interactions. Now, previous versions of these protein prediction software would model where each amino acid Cid was located, but in this new version, AlphaFold 3, it maps things on an even smaller level.

Starting point is 00:09:47 So it models where individual atoms are. Wow. So they can predict the structure of multi-protein complexes like the bacterial phlegelum or something like proteins in the blood, which attach to iron atoms. That is powerful. Okay. What are the limits to AlphaFold predictions? Yeah, there are definitely limitations. Pushmeet says that the model works best when a protein has. has a single defined structure.

Starting point is 00:10:12 But some proteins have more than one shape or they have sections that are kind of flimsy, think cooked versus uncooked spaghetti. Okay. So the model sounds like some trouble with prediction in some cases and the results show that. Yeah. So the idea is that these results would say, hey, I'm not so confident in this area of the protein. Just so like users know. Oh.

Starting point is 00:10:34 And another limitation is that the prediction ability depends on the amount of what's called training data available. Uh-huh. So I mentioned that there's a lot of training data for proteins, but... Some categories have much less training data available. For example, there's much less structural data available for RNAs. Okay, so the prediction is only as good as the data. Exactly, exactly.

Starting point is 00:11:02 But Emily. But Burley. There's another way scientists can use AI in the protein world. Okay, what's that? to generate brand new proteins. Ones, like, not found in nature anywhere. Humans face new problems today, and, you know, we live longer.

Starting point is 00:11:20 We're polluting and heating up the planet. And it's reasonable to think that if with more millions of years of evolution, that some of these problems would be solved, but we don't want to wait that long. So the idea is that we can now create completely new proteins that solved these problems that weren't really relevant during evolution. to make the world a better place. So this is David Baker.

Starting point is 00:11:44 He's a biochemist and the director of the Institute for Protein Design at the University of Washington. And he's been working on proteins for years. He actually developed one of the earlier protein prediction models. His lab has a similar AI program to AlphaFold 3. It's called Rosetta Fold All-Atum. But his big focus is designing these brand new proteins. This sounds so futuristic. Right?

Starting point is 00:12:07 Like what kind of new proteins? So far, they've done things like design new protein antibodies, which are important for fighting infections, in this case to fight influenza. They've made something called a switch protein that could be used as an environmental sensor. And they've also made proteins that could help store carbon, which is a huge hurdle for fighting climate change. I think really across medicine, sustainability, technology, I think there's huge opportunities to transatlantic. form the current ways we do things with protein design. So these predictive and generative AI models have fundamentally changed the protein science landscape.

Starting point is 00:12:49 And again, there's definitely room for improving the prediction power. But with what the field has shifted to, like in terms of prediction accuracy and design potential, I mean, it's really gotten this retired protein fanatic, like missing my science days. Burley? Thank you so much for bringing us this big, big story about the little things in life. Thanks, Emily. This episode was produced by Rachel Carlson. It was edited by our showrunner, Rebecca Ramirez.

Starting point is 00:13:28 Burleigh Check the Facts. Co. Takasugi Chernovin was the audio engineer. Special thanks to Jeff Brumfield. Beth Donovan is our senior director, and Colin Campbell is our senior vice president of podcasting strategy. I'm Emily Kwong. Thank you for listening to Shortwave from NPR.

Short Wave - Can AI Crack The Biology Code?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.