a16z Podcast - DNA's Potential to Store the World's Data

Starting point is 00:00:00 Moore's Law is not like a law of nature, a law of physics. It's a law of human determination. They would just will it into existence. The races to do is to be able to build the Moore's Law for DNA. That feels like one of the last big unlocks in synthetic biology. Actually, that's where you really quickly get humbled by how all nature has done itself. A thousand years from now, humans will know how to read DNA. Biology has had a huge storage problem on its own.

Starting point is 00:00:28 dealt with it within the biological context. The fact that you look out in the world with trees and people and birds and so on, there's tons and tons of bits there and bites all the other place that's been there for millions of years. Life itself is literally digital. Hello, A16Z podcast listeners. Welcome to 2024. At the speed that many fields have been moving recently, whether it be AI or robotics or biotech, almost nothing feels impossible. And that's why we're kicking off this year. with a topic that does sound outlandish, but might actually be within our field of view.

Starting point is 00:01:04 That topic, DNA as data storage. Scientists have estimated that the average human body has trillions of cells, with billions replaced daily. Now, that's an incredibly efficient machine, driven by the genetic code in every single one of us. And of course, that's DNA. This blueprint has also evolved and become optimized for space and longevity. You cannot see it with the naked eye, yet it can last, for hundreds, maybe even millions of years. And get this, the storage capacity of a single gram

Starting point is 00:01:34 of DNA is over 200 million gigabytes. The amount of DNA in your body? 150 billion terabytes capable of storing every single movie released in the 21st century, billions of times over. Or equivalent to thousands of data centers, except requiring way less energy and lasting much longer. So it should be no surprise that humans wanted to leverage this, quote, natural intelligence built and all of us. And the ever-increasing demand for data only pushes this trend further, with some researchers estimating we may even run out of storage this decade. And I have a funny feeling that software is not done eating the world. Today, you'll learn from A16C general partner, VJ Ponday, about the fascinating world of DNA as a storage mechanism, and how advances in DNA

Starting point is 00:02:25 sequencing, aka reading, and synthesis, aka writing, have let us here. As Fiji says, this next wave of biological repeating will be the mother of many new exponentials to come. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates, may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16C.com slash disclosures.

Starting point is 00:03:09 All right, first I want to kick off this episode with a quick story that I originally stumbled upon via an article called Save As DNA. Well, of course, drop that link in the show notes, and it's written by Toby, a writer for Exo, and he shares the Nobel Prize-winning physicist Richard Feynman's 1959 lecture called There's Plenty of Room at the Bottom, an invitation to enter a new field of physics. In it, Feynman calculated that all the volumes of the encyclopedia could be written on a, quote, cube of material one two hundredth of an inch wide, to which he followed, so there's plenty of room at the bottom.

Starting point is 00:03:46 Remember, this is in 1959. He even postulated the idea of swallowing a surgeon in the form of a swallowing a surgeon in the form of a dwellable robot. Knowing he was well before his time, he quipped. This fact that enormous amount of information can be carried in an exceedingly small space is, of course, well known to the biologists, and resolves the mystery which existed before we understood all this clearly, of how it could be that in the tiniest cell. All of the information for the organization of a complex creature, such as themselves, can be stored. All this information, whether we have brown eyes or whether we think at all, or that in the embryo the jawbone should first develop with a little hole in the

Starting point is 00:04:25 side so that later a nerve can grow through it. All of this information is contained in a very tiny fraction of the cell, in the form of long-chain DNA molecules, in which approximately 50 atoms are used for one bit of information about the cell. Now, Feynman ended his lecture with a $1,000 prize to the first person to take the information on the page of a book and miniaturize it to one 25,000th of its size. So that's 25,000 times smaller to be read on an electron microscope. Now, it ended up taking 30 years for Tom Newman to claim this prize. So you can get a sense of just how long it took for more people to widely understand the intelligence embedded at the atomic scale. All right, now let's bring in Vijay to get us up to date on where we really

Starting point is 00:05:12 are in the trajectory today. I'd love to just start out by getting your take on a few trends that are emerging, and you could even say colliding. So the first one is around data storage. The world is undeniably becoming more digital, as Mark likes to say, software continues to eat the world. Do we realistically have the data storage we need for this increasingly digital world, or what are you seeing there?

Starting point is 00:05:39 Well, yeah, so from a compute point of view, when we talk about Moore's Law, we often talk about just from a calculation point of view. Can we compute more and more? But a key part of AI and compute today is data. And I think what a lot people forget about is actually storage has been exponentially increasing over time. And if you think about your laptop now might have a terabyte, wasn't that long ago, you're happy to have 100 gigabytes and then 10 gigabytes and so on. So that exponential increase in storage just due to technologies like originally just technologies, now SSD technologies, has enabled this exponential increase in storage, which in turn actually is a key part of AI.

Starting point is 00:06:15 AI is this confluence of exponential increase in compute meeting the exponential increase in data. Exactly. And I think to your point, many people don't realize how many zeros follow the number of bytes that humans as a species are now producing. One report predicted that by 2025, humanity is set to unleash 175 zeta bytes. So that's 175 followed by 21 zeros. another exponential trend or at least fastly advancing trend is around genomics and DNA sequencing and synthesis. So sequencing being reading, synthesis being writing DNA. Maybe one interesting thing I'd love to get your take on is people maybe are familiar with the sequencing graph where that also has exponentially declined similar to Moore's law. But synthesis hasn't quite followed as much of an exponential trend, despite us working on both for at least a few decades. Yeah, in the genomics

Starting point is 00:07:14 field, reading actually is, in the end, a lot easier than writing, in part due to a variety of very clever technologies on the reading side. The writing side is actually much more complicated for somewhat technical reasons. The reason why reading actually is somewhat simplified is that the way most of the reading is done is you have a long case of DNA and it's chopped up into little bits. And then it's red as little bits and then put back together like this massive jigsaw bubble. But that's the so-called shotgun sequencing that was invented in the 90s. And that really was a huge advance in the reading. And sequencing now is the successor to those types of technologies. Writing there's no equivalent trick yet. And you can imagine maybe you write little bits and then

Starting point is 00:07:53 you put them together. But putting that together in real life is a lot harder than to put the pieces together on software. And that's maybe the fundamental asymmetry. And is that change Are there new innovations that make you confident that maybe we'll see an inflection there? Yeah, there are many companies in the space, many researchers pushing it for a variety of applications. Obviously, in biology, having DNA is the starting place for any sort of synthetic biology operation. That synthetic biology is this whole space where we can actually engineer biology and that it starts with the writing into it. And so most synthetic biology companies are really bottlenecked by the speed and cost of writing. DNA. And then when you think about like RNA vaccines or so on, like we've all took COVID

Starting point is 00:08:37 vaccines, that's writing a sibling of DNA RNA. And so there's been a huge effort for doing that. The needs are great. And it bottlenecks so many key things that I think that's really encouraged a lot of people to move into the space. One application that I just thought was so fascinating was storage. So coming back to that first trend, it at least to me was not intuitive to say, let's use this building block that's in all of our bodies for data storage. Like, how did we get there? And is this really a potential solution? Yeah, it's fun for a variety of reasons. So one reason is that it's super dense. A disk or any other technology is not going to be nearly as dense as DNA. Each bit in DNA is just not that many atoms when it comes down to it.

Starting point is 00:09:19 The density actually will be really hard to beat that with other technologies, at least for a while. And secondly, we have technologies to read it super fast. And finally, I don't know if you have any old storage technologies like zip disks or old Saturdays. discs. Can you read that? I mean, even like USB, sometimes people don't even have USBA to read now. The nice thing about DNA is like a thousand years from now, humans will know how to read DNA. That will have no doubt. So that ubiquity and significance of DNA from a biological point of view, it will mean that actually will always know how to read it. And that's actually truly interesting. Oh, and I forgot the other kicker is that it can last for a thousand

Starting point is 00:09:55 years. So if you want to have something in a vault, like a copy of a great movie, like the Godfather to be there for safekeeping. You could have that in DNA. Yeah, I think those are great points. And just to add a number to this, the storage capacity of a single gram of DNA is around 250 million gigabytes, which is just wild. I mean, to your point about efficiency, that is just crazy when you compare it to some of the man-made alternatives. Maybe we could compare and contrast the DNA as data storage concept with what we actually use today. Are there other things you'd call out there in terms? of whether this is truly viable or any other aspect of whether the capabilities are really there yet?

Starting point is 00:10:37 Yeah, so I think there may be other intermediates. So archival storage is probably going to be the first application. But then also what's interesting is that people are coming up with more and more compute elements that you can encode in biology. So biology can do little circuitry and so on. And so the nice thing about DNA as a storage medium is that it's very compatible with our bodies. And so you can imagine new types of therapeutics that actually use some degree of DNA as storage such that these elements are doing some very simple version of compute. That's very

Starting point is 00:11:06 early, but I've been around playing with computers since the 70s, and those things were pretty early then, too. And look, here we are. I think things start simple. But the part that gets us excited about something like biological computing is how compatible it would be with us and how far it could grow from here. Yeah. And if we compare it to software and this idea of When we save something on our computer, a lot of people are familiar with zeros and ones and that being encoded into bits. Maybe you could break down what the equivalent is when we're talking about DNA. Like, how can we actually get biological computing with this structure? Yeah, so DNA is like a long molecule, like a single rope, and it's comprised of DNA bases.

Starting point is 00:11:48 And each base could be one of four possibilities. And while bits are two possibilities, bases get four possibilities, so you can code two bits with each base that way. And then the other key thing about it, and this was the huge revelation that Watson Crick published, is that DNA has a structure where one base will connect in with a complementary one, and that forms a double helix. And this double helix is very stable. That's the thing that can last for a very long time. But also, it's error correcting because you have a redundant copy, essentially a complementary copy

Starting point is 00:12:21 in there. And so that also is very appealing. In the end, biology has had a huge storage problem on its own. The fact that you look out in the world with trees and people and birds and so on, there's tons and tons of bits there and bites all the other place that's been there for millions of years. And so it has all the same compute problems of how to store, how to error correct, how to read and write quickly. And so it's dealt with it within the biological context. And that's partially what also makes DNA interesting. Yeah.

Starting point is 00:12:48 Maybe something else that comes to mind is that storage is not just about writing, but it's about that retrieval side as well. at least if we're thinking about I write something up in Google Docs and I want to save it and I want to be able to bring it back and share it. Can we do that with DNA? We're talking about encoding all this data into DNA. Do we really have the retrieval mechanisms to do that effectively or are we talking about a different kind of storage? In principle, you could do that. And what you would do is you could have a drive equivalent where you have file names, which are little bits of DNA at the end. And then if you want to retrieve that file, you'd get a the complementary part to that.

Starting point is 00:13:27 And so you pull out that strand and then you just read that strand. And so that's just even a simple example. People have come up with very clever approaches. The one thing about this is that this is much more in the hierarchy of computers where you have RAM to SSDs to tape drives. There's much more in the tape drive side of things. So you could get a lot of bandwidth, but probably not very good latency. It would take some time to read all that stuff.

Starting point is 00:13:50 But also, and this starts to get into James Bond-like territory, but people are also realizing it's a way you can move a lot of data very discreetly. And so you could imagine like injecting somebody with something and they walk across the border and there's nothing to scan for. And that could easily be like terabytes of data. I hadn't considered that, but one aspect of storage is security. Are there any other things that are top of mind there for you in terms of if this were a new storage mechanism, security is such an important aspect of that when we think of software.

Starting point is 00:14:21 where we're thinking of now digital biology. Are there any other implications of that? I think it's all the same things as we deal with with any sort of cybersecurity. And so I think people keeping things private is generally not a bad thing. And so I think actually I would flip it and say that it actually is an interesting scheme for privacy.

Starting point is 00:14:40 But we're really dipping into sci-fi here. That's not something people are doing now, but that something in principle that could be very plausible. Yeah. Well, coming back from sci-fi, grounding ourselves, When we're talking about this one application of potential DNA synthesis, is this something that's already in motion or even commercially viable?

Starting point is 00:14:59 Are companies on the ground actually producing this technology? And also, do they have customers yet? Yeah. So there's numerous companies of various stages. Some have been around for many years, some that are startups that are producing DNA. And like any commodity, you can actually go to the web, upload a sequence, and get your DNA. So that's there. I think what the race is to do is to be able to build the more.

Starting point is 00:15:21 law for DNA to build that exponential decrease in cost. And that, as you should point out at the beginning, actually hasn't been there quite yet. There's been a decrease, but maybe not a true exponential decrease. And so if we can do that, that feels like one of the last big unlocks in synthetic biology that we've got the read and actually even got the edit with CRISPR. And so people are editing all the time. Ironically, we just don't have that right part. I think if we can enable that, so much in synthetic biology is ready to go. Quick interjection, just in case you're looking for a real-world example of how all this can be applied. So spider silk has long been known for its strength.

Starting point is 00:15:58 In fact, it's five times as strong as the same weight of steel, but it turns out that you can't just grow spider silk. If you try to farm spiders, unlike silkworms, they will actually all just kill each other. But a paper published in September showed how scientists using CRISPR were able to genetically engineer silkworms with spider genes that not only didn't kill, one another, but we're able to produce fibers six times as tough as Kevlar. And of course, we're just getting started here. Now, back to Vijay, to shed light on what's between us and these potential applications across health, food, materials, and more. What do you think is the biggest bottleneck, if you could point to anything?

Starting point is 00:16:37 For now, this has largely been a sort of a chemical problem, and so people have been handling it with various chemistry methods. There's different types of ways people synthesize proteins, which are analogous, long-chain molecules and people have tried to extend that to DNA. And those things work well, but you can imagine you have to make this thing perfect. And so as it gets longer, it gets exponentially harder to have high fidelity. And so that's why people typically have been selling really short ones. And then maybe you can try to combine the short ones, but it is a sort of a different type of exponential problem that it's hard to do it really well without errors. And actually, that's where you really quickly get

Starting point is 00:17:12 humbled by how well nature has done it itself, that it's solved this problem in its own ways. And so that's still something that I think to do it at scale with very low error is the holy grail. Yeah, I mean, the more I research this topic, the more I just appreciated, oh, my gosh, this is so efficient. Our body's storage mechanism is just incredible, or nature, really, for that matter. Trying to stay away from sci-fi again, but I'd love to get your take on just where we go from here. I know it's impossible to make a true prediction, but just given all the things you've seen, what kind of timeline do you think is hopeful? It's impossible to really know, but there are actually now many companies around the hoop that are doing exciting things, and there's a great need for it. So the combination of the market maturing in synthetic biology and companies pursuing lots different approaches, it has the right elements of what we want to see in terms of a new type of tech company.

Starting point is 00:18:04 But this is the type of thing where that advance has to really be there. But what's been really unique about sequencing is that there's been advance after advance. not unlike what makes computer chips work is Moore's Law is not like a law of nature, a law of physics. It's a law of human determination where people have been just trying and they'll do this lithography and that lithography

Starting point is 00:18:25 and this new type of transition, and they just will it into existence. We've seen the analogous willing into existence in the sequencing part. On top of a platform that was compatible for that, we've yet to sort of get started. And I think if we can build a platform such that we can enable that human willing to existence sort of phenomenon, that's really

Starting point is 00:18:45 what's been missing. Right. And maybe to get listeners excited about that potential unlock, you touched on some of these earlier, but if we are able to generate low-cost at-scale synthesis of DNA, what does that unlock, whether it's materials or food or health, what are the applications that maybe excite you most? I think the most foundational statement is that it really unlocked. large-scale engineering of biology.

Starting point is 00:19:13 And so that shift from biology as let's tweak and experiments and just discover to let's build things. And it's that building part that really the DNA part is central for. Because once we get the DNA, we can actually now CRISPR edit it into any type of system. And then the key part is actually not just building it, but then building it quickly so we can have fast iterations. I think what's really great about programming is that you can compile and run your code like in minutes or seconds and get those fast iterations.

Starting point is 00:19:43 Once DNA synthesis can get to that point, then we'll see fast durations in synthetic biology. We'll see that engineering cycle kick in, and it will be the mother of many new exponentials to come. We really need that platform, the same way we saw with software. Yeah, I think the last thing I'll add is that you started by mentioning how a lot of our life is becoming digital. Ironies that in many ways it always has been, that life itself is literally digital, that we may be coming full circle to adopting these technologies for new advances in engineering and biology.

Starting point is 00:20:13 But the super fun thing is what we're talking about is when CSM bio converged. Yeah, I think that's a wonderful place to end off. Thanks, VJ. Fantastic. All right, there you have it. DNA as data storage. Yet another example of science fiction,

Starting point is 00:20:28 actually maybe just being science reality. Now, clearly there are still hurdles along the way, but hopefully this episode got you amped about the possibilities to come And also, maybe an appreciation for just how efficient our own bodies are. And by the way, if people want to hear more from VJ, you can listen as he hosts our sister podcast, Raising Health. Now, Raising Health was previously called Byreed's World, and they actually just relaunched. So again, if you want to hear more from VJ and the wonderful guests on Raising Health, make sure to go check out that feat. All right, we'll see you next time.

a16z Podcast - DNA's Potential to Store the World's Data

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.