The a16z Show - DNA's Potential to Store the World's Data

Episode Date: January 5, 2024

Nature’s blueprint – DNA – is an incredibly efficient machine. You cannot see it with the naked eye, yet it can last for hundreds, maybe even millions of years. Plus, the storage capacity of a s...ingle gram of DNA is over 200 million gigabytes! As the cost of DNA sequencing (reading) and synthesis (writing) comes down, scientists are looking to our very own biology for applications reaching as far as data storage. Learn more about this fascinating world with a16z General Partner Vijay Pande, as he says, this next wave of biological computing will “be the mother of many new exponentials to come.” Resources: Save As: DNA Part 1: https://exo.substack.com/p/saving-our-story-in-dna-part-1Save As: DNA Part 1: https://exo.substack.com/p/save-as-dna-part-2 Stay Updated: Find a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Transcript
Discussion (0)
Starting point is 00:00:00 Moore's Law is not like a law of nature, a law of physics. It's a law of human determination. They would just will it into existence. The races to do is to be able to build the Moore's Law for DNA. That feels like one of the last big unlocks in the synthetic biology. Actually, that's where you really quickly get humbled by how well nature has done itself. A thousand years from now, humans will know how to read DNA. Biology has had a huge storage problem on its own.
Starting point is 00:00:28 dealt with it within the biological context. The fact that you look out in the world with trees and people and birds and so on, there's tons and tons of bits there and bites all the place that's been there for millions of years. Life itself is literally digital. Hello, A16Z podcast listeners. Welcome to 2024. At the speed that many fields have been moving recently, whether it be AI or robotics or biotech, almost nothing feels impossible. And that's why we're kicking off this year with a topic that does sound outlandish, but might actually be within our field of view. That topic, DNA as data storage. Scientists have estimated that the average human body has trillions of cells, with billions replaced daily. Now, that's an incredibly efficient
Starting point is 00:01:15 machine driven by the genetic code in every single one of us. And of course, that's DNA. This blueprint has also evolved and become optimized for space and longevity. You cannot see it with the naked eye, Yet, it can last for hundreds, maybe even millions of years. And get this. The storage capacity of a single gram of DNA is over 200 million gigabytes. The amount of DNA in your body? 150 billion terabytes. Capable of storing every single movie released in the 21st century, billions of times over.
Starting point is 00:01:49 Or equivalent to thousands of data centers, except requiring way less energy and lasting much longer. So it should be no surprise. that humans wanted to leverage this, quote, natural intelligence built and all of us. And the ever-increasing demand for data only pushes this trend further, with some researchers estimating we may even run out of storage this decade. And I have a funny feeling that software is not done eating the world. Today, you'll learn from A16C General Partner Vijay Ponday about the fascinating world of DNA as a storage mechanism, and how advances in DNA sequencing, aka reading,
Starting point is 00:02:26 in synthesis, aka writing, have let us here. As Fiji says, this next wave of biological repeating... ...will be the mother of many new exponentials to come. As a reminder, the content here is for informational purposes only, should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates
Starting point is 00:02:55 may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16C.com slash disclosures. All right, first, I want to kick off this episode with a quick story that I originally stumbled upon via an article called Save As DNA. Well, of course, drop that link in the show notes, and it's written by Toby, a writer for Exo, and he shares the Nobel Prize-winning physicist Richard Feynman's 1959 lecture called There's Plenty of Room at the Bottom, an invitation to enter a new field. field of physics. In it, Feynman calculated that all the volumes of the encyclopedia could be written
Starting point is 00:03:38 on a, quote, cube of material one two hundredth of an inch wide, to which he followed, so there's plenty of room at the bottom. Remember, this is in 1959. He even postulated the idea of swallowing a surgeon in the form of a swallible robot. Knowing he was well before his time, he quipped, This fact that enormous amount of information can be carried in an exceedingly small space is, of course, well known to the biologists, and resolves the mystery which existed before we understood all this clearly, of how it could be that in the tiniest cell. All of the information for the organization of a complex creature, such as themselves, can be stored. All this information, whether we have brown eyes or whether we think at all, or that in the embryo the jawbone should first develop with a little hole in the side so that later a nerve can grow through it, All of this information is contained in a very tiny fraction of the cell, in the form of long-chain DNA molecules, in which approximately 50 atoms are used for one bit of information about the cell.
Starting point is 00:04:40 Now, Feynman ended his lecture with a $1,000 prize to the first person to take the information on the page of a book and miniaturize it to 1,000th of its size. So that's 25,000 times smaller to be read on an electron microscope. Now, it ended up taking 30 years for Tom Newman to claim this prize. So you can get a sense of just how long it took for more people to widely understand the intelligence embedded at the atomic scale. All right, now let's bring in VJ to get us up to date on where we really are in the trajectory today. I'd love to just start out by getting your take on a few trends that are emerging and you could even say colliding. So the first one is around data storage.
Starting point is 00:05:25 The world is undeniably becoming more digital. As Mark likes to say, software continues to eat the world. Do we realistically have the data storage we need for this increasingly digital world? Or what are you seeing there? Well, yeah. So from a compute point of view, when we talk about Moore's Law, we often talk about just from a calculation point of view. Can we compute more and more? But a key part of AI and compute today is data.
Starting point is 00:05:51 And I think what a lot people forget about is actually storage has been exponentially increasing over time. And if you think about your laptop now might have a terabyte, wasn't that long ago, you're happy to have 100 gigabytes and then 10 gigabytes and so on. So that exponential increase in storage, just due to technologies like originally just technologies, now SSD technologies, has enabled this exponential increase in storage, which in turn actually is a key part of AI. AI is this confluence of exponential increase in compute meeting the exponential increase in data. Exactly. And I think to your point, many people don't realize how many zeros follow the number of bytes that humans as a species are now producing.
Starting point is 00:06:30 One report predicted that by 2025, humanity is set to unleash 175 Zeta bytes. So that's 175 followed by 21 zeros. Another exponential trend, or at least fastly advancing trend, is around genomics and DNA sequencing and synthesis, So sequencing being reading, synthesis being writing DNA. Maybe one interesting thing I'd love to get your take on is people maybe are familiar with the sequencing graph, where that also has exponentially declined, similar to Moore's Law. But synthesis hasn't quite followed as much of an exponential trend, despite us working on both for at least a few decades. Yeah, in the genomics field, reading actually is, in the end, a lot easier than writing, in part due to a variety of very clever technologies on the reading side. The writing side is actually much
Starting point is 00:07:23 more complicated for somewhat technical reasons. The reason why reading actually is somewhat simplified is that the way most of the reading is done is you have a long case of DNA and it's chopped up into little bits and then it's red as little bits and then put back together like this massive jigsawpog. That's the so-called shotgun sequencing that was invented in the 90s. And that really was a huge advance in the reading, and sequencing now is the successor to those types of technologies. Writing, there's no equivalent trick yet. And you can imagine maybe you write little bits and then you put them together, but putting that together in real life is a lot harder than to put the pieces together on software. And that's maybe the fundamental asymmetry.
Starting point is 00:08:01 And is that changing? Are there new innovations that make you confident that maybe we'll see an inflection there? Yeah, there are many companies in the space, many researchers pushing it for variety of applications. Obviously, in biology, having DNA is the starting place for any sort of synthetic biology operation. That synthetic biology is this whole space where we can actually engineer biology and that it starts with the writing into it. And so most synthetic biology companies are really bottlenecked by the speed and cost of writing DNA. And then when you think about like RNA vaccines or so on, like we've all took COVID vaccines, that's writing a sibling of DNA RNA. And so there's been a huge effort.
Starting point is 00:08:42 for doing that. The needs are great, and it bottlenecks so many key things that I think that's really encouraged a lot of people to move into the space. One application that I just thought was so fascinating was storage. So coming back to that first trend, at least to me, was not intuitive to say, let's use this building block that's in all of our bodies for data storage. How did we get there? And is this really a potential solution? Yeah, it's fun for a variety of reasons. So one reason is that it's super dense, a disk or any other technology is not going to be nearly as dense as DNA. Each bit in DNA is just
Starting point is 00:09:17 not that many atoms when it comes down to it. That density actually will be really hard to beat that with other technologies at least for a while. And secondly, we have technologies to read it super fast. And finally, I don't know if you have any old storage technologies
Starting point is 00:09:30 like zip disks or old satad disks. Can you read that? I mean, even like USB, sometimes people only have USBA to read now. The nice thing about DNA is like a thousand years from humans will know how to read DNA. That will have no doubt. So that ubiquity and significance of DNA from a biological point of view, it will mean that actually will always know how to read it.
Starting point is 00:09:50 And that's actually truly interesting. Oh, and I forgot the other kicker is that it can last for a thousand years. So if you want to have something in a vault like a copy of a great movie, like the godfather, to be there for safekeeping, you could have that in DNA. Yeah, I think those are great points. And just to add a number to this, the storage capacity of a single gram of DNA is around 250 million gigabytes, which is just wild. I mean, to your point about efficiency, that is just crazy when you compare it to some of the man-made alternatives. Maybe we could compare and contrast the DNA as data storage concept with what we actually use today. Are there other things you'd call out there in terms of whether
Starting point is 00:10:32 this is truly viable or any other aspect of whether the capabilities are really there yet? Yeah, so I think there may be other intermediates. So archival storage is probably going to be the first application. But then also what's interesting is that people are coming up with more and more compute elements that you can encode in biology. So biology can do a little circuitry and so on. And so the nice thing about DNA as a storage medium is that it's very compatible with our bodies. And so you can imagine new types of therapeutics that actually use some degree of DNA
Starting point is 00:11:01 as storage such that these elements are doing some very simple version of compute. that's very early, but I've been around playing with computers since the 70s, and those things were pretty early then, too. And look, here we are. I think things start simple. But the part that gets us excited about something like biological computing is how compatible it would be with us and how far it could grow from here. Yeah. And if we compare it to software and this idea of when we save something on our computer, a lot of people are familiar with zeros and ones and that being encoded into bits, maybe you could break. down what the equivalent is when we're talking about DNA. Like, how can we actually get biological computing with this structure? Yeah. So DNA is like a long molecule, like a single rope, and it's comprised of DNA bases. And each base could be one of four possibilities. And while bits are two possibilities, bases get four possibilities. So you can code two bits with each base that way.
Starting point is 00:11:58 And then the other key thing about it, and this was the huge revelation that Watson Crick published, is that DNA has a structure where one base will connect in with a complementary one, and that forms a double helix. And this double helix is very stable. That's the thing that can last for a very long time. But also, it's error-correcting because you have a redundant copy, essentially a complementary copy in there. And so that also is very appealing. In the end, biology has had a huge storage problem on its own.
Starting point is 00:12:27 The fact that you look out in the world with trees and people and birds and so on, there's tons and tons of bits there and bites all over the place that's been there for millions of years. And so it has all the same compute problems of how to store, how to error correct, how to read and write quickly. And so it's dealt with it within the biological context. And that's partially what also makes DNA interesting. Yeah. Maybe something else that comes to mind is that storage is not just about writing, but it's about that retrieval side as well, at least if we're thinking about I write something up in Google Docs and I want to save it and I want to be able to bring it back and share it. Can we do that with DNA? We're talking about encoding all this data into
Starting point is 00:13:06 DNA. Do we really have the retrieval mechanisms to do that effectively? Or are we talking about a different kind of storage? In principle, you could do that. And what you would do is you could have a drive equivalent where you have file names, which are little bits of DNA at the end. And then if you want to retrieve that file, you'd get the complementary part to that. And so you pull out that strand, and then you just read that strand. And so that's just, that's just, even a simple example, people have come up with very clever approaches. The one thing about this is that this is much more in the hierarchy of computers where you have RAM to SSDs to tape drives. There's much more in the tape drive side of things. So you could get a lot of bandwidth, but
Starting point is 00:13:47 probably not very good latency. It would take some time to read all that stuff. But also, and this starts to get into James Bond-like territory, but people are also realizing it's a way you can move a lot of data very discreetly. And so you can imagine like injecting. somebody with something and they walk across the border and there's nothing to scan for. And that could easily be like terabytes of data. I hadn't considered that, but one aspect of storage is security. Are there any other things that are top of mind there for you in terms of if this were a new storage mechanism, security is such an important aspect of that when we think of software.
Starting point is 00:14:22 We're thinking of now digital biology. Are there any other implications of that? I think it's all the same things as we deal with with any sort of cybersecurity. And so I think people keeping things private is generally not a bad thing. And so I think actually I would flip it and say that it actually is an interesting scheme for privacy. But we're really dipping into sci-fi here. That's not something people are doing now. But that's something in principle that could be very plausible.
Starting point is 00:14:48 Yeah. Well, coming back from sci-fi, grounding ourselves, when we're talking about this one application of potential DNA synthesis, is this something that's already in motion or even commercially viable? are companies on the ground actually producing this technology? And also, do they have customers yet? Yeah. So there's numerous companies of various stages. Some been around for many years, some that are startups that are producing DNA. And like any commodity, you can actually go to the web, upload a sequence, and get your DNA. So that's there. I think what the race is to do is to be able to build the Moore's Law for DNA, to build that exponential decrease in cost. And that, as you should point out at the beginning,
Starting point is 00:15:27 actually hasn't been there quite yet. There's been a decrease, but maybe not a true exponential decrease. And so if we can do that, that feels like one of the last big unlocks in synthetic biology, that we've got the read and actually even got the edit with CRISPR. And so people are editing all the time. Ironically, we just don't have that right part. I think if we can enable that, so much in synthetic biology is ready to go. Quick interjection, just in case you're looking for a real-world example of how all this can be applied. So Spider-Sulk has long been known for its strength. In fact, it's five-time. as strong as the same weight of steel, but it turns out that you can't just grow spider silk.
Starting point is 00:16:05 If you try to farm spiders, unlike silkworms, they will actually all just kill each other. But a paper published in September showed how scientists using CRISPR were able to genetically engineer silkworms with spider genes that not only didn't kill one another, but were able to produce fibers six times as tough as Kevlar. And of course, we're just getting started here. Now back to Vijay to shed light on what's between us and these potential. applications across health, food, materials, and more. What do you think is the biggest bottleneck, if you could point to anything?
Starting point is 00:16:37 For now, this has largely been a sort of a chemical problem, and so people have been handling with various chemistry methods. There's different types of ways people synthesize proteins, which are analogous, long-chain molecules, and people have tried to extend that to DNA. And those things work well, but you can imagine you have to make this thing perfect, and so as it gets longer, it gets exponentially harder to have high fidelity. And so that's why people typically have been selling really short ones. And then maybe you can try to combine the short ones, but it is a sort of a different type of exponential problem that it's hard to do it really well without errors.
Starting point is 00:17:10 And actually, that's where you really quickly get humbled by how well nature has done it itself, that it solves this problem in its own ways. And so that's still something that I think to do it at scale with very low error is the holy grail. Yeah, I mean, the more I research this topic, the more I just appreciate it. Oh my gosh, this is so efficient. Our body's storage mechanism is just incredible, or nature, really, for that matter. Trying to stay away from sci-fi again. But I'd love to get your take on just where we go from here. I know it's impossible to make a true prediction.
Starting point is 00:17:42 But just given all the things you've seen, what kind of timeline do you think is hopeful? It's impossible to really know. But there are actually now many companies around the hoop that are doing exciting things. And there's a great need for it. So the combination of the market maturing in synthetic biology and companies pursuing lots different approaches, it has the right elements of what we want to see in terms of a new type of tech company. But, you know, this is the type of thing where that advance has to really be there.
Starting point is 00:18:07 But what's been really unique about sequencing is that there's been advance after advance. Not unlike what makes computer chips work is Moore's Law is not a not like a law of nature, a law of physics. It's a law of human determination where people have been just. trying and they'll do this lithography and that lithography and this new type of tech in a new transistor and they would just will it into existence. We've seen the analogous willing into existence in the sequencing part on top of the platform that was compatible for that, we've yet to sort of get started. And I think if we can build a platform,
Starting point is 00:18:41 such that we can enable that human willing to existence sort of phenomenon, that's really what's been missing. Right. And maybe to get listeners excited about that potential unlock, You touched on some of these earlier, but if we are able to generate low-cost at-scale synthesis of DNA, what does that unlock, whether it's materials or food or health, what are the applications that maybe excite you most? I think the most foundational statement is that it really unlocks large-scale engineering of biology. And so that shift from biology as let's tweak and experiments and just discover to let's build things. and it's that building part that really the DNA part is central for. Because once we get the DNA, we can actually now CRISPR edit it into any type of system.
Starting point is 00:19:29 And then the key part is actually not just building it, but then building it quickly so we can have fast iterations. I think what's really great about programming is that you can compile and run your code like in minutes or seconds and get those fast iterations. Once DNA synthesis can get to that point, then we'll see fast duration since the Vague Biology. we will see that engineering cycle kick in, and it will be the mother of many new exponentials to come. We really need that platform, the same way we saw with software. Yeah, I think the last thing I'll add is that you started by mentioning how a lot of our life is becoming digital. Irony is that in many ways it always has been, that life itself is literally digital, that we may be coming full circle to adopting these technologies for new advances in engineering biology.
Starting point is 00:20:13 But the super fun thing is what we're talking about is when CSM bioconferged. Yeah. that's a wonderful place to end off. Thanks, VJ. Fantastic. All right, there you have it. DNA as data storage. Yet another example of science fiction actually may be just being science reality. Now, clearly, there are still hurdles along the way, but hopefully this episode got you amped about the possibilities to come, and also maybe an appreciation for just how efficient
Starting point is 00:20:40 our own bodies are. And by the way, if people want to hear more from VJ, you can listen as he hosts our sister podcast, Raising Health. Now, Raising Health was previously called Fireweed's World, and they actually just relaunched. So again, if you want to hear more from VJ and the wonderful guests on Raising Health, make sure to go check out that feat. All right, we'll see you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.