a16z Podcast - DNA's Potential to Store the World's Data
Episode Date: January 5, 2024Nature’s blueprint – DNA – is an incredibly efficient machine. You cannot see it with the naked eye, yet it can last for hundreds, maybe even millions of years. Plus, the storage capacity of a s...ingle gram of DNA is over 200 million gigabytes! As the cost of DNA sequencing (reading) and synthesis (writing) comes down, scientists are looking to our very own biology for applications reaching as far as data storage. Learn more about this fascinating world with a16z General Partner Vijay Pande, as he says, this next wave of biological computing will “be the mother of many new exponentials to come.” Resources: Save As: DNA Part 1: https://exo.substack.com/p/saving-our-story-in-dna-part-1Save As: DNA Part 1: https://exo.substack.com/p/save-as-dna-part-2 Stay Updated: Find a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.
Transcript
Discussion (0)
Moore's Law is not like a law of nature, a law of physics.
It's a law of human determination.
They would just will it into existence.
The races to do is to be able to build the Moore's Law for DNA.
That feels like one of the last big unlocks in synthetic biology.
Actually, that's where you really quickly get humbled by how all nature has done itself.
A thousand years from now, humans will know how to read DNA.
Biology has had a huge storage problem on its own.
dealt with it within the biological context.
The fact that you look out in the world with trees and people and birds and so on,
there's tons and tons of bits there and bites all the other place that's been there for
millions of years. Life itself is literally digital.
Hello, A16Z podcast listeners. Welcome to 2024.
At the speed that many fields have been moving recently, whether it be AI or robotics or
biotech, almost nothing feels impossible. And that's why we're kicking off this year.
with a topic that does sound outlandish, but might actually be within our field of view.
That topic, DNA as data storage.
Scientists have estimated that the average human body has trillions of cells, with billions
replaced daily.
Now, that's an incredibly efficient machine, driven by the genetic code in every single one of us.
And of course, that's DNA.
This blueprint has also evolved and become optimized for space and longevity.
You cannot see it with the naked eye, yet it can last,
for hundreds, maybe even millions of years. And get this, the storage capacity of a single gram
of DNA is over 200 million gigabytes. The amount of DNA in your body? 150 billion terabytes
capable of storing every single movie released in the 21st century, billions of times over.
Or equivalent to thousands of data centers, except requiring way less energy and lasting much
longer. So it should be no surprise that humans wanted to leverage this, quote, natural intelligence
built and all of us. And the ever-increasing demand for data only pushes this trend further,
with some researchers estimating we may even run out of storage this decade. And I have a funny feeling
that software is not done eating the world. Today, you'll learn from A16C general partner,
VJ Ponday, about the fascinating world of DNA as a storage mechanism, and how advances in DNA
sequencing, aka reading, and synthesis, aka writing, have let us here. As Fiji says, this next wave
of biological repeating will be the mother of many new exponentials to come. As a reminder, the content
here is for informational purposes only, should not be taken as legal, business, tax, or investment
advice, or be used to evaluate any investment or security, and is not directed at any
investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates,
may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments,
please see A16C.com slash disclosures.
All right, first I want to kick off this episode with a quick story
that I originally stumbled upon via an article called Save As DNA.
Well, of course, drop that link in the show notes,
and it's written by Toby, a writer for Exo,
and he shares the Nobel Prize-winning physicist Richard Feynman's 1959 lecture called
There's Plenty of Room at the Bottom, an invitation to enter a new field of physics.
In it, Feynman calculated that all the volumes of the encyclopedia could be written on a, quote,
cube of material one two hundredth of an inch wide, to which he followed, so there's plenty of room at the bottom.
Remember, this is in 1959.
He even postulated the idea of swallowing a surgeon in the form of a swallowing a surgeon in the form of a
dwellable robot. Knowing he was well before his time, he quipped. This fact that enormous amount
of information can be carried in an exceedingly small space is, of course, well known to the biologists,
and resolves the mystery which existed before we understood all this clearly, of how it could be
that in the tiniest cell. All of the information for the organization of a complex creature,
such as themselves, can be stored. All this information, whether we have brown eyes or whether we
think at all, or that in the embryo the jawbone should first develop with a little hole in the
side so that later a nerve can grow through it. All of this information is contained in a very
tiny fraction of the cell, in the form of long-chain DNA molecules, in which approximately
50 atoms are used for one bit of information about the cell. Now, Feynman ended his lecture with a
$1,000 prize to the first person to take the information on the page of a book and miniaturize it to
one 25,000th of its size. So that's 25,000 times smaller to be read on an electron microscope.
Now, it ended up taking 30 years for Tom Newman to claim this prize. So you can get a sense
of just how long it took for more people to widely understand the intelligence embedded
at the atomic scale. All right, now let's bring in Vijay to get us up to date on where we really
are in the trajectory today. I'd love to just start out by
getting your take on a few trends that are emerging,
and you could even say colliding.
So the first one is around data storage.
The world is undeniably becoming more digital,
as Mark likes to say, software continues to eat the world.
Do we realistically have the data storage we need
for this increasingly digital world, or what are you seeing there?
Well, yeah, so from a compute point of view,
when we talk about Moore's Law,
we often talk about just from a calculation point of view.
Can we compute more and more?
But a key part of AI and compute today is data.
And I think what a lot people forget about is actually storage has been exponentially increasing over time.
And if you think about your laptop now might have a terabyte, wasn't that long ago, you're happy to have 100 gigabytes and then 10 gigabytes and so on.
So that exponential increase in storage just due to technologies like originally just technologies, now SSD technologies, has enabled this exponential increase in storage, which in turn actually is a key part of AI.
AI is this confluence of exponential increase in compute meeting the exponential increase in data.
Exactly. And I think to your point, many people don't realize how many zeros follow the number of bytes that humans as a species are now producing.
One report predicted that by 2025, humanity is set to unleash 175 zeta bytes. So that's 175 followed by 21 zeros.
another exponential trend or at least fastly advancing trend is around genomics and DNA sequencing
and synthesis. So sequencing being reading, synthesis being writing DNA. Maybe one interesting thing
I'd love to get your take on is people maybe are familiar with the sequencing graph where that
also has exponentially declined similar to Moore's law. But synthesis hasn't quite followed as much of an
exponential trend, despite us working on both for at least a few decades. Yeah, in the genomics
field, reading actually is, in the end, a lot easier than writing, in part due to a variety of
very clever technologies on the reading side. The writing side is actually much more complicated
for somewhat technical reasons. The reason why reading actually is somewhat simplified is that
the way most of the reading is done is you have a long case of DNA and it's chopped up into little
bits. And then it's red as little bits and then put back together like this massive jigsaw
bubble. But that's the so-called shotgun sequencing that was invented in the 90s. And that really
was a huge advance in the reading. And sequencing now is the successor to those types of technologies.
Writing there's no equivalent trick yet. And you can imagine maybe you write little bits and then
you put them together. But putting that together in real life is a lot harder than to put the
pieces together on software. And that's maybe the fundamental asymmetry. And is that change
Are there new innovations that make you confident that maybe we'll see an inflection there?
Yeah, there are many companies in the space, many researchers pushing it for a variety of applications.
Obviously, in biology, having DNA is the starting place for any sort of synthetic biology operation.
That synthetic biology is this whole space where we can actually engineer biology and that it starts with the writing into it.
And so most synthetic biology companies are really bottlenecked by the speed and cost of writing.
DNA. And then when you think about like RNA vaccines or so on, like we've all took COVID
vaccines, that's writing a sibling of DNA RNA. And so there's been a huge effort for doing
that. The needs are great. And it bottlenecks so many key things that I think that's really
encouraged a lot of people to move into the space. One application that I just thought was so
fascinating was storage. So coming back to that first trend, it at least to me was not intuitive
to say, let's use this building block that's in all of our bodies for data storage. Like, how
did we get there? And is this really a potential solution? Yeah, it's fun for a variety of reasons.
So one reason is that it's super dense. A disk or any other technology is not going to be nearly as
dense as DNA. Each bit in DNA is just not that many atoms when it comes down to it.
The density actually will be really hard to beat that with other technologies, at least for a while.
And secondly, we have technologies to read it super fast. And finally, I don't know if you have any old
storage technologies like zip disks or old Saturdays.
discs. Can you read that? I mean, even like USB, sometimes people don't even have USBA
to read now. The nice thing about DNA is like a thousand years from now, humans will know
how to read DNA. That will have no doubt. So that ubiquity and significance of DNA from
a biological point of view, it will mean that actually will always know how to read it. And that's
actually truly interesting. Oh, and I forgot the other kicker is that it can last for a thousand
years. So if you want to have something in a vault, like a copy of a great movie, like the
Godfather to be there for safekeeping. You could have that in DNA.
Yeah, I think those are great points. And just to add a number to this, the storage capacity
of a single gram of DNA is around 250 million gigabytes, which is just wild. I mean,
to your point about efficiency, that is just crazy when you compare it to some of the man-made
alternatives. Maybe we could compare and contrast the DNA as data storage concept with what we
actually use today. Are there other things you'd call out there in terms?
of whether this is truly viable or any other aspect of whether the capabilities are really there yet?
Yeah, so I think there may be other intermediates.
So archival storage is probably going to be the first application.
But then also what's interesting is that people are coming up with more and more compute elements
that you can encode in biology.
So biology can do little circuitry and so on.
And so the nice thing about DNA as a storage medium is that it's very compatible with our bodies.
And so you can imagine new types of therapeutics that actually use some degree of DNA
as storage such that these elements are doing some very simple version of compute. That's very
early, but I've been around playing with computers since the 70s, and those things were
pretty early then, too. And look, here we are. I think things start simple. But the part that
gets us excited about something like biological computing is how compatible it would be with us
and how far it could grow from here. Yeah. And if we compare it to software and this idea of
When we save something on our computer, a lot of people are familiar with zeros and ones and that being encoded into bits.
Maybe you could break down what the equivalent is when we're talking about DNA.
Like, how can we actually get biological computing with this structure?
Yeah, so DNA is like a long molecule, like a single rope, and it's comprised of DNA bases.
And each base could be one of four possibilities.
And while bits are two possibilities, bases get four possibilities, so you can code two bits with each base that way.
And then the other key thing about it, and this was the huge revelation that Watson
Crick published, is that DNA has a structure where one base will connect in with a complementary
one, and that forms a double helix.
And this double helix is very stable.
That's the thing that can last for a very long time.
But also, it's error correcting because you have a redundant copy, essentially a complementary copy
in there.
And so that also is very appealing.
In the end, biology has had a huge storage problem on its own.
The fact that you look out in the world with trees and people and birds and so on, there's tons and tons of bits there and bites all the other place that's been there for millions of years.
And so it has all the same compute problems of how to store, how to error correct, how to read and write quickly.
And so it's dealt with it within the biological context.
And that's partially what also makes DNA interesting.
Yeah.
Maybe something else that comes to mind is that storage is not just about writing, but it's about that retrieval side as well.
at least if we're thinking about I write something up in Google Docs and I want to save it
and I want to be able to bring it back and share it. Can we do that with DNA? We're talking about
encoding all this data into DNA. Do we really have the retrieval mechanisms to do that effectively
or are we talking about a different kind of storage? In principle, you could do that.
And what you would do is you could have a drive equivalent where you have file names, which are
little bits of DNA at the end. And then if you want to retrieve that file, you'd get a
the complementary part to that.
And so you pull out that strand and then you just read that strand.
And so that's just even a simple example.
People have come up with very clever approaches.
The one thing about this is that this is much more in the hierarchy of computers
where you have RAM to SSDs to tape drives.
There's much more in the tape drive side of things.
So you could get a lot of bandwidth, but probably not very good latency.
It would take some time to read all that stuff.
But also, and this starts to get into James Bond-like territory,
but people are also realizing it's a way you can move a lot of data very discreetly.
And so you could imagine like injecting somebody with something and they walk across the border
and there's nothing to scan for.
And that could easily be like terabytes of data.
I hadn't considered that, but one aspect of storage is security.
Are there any other things that are top of mind there for you in terms of if this were
a new storage mechanism, security is such an important aspect of that when we think of software.
where we're thinking of now digital biology.
Are there any other implications of that?
I think it's all the same things
as we deal with with any sort of cybersecurity.
And so I think people keeping things private
is generally not a bad thing.
And so I think actually I would flip it
and say that it actually is an interesting scheme for privacy.
But we're really dipping into sci-fi here.
That's not something people are doing now,
but that something in principle
that could be very plausible.
Yeah.
Well, coming back from sci-fi, grounding ourselves,
When we're talking about this one application of potential DNA synthesis,
is this something that's already in motion or even commercially viable?
Are companies on the ground actually producing this technology?
And also, do they have customers yet?
Yeah.
So there's numerous companies of various stages.
Some have been around for many years, some that are startups that are producing DNA.
And like any commodity, you can actually go to the web, upload a sequence, and get your DNA.
So that's there.
I think what the race is to do is to be able to build the more.
law for DNA to build that exponential decrease in cost. And that, as you should point out at the
beginning, actually hasn't been there quite yet. There's been a decrease, but maybe not a true
exponential decrease. And so if we can do that, that feels like one of the last big unlocks
in synthetic biology that we've got the read and actually even got the edit with CRISPR. And so people
are editing all the time. Ironically, we just don't have that right part. I think if we can
enable that, so much in synthetic biology is ready to go.
Quick interjection, just in case you're looking for a real-world example of how all this can be applied.
So spider silk has long been known for its strength.
In fact, it's five times as strong as the same weight of steel, but it turns out that you can't just grow spider silk.
If you try to farm spiders, unlike silkworms, they will actually all just kill each other.
But a paper published in September showed how scientists using CRISPR were able to genetically engineer silkworms with spider genes that not only didn't kill,
one another, but we're able to produce fibers six times as tough as Kevlar.
And of course, we're just getting started here.
Now, back to Vijay, to shed light on what's between us and these potential applications
across health, food, materials, and more.
What do you think is the biggest bottleneck, if you could point to anything?
For now, this has largely been a sort of a chemical problem, and so people have been
handling it with various chemistry methods.
There's different types of ways people synthesize proteins, which are analogous, long-chain
molecules and people have tried to extend that to DNA. And those things work well, but you can imagine
you have to make this thing perfect. And so as it gets longer, it gets exponentially harder to have high
fidelity. And so that's why people typically have been selling really short ones. And then maybe
you can try to combine the short ones, but it is a sort of a different type of exponential problem that
it's hard to do it really well without errors. And actually, that's where you really quickly get
humbled by how well nature has done it itself, that it's solved this problem in its own ways.
And so that's still something that I think to do it at scale with very low error is the holy grail.
Yeah, I mean, the more I research this topic, the more I just appreciated, oh, my gosh, this is so efficient.
Our body's storage mechanism is just incredible, or nature, really, for that matter.
Trying to stay away from sci-fi again, but I'd love to get your take on just where we go from here.
I know it's impossible to make a true prediction, but just given all the things you've seen, what kind of timeline do you think is hopeful?
It's impossible to really know, but there are actually now many companies around the hoop that are doing exciting things, and there's a great need for it.
So the combination of the market maturing in synthetic biology and companies pursuing lots different approaches, it has the right elements of what we want to see in terms of a new type of tech company.
But this is the type of thing where that advance has to really be there.
But what's been really unique about sequencing is that there's been advance after advance.
not unlike what makes computer chips work is
Moore's Law is not like a law of nature,
a law of physics.
It's a law of human determination
where people have been just trying
and they'll do this lithography and that lithography
and this new type of transition,
and they just will it into existence.
We've seen the analogous willing into existence
in the sequencing part.
On top of a platform that was compatible for that,
we've yet to sort of get started.
And I think if we can build a platform
such that we can enable that human willing to existence sort of phenomenon, that's really
what's been missing.
Right.
And maybe to get listeners excited about that potential unlock, you touched on some of these
earlier, but if we are able to generate low-cost at-scale synthesis of DNA, what does that unlock,
whether it's materials or food or health, what are the applications that maybe excite you
most?
I think the most foundational statement is that it really unlocked.
large-scale engineering of biology.
And so that shift from biology as let's tweak and experiments and just discover to let's build
things.
And it's that building part that really the DNA part is central for.
Because once we get the DNA, we can actually now CRISPR edit it into any type of system.
And then the key part is actually not just building it, but then building it quickly so we
can have fast iterations.
I think what's really great about programming is that you can compile and run your code
like in minutes or seconds and get those fast iterations.
Once DNA synthesis can get to that point, then we'll see fast durations in synthetic biology.
We'll see that engineering cycle kick in, and it will be the mother of many new exponentials to come.
We really need that platform, the same way we saw with software.
Yeah, I think the last thing I'll add is that you started by mentioning how a lot of our life
is becoming digital.
Ironies that in many ways it always has been, that life itself is literally digital,
that we may be coming full circle to adopting these technologies
for new advances in engineering and biology.
But the super fun thing is what we're talking about
is when CSM bio converged.
Yeah, I think that's a wonderful place to end off.
Thanks, VJ.
Fantastic.
All right, there you have it.
DNA as data storage.
Yet another example of science fiction,
actually maybe just being science reality.
Now, clearly there are still hurdles along the way,
but hopefully this episode got you amped about the possibilities to come
And also, maybe an appreciation for just how efficient our own bodies are.
And by the way, if people want to hear more from VJ, you can listen as he hosts our sister podcast, Raising Health.
Now, Raising Health was previously called Byreed's World, and they actually just relaunched.
So again, if you want to hear more from VJ and the wonderful guests on Raising Health, make sure to go check out that feat.
All right, we'll see you next time.