The a16z Show - DNA's Potential to Store the World's Data
Episode Date: January 5, 2024Nature’s blueprint – DNA – is an incredibly efficient machine. You cannot see it with the naked eye, yet it can last for hundreds, maybe even millions of years. Plus, the storage capacity of a s...ingle gram of DNA is over 200 million gigabytes! As the cost of DNA sequencing (reading) and synthesis (writing) comes down, scientists are looking to our very own biology for applications reaching as far as data storage. Learn more about this fascinating world with a16z General Partner Vijay Pande, as he says, this next wave of biological computing will “be the mother of many new exponentials to come.” Resources: Save As: DNA Part 1: https://exo.substack.com/p/saving-our-story-in-dna-part-1Save As: DNA Part 1: https://exo.substack.com/p/save-as-dna-part-2 Stay Updated: Find a16z on Twitter: https://twitter.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zSubscribe on your favorite podcast app: https://a16z.simplecast.com/Follow our host: https://twitter.com/stephsmithioPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
Moore's Law is not like a law of nature, a law of physics.
It's a law of human determination.
They would just will it into existence.
The races to do is to be able to build the Moore's Law for DNA.
That feels like one of the last big unlocks in the synthetic biology.
Actually, that's where you really quickly get humbled by how well nature has done itself.
A thousand years from now, humans will know how to read DNA.
Biology has had a huge storage problem on its own.
dealt with it within the biological context. The fact that you look out in the world with trees
and people and birds and so on, there's tons and tons of bits there and bites all the
place that's been there for millions of years. Life itself is literally digital.
Hello, A16Z podcast listeners. Welcome to 2024. At the speed that many fields have been moving
recently, whether it be AI or robotics or biotech, almost nothing feels impossible. And that's why
we're kicking off this year with a topic that does sound outlandish, but might actually be within
our field of view. That topic, DNA as data storage. Scientists have estimated that the average human
body has trillions of cells, with billions replaced daily. Now, that's an incredibly efficient
machine driven by the genetic code in every single one of us. And of course, that's DNA. This blueprint
has also evolved and become optimized for space and longevity. You cannot see it with the naked eye,
Yet, it can last for hundreds, maybe even millions of years.
And get this.
The storage capacity of a single gram of DNA is over 200 million gigabytes.
The amount of DNA in your body?
150 billion terabytes.
Capable of storing every single movie released in the 21st century, billions of times over.
Or equivalent to thousands of data centers, except requiring way less energy and lasting much longer.
So it should be no surprise.
that humans wanted to leverage this, quote, natural intelligence built and all of us.
And the ever-increasing demand for data only pushes this trend further,
with some researchers estimating we may even run out of storage this decade.
And I have a funny feeling that software is not done eating the world.
Today, you'll learn from A16C General Partner Vijay Ponday about the fascinating world of DNA
as a storage mechanism, and how advances in DNA sequencing, aka reading,
in synthesis, aka writing, have let us here.
As Fiji says, this next wave of biological repeating...
...will be the mother of many new exponentials to come.
As a reminder, the content here is for informational purposes only,
should not be taken as legal, business, tax, or investment advice,
or be used to evaluate any investment or security
and is not directed at any investors or potential investors in any A16Z fund.
Please note that A16Z and its affiliates
may also maintain investments in the companies discussed in this podcast.
For more details, including a link to our investments, please see A16C.com slash disclosures.
All right, first, I want to kick off this episode with a quick story that I originally stumbled upon
via an article called Save As DNA.
Well, of course, drop that link in the show notes, and it's written by Toby, a writer for Exo,
and he shares the Nobel Prize-winning physicist Richard Feynman's 1959 lecture called
There's Plenty of Room at the Bottom, an invitation to enter a new field.
field of physics. In it, Feynman calculated that all the volumes of the encyclopedia could be written
on a, quote, cube of material one two hundredth of an inch wide, to which he followed, so there's
plenty of room at the bottom. Remember, this is in 1959. He even postulated the idea of swallowing
a surgeon in the form of a swallible robot. Knowing he was well before his time, he quipped,
This fact that enormous amount of information can be carried in an exceedingly small space is, of course, well known to the biologists, and resolves the mystery which existed before we understood all this clearly, of how it could be that in the tiniest cell.
All of the information for the organization of a complex creature, such as themselves, can be stored.
All this information, whether we have brown eyes or whether we think at all, or that in the embryo the jawbone should first develop with a little hole in the side so that later a nerve can grow through it,
All of this information is contained in a very tiny fraction of the cell, in the form of long-chain DNA molecules,
in which approximately 50 atoms are used for one bit of information about the cell.
Now, Feynman ended his lecture with a $1,000 prize to the first person to take the information on the page of a book
and miniaturize it to 1,000th of its size.
So that's 25,000 times smaller to be read on an electron microscope.
Now, it ended up taking 30 years for Tom Newman to claim this prize.
So you can get a sense of just how long it took for more people to widely understand the intelligence embedded at the atomic scale.
All right, now let's bring in VJ to get us up to date on where we really are in the trajectory today.
I'd love to just start out by getting your take on a few trends that are emerging and you could even say colliding.
So the first one is around data storage.
The world is undeniably becoming more digital.
As Mark likes to say, software continues to eat the world.
Do we realistically have the data storage we need for this increasingly digital world?
Or what are you seeing there?
Well, yeah.
So from a compute point of view, when we talk about Moore's Law, we often talk about just from a calculation point of view.
Can we compute more and more?
But a key part of AI and compute today is data.
And I think what a lot people forget about is actually storage has been exponentially
increasing over time. And if you think about your laptop now might have a terabyte,
wasn't that long ago, you're happy to have 100 gigabytes and then 10 gigabytes and so on.
So that exponential increase in storage, just due to technologies like originally just
technologies, now SSD technologies, has enabled this exponential increase in storage,
which in turn actually is a key part of AI. AI is this confluence of exponential increase in
compute meeting the exponential increase in data. Exactly. And I think to your point,
many people don't realize how many zeros follow the number of bytes that humans as a species are now producing.
One report predicted that by 2025, humanity is set to unleash 175 Zeta bytes.
So that's 175 followed by 21 zeros.
Another exponential trend, or at least fastly advancing trend, is around genomics and DNA sequencing and synthesis,
So sequencing being reading, synthesis being writing DNA.
Maybe one interesting thing I'd love to get your take on is people maybe are familiar with the sequencing graph, where that also has exponentially declined, similar to Moore's Law.
But synthesis hasn't quite followed as much of an exponential trend, despite us working on both for at least a few decades.
Yeah, in the genomics field, reading actually is, in the end, a lot easier than writing, in part
due to a variety of very clever technologies on the reading side. The writing side is actually much
more complicated for somewhat technical reasons. The reason why reading actually is somewhat simplified
is that the way most of the reading is done is you have a long case of DNA and it's chopped
up into little bits and then it's red as little bits and then put back together like this
massive jigsawpog. That's the so-called shotgun sequencing that was invented in the 90s. And that
really was a huge advance in the reading, and sequencing now is the successor to those types of
technologies. Writing, there's no equivalent trick yet. And you can imagine maybe you write little
bits and then you put them together, but putting that together in real life is a lot harder
than to put the pieces together on software. And that's maybe the fundamental asymmetry.
And is that changing? Are there new innovations that make you confident that maybe we'll see an
inflection there? Yeah, there are many companies in the space, many researchers pushing it for
variety of applications. Obviously, in biology, having DNA is the starting place for any sort of
synthetic biology operation. That synthetic biology is this whole space where we can actually
engineer biology and that it starts with the writing into it. And so most synthetic biology
companies are really bottlenecked by the speed and cost of writing DNA. And then when you think
about like RNA vaccines or so on, like we've all took COVID vaccines, that's writing a sibling
of DNA RNA. And so there's been a huge effort.
for doing that. The needs are great, and it bottlenecks so many key things that I think that's really
encouraged a lot of people to move into the space. One application that I just thought was so fascinating
was storage. So coming back to that first trend, at least to me, was not intuitive to say, let's use
this building block that's in all of our bodies for data storage. How did we get there? And is this
really a potential solution? Yeah, it's fun for a variety of reasons. So one reason is that it's super dense,
a disk or any other technology
is not going to be nearly as dense as DNA.
Each bit in DNA is just
not that many atoms when it comes down to it.
That density actually will be really hard
to beat that with other technologies
at least for a while.
And secondly, we have technologies
to read it super fast.
And finally, I don't know if you have
any old storage technologies
like zip disks or old satad disks.
Can you read that?
I mean, even like USB,
sometimes people only have USBA to read now.
The nice thing about DNA
is like a thousand years from
humans will know how to read DNA. That will have no doubt. So that ubiquity and significance of
DNA from a biological point of view, it will mean that actually will always know how to read it.
And that's actually truly interesting. Oh, and I forgot the other kicker is that it can last for
a thousand years. So if you want to have something in a vault like a copy of a great movie,
like the godfather, to be there for safekeeping, you could have that in DNA.
Yeah, I think those are great points. And just to add a number to this,
the storage capacity of a single gram of DNA is around 250 million gigabytes, which is just wild.
I mean, to your point about efficiency, that is just crazy when you compare it to some of the
man-made alternatives. Maybe we could compare and contrast the DNA as data storage concept with
what we actually use today. Are there other things you'd call out there in terms of whether
this is truly viable or any other aspect of whether the capabilities are really there yet?
Yeah, so I think there may be other intermediates.
So archival storage is probably going to be the first application.
But then also what's interesting is that people are coming up with more and more compute elements
that you can encode in biology.
So biology can do a little circuitry and so on.
And so the nice thing about DNA as a storage medium is that it's very compatible with our bodies.
And so you can imagine new types of therapeutics that actually use some degree of DNA
as storage such that these elements are doing some very simple version of compute.
that's very early, but I've been around playing with computers since the 70s, and those things were pretty early then, too.
And look, here we are. I think things start simple. But the part that gets us excited about something like biological computing is how compatible it would be with us and how far it could grow from here.
Yeah. And if we compare it to software and this idea of when we save something on our computer, a lot of people are familiar with zeros and ones and that being encoded into bits, maybe you could break.
down what the equivalent is when we're talking about DNA. Like, how can we actually get biological
computing with this structure? Yeah. So DNA is like a long molecule, like a single rope, and it's
comprised of DNA bases. And each base could be one of four possibilities. And while bits are
two possibilities, bases get four possibilities. So you can code two bits with each base that way.
And then the other key thing about it, and this was the huge revelation that Watson Crick published,
is that DNA has a structure where one base will connect in with a complementary one,
and that forms a double helix.
And this double helix is very stable.
That's the thing that can last for a very long time.
But also, it's error-correcting because you have a redundant copy, essentially a complementary copy in there.
And so that also is very appealing.
In the end, biology has had a huge storage problem on its own.
The fact that you look out in the world with trees and people and birds and so on,
there's tons and tons of bits there and bites all over the place that's been there for millions of
years. And so it has all the same compute problems of how to store, how to error correct, how to
read and write quickly. And so it's dealt with it within the biological context. And that's partially
what also makes DNA interesting. Yeah. Maybe something else that comes to mind is that storage is not
just about writing, but it's about that retrieval side as well, at least if we're thinking about
I write something up in Google Docs and I want to save it and I want to be able to bring
it back and share it. Can we do that with DNA? We're talking about encoding all this data into
DNA. Do we really have the retrieval mechanisms to do that effectively? Or are we talking about a different
kind of storage? In principle, you could do that. And what you would do is you could have a drive
equivalent where you have file names, which are little bits of DNA at the end. And then if you want
to retrieve that file, you'd get the complementary part to that. And so you pull out that strand,
and then you just read that strand. And so that's just, that's just,
even a simple example, people have come up with very clever approaches. The one thing about this
is that this is much more in the hierarchy of computers where you have RAM to SSDs to tape drives.
There's much more in the tape drive side of things. So you could get a lot of bandwidth, but
probably not very good latency. It would take some time to read all that stuff. But also,
and this starts to get into James Bond-like territory, but people are also realizing it's a way
you can move a lot of data very discreetly. And so you can imagine like injecting.
somebody with something and they walk across the border and there's nothing to scan for.
And that could easily be like terabytes of data.
I hadn't considered that, but one aspect of storage is security.
Are there any other things that are top of mind there for you in terms of if this were
a new storage mechanism, security is such an important aspect of that when we think of software.
We're thinking of now digital biology.
Are there any other implications of that?
I think it's all the same things as we deal with with any sort of cybersecurity.
And so I think people keeping things private is generally not a bad thing.
And so I think actually I would flip it and say that it actually is an interesting scheme for privacy.
But we're really dipping into sci-fi here.
That's not something people are doing now.
But that's something in principle that could be very plausible.
Yeah.
Well, coming back from sci-fi, grounding ourselves, when we're talking about this one application of potential DNA synthesis,
is this something that's already in motion or even commercially viable?
are companies on the ground actually producing this technology? And also, do they have customers yet?
Yeah. So there's numerous companies of various stages. Some been around for many years,
some that are startups that are producing DNA. And like any commodity, you can actually go to the web,
upload a sequence, and get your DNA. So that's there. I think what the race is to do is to be able
to build the Moore's Law for DNA, to build that exponential decrease in cost. And that, as you should point out at the beginning,
actually hasn't been there quite yet. There's been a
decrease, but maybe not a true exponential decrease. And so if we can do that, that feels like one of
the last big unlocks in synthetic biology, that we've got the read and actually even got the
edit with CRISPR. And so people are editing all the time. Ironically, we just don't have that right
part. I think if we can enable that, so much in synthetic biology is ready to go.
Quick interjection, just in case you're looking for a real-world example of how all this can be applied.
So Spider-Sulk has long been known for its strength. In fact, it's five-time.
as strong as the same weight of steel, but it turns out that you can't just grow spider silk.
If you try to farm spiders, unlike silkworms, they will actually all just kill each other.
But a paper published in September showed how scientists using CRISPR were able to genetically
engineer silkworms with spider genes that not only didn't kill one another, but were able to produce
fibers six times as tough as Kevlar.
And of course, we're just getting started here.
Now back to Vijay to shed light on what's between us and these potential.
applications across health, food, materials, and more.
What do you think is the biggest bottleneck, if you could point to anything?
For now, this has largely been a sort of a chemical problem, and so people have been handling
with various chemistry methods.
There's different types of ways people synthesize proteins, which are analogous, long-chain
molecules, and people have tried to extend that to DNA.
And those things work well, but you can imagine you have to make this thing perfect, and so
as it gets longer, it gets exponentially harder to have high fidelity.
And so that's why people typically have been selling really short ones.
And then maybe you can try to combine the short ones, but it is a sort of a different type of exponential problem that it's hard to do it really well without errors.
And actually, that's where you really quickly get humbled by how well nature has done it itself, that it solves this problem in its own ways.
And so that's still something that I think to do it at scale with very low error is the holy grail.
Yeah, I mean, the more I research this topic, the more I just appreciate it.
Oh my gosh, this is so efficient.
Our body's storage mechanism is just incredible, or nature, really, for that matter.
Trying to stay away from sci-fi again.
But I'd love to get your take on just where we go from here.
I know it's impossible to make a true prediction.
But just given all the things you've seen, what kind of timeline do you think is hopeful?
It's impossible to really know.
But there are actually now many companies around the hoop that are doing exciting things.
And there's a great need for it.
So the combination of the market maturing in synthetic biology and companies pursuing lots
different approaches, it has the right elements of what we want to see in terms of a new type
of tech company.
But, you know, this is the type of thing where that advance has to really be there.
But what's been really unique about sequencing is that there's been advance after advance.
Not unlike what makes computer chips work is Moore's Law is not a not like a law of nature,
a law of physics.
It's a law of human determination where people have been just.
trying and they'll do this lithography and that lithography and this new type of
tech in a new transistor and they would just will it into existence. We've seen the analogous
willing into existence in the sequencing part on top of the platform that was compatible for that,
we've yet to sort of get started. And I think if we can build a platform,
such that we can enable that human willing to existence sort of phenomenon, that's really
what's been missing. Right. And maybe to get listeners excited about that potential unlock,
You touched on some of these earlier, but if we are able to generate low-cost at-scale synthesis of DNA,
what does that unlock, whether it's materials or food or health, what are the applications that maybe excite you most?
I think the most foundational statement is that it really unlocks large-scale engineering of biology.
And so that shift from biology as let's tweak and experiments and just discover to let's build things.
and it's that building part that really the DNA part is central for.
Because once we get the DNA, we can actually now CRISPR edit it into any type of system.
And then the key part is actually not just building it, but then building it quickly so we can have fast iterations.
I think what's really great about programming is that you can compile and run your code like in minutes or seconds and get those fast iterations.
Once DNA synthesis can get to that point, then we'll see fast duration since the Vague Biology.
we will see that engineering cycle kick in, and it will be the mother of many new exponentials to come.
We really need that platform, the same way we saw with software.
Yeah, I think the last thing I'll add is that you started by mentioning how a lot of our life is becoming digital.
Irony is that in many ways it always has been, that life itself is literally digital,
that we may be coming full circle to adopting these technologies for new advances in engineering biology.
But the super fun thing is what we're talking about is when CSM bioconferged.
Yeah.
that's a wonderful place to end off. Thanks, VJ.
Fantastic.
All right, there you have it. DNA as data storage.
Yet another example of science fiction actually may be just being science reality.
Now, clearly, there are still hurdles along the way, but hopefully this episode got
you amped about the possibilities to come, and also maybe an appreciation for just how efficient
our own bodies are.
And by the way, if people want to hear more from VJ, you can listen as he hosts our sister
podcast, Raising Health. Now, Raising Health was previously called Fireweed's World, and they actually
just relaunched. So again, if you want to hear more from VJ and the wonderful guests on Raising Health,
make sure to go check out that feat. All right, we'll see you next time.
