Storage Developer Conference - #186: The Looming need for Molecular Storage
Episode Date: April 4, 2023...
 Transcript
 Discussion  (0)
    
                                         Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
                                         
                                         SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
                                         
                                         developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual
                                         
                                         Storage Developer Conference.
                                         
                                         The link to the slides is available in the show notes
                                         
                                         at snea.org slash podcasts.
                                         
                                         You are listening to SDC Podcast
                                         
                                         episode number 186.
                                         
    
                                         Okay, so Murphy's Law is in full effect today.
                                         
                                         I got whacked out of my laptop just before the presentation,
                                         
                                         but thank you for providing me with another.
                                         
                                         So kind of, as was said,
                                         
                                         my job is roughly making sure we have the right hardware, software.
                                         
                                         I'll add scale in Azure Storage.
                                         
                                         And also, people have a lot of difficulty
                                         
                                         thinking about exponential growth. People aren't
                                         
    
                                         evolved for it. And, you know, we tend to look two, three years down the horizon and figure out
                                         
                                         what we need to do. But there's some things where we need to look very far ahead to try and
                                         
                                         understand the actions we have to take now to get ready for it.
                                         
                                         So just as a quick survey here, in terms of generations, how many boomers do we have?
                                         
                                         All right.
                                         
                                         How many Gen Xers do we have?
                                         
                                         Gen Y?
                                         
                                         Okay, that's not good.
                                         
    
                                         Gen Z? Okay, so we need a little more youth in the storage area in order to kind of get ready for this future that's coming.
                                         
                                         So in Azure Storage, I started 2008.
                                         
                                         We had a fairly modest footprint,
                                         
                                         and we inherited a lot of our technology from our search team.
                                         
                                         And we've grown quite a bit,
                                         
                                         thousands of times since the inception. And even at the beginning, it was very difficult
                                         
                                         to get the team ready for the future. Again, people don't tend to think exponentially.
                                         
                                         So we had like a handful of clusters. We made a big purchase of 12 clusters. And I start telling
                                         
    
                                         people, well, we need to get ready for a thousand.
                                         
                                         And people are like, what?
                                         
                                         I'm like, yeah, that's only three, four years away.
                                         
                                         And now this is what we have.
                                         
                                         We started in six data centers.
                                         
                                         We're now in almost 100 regions,
                                         
                                         over 140 data centers.
                                         
                                         And we're building data centers at a rate you wouldn't believe.
                                         
    
                                         It's more than one a month.
                                         
                                         I can't tell you exactly.
                                         
                                         And the question is, well, what are we doing to store data?
                                         
                                         How are we deploying it?
                                         
                                         And there's kind of a picture of one of our
                                         
                                         deployments, about 20 racks. And, you know, we have, you know, tens to hundreds of exabytes of
                                         
                                         data. We're approaching zettabytes in the next few years, as the industry is. And what this
                                         
                                         translates into physically is tens of kilometers of those racks. You could run a marathon beside
                                         
    
                                         Azure Storage. It's big. It's hard to comprehend, even as you write down the numbers. We have
                                         
                                         thousands of deployments. We do deployments every day. And in terms of the scale and power, we're into the hundreds of megawatts,
                                         
                                         you know, so think small cities.
                                         
                                         And there's a problem coming.
                                         
                                         You know, there's these problems we deal with day to day
                                         
                                         with this growth,
                                         
                                         but there's a much bigger problem coming.
                                         
                                         So this data growth curve is kind of the industry HDD, but this maps to all types of data storage equally.
                                         
    
                                         And the growth curve is about 40% year over year.
                                         
                                         And this is a very strong signal.
                                         
                                         Predicting the future is notoriously difficult.
                                         
                                         Predicting exponential curves is even harder.
                                         
                                         I gave a presentation years ago, like 2011, talking about storing zettabytes.
                                         
                                         And last year, the hard drive industry shipped a zettabyte of capacity for the first time.
                                         
                                         And when you were talking about shipping zettabytes, when the industry was shipping,
                                         
                                         you know, hundreds of petabytes, you sound like you're a tinfoil hat guy.
                                         
    
                                         What are you talking about? You know, people think linearly. shipping hundreds of petabytes. You sound like you're a tinfoil hat guy.
                                         
                                         What are you talking about?
                                         
                                         People think linearly.
                                         
                                         So much so that I even went to give a presentation to Seagate because sometime around 2014,
                                         
                                         they were seeing a decline in their head counts.
                                         
                                         And they were like, oh my gosh,
                                         
                                         our hard drive's going out of business.
                                         
                                         Should we be lowering our investment
                                         
    
                                         and shutting this thing down?
                                         
                                         I mean, it wasn't quite that extreme,
                                         
                                         but I was looking at these curves in the cloud,
                                         
                                         and I say, no, your business is just shifting.
                                         
                                         You were selling all these customers
                                         
                                         these multi-hundred gigabyte, terabyte drives.
                                         
                                         They were putting like 10, 20% on them.
                                         
                                         And then they weren't using
                                         
    
                                         the capacity. When we went to the cloud, we started buying your very high capacity drives,
                                         
                                         the biggest ones we can. We run them at, you know, a lot of people didn't believe it, over 90%
                                         
                                         capacity effectively. And we compress everything and we erasure code everything and we store it
                                         
                                         much more efficiently.
                                         
                                         So they were seeing a decline in heads,
                                         
                                         and they're like, oh, my God, the world's coming to an end.
                                         
                                         I'm like, no, no, no, just wait a few years and follow this curve. And I was lucky.
                                         
                                         The growth was pretty consistent,
                                         
    
                                         and it actually tracked that curve almost perfectly.
                                         
                                         And now the hard drive industry recognizes
                                         
                                         that their entire business is selling to cloud service providers.
                                         
                                         There's no more consumer hard drive revenue of any significance, like if you just fast forward a few more years.
                                         
                                         And still, this is going to be a complete boom for them because they have the most efficient dollar per gigabyte answer today.
                                         
                                         And there's nothing that I'm aware of that's coming
                                         
                                         that will beat them for online hot access
                                         
                                         in the dollar per gigabyte range.
                                         
    
                                         And then the question is when you're looking at, you know,
                                         
                                         an exponential curve, you say, well, that can't go on forever.
                                         
                                         Absolutely, it can't go on forever absolutely it can't go on forever but the
                                         
                                         question is are we in the middle are we near the end so two things can happen right things can keep
                                         
                                         doing what they're doing or things can change and you got to ask the question is like well you know
                                         
                                         what are the things that are coming into the cloud and are things likely to change and you know how
                                         
                                         much how big is this data really?
                                         
                                         And when we think about a zettabyte,
                                         
    
                                         well, that sounds like a lot of data.
                                         
                                         But it depends on how you look at it
                                         
                                         and what type of analogies you use for scale.
                                         
                                         So my current favorite one is
                                         
                                         if we took all the storage that mankind has ever produced,
                                         
                                         we can't describe the state of one mole of gas.
                                         
                                         So from that perspective, it looks pretty small.
                                         
                                         And then you've got to think about
                                         
    
                                         what are the applications that are coming
                                         
                                         and what type of storage are they going to need?
                                         
                                         We'll get more into that later.
                                         
                                         So this is kind of taking you down the journey
                                         
                                         of how we've worked to improve the efficiency
                                         
                                         of storage.
                                         
                                         And I strongly believe that as we reduce the cost of online storage, we are enabling more
                                         
                                         and more applications.
                                         
    
                                         If it costs too much to retain the data, the data isn't retained.
                                         
                                         But I think we also are creating a virtuous cycle in that the number of applications that
                                         
                                         can come grows faster than the efficiency
                                         
                                         improvements. So we create a bigger and bigger business. If you look at the projections for the
                                         
                                         cloud, you know, IDC or whoever, you know, they predict that the revenue for CSPs by the end of
                                         
                                         the decade will be a trillion dollars a year and continue to grow. And a lot of that is going to
                                         
                                         be data storage. And my favorite line is, there's a reason they call it a data center and not a
                                         
                                         compute center, because the data is the important part. And here at SDC, you guys should know that
                                         
    
                                         you're making the most important changes to the future by enabling new storage technologies and making the cloud more effective and more efficient.
                                         
                                         So let's go through the efficiency journey.
                                         
                                         When we started in Azure, like I said,
                                         
                                         we inherited technology that came from Search,
                                         
                                         and Search had this very strong meme around
                                         
                                         you want all your hardware to be fungible or reusable
                                         
                                         because you don't know
                                         
                                         where the applications are going to be. And they were writing the 40% improvement from Moore's Law,
                                         
    
                                         and they're like, hey, every year we're doing great. We're getting better and better and better.
                                         
                                         But they were buying hardware and selling ads. So the coupling between what they're buying
                                         
                                         and what they're selling wasn't very tight.
                                         
                                         Their margins on ads are huge.
                                         
                                         So they're not looking very hard at their hardware.
                                         
                                         When we came in with Azure, we looked at their system,
                                         
                                         and the finance people looked at their system,
                                         
                                         and we had a benchmark.
                                         
    
                                         We had AWS.
                                         
                                         And it turned out that to store data in that system
                                         
                                         as a service cost over five times what AWS
                                         
                                         was charging. So I kind of started on this journey to work to improve the efficiency of data storage,
                                         
                                         along with, of course, a huge team of people at Microsoft. And a lot of the lifting has been done,
                                         
                                         of course, by the hard drive industry. When we started, we had 500 gigabyte drives. We're now
                                         
                                         like, that's a little
                                         
                                         out of date now. I think we're getting 22 terabyte
                                         
    
                                         drives.
                                         
                                         But, you know, if you follow the drive
                                         
                                         capacity curve,
                                         
                                         they went 1 terabyte, 2 terabyte,
                                         
                                         3 terabyte, 4, 6,
                                         
                                         8, 10, 12.
                                         
                                         Well, it's not exponential.
                                         
                                         It means that in order to handle the curve of growth,
                                         
    
                                         we need to deploy more and more hardware.
                                         
                                         As I've said earlier, we have kilometers and kilometers of it now.
                                         
                                         And to continue to lower the friction for more applications,
                                         
                                         we have to continue to push down that cost.
                                         
                                         So besides the hardware improvement in hard drives,
                                         
                                         we've done a lot of work in how we store the data.
                                         
                                         We've added compression systems,
                                         
                                         very sophisticated erasure coding systems.
                                         
    
                                         We offer different classes.
                                         
                                         We've deployed archival storage.
                                         
                                         I gave a talk there for Peter Fallhaber
                                         
                                         and Fujifilm's conference.
                                         
                                         At the time, that was the cheapest way.
                                         
                                         I think it is still, for Idlebytes,
                                         
                                         the cheapest way to store.
                                         
                                         I even went to a data at scale conference
                                         
    
                                         where Facebook was pushing optical storage,
                                         
                                         which was derived from DVDs,
                                         
                                         and they were saying,
                                         
                                         this is the future of cold storage,
                                         
                                         and anybody with a little background in physics
                                         
                                         can look at the wavelength of light,
                                         
                                         the surface area of the disk, and say, well, that's probably not the future the wavelength of light, the surface area of the disk and say,
                                         
                                         well, that's probably not the future.
                                         
    
                                         It's a lot more surface area on a tape.
                                         
                                         So I actually presented after their presentation on optical
                                         
                                         to tell them actually tape was just fine
                                         
                                         and was going to be huge.
                                         
                                         Facebook, from what I hear,
                                         
                                         is now the biggest consumer of tape on the planet
                                         
                                         and they back everything up.
                                         
                                         But these are just some of the opportunities.
                                         
    
                                         But in my day-to-day,
                                         
                                         I spend a lot of time worrying about the future,
                                         
                                         how we make it cheaper, how we
                                         
                                         enable more storage applications.
                                         
                                         So
                                         
                                         the HDD story has a little,
                                         
                                         not a problem for the HDD manufacturers,
                                         
                                         but a problem for us, which is
                                         
    
                                         that they're hitting the top of an S-curve,
                                         
                                         kind of the capacities I described a minute ago, and they need to make a technology shift. So some of them are shifting to
                                         
                                         MAMR, and some are shifting to HAMR, and the reason for this shift is that the bits on the
                                         
                                         disk are so small now that using the regular media types, which are stable at room temperature,
                                         
                                         the bits won't stay where they are.
                                         
                                         They flip.
                                         
                                         Some call it coercivity of the bits.
                                         
                                         But basically, at room temperature on the media, they aren't stable.
                                         
    
                                         So they need to use a media that is more stable and needs to be excited with energy before it can be programmed.
                                         
                                         So that's what MAMR and HAMR are about.
                                         
                                         And from talking to them,
                                         
                                         MAMR, they think, might get into the high tens, mid tens.
                                         
                                         HAMR has a roadmap maybe to 100,
                                         
                                         but they always surprise us.
                                         
                                         So you might assume,
                                         
                                         okay, maybe they'll get twice as good as we think.
                                         
    
                                         They'll claim that they're going to get.
                                         
                                         In that world, let's say they get to 230 terabytes.
                                         
                                         If we look at the amount of power that we consume just to spin the drives,
                                         
                                         like forget the data centers, fans, servers, and we follow the current curve, even if we had this 230 terabyte mythical drive,
                                         
                                         by 2030, or sorry, 2042,
                                         
                                         we would use 5% of the current US generating capacity
                                         
                                         just to spin the drives.
                                         
                                         And of course, the curve doesn't end there.
                                         
    
                                         By 2050, if we've tried to follow this curve
                                         
                                         and provide this amount of storage,
                                         
                                         we would be using 60% of the current US generating capacity.
                                         
                                         And then if those drives are not 230,
                                         
                                         but really 100, we'd be using more power
                                         
                                         than we currently generate in the US.
                                         
                                         Which, when you talk about exponential curves,
                                         
                                         are things gonna stay the same? I can very strongly say things cannot stay the same.
                                         
    
                                         So the question is, well, what has to change?
                                         
                                         Well, one of three things has to change.
                                         
                                         We have to slow data growth.
                                         
                                         Tell people, yeah, this is how much data is going to be.
                                         
                                         This is what it's cost.
                                         
                                         We don't have any more improvement.
                                         
                                         Work with it.
                                         
                                         I don't think that's a future anybody wants
                                         
    
                                         because it means all those applications
                                         
                                         will not be enabled.
                                         
                                         We can generate a lot more capacity and power.
                                         
                                         That is going to be extremely controversial
                                         
                                         given all our efforts to conserve energy,
                                         
                                         use renewables.
                                         
                                         So kind of bad timing for, say,
                                         
                                         let's stoke more furnaces to generate the power.
                                         
    
                                         There's another thing we can do.
                                         
                                         We can change data storage technology.
                                         
                                         And then the question becomes as well, well, how?
                                         
                                         I mean, we don't have the playbook.
                                         
                                         We don't have the tech.
                                         
                                         We haven't heard about anything that can do this.
                                         
                                         I'll tell you just historically,
                                         
                                         since I started talking about tape and other media types, I've been getting a lot of emails from everybody with a, I won't say crazy
                                         
    
                                         idea, but an innovative idea on how to store data. And some of them are, well, if you just cool down
                                         
                                         the data center to like two Kelvin and you put this device in there, then I can store all this data. I'm like,
                                         
                                         yeah, that's great. Except for the two Kelvin part. Um, there's people who've come up and said,
                                         
                                         well, I can print on paper and multiple colors and you can use the colors, you know, bit depth and all that. And like, well, how are you going to get the consistent color? And, you know, show me
                                         
                                         a prototype. Then we can talk. Um, but, but it's a constant stream. So there's lots of people recognize this is a big business.
                                         
                                         There's lots of justification to invest and innovate.
                                         
                                         But we need a platform that's going to work.
                                         
                                         So when Azure Storage started, we had a dozen clusters.
                                         
    
                                         We made our first big purchase.
                                         
                                         It was like $80 million for 12 clusters.
                                         
                                         Small business. We're a pretty big business now, right? We measure, you know, revenue in the
                                         
                                         billions. And we can afford to try and help answer this question on our own. So we've had research
                                         
                                         projects in MSR to do DNA storage. Molecular storage is kind of the panacea, right? I mean,
                                         
                                         yeah, maybe you can store things in electron spin or something, but DNA has the highest density of
                                         
                                         anything that we've actually seen be used to store data. To put it in perspective,
                                         
                                         the raw data, not error-corrected, you can get about an exabyte and a cubic centimeter.
                                         
    
                                         That's pretty dense.
                                         
                                         We've done more research into storing data in glass,
                                         
                                         which has some very nice characteristics.
                                         
                                         And this is another, this is actually a case of a crazy email went good. I got an email from, forwarded to me about a research project at the University of
                                         
                                         Southampton storing data
                                         
                                         in glass.
                                         
                                         My friend in MSR,
                                         
                                         Ant Rostron,
                                         
    
                                         read the same article, and I'd
                                         
                                         already scheduled a flight to go meet the
                                         
                                         guy at the University of Southampton.
                                         
                                         Ant went and met him later.
                                         
                                         And we worked with them
                                         
                                         to figure out how to develop
                                         
                                         a commercial system based on this
                                         
                                         the very interesting characteristic about this
                                         
    
                                         is as we generate more and more data
                                         
                                         our media types today actually are not
                                         
                                         very durable. If I store data on a hard drive
                                         
                                         and I try to put it on a shelf
                                         
                                         you can come back in five or six years,
                                         
                                         but if you come back in 10 years, your data is probably not going to be readable.
                                         
                                         So in a sense, as technology has advanced, we've kind of gone backwards, right? We have stone
                                         
                                         tablets that are thousands of years old. We have no media type that we use today in a data center
                                         
    
                                         that can retain data for thousands of years. This can retain data for we don't know
                                         
                                         how long, as long as we've ever been able to test it for. You can boil it, you can run it at high
                                         
                                         temperature, you can hit it with an EMP, and your data is still good. So this is kind of very exciting
                                         
                                         work, but it's more archival, not really hot data. We've been doing research into holographic storage.
                                         
                                         This is an area where IBM made a significant investment
                                         
                                         two decades ago, and people haven't revisited it.
                                         
                                         There's a bunch of technology involved in how you create the image,
                                         
                                         how you project it onto the crystal, and how you retrieve it.
                                         
    
                                         And we have a lot better technology today for doing that,
                                         
                                         so we're exploring this.
                                         
                                         The very nice thing about holographic storage
                                         
                                         is it's extremely fast.
                                         
                                         The images are very large,
                                         
                                         so you can retrieve a large image in a millisecond.
                                         
                                         That image can have gigabytes of data in it,
                                         
                                         so you can figure out that data rate.
                                         
    
                                         It's pretty darn good.
                                         
                                         But there are a lot of challenges with holographic storage.
                                         
                                         That's why we're doing research.
                                         
                                         And we're willing to try lots of different things.
                                         
                                         When you're making billions, you can invest millions in the research.
                                         
                                         But we're not seeing anything yet that we have a lot of confidence
                                         
                                         will displace the primary store.
                                         
                                         So the question is, well, what can we do?
                                         
    
                                         Well, obviously, when we look at DNA, we love the density.
                                         
                                         It is molecular.
                                         
                                         It is small.
                                         
                                         It is very high capacity.
                                         
                                         We could definitely store the state of several moles of gas with it.
                                         
                                         But the problem today is that it's leveraging technology
                                         
                                         for medical. Medical's incentives around performance
                                         
                                         are not aligned with what we need in the data center.
                                         
    
                                         And we need something that can be faster.
                                         
                                         So,
                                         
                                         when I was thinking about this problem, and many people are thinking about this problem, I asked this question, which is, where is most of humanity's data stored?
                                         
                                         Anyone want to guess? Anyone want to put out?
                                         
                                         There you go.
                                         
                                         Okay, so the HD, I don't know if this is a little too small,
                                         
                                         HD industry likes to ship to zettabyte.
                                         
                                         To power a zettabyte, you need about 50 million hard drives,
                                         
    
                                         500 megawatts, at 500 megawatts per zettabyte.
                                         
                                         You know, this is state of the art, maybe.
                                         
                                         Well, human brains, by estimates, you can kind of look it up.
                                         
                                         Many people have tried to estimate the capacity of a human brain.
                                         
                                         And, you know, this is kind of the raw bits, not the full capacity.
                                         
                                         I assume there's some deduplication, some very novel representations.
                                         
                                         But basically, the human brain embarrasses us.
                                         
                                         Eight megawatts per zettabyte.
                                         
    
                                         And, you know, this is kind of a proof point
                                         
                                         that within the universe and the world of physics,
                                         
                                         there exist solutions for data storage and access
                                         
                                         that are far superior to the systems that we have today.
                                         
                                         But the question is, is what investments do we need to make
                                         
                                         in order to access them?
                                         
                                         So, you know, this is just one example.
                                         
                                         I could write a similar roadmap, you know, for any type of storage.
                                         
    
                                         But, you know, this is talking about specifically the integrated circuit.
                                         
                                         So my entire life,
                                         
                                         1971, I was born in 67,
                                         
                                         we've been on the integrated circuit.
                                         
                                         And if somebody asked me what happened before that,
                                         
                                         I'm like, I don't know.
                                         
                                         So I went and looked a little.
                                         
                                         There were designs for computational devices even in the 1800s.
                                         
    
                                         They didn't actually get built or work.
                                         
                                         We've had mechanical systems.
                                         
                                         We've had systems based on thermionic valves, relays, vacuum tubes, and then big systems built
                                         
                                         on individual transistors. And probably everybody in this, well, there's some older people here,
                                         
                                         the boomers, they're probably aware of those older systems. But, you know, we never, you know, I've never even considered until we went through this exercise that we're just sitting on one platform.
                                         
                                         And that platform has done so well that we haven't thought about other platforms for computing and storage.
                                         
                                         You know, we've done pretty well with the integrated circuit.
                                         
                                         You know, we're sitting at 100 million times improvement.
                                         
    
                                         Do your own calculation. It's something between this and a billion times. We probably have 100x
                                         
                                         to go. People are talking about 1.6 nanometer or even maybe half nanometer. Still a lot of research.
                                         
                                         But it's pretty clear to me, my opinion, that we've kind of pushed this platform pretty far.
                                         
                                         And we're now into the world of optimizing for the application.
                                         
                                         We're seeing very more application-specific designs using kind of the Von Neumann machine as a general purpose.
                                         
                                         It isn't working.
                                         
                                         We've gone to GPGPUs.
                                         
                                         We have other classes of accelerators like DPUs. So we're
                                         
    
                                         specializing. We're not in the general purpose processor anymore. And sometime in the next few
                                         
                                         years, decade, two decades, if we want to keep riding this exponential improvement, we're going
                                         
                                         to have to look deeper into the world of physics and different principles in order to continue going.
                                         
                                         So if we follow the curve and we want to get to Yotta scale,
                                         
                                         we're at Zeta scale today,
                                         
                                         the curve intercepts Yotta scale in 2042.
                                         
                                         And we discussed how much power that would take,
                                         
                                         which is unacceptable.
                                         
    
                                         And we look at the existing roadmaps that we have.
                                         
                                         And for capacity, only DNA on here
                                         
                                         can hit these capacity things at a reasonable power.
                                         
                                         DNA is, I'll say more generally,
                                         
                                         is a molecular class of storage.
                                         
                                         There are other ways we can do storage based on molecules.
                                         
                                         And I don't know if DNA is the right molecule,
                                         
                                         but it's the one we have the tools for right now.
                                         
    
                                         And we're going to start investing there.
                                         
                                         We are investing.
                                         
                                         And we're going to try and build useful applications.
                                         
                                         And we'll hopefully learn a lot of things
                                         
                                         about manipulating molecules.
                                         
                                         But there are other people looking at manipulating molecules.
                                         
                                         And what's pretty clear, you know, I mentioned you can't simulate a mole of gas.
                                         
                                         You can't even represent it in storage.
                                         
    
                                         So people are trying to use AI in order to not simulate things,
                                         
                                         but infer through AI how things will behave.
                                         
                                         So Google has produced AlphaFold,
                                         
                                         which is an AI that figures out how proteins are going to fold.
                                         
                                         It is extremely good at predicting it,
                                         
                                         and it is not sitting there simulating the atoms and molecules
                                         
                                         inside a protein to figure out how it's going to fold.
                                         
                                         It's learning in a neural net based on other rules and experiences.
                                         
    
                                         And from what I've read,
                                         
                                         they've successfully predicted the folding
                                         
                                         of something like 200 million known proteins.
                                         
                                         Pretty good result.
                                         
                                         Microsoft has a research project called AI for Science
                                         
                                         where we're doing molecular simulation.
                                         
                                         There's a lot of medical research,
                                         
                                         and a little controversial,
                                         
    
                                         but there's a lot of billionaires walking around,
                                         
                                         and they have a limited time span.
                                         
                                         And this type of technology might give them a longer time span.
                                         
                                         And they're directing a lot of their financial net worth
                                         
                                         towards figuring out how they could stick around a little longer.
                                         
                                         And fortunately for us, that investment overlaps with storage needs and maybe computing needs.
                                         
                                         In terms of this area, besides figuring out how to manipulate molecules, how to fold proteins,
                                         
                                         how to build molecular machines that we can possibly build systems out of.
                                         
    
                                         It's pretty clear that our new tech is going to have to bridge to the old tech,
                                         
                                         which are our integrated circuits.
                                         
                                         So we're going to have to figure out how to interface with it.
                                         
                                         And the prediction for me is if we can get this type of system working,
                                         
                                         we can enable Yotta scale and Zano
                                         
                                         scale without melting the planet. Any questions? Seriously? Okay.
                                         
                                         Several slides back, you had a projection of energy consumption.
                                         
                                         Was that just Microsoft's or?
                                         
    
                                         No, that's all hard drive capacity.
                                         
                                         And that's global? Was that U.S.?
                                         
                                         That's a global measurement using U.S. power generation.
                                         
                                         So the United States generates 700, sorry, has capacity for 700 gigawatts, isn't generating it all the time.
                                         
                                         So like an average for the AES-580 worldwide, the generation is 7.1 terawatts.
                                         
                                         In terms of the hard drive or whatever drive you're using, what's the basis for that?
                                         
                                         A 10-watt hard drive. Okay, 10-watt hard drive.
                                         
                                         Okay.
                                         
    
                                         Yeah, so the question is, you know, we have an estimate on, like, how much storage we need.
                                         
                                         Do we have an estimate on what it would cost to develop DNA storage?
                                         
                                         And the answer is, no, I don't.
                                         
                                         But, you know, I think that we're talking about materials that are, like, generally available.
                                         
                                         They're all in you presumably you know there's a path to synthesis
                                         
                                         that is is pretty cheap and inexpensive so one kind of placeholder i use in my head is like you
                                         
                                         know every one of you has a bunch of ribosomes in you that's a pretty good engine for translating
                                         
                                         dna into proteins maybe we can leverage something like that Yeah, the question is,
                                         
    
                                         we have an estimate on power,
                                         
                                         but do we have an estimate on the depth
                                         
                                         that we would have to cover the planet in HDDs?
                                         
                                         Yeah, the planet is really, really big,
                                         
                                         and HDDs actually aren't that bad.
                                         
                                         Definitely at Yotta scale,
                                         
                                         you wouldn't even cover the Earth with HGDs.
                                         
                                         Definitely
                                         
    
                                         at Zano scale,
                                         
                                         you'd probably be a few feet deep.
                                         
                                         Anyway,
                                         
                                         the other thing that I want to point out
                                         
                                         for everyone here, especially the younger
                                         
                                         ones, is that these are projections based on exponentials.
                                         
                                         The future isn't kind of written, right?
                                         
                                         But the pressure from the industry and mankind to store more data is pretty obvious when
                                         
    
                                         we look at these curves.
                                         
                                         This is a future that we would have to enable by making the right investments.
                                         
                                         It's a future that we have to create if we want all the technology and capabilities that this type of storage will enable.
                                         
                                         You could argue that you're sort of blindly accepting the need for data growth.
                                         
                                         Those same kind of projections are used for demand for water, for example.
                                         
                                         If we can start putting real prices on what water costs to consumers,
                                         
                                         it can knock the demand down
                                         
                                         pretty significantly. Is there a way
                                         
    
                                         to do something like that for data growth?
                                         
                                         I'm sure there is.
                                         
                                         I think we're about to do it
                                         
                                         and see what happens if we don't get
                                         
                                         some new technology.
                                         
                                         When I look
                                         
                                         at the data sources,
                                         
                                         there are more than enough data sources
                                         
    
                                         to continue driving this.
                                         
                                         There's sensors, there's cameras.
                                         
                                         An interesting statistic is there's more security camera,
                                         
                                         there will be more security cameras than people
                                         
                                         by the middle of the decade.
                                         
                                         And the question is,
                                         
                                         what is the value of the data
                                         
                                         versus the cost of storing it?
                                         
    
                                         And as long as we continue to shift the equation to the cost of storing it being cheaper,
                                         
                                         then the data will get stored, as long as it has the value.
                                         
                                         I'm sure there's lots of video cameras that post 9-11,
                                         
                                         the U.S. government would have wished had longer retention,
                                         
                                         so they could go and trace back things that happened.
                                         
                                         So I really kind of believe that the data growth and the different sources,
                                         
                                         when we started, there was a clear one.
                                         
                                         It's like all the cell phones are going to get backed up.
                                         
    
                                         Now it's clear these AI data sets are massive.
                                         
                                         We have weather data, weather sensors.
                                         
                                         We have space telescopes capturing incredible resolution images. And the counter to that is there's a whole bunch of those things pointing back at the Earth.
                                         
                                         And they might be even better tech.
                                         
                                         And what are they recording and what type of retention?
                                         
                                         I don't see any shortage of data sources if we can make the data inexpensive enough.
                                         
                                         And I think that the applications that get enabled
                                         
                                         are helpful for mankind.
                                         
    
                                         I would like a personal digital assistant
                                         
                                         that is as smart as I am to help me out day to day.
                                         
                                         I think everybody would.
                                         
                                         We know that it's going to take two petabytes per person,
                                         
                                         a few autobytes of storage.
                                         
                                         I think it's a useful application.
                                         
                                         If we make it cheap enough, it's going to happen.
                                         
                                         Let's make it cheap enough.
                                         
    
                                         Have you done any calculations with that?
                                         
                                         I haven't, and the reason is, you know,
                                         
                                         we obviously are very focused on all classes of storage.
                                         
                                         We currently have about, I don't know,
                                         
                                         between 5% and 10% of our storage on Flash.
                                         
                                         Flash continues to be significantly more expensive
                                         
                                         than hard drive, and with the hard drive roadmap,
                                         
                                         we aren't getting any signal yet
                                         
    
                                         that the Flash industry has a path
                                         
                                         to be more cost-effective per byte than the hard drives are today.
                                         
                                         Now, certainly, there's always innovation, right?
                                         
                                         Lots of smart people are working on this problem, so that could change.
                                         
                                         So, yeah, maybe there's some solid-state solution that will help solve this. Maybe there's something with, what is it called,
                                         
                                         carbon nanotubes or sheets of carbon storing data in there.
                                         
                                         I mean, there's lots of research going on.
                                         
                                         We might be able to achieve much higher densities
                                         
    
                                         with things that are much less radical.
                                         
                                         And I think that relative to the opportunity,
                                         
                                         the cost of doing all this research is kind of peanuts.
                                         
                                         You know, you tell me, when I talk about a trillion dollar
                                         
                                         a year industry, you're spending a few billion dollars
                                         
                                         a year on research.
                                         
                                         I think that's probably a good budget.
                                         
                                         And I think the world's doing it.
                                         
    
                                         I mean, I know definitely in the direction of molecules,
                                         
                                         that kind of money is being spent.
                                         
                                         This is the flip side of this, an economic argument.
                                         
                                         In other words, if we don't find a new technology medium in the desert,
                                         
                                         you know, me storing 150 pictures of my cat,
                                         
                                         the incremental cost of that is going to go up exponentially because
                                         
                                         I'm going to start eating into the 500 megawatts per zetabyte and all that, right?
                                         
                                         Isn't that kind of a flip side of this?
                                         
    
                                         The question, I'm not quite sure what the question is.
                                         
                                         Can you repeat it again?
                                         
                                         If we don't do this, if we don't find another medium, then if we continue down this HDD path,
                                         
                                         in 2042, it's going to cost me, you know, 20 bucks a year to store a picture of my cat.
                                         
                                         Well, if the resolution of your cat picture continues to accelerate, yes.
                                         
                                         But, you know, we keep getting better, right?
                                         
                                         So, you know, we have a clear roadmap to the middle of the decade,
                                         
                                         maybe through the end of the decade,
                                         
    
                                         to continue to reap improvements from storage density.
                                         
                                         That's not where the problem is.
                                         
                                         And as we go, your cat pictures are going to get cheaper and cheaper.
                                         
                                         I hope to make your cat picture one-tenth the cost to store.
                                         
                                         The question is beyond that, right?
                                         
                                         As the growth continues,
                                         
                                         how do we make it so that we can
                                         
                                         support that growth
                                         
    
                                         economically?
                                         
                                         I think you need to add
                                         
                                         time to access
                                         
                                         into your thought.
                                         
                                         Absolutely.
                                         
                                         If you look at the current interface, it's certainly the only way to get serious Absolutely. Absolutely.
                                         
                                         Right, but we have a few proof points that there exist systems that are much denser
                                         
                                         that are efficient.
                                         
    
                                         So like you can sort through all the data
                                         
                                         of all the people you've ever seen
                                         
                                         in about 20 milliseconds to do recognition.
                                         
                                         Now that's not strictly DNA storage,
                                         
                                         but I'd say it's based on molecular
                                         
                                         or electro-molecular machinery.
                                         
                                         So there's a huge space of design to explore here.
                                         
                                         Molecular is just kind of our first
                                         
    
                                         toe in the water to start understanding this space.
                                         
                                         All right.
                                         
                                         Is that two petabytes per brain more DNA storage?
                                         
                                         No, no, no.
                                         
                                         That's neuron storage.
                                         
                                         That's neural interconnect.
                                         
                                         There's a bunch of different analysis of it.
                                         
                                         They all come to the same number.
                                         
    
                                         I don't know if they cheated off each other,
                                         
                                         but they did it through different sets of calculations.
                                         
                                         And this tends to be the range.
                                         
                                         I mean, maybe it's a tenth that.
                                         
                                         Maybe it's ten times that.
                                         
                                         The point is, you know, still the same, right?
                                         
                                         That's neurons, not DNA.
                                         
                                         Correct. That's interconnected neurons, not DNA.
                                         
    
                                         Thank you.
                                         
                                         Thank you.
                                         
                                         Thanks for listening.
                                         
                                         If you have questions about the material presented in this podcast,
                                         
                                         be sure and join our developers mailing list
                                         
                                         by sending an email to developers-subscribe at sneha.org.
                                         
                                         Here you can ask questions and discuss this topic further with your peers
                                         
                                         in the storage developer
                                         
    
                                         community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.
                                         
