Grey Beards on Systems - 164: GreyBeards talk FMS24 Wrap-up with Jim Handy, General Dir., Objective Analysis
Episode Date: August 16, 2024Jim Handy, General Director, Objective Analysis, is our long, time goto guy on SSD and Memory Technologies and we were both at FMS (Future of Memory and Storage – new name/broader focus) 2024 confer...ence last week in Santa Clara, CA. Lots of new SSD technology both on and off the show floor as well as … Continue reading "164: GreyBeards talk FMS24 Wrap-up with Jim Handy, General Dir., Objective Analysis"
Transcript
Discussion (0)
Hey everybody, Ray Lucchese here.
Jason Collier here.
Welcome to another sponsored episode of the Greybeards on Storage podcast,
a show where we get Greybeards bloggers together with storage assistant vendors
to discuss upcoming products, technologies, and trends affecting the data center today.
We have with us here today Jim Handy, analyst with Objective Analysis,
focused on SSD and memory technologies.
Jim's been on our show multiple times before
and has been our go-to guy on SSD and memory technology just about forever.
We were both at Future Memory of Storage Conference last week at Santa Clara.
So, Jim, I heard lots about NAND SSD tech and some on CXL and memory, all associated with AI, of course.
But what did you hear that was of significant interest to you last week?
Well, CXL and AI were certainly a big part of it.
And, you know, the new name of the conference was certainly a stumbling block.
I gave the inventors of 3D NAND the Lifetime Achievement Award and accidentally called the show Flash Memory Summit when I was up there on the podium.
But, yeah, anyway, it seemed like everybody was wanting to talk about those two specific topics, AI and CXL, and it was just all over the place.
Yeah, there's still a lot of NAND stuff going on.
The whole Kaoxia session was NAND, NAND, NAND, and what we're going to do next, and more and more capacity. And I never heard the term CapEx efficiency before, but then I'm not a business manager,
I guess, in this technology environment.
Yeah, that actually is something that both Keoxia and their partner, Western Digital,
have been talking about for about, oh, I don't know, the past few months.
It's a relatively recent thing where they say, okay, from generation to generation of NAND flash back when before we had 3D, when it was all planar, then it cost us, you know, yay many dollars to be able to move from one migration to the next.
And then when we went to 3D, then all of a sudden it started costing more to go from one generation to
the next. And I don't really know what they hope to accomplish by, you know, making this statement
about it, but, you know, it was kind of sidling up to the idea that there were going to be changes
in the way that NAND Flash was made. What it seems like, you know, if you just look at it from a detached perspective, it
seems like what it should mean is that the prices for NAND flash shouldn't go down as
rapidly as they have been in the past.
Somebody mentioned at the show, and I'm not sure which keynote it was, that the race for
layers was over.
Yeah.
I know Kioxia mentioned 218 layers in the latest round of technology.
Yeah, that was Western Digital who really made that point,
that there shouldn't be any more layers.
But everybody's still talking about them,
and they're even talking about new ways of making the wafers.
Western Digital and Kioxia, you know, because of the fact that they're in this really tight partnership, they both released the same technology at the same time.
And they, a year ago, announced that they were going to be going from something where they made the chips using a single wafer to one where they make the chips using two wafers that are bonded to each other.
Chiplets.
No, this is wafer bonding.
Not chiplets. Okay.
Yeah. Chiplets, you put a bunch of small chips onto some kind of a substrate.
And eventually they're hoping to sneak up to the idea of putting small chips on top of a
bigger chip. So you might put small HBM chips on top of a GPU. But with wafer bonding, what you do is you make an entire wafer
have one thing on it, and another entire wafer have something else on it. And then you put these
two wafers together. And I don't know how they get everything to line up, but they bond them
together. And then they cut them apart like they're a single wafer,
you know, and so you end up with a die that actually is made up from two different dice that
are, you know, stuck together, that have been stuck together at wafer level. I think KEOXIA had a
picture of their latest technology, which showed like a, it's almost CMOS logic and then NAND
technology, one on top of the other, stacked on top of one another, right?
Yeah. And this is something that has been used for years for CMOS image sensors,
that apparently the things that are light sensitive are made using a different semiconductor process
than the logic that actually allows that to talk to the rest of the camera. And so the CMOS image sensor people
have been making logic on one wafer
and imaging stuff on the other wafer
and then bonding them together and doing the same thing.
So, and as a matter of fact,
there's a company YMTC over in China
that first jumped into the NAND flash market that way.
I used to look at it as being a quick way to get into the NAND flash market, which suited their
needs because since they're government finance, they don't need to worry about profitability the
way that everybody else does. And, you know, so I thought, okay, it's an expensive chip, an expensive way to make chips. But, you know, it did get them to the market quickly. And, you know, I just thought they'd be alone in doing that. And then I was a little bit surprised when WD and Kioxia said that they were going to be doing it that way too.
That seemed like it provided more, I should say, less overhead for the logic and more space for bits, I guess,
kind of the way I read it. Well, the die size isn't much different. So that's what puzzles me
is, you know, if, you know, company A over here uses one wafer to make a part and company B uses
two wafers to make a part, and they're both the same size part using the same kind of a process,
then is either one going to be cheaper than the other?
I would bet on the one wafer part being cheaper.
But there are things I don't know.
So, you know, clearly there are.
Right, right, right.
Well, the CXL side of things is pretty interesting too as well in the show.
A lot more, I'll call it, expanded memory discussions in the show and some of the applications that are going on. But there was even more. I mean, there was a company out there, I can't think of the name, might have been Cove, that was talking about memory virtualization across servers and stuff like that. Do you remember anything like that? Yeah, yeah.
I actually was at another conference and Cove was on my panel,
so I had to come up to speed on what they do.
And they do something impressive.
They don't say how they do it, and they're a little bit cocksure about it.
So, you know, it's kind of weird to listen to it. But, you know, if you get away from that and you just look at it, what they say is
that you can have a network of computers and all of the memory that is on all of the computers
can be assigned to any computer on the network. And this network could go coast to coast. It could
be just huge, but you still have access to the memory
at memory-like speeds, which I think means that they're using some kind of sophisticated caching
that, you know, hides any huge latencies that they have between different things.
But, you know, the president, I can't think of his name right now, he was saying that you could
even take the HBM on a GPU and reassign that to some CPU somewhere else so that it would be that CPU's RAM.
I heard somebody calling the VMware of memory.
But I think it's, A, it only applies to InfiniBand server connected.
B, it only works at the moment on open shift red hat so there are some
limitations to what they do nobody talked to me about coast to coast networking sharing of memory
yeah coast coast networking memory i'm just like yeah you no matter how uh how fast you how hard
you push you can't increase the speed of light right no no and and you know the whole point in memory is that it's supposed to be fast so right right kind of hard hard to understand but you know maybe it's one of those things that
they say yeah conceptually we can do it but nobody ever would yeah yeah and my crown was there i
didn't hear much their their stuff i'm obviously bigger faster better cheaper kind of thing but
yeah that's pretty much all of the message that I heard from them this year is that,
you know, it was going down the same paths as everything else.
You know, they threw a curveball back in December by revealing that they had a 3D DRAM that
they had done significant development work on.
But it was generally believed at the show that everybody was saying it to each other,
that the only things that Micron presents at that show, which is called the International
Electronic Devices Meeting, IEDM, the only thing that Micron ever shares at that show
are the things that they decide not to turn into products.
That's interesting.
You would think 2D, 3D DRAM would be a natural progression.
It's weird because DRAM already kind of is 3D, but nobody ever called it that until recently.
Something that happened with DRAM back in the 80s was that they needed to make these capacitors.
You know, the DRAM, the bit is stored on a capacitor,
and the capacitor has to be a certain size.
And when they started trying to shrink these capacitors,
they said, well, what do we do?
And first they started making them V-shaped
so that they'd end up, you know, lining a notch in the wafer.
And that allowed them to make them smaller on the surface,
but they just kind of went down into the wafer.
And then they got kind of more deeply into that
and came out with something called the trench cell,
which goes way deep.
And it actually is a precursor for 3D NAND
and how they made 3D NAND.
If they hadn't learned how to make the trench cell,
3D NANDs never would have happened.
Oh God, interesting.
So they made the capacitors go way deep into the DRAM.
Now they go like, I don't know,
I can't tell you how deep they go in,
but they go deep enough in that if you were to turn that sideways, like the way that they turn 3D NAND sideways, then it would be way wide.
And you'd have to really work hard to make that better.
Interesting.
You know, I did see something that showed almost NAND chiplets bonded together into
a separate package. Yeah, I can't remember seeing anything like that. NAND, for a long time,
people have been stacking up to 16 NAND chips on top of each other inside of, for example, micro SD cards. And so you'll see these eight terabyte micro SD cards
that are, what's this?
I'm sorry, not eight terabyte, eight gigabyte.
Anyway, they're relatively large SD cards.
I'm sitting here not remembering
what the capacities are that I've seen on those.
I think they're pretty close.
Yeah, I think that they do have terabyte ones.
Bites, yeah, but certainly hundreds of gigabytes.
Yeah. And, and those are made out of eight stacked one terabit chips,
you know, and so,
so they do a lot of fancy packaging with NAND flash for those already.
I'm not sure which of the vendors talked about.
On their roadmap, maybe it was a 240 terabyte QLC drive that when you packaged it in a rack was effectively 300 plus petabytes in a rack.
In a rack?
Honest to God, one third of an exabyte in a rack.
I mean, can you believe this?
This world has gone insane.
This is something that I don't really understand, but it's being driven by the hyperscale computing companies that they have pretty much asked all of the major NAND flash manufacturers to develop 100 terabyte or larger SSDs for them. And they'll put all of this behind,
granted a high-speed NVMe interface, but still you've got an awful lot of pent-up bandwidth
from all of the NAND flash chips that are in there that you can't use because the interface is too
slow. Oh, you mean the drive itself? So if let's say i have 100 terabytes the number of chips required
says my bandwidth should be pretty high but then when you put it behind one drive which only has a
single pcie gen 5 nvme it can't keep up yeah yeah and you know it's just that you read that you can't push data to the interface. Yeah. What's the PCI Gen 5 speed these days?
You're asking me questions that I am not properly equipped to answer in a great way.
It's okay.
It's okay.
I'm the software guy, man.
I developed the Kubernetes stacks on top of it.
Right, right, right.
So I think it's like 50.
It's a lot.
It's 50 gigabytes a second, something like that?
Yeah, it's up there.
Yeah.
Yeah.
I remember years ago, IBM made an array of Fusion I.O. drives
that would give them, oh, I'm trying to remember what it was now,
but it was something now that
all all of the nvme drives can very handily do that you know right kind of bandwidth and it's
just it kind of it's it's the way computers go is that you know something that was um a
great challenge um years ago is is now something that everybody has.
Yeah, yeah, yeah.
I noticed that Solidime wasn't a big sponsor on the show, but they were there and they did something off-site as well.
Did you attend that?
Yeah, I didn't go to the off-site thing.
What's funny about Solidime is that they're owned by SK Hynix.
And SK Hynix very working very diligently to
bring solidime into the fold um and so you know part of that is that they're getting solidime's
ssd controller technology and pairing it up with hynix's flash chips because the flash chips that Solidigm was using were things that were only made by Intel.
And, you know, then SK Hynix took over that business.
And, you know, they also took over the manufacture of this Intel proprietary 3D NAND flash.
The idea is that eventually that's going to go away.
And, you know, probably with it, Solidigm is not going to be a separate entity from SK Hynix.
They're all going to be one big company.
So it is a little bit surprising to see that they're still doing trade show stuff, you know, as an independent firm.
Yeah, yeah.
So I think they announced new Gen 5 TLC tlc ssds at at that that event yeah uh one one with more
endurance one with less i guess yeah yeah tlc is an interesting thing because it is coming into the
enterprise and you know there is a talk of qlc in the enterprise i think there are a few drives that
that do do that but uh you know it's like years ago, I remember when people said that
there'd be no MLC in the enterprise because it wasn't reliable or fast enough.
Controllers keep getting better.
Yeah, they really do. I'd like to say that Moore's law affects controllers too, that not only do the
memory chips get denser by Moore's law, but the controllers get more sophisticated. And so because of that, you can continue to use, you know, worsening media.
Right. You can get increased reliability out of bad media.
Exactly. So, Jason, my joke that I tell about that is that the controllers are getting so good
that eventually you're going to be able to have a controller
talk to a completely dead man flash chip
and still pull out good data.
Yeah.
It'll reconstruct it via AI.
Probably.
Oh, I wouldn't doubt it.
I wouldn't doubt it.
It may not be really correct,
but hey, it's going to look right.
So today, Jim, the world of enterprise SSDs
are MLC.
And I didn't see anything about MLC on the show whatsoever.
MLC is boring.
It works. It's boring.
Yeah. Well, boring is good when you're a data scientist.
When you're an enterprise architect or organization CIO, you want boring.
Yeah.
Yeah, I guess.
I guess.
I want to go back to these TLC drives.
They're like, you know, one to 10 terabyte kind of numbers.
And it's like the endurance is one drive write per day.
Who writes 10 terabytes a of numbers. And, and it's like the, the endurance is one drive, right? Per day.
Who writes 10 terabytes a day?
Yeah.
I know.
I mean, it's,
it's bizarre.
I mean,
these things that,
you know,
I used to be the endurance.
Okay.
A 250 gig drive.
Okay.
That's a problem because we can do that quite often,
but we're talking terabytes now.
Yeah.
It's,
well,
I tell you what, those AI training models.
Yeah.
Yeah.
Yeah.
You know, though, it is kind of funny to think about, you know, I would very easily picture that the capacity of one of these large drives is, you know, equivalent to those trucks that Google used to drive up to companies to suck all their data out. So then Google can host it.
Yep.
What were those called snowmobiles or something like that?
Oh, that was AWS. No, AWS was the snowmobile.
Oh, I'm getting confused about that.
AWS did have a snowball.
They have, they have a snowball device. They got a snowball,
they got a snow cone. And then they got the snowmobile, which is a truck.
Okay.
Something like that.
Yeah.
Yeah, these guys, you know, we're talking terabytes of data on disk drives, petabytes per tray.
It's boggling imagination here.
So I think there is 128-terabyte drive out there now, right, Jim?
Yeah, yeah.
And there was one years ago by a little company called Nimbus Data.
I don't know if you remember that.
Yeah, yeah.
They said that they had a drive that couldn't be worn out because of the fact that the interface was too slow to be able to access the NAND flash enough to wear it out. Yeah. I did notice Nimbus was at the show.
Yeah. Yeah. I saw a picture where they had their stuff in the back of a cyber truck.
Interesting. Yeah. I didn't stop by to talk to them, but they seem like they're fairly busy. Did the show floor seem bigger to you this year, Jim?
I think it seemed about the same. You know, it was certainly packed. I don't think that they had any booths that were unsold. But, you know, they certainly had a lot of people there. It was shoulder to shoulder when I was in on the exhibit hall. I heard that they had
roughly about 3,000 attendees and about 75
exhibitors. Okay. I didn't hear the exhibitor count, but the
3,000 attendees is consistent with what I've heard too.
Even while during the show, there was the
sessions and all that stuff,
the keynotes were fairly well populated and stuff like that.
So it seemed like it was going.
The only thing I was really upset about, and I'm going to talk to the management here, Jim,
if you have any insight to this, let me know.
They had that last year, they had pizza and beer sort of, you know, meet the experts.
And it was like a 20-table event of, you know, meet the experts. And it was like a 20 table event where, you know,
it was an expert per table that was assigned and you can go to any table you
want and had all the pizza and beer you could eat.
This year it was on Thursday night. I leave like Thursday at three.
Did you go to that? I thought that was a great,
I thought that was the best thing they had done, quite frankly.
I've been an expert at those tables. Excuse me.
Yeah, I know. It's not great for some of the people, but it gives an opportunity for anybody to talk to anybody.
That's true. It makes it really democratic.
Exactly. Exactly.
Yeah.
So why did they move it jim i i believe that it was because um the sponsors wanted to
throw parties on the days of the beer and pizza thing throw parties so i need to stay there another
day for the party and the beer and pizza is what you're telling me yeah tell you what why don't
you just make a week of it you know have a weekend seeing san francisco and other
weekends yeah yeah if i could take the family and everything that would be one thing yeah yeah
tell the family to stay in san jose they'll they'll thank you infinitely for that
yes they will you think so i i don't know i don't know that's interesting so what else is going on
you know the future of memory
and storage i thought there was going to be a tape session or two i only saw tape mentioned
twice yeah nobody talks about tape they just use it yeah yeah i know i know yeah i you know i
don't know about that there was some dna stuff there. Oh yeah, there was DNA data. Yeah, which,
you know, it still surprises me. I guess that I'm way out of date with my knowledge of DNA,
but the last that I had heard was it took something like 15 minutes to access anything.
Yeah. I think they've, you know, by parallelizing, they've sped up the write times and stuff like that.
I haven't seen the random access times, but my assumption was it was always going to be sequential anyways.
Yeah, it makes sense. So it now becomes time to first DNA stripe or something.
I don't know.
It's going to out-tape tape.
It's going to be even cheaper and even slower.
Yeah, I think that's the intent. outpace uh uh hard drive as well right and uh i think i think the current um you know and i was
just kind of doing a quick google here but it looks like 16 terabytes is the max you can get
on a tape these days i don't know lto five six seven it's been a while since i even looked at
this stuff oh yeah yeah so i mean it does seem like there's, there's not enough R&D
input into tape to, you know, validate the technology for it to push it forward. Right.
Yeah. My understanding of tape was, you know, the technology that was using is like GMR heads. And
so the, the disc world has moved long beyond that stuff. So I think there's enough runway in disk technology
if they could apply it to tape for another 25 years
of significant density increase.
Well, I totally agree because also the nice thing about tape
is like when you're not accessing it, it's off.
It consumes no power.
And, you know, in the world of where, you know,
power is a huge thing.
That's a big deal.
There is hard drive based cold storage, though, where they turn off the hard drive.
Yes.
Yeah.
Made.
It died.
It died a long time ago.
I can't even think of the last company I worked at.
And I actually was a client.
It was a client of mine.
Oh, God.
Yeah.
But, yeah, it's one of those, yeah, it's definitely a, it's an interesting conundrum, right?
And especially when, I mean, you know, how are we going to make all this room for the power, you know, if we're consuming it for online storage when we need to run all this AI stuff?
Well, there was actually a lot about power efficiency and SSDs and, you know, better thermals, better power consumption, et cetera, et cetera, as being a useful tool for AI.
AI is still pushing a lot of technology here in this business.
Yeah, it really is. density of SSDs. And, you know, one of those things that is a little bit different between
SSDs and hard drives is that with hard drives, you want to increase the capacity, you increase
the number of platters, but they only go up to a certain number of platters. I don't know what
that is, but you know, there's some kind of limit. Probably 60s days or maybe less.
Yeah. But with an SSD, you just keep throwing chips at it and making the
box bigger. So, you know, and then, you know, it comes back to the thing I've been questioning a
couple of times in this call is, you know, at what point do you say, okay, well, this, this interface
is just the tiniest little soda straw for, you know, this elephant. Exactly.
Yeah.
Yeah.
Yeah.
Well, luckily PCI gen six is on the horizon and seven is not far away. So I imagine all that stuff will slowly but surely resolve itself.
Yeah.
And it's interesting when you get into like, when you think about it, are, are there, are
there ways in the future in which you can parallelize interfaces like that too?
Yeah.
There's something else people are doing that I should give a nod to, and that is computational storage.
And that's the idea that you don't have to force anything through the interface except for the final result.
Right.
Yeah.
Go ahead.
I think there's a lot of research and development being put into that.
And I think there's actually like a good number of products that are starting to pop out on that.
And then you've got different levels of computational storage as well.
Like all the way from being able to, you know, handle, you know, like encryption at the drive level handle, you know, basically doing, it's like using it as an offload engine,
but then you can also use it as a throughput engine.
And there's all kinds of ways in which you can do it.
And I think it's one of the things where SolidIne was doing a really good job
of, of, you know, kind of throwing some of those proof points out there.
And I think there's a few other.
Oh yeah. Scale flux.
A couple other guys that are out there doing this stuff.
Big time.
I have a computational memory. I'm not sure where I heard that. Oh yeah, Scale Flux and a couple other guys that are out there doing this stuff big time. IBM actually is doing a lot.
I heard computational memory.
I'm not sure where I heard that.
I don't think it was this year at this conference, but computational memory exists, and depending on whether you like to put computation in memory, they look at the memory and they say, okay, there's this very, very wide data path, tens of thousands of bits wide that gets squeezed out of 16 bits or whatever to get out of the chip. And so if you could do tens of thousands of bits of math, you could do
it with slow logic, which memory is not very good at logic. So, you know, that's okay. But then,
you know, you could do some pretty good processing. And, you know, the people argue for that. And
some small research firms have put together some chips that do that. But then you turn around and you look at a typical, you know, AMD Intel processor chip, and half of the chip is memory.
And the other half is logic.
And you could say, okay, well, that's a computational memory chip.
They just call it a processor instead.
Right, right.
Yeah.
So, you know, we're kind of already there with that stuff you know the
mixing mixing a processor with memory and you know chiplets which was another subject that
was brought up there is just a more economical way to put more memory and better logic together
on the you know in the same package even if you don't do it with the same chip
and i will politely say I have no comment.
Our resident AMD expert here.
But software, of course, not hardware.
I'll have to mention that.
Yeah.
Yeah, yeah, yeah, yeah.
So what do you think about this AI consumption of storage?
Does it seem like it's going up?
Oh, yeah.
Yeah.
I think Jason can talk to that more than I could, but I know that it seems almost like people are beating their chest about how big their data sets are.
Yeah.
Yes, they are. and the amount of data sets that are basically the amount of the size of the training sets plus
the size of the sets that are being generated on basically the back end of that. And then the fact
that you're generating thousands, if not millions of those. Yeah. It increases consumption of data.
Yep. Yeah. It seems like I'm hearing this company say, I do a trillion data points.
And the next one says, well, I do a bazillion data points.
And then the next one says, well, I do a kazillion data points.
Well, there was an article by Epic, I think, or something like that.
There's an AI consultancy.
And they said that the size of the data is going up by almost 3x a year that's being used to train these models.
Yeah.
And it was like 10 to the 13th words, whatever the heck that means. It's probably 10 to the 15th bytes, something like that.
We're talking.
And that's because every year we keep generating more data,
but that doesn't always mean that it's valuable data.
It's so much reminds me of the old computer adage, we always used to use garbage in, garbage
out.
Yeah, right.
And this is, I mean, I think one of the things that really needs to happen in AI is looking
and really filtering what is being used as a training subset.
Because if you just take everything on the internet, let me tell you, there's a lot of crap on the internet.
Yeah, mostly spam.
Yeah, there's a lot of useless data on the internet.
If you train knowledge stations around that,
then it's not going to be a healthy environment.
That's true.
There's a lot of cleaning that has to go into this sort of stuff.
And they are hitting the data wall sooner or later, whereas they're going to run out of real human
data. And the only thing they're going to have left is AI generated data. Yeah. And that could
be a challenge. Oh, yeah. And then the AI is going to generate even more data. And then you're going
to train the AI on AI generated data. Right, right. I've actually seen slides where people show how data is growing,
and they show the human generated data versus the machine generated data. And what they're saying
is machine generated data is basically the output of every security camera on the face of the planet.
And exactly that kind of stuff. Jeez, you know, you could never run out of that.
Right, right. Well, at least that's real data.
It's at least capturing images of people doing stuff and things of that nature versus AI-generated images and videos and text and stuff like that.
Yeah, but how about three hours of video of an empty park because the park's closed?
I mean, so you process that.
And give me one minute while I step away. I'm going to go grab a notebook because I have
actually done the research on this and it's actually some interesting data about this.
Right. Okay. So we'll talk about this parking lot. What about the ant activity and insect
activity that's going on? Isn't this of importance to somebody?
They probably learn a lot of stuff if they looked at them.
You know, something that I think, you know, I like to look back on things just over the course of my own life and say, you know, how is it that we've discovered this new thing here, that new thing there?
And a lot of it comes down to the fact that nobody was really looking hard
at a certain thing at the time.
And now they're looking hard at it.
So if you talk about ants,
I bet that if you devoted three years of your life
to studying an anthill,
that you'd probably come away with some big revelations
that might not make any difference to anybody.
But- Or it could make a difference to swarm robots swarm robots and stuff yeah yeah yeah you know and and you know
it could be that people say hey wow ants communicate by their knees right so this is like
this is funny because i said i actually had a notebook and i it's like stuff that i have written
down on paper oh that kind of notebook.
So, yeah. So this is, this is good stuff.
Excuse me while I get my cuneiform.
Yeah.
Hey, let's not go there. I'm right. I'm taking notes in real time on paper.
Basically in, so it's interesting. So data growth,
so data growth generated annually has grown
significantly year to year since 2010. So 13 years, the data has increased by 60X,
two zettabytes. There were two zettabytes of data in the world in 2010. There were 120 zettabytes of data by 2023. It is predicted there will be 180 zettabytes
in 2025. Every day, 328 million terabytes of data are generated.
Okay. 328 million terabytes.
Terabytes.
Does it say how much of that is human generated and how much is machine generated or anything like that?
It does not.
But effectively, that's 328 exabytes of data that are being generated every day.
Do you know what share of that is cat videos?
54%. Yes, okay.
So 54% is categorically video.
13% is social 10 is gaming and five percent is web wow i'm surprised gaming is you know i didn't realize it generated that much data
250 million emails are sent every minute i get half of those, I'm going to.
Most of them are spam.
Yeah, I was going to say,
oh, they're all spam.
And there are 333 billion emails sent per day.
Wow.
And then a data center breakdown.
The United States has
5,388 data centers.
Germany is number two
at 522 data centers.
Wow, that's a big difference.
There are 517 data centers in the UK,
and China has a reported 449 data centers.
I can't believe that.
I don't either.
Or they're really big data centers.
Exactly.
So anyway, those are some of the data points that I'd collected recently.
Yeah, I mean, it's 220 exabytes of data being generated a day seems to be, and even if half of that is video, seems insane.
It's insane.
Yeah.
Yeah. Yeah. You know, though, have you ever heard there's some statistic about how much data is generated by the monitoring hardware on a 747 for every flight?
And it's something like five terabytes or something.
Yeah.
I mean, they're monitoring every one of the jet engines, probably every one of the fuel injectors.
It's quite intense.
And you think that's bad, the 787, which is like all digital.
Yeah, exactly.
And they probably got cameras everywhere.
And guess what?
It's probably like 54% video.
Probably, probably, probably, probably.
So the data stuff is growing out of sight, out of mind, with no end in sight.
One question, you think how much of that data is being stored?
I mean, obviously the emails and all this stuff is being stored.
I always thought that was like the zettabyte numbers were what was transferring across the network.
But the way you classify it, it's almost the data is being stored.
If 54% of that data is video, I guarantee it's being stored someplace.
Well, do you use Google for your email?
Yeah, among other things.
Then every email that you've ever sent is stored.
Yeah.
No, I understand.
Outlook is not that far off that mark.
No, they're not. There's no cloud provider that's not basically storing all data infinitely.
I remember back in the old days when I was running Novell networks and we had very specific quotas that we would put on there.
Delete it. Delete it. Remove it, get it off, get it off the system. Right. And that's, I can't remember Lotus CC mail that every time you sent an email,
it said, should I keep a copy? Now you've got a send items folder that's got all this stuff.
And it's something that I've said a long time ago when people started telling me about data
deduplication. I said, I bet that if you had a really sophisticated system, you could do a good
job of it because you need to, you know, somebody sends me a, you know, scan of a cartoon, you know,
one, one frame of somebody doing something silly with the caption underneath it, you know, and,
and I look at that and I say, ha ha, that's funny. I store a copy of the cartoon on my C drive on my PC, and then I send it to 10 of my friends.
And what I've done, you're the problem. Yeah. I've got one copy in my inbox that I probably
won't delete a copy in my sent items folder, a copy on my C drive and my 10 friends each have
a copy in their inbox until they forward it. Yeah, yeah, yeah, yeah.
No, it's certainly a perennial problem.
It hasn't gone away.
I actually started deleting some photos here today.
Oh, you might need those.
I take photos of slides and stuff like that at presentations.
So what I ended up doing is I moved them from the photo library to a specific library associated
with a conference, and I deleted them out of the photo
library. So actually no net change in capacity,
but at least they didn't double it.
I tell you a very,
very interesting observation about that whole data growth piece too,
is when you look at when the data growth really started,
when it started to basically that hockey stick that we talk about,
it was 2010. There were
two zettabytes in the world in 2010. Um, so Facebook was founded in 2004 and Twitter was
founded in 2006. Social media started to be a thing. Social media started to be this thing.
And then everybody with a cell phone started to be able to get access to it. And then when you think about that, in that 13 year period between 2010 and 2023,
it went from two zettabytes to 120. And between now, basically, so well, 2023, so last year,
and next year, it's going from 120 to 180. Yeah.
I mean, social media is the data growth catalyst, right?
Right.
Right.
Well, that and the video, actually.
Yeah. It's the video that people share on social media.
Yeah.
Right?
That too.
That too.
The damn cartoons, Jim.
Yeah.
That's the universal media.
Okay.
All right, guys. this has been great.
Any last Jason, any last questions for Jim before I let him go?
Uh, I don't think that I have any, uh, Jim, Jim was great as always, uh, talking with you. Oh, thank you. Jim, is there anything else you'd like to say to our listening audience before we
close? Nah, you know, I, I think that what will be a lot of fun is if people come back to this podcast in 10 years and listen to it again,
and they'll say, oh boy, were those guys ever wrong? We're always wrong. Wouldn't be good if
we're right all the time. Can you believe those guys are talking about 180 zettabytes?
It's nothing. I think I have that on a USB stick in the back.
I was thinking the code in my AI hearing aid
is more than that.
Probably.
Yeah, probably.
Well, this has been great, Jim.
Thanks again for being on our show today.
Oh, thanks for inviting me, Ray.
All right.
And that's it for now.
Bye, Jim.
Bye, Jason.
We'll see you, Ray.
Have a good one. Bye, Jim. Bye, Jason. We'll see you, Ray. Yep. Have a good one.
Until next time.
Next time, we will talk to the most system storage technology person.
Any questions you want us to ask, please let us know.
And if you enjoy our podcast, tell your friends about it.
Please review us on Apple Podcasts, Google Play, and Spotify, as this will help get the word out.