Podcast Archive - StorageReview.com - Podcast #112: Why HDDs Aren’t Going Away
Episode Date: October 5, 2022Brian catches up with Broadcom’s Rick Kutcipal for this session. Rick is a Product… The post Podcast #112: Why HDDs Aren’t Going Away appeared first on StorageReview.com. ...
Transcript
Discussion (0)
Hey everyone, Brian Beeler. Welcome to the podcast. Thanks for joining me. We've got a conversation today with a key component supplier in the storage business, in addition to other things that I'm probably forgetting, but you know them from the Broadcom braid cards and all sorts of little tiny componentry that's on a ton of drives out there and other systems that has an important part of the overall ecosystem. And Broadcom has some interesting
thoughts on hard drives, the future of storage, and wanted to bring him in today to learn a
little bit more about that. Rick, thanks for coming in with us today. Yeah, thanks for having me.
So I characterized you as a RAID card company and a supplier of little bits of hard drives.
I know you do a little more than that, but tell us from your perspective, what part of the world do you interact with?
Yeah, so our division is, we call it DCSG or the Data Center Solutions Group.
And we have responsibility for, you know, so it's an amalgamation of all these different pieces, right?
It's the LSI RAID piece where we have our RAID cards and our HBAs and SAS expanders.
And then we took on the PLX acquisition.
So we have PCI switches.
And then recently we took on Ethernet NICs.
So we have quite a broad portfolio, mainly focused on the data center itself.
So what, just for Broadcom's vision then, how do you communicate that?
What's the higher level messaging?
What's your view on the world?
What do you guys want to be or want to be perceived as, I suppose?
That's a good one.
From my opinion, it would be servicing the connectivity in the data center.
And the data center is used fairly loosely.
It would include, in this case, the storage OEMs
and the server OEMs, along with the true data center.
Okay, so let's start working through that a
little bit because you guys were talking a couple weeks back about what some of
the future looking trends are that that you guys see in the data center and one
of them is around hard drive storage so I want to get into that specifically
because it's something we get hit on every time we post a video or talk about
we just did a whole series on the WD Gold 22TB hard drives.
Pretty cool. They're fast.
We're testing 8 of the SATA drives and having a pretty good go of it.
But every time we do that, we get a bunch of people in the comments saying,
Oh, you know, hard drives are old.
Old technology, dying technology.
Hard drives suck. Flash is king. Whatever.
And, you know, I understand a lot of those are client end users, consumers that don't really understand storage at scale, which is fine.
But you guys have good visibility into the hard drive space,
and you've got some thoughts about longevity there and what that looks like for capacity, not quite, well, maybe archive tape
is a play there too, but active archive type tier. So what's the position there and how are
you guys thinking about hard drives? To be clear, right, we see Flash as a very important part of
the storage ecosystem, growing rapidly rapidly very important um so not to
discount that at all but but at the same time the relevance of you know hard disk drives especially
in the hyperscale and modern hyperscale architectures is is huge and it's growing right
i mean a data point i like to call out is last in this, this came from trend focus data last year, 2021, about 90% of all the storage shipped into the hyperscales were, was rotational.
Right. So there is a, there's a huge need for rotational resources, all driven by the dollar per gigabyte, you know, dollar per terabyte, pick your metric,
all driven by that, there's a huge, huge increase
in capacity and capacity demand in the hyperscales.
And, you know, we feel that it's important
that the broader storage ecosystem
still is investing in that.
And there's plenty to be done to enhance,
you know, enhance the needs of the data center.
Well, yeah, I mean, the hyperscalers, for sure, there just is no substitute for the responsiveness
equation and the capacity per dollar. I mean, there's just no other answer there. And tape is
still a play, of course, in some of the colder data tiers. So every time I hear people complain about hard drives, I said, well, let me blow your mind.
They're still selling a ton of tape into the hyperscalers, too.
And you like that super cheap deep storage?
What do you think that runs on?
It's not SSDs, I assure you.
Right.
And what's interesting is it's that tier above the cold, you know, or the archival tier that is growing so rapidly
within the hyperscalers, right? That warm tier that, you know, some people call it a warm tier,
some people call it transactional tier, right? With the advent of, you know, video,
massive amounts of video, whether it's from social media, surveillance, online shopping,
right? It's, you know, It's very, very important.
And then you combine that with analytics
and the data's got to be warm.
It can't, it's got to be there.
It's got to be ready to be worked with.
And so that's what's driving the need for HTDs
in the hyperscale.
So let's go one step further then.
If we know anything about hyperscalers, OCP is coming up in whatever, three weeks as we
sit here and record this on the 23rd, I think.
But we've got OCP coming up.
And I love the OCP show and haven't been in a couple of years, but I love it because for
my money, it's the best visualization of what's going to happen in the enterprise in four or five years.
And we've seen that come true with things like M.2 SSDs when Facebook was showing those off six or seven years ago on these crazy sleds that they were doing.
That became part of the way systems were booted and used in the enterprise.
All sorts of innovations around every server now has the OCP card slot and that NIC has
become somewhat ubiquitous, or that NIC slot, and so much more, liquid cooling and many
other things.
I love that show because it starts to show you a lot of what's coming and the hyperscalers
are driving that intentionally or otherwise
because everyone else in the supply chain that's manufacturing for them wants to have
some processes in place and not make 87 varieties of every server or card they produce.
So that said, hard drives are the number one seller into the hyperscalers, but let's get
specific on interface. What does that mean in terms of what you're seeing there? You know, hard drives are the number one seller into the hyperscalers, but let's get specific
on interface.
What does that mean in terms of what you're seeing there?
Is it SATA?
Is it SAS?
Can we get off of one of those two?
And after you answer that, talk to me about NVMe because we've seen some science projects
at OCP around NVMe hard drives.
So cover off on interface.
What does it mean?
Yeah, so we'll start with SAS and SATA. Right now the cost per capacity provided by SATA,
SATA nearline drives, is very difficult to compete with. There are some distinct advantages
with going with SAS, dual ported, QDAP. There are a number of different things. But at the end
of the day, it's going to be, you know, SATA and near line drives are going to be the platform of
choice. And what about NVMe? Because I know that's crazy in hard drives, but there's an argument to
be made that it simplifies architecture design, but what do you think?
So it does, and this is a big emphasis,
there's a big effort going on within OCP right now.
There's a subgroup that is focused on NVMe HDDs.
And on the surface, I agree that the value proposition
does seem compelling from a software perspective.
The problem comes in the scalability, right? When you're trying to
compare it to architecture, you know, similar architectures, whether, you know, it's for,
for you nineties, you know, how, how do you, how do you, how do you construct that type of topology
with NVMe HDDs? It is possible, but then, you know, then it, but then from a cost perspective, it's not competitive with a true
SaaS topology with SATA drives. And what about, you guys have an interesting, since you're
in the market, but you're not a drive manufacturer, you've got some interesting
opinions and visibility into the space, I think. What do you think then about multi-actuator and
does that do anything from the interface perspective that changes the game at all?
Yeah. So, you know, multi-actuator is a hot topic right now. Dual actuator drives are out there,
you know, being tested and in limited production. And it is, it does address a very specific need that the hyperscalers are
dealing with right now. And that's the IOPS per terabyte metric. As these drives get so big,
and that pipe, you know, being limited, you know, by that, that one actuator is being limited to,
you know, a very small amount, right? You have islands of storage and some of it
then can become stranded behind these small pipes.
And so adding another actuator and dual actuator
effectively nearly doubles the bandwidth of that drive.
And so that is compelling.
And SAS handles that very well, right?
With multiple LUNs, you map an actuator to a LUN
and it's just done. It gets
a little more complicated with SATA. SATA doesn't have the notion of multiple LUNs and so T10 has
done and T13 have done some work to be able to try to account for that and so SATA dual actuator
drives are actually being tested right now as well. One thing that is, you know,
being watched very carefully is, you know, two things actually is the power of the drives,
right? The second actuator adds power to the drive. So the hyperscalers will have a metric of
power per slot, power per terabyte. I mean, you know, it varies, but there's going to be some sort of metric like that,
and they will suffer with the extra power associated with that second actuator. And
early dual actuator drives, you have to remove a platter to be able to accommodate the extra
mechanicals. So you're going down in capacity and you're going up in power. Now, that's still to be seen.
The drive vendors are working very hard on that.
And I think it is overcomable.
But it is something that needs to be watched very carefully.
Right.
So for people that don't follow the dual actuator space,
you're essentially getting two hard drives squished into one in terms
of the bandwidth you can get out, because you've got both actuators that can both do
whatever it is, 265 megasecond or some number, right?
In an aggregate, then you're looking at five, I don't know, 50, 60, whatever that is, right?
So the potential is really neat there.
The capacity is a challenge, but the hyperscalers don't have the same metrics of capacity and performance that an enterprise
does. So it'll be a bit curious to see if dual actuator can move past the hyperscaler
or if there's a need for that. What do you see for dual actuator in the enterprises?
Do you see, what do you see for dual actuator in the enterprises? Do you see interest in that?
Interest? Yes. A lot of the server and storage OEMs are looking at them very carefully.
I think it's still too early to tell how they're going to be deployed in a true enterprise
environment. What about form factor?
Because the hard drive, aside from the two and a half inch hard drives
pretty much going away at this point,
has been that same three and a half inch form factor for a very long time.
Is there energy within OCP or other places to look at some other shape?
I mean, there's no reason why they need to maintain that size.
If the hyperscalers combine enough volume, they can change the game, right?
Yeah.
This is a hard one because of the mechanical, right?
I mean, SSDs have the advantage where it's all solid state,
so you can lay out the board differently,
and you can do all sorts of different geometries and you're seeing that with all the edsff
the long rulers those dudes are right one right e1 e3 right all the different versions um rotational
drives right now you have round platters you have actuators um so you have some some fundamental
constraints there um the one thing that is being investigated right now are, I call them thicker drives.
Some people call them, you know, one inch drives.
And what that means is it's still a three and a half inch form factor.
Don't get confused on that.
But it's just thicker.
It would be more platters.
So a higher Z height then, basically.
Yeah, correct.
That's why I call them thicker instead of, you know, giving it a one inch number because that just gets confusing.
So the, yeah, so they'll have, you know, upwards of 20 platters.
Now, right, there are a couple problems with that.
One is, you know, you know, a standard rack won't handle that, right?
I mean, so you're talking about new enclosures
from the ground up.
The second one is that problem of IOPS per terabyte,
these capacity problems, which are growing anyway
with aerial density improvements.
But it takes that problem of putting now twice as much
capacity in a 3 and a half inch form factor.
And how do you deal with that? You know, is the pipe wide enough?
And what happens when a disc, you know, platter fails? You know, things like that.
Yeah, well, you do get to some resiliency questions.
And how much of that drive can you start to fail before you fail the entire drive, right?
Right. Then you've got, you know, rehydrating that drive can you start to fail before you fail the entire drive, right?
Then you've got rehydrating that drive and filling it back.
There are a lot of challenges there.
It's interesting though just to see how the hyperscalers are, that's their special sauce,
is hardware management.
So there's not as much of that that's transparent to the rest of the world.
But some of the hardware decisions are, and then from that you can try to infer some things.
But what else in hard drive technology?
I mean, we've talked interface size and how about SMR?
What's going on with shingles?
What are you seeing there?
I mean, obviously, hyperscalers can adopt those easier than enterprise because they
can modify their software.
That's one of the biggest tricks with SMR.
What are you seeing there?
WD has shown off 26 terabyte SMR drives.
Yeah, so SMR, shingled magnetic recording, it's a very interesting technology and its
main purpose is to improve the aerial density, right?
Taking tracks instead of leaving small gaps
between the tracks, the read and write tracks,
then you start overlapping them.
And so it sounds great,
you're improving the aerial density and that's all good,
but it comes at a cost and it's on the rights.
The rights all have to be sequential
and the tracks are laid out in zones
and the zones all have to be managed and the tracks are laid out in zones and the zones all
have to be managed and so to your point it's an excellent technology for someone who owns
their own os their own applications right is owns their own data center right and that's that's kind
of the the signature of a of a modern hyperscaler um in contrast to server OEM. A server OEM has to be able to support numerous OSs
and who knows how many applications and so it's not sustainable. But in that sense,
you can get aerial density gains and so for the same drive you can actually you know get more capacity how much more um that's
debatable um it depends on a number of different variables and often often vendor specific yeah
i mean i remember i can't remember if it was wd or seagate put out a branded product for
for client systems and uh slipped in or accidented in an smR drive and people found out real quick that bought those
not knowing about SMR that
throwing that on a Windows system and using it as your local share
can be suboptimal depending on how you're
you're interacting with the drive but the hyperscalers don't suffer from that
they can they can tune their software which
which makes all the difference.
The other technologies, Hammer and MAMR and all these other things,
have been extraordinarily slow to come to market.
Do you see those hampering?
I mean, I know you guys are very bullish on hard drives for the hyperscalers,
but if we can't increase capacity at some speed, does that matter that much?
Yes. Yes. So we, the storage ecosystem has, you know, we have to keep increasing the capacities
of these drives. If that, if that, if that innovation, if that metric stops or slows down
dramatically, you know, the, the SSDs, yous, you know, they're making big investments in the SSDs.
You're seeing QLC architectures coming on.
You know, there will be something that then, you know,
they get the dollars per gigabyte, you know, in line with HDDs.
So the thing that protects the HDDs is that, you know, continual evolution of capacities.
And, you know, I see technologies like SMR as baby steps in that, you know, it's enough to keep it,
but it's just barely, right? Small incremental steps. It will be, you know, innovative technology
or recording technologies like Hammer that
are going to take it to the next level.
You mentioned Hammer and some of these advanced recording technologies have been in the works
for a while.
In theory, they are starting to come to fruition. Some of the drive vendors are being very aggressive on, you know, marketing it and, you know, roadmaps, etc.
And so we're expecting to see hammer play a role in capacities of HDD of, you know, drives in the 20, you know, low 20
terabyte range, you know, it's going to bump it up, you know, into the 30s.
Well, that there is the game changer, right? Because you talked about QLC flash drives,
and we're doing a ton of work with the SolidIM 5316s, the former Intel part and we've got a server upstairs with 24 30.7
2 terabyte drives so we're looking at three roughly three quarters of a petabyte raw into
you which is like that's just crazy thinking about that three or four years ago you know just saying
something like that just you would sound like a crazy person. And in the cost of that part,
just retail one drive for this conversation, say it's 4,500 for that flash drive and a 22 terabyte
WD Gold is around 600 or 550, whatever. So you're still paying an eight and a half,
nine and a half X premium for the flash just on a on a capacity basis but with software if we
can dedupe if we can compress if we can do other smart data things that math
changes pretty quick depending on what's being stored so you talked about media
files but the hyperscalers are really good at this and being efficient with data placement.
If you can get 3, 4, 5x compression dedupe on these drives, it's not parity, but it's getting close,
especially when you start to think about the operational benefits of power and heating and all those other things.
Right. The consensus is around 3x. So once that dollar per gigabyte metric gets within 3x between SSDs and HDDs, that's when the hyperscalers are going to start, you know, seriously considering, you know, flash and SSDs for this warmer tier. Well, yeah. I mean, Solidim's talked about PLC NAND now too. So
five layer cell and who knows what that will look like. So far it's been very experimental
and they've shown a couple of demos and there's definitely potential there. And they're really
the only ones talking about it. Surely Samsung is working on it as well. But QLC has been tricky.
I think before we walk all the way past it, I mean, that is not easy. And Solidigm and Intel
before are the only ones that have really shipped a lot of that and done well there. I mean, there's
all sorts of challenges around these flash technologies as you try to jam more, you know, electrons into these things, right? It's not trivial, is it?
It's not, right? And so I, you know, I always find myself, you know, rereading something when
somebody says, you know, they just say, oh yeah, QLC. And they just, they don't talk about the
complications associated with it. They, you know, same on the HDD side, you know, they say, oh yeah, QLC. And they just, they don't talk about the complications associated with it.
They, you know, on the same, on the HDD side, you know, they say, you know, about SMR, but they
don't talk about, you know, the complications associated with it. You know, these technologies
are very difficult and it doesn't matter HDD, SSD. I mean, we're, we're at the point where
we're solving really, really hard problems right now. And so when you, when
you find a technology, you know, pick it QLC, SMR, you know, it, it, it's not going to be for free.
It's not, it's not like you just thought up of something new and it's, and it's going to be
all for free, right? There's, there's all, there's always other pieces to it. Other complications,
whether it's, you know, manufacturing, whether it's power,
it's a very complex industry. Yeah, absolutely. So let me ask you about that. It'll be an
interesting pivot to the adapter side of the business. I mean, everyone knows,
you've talked about LSI RAID cards, but they were ubiquitous as the primary way to aggregate storage for decades, really.
The math has changed a little bit, some of it because of things like these QLC drives,
and you talked about the challenges.
Really what you're referring to are how they want to be written to, right?
So QLC is fantastic for read, similar to SMR hard drives,
but needs to be written to in an organized way and in
a block size that the drive wants to receive to be efficient and manage write amplification.
Those behaviors in addition to some of the other NVMe speed challenges have led to this
whole new series of accelerators. Guys like Plyops is one, Grade is another.
There's a bunch of them that are FPGAs or ASICs that want to sit,
even DPUs now too, that want to sit on the PCIe bus and do that drive management,
thereby effectively becoming the RAID card.
With the background that you guys have in storage,
I'm curious what your view is on that part of the drive aggregation space and drive management and
kind of what that looks like to you guys. We love to see innovation in the space,
keep everybody thinking and thinking about the you know, the end game,
then, you know, the next step. So technologies, you know, that we're seeing coming out, you know,
you know, we love, we love to see it. Now, you know, again, any time you solve one of these problems, right, there are other, you know, other associated issues with it. And that's what we're
seeing a lot with these technologies is they're in their infancy. You know, some associated issues with it. And that's what we're seeing a lot with with these technologies
is they're in their infancy. You know, some of them, some of them will, you know, some pieces
of the technology will, will, you know, make it to the mainstream. Others, you know, will be a
learning, a learning experience. But but inevitably, is that what you call 50 billion into a startup that goes under?
You got to be careful. But it's good to see, you know, thought leadership in these areas,
right? And different ways to do it. You know, so can I say one's going to, you know, win over the other? No, no. I just like to see the competition. I like to see people thinking outside the box, if you will.
Well, what do you guys have to do then to remain,
this is going to sound harsh, but to remain relevant then?
Because people aren't, it's rare to see someone just throwing eight drives
on a RAID card and being done with it.
I mean, it's still happening and it's definitely happening in the SMB quite a bit as a file server or
whatever.
But now to get performance out of these drives, we're seeing so many people with these Linux
systems or sort of CluG software RAID or other alternate ways of going about drive aggregation.
From the RAID card perspective, with all this innovation and changes, especially
with NVMe storage, how does the RAID card have to evolve to stay relevant with these
more modern architectures? Yeah, that's a good one. Remember that traditional RAID architectures
assume HDDs, right? We're leveraging seek times and, you know, other general attributes
of a drive to do different functions. And that's all changing, right? With SSDs, whether it's
NVMe SSDs, it's very, very, very different. And the RAID architecture has to fundamentally change,
right? And that's what's happening. That's what we're seeing within our own organization
and its investments in that RAID architecture
to address things like RAID 5 performance,
rebuild performance, things like that.
And it is doable, especially with modern technology.
Instead of using traditional memory architectures,
maybe different memory architectures
to solve different bottlenecks. So, so yet, if you just, if you just rely on your old-fashioned
RAID architecture, you know, that's, that's, that's not going to get you much farther.
So it sounds like you're, I mean, obviously you guys are always working on things, but you're
actively looking to innovate in this space. And, And I mean, I know you're not announcing anything today,
but stuff is coming, I take it, is the information.
Absolutely, we still think data protection
is a very, very important piece
of the overall storage ecosystem,
and we're investing in that.
What does Broadcom want to do in that space? Really maintain the
hardware engines that you have in place, or do you explore, and I know you can only talk about
so much of this stuff that's not public, but do you explore alternative methodologies, erasure
coding, or other software engines, or other things that might be more appropriate for drives
and certain capacity points.
I'm just sort of curious.
You were talking a little bit about RAID, and that's definitely the standard 1 or 6 or 10 or whatever.
But is there more past RAID in your vision?
Yeah, so I was very careful when I said data protection.
We think data protection is very important. RAID is a type of data protection. And, you know,
to just put on the blinders and say, you know, that's the way it's going to be forever, right?
You have to be careful with that mentality. So we're looking at, you know, data protection
holistically. So yes, there's a big focus on RAID,
but we have to be looking at other alternatives as well.
So we sort of glanced past it,
but DPUs are becoming more and more popular
at this past VMware Explorer.
VMware really opened up vSphere and vSAN
to leverage DPUs from HPE Pensado and, I almost said Mellanox, but NVIDIA Bluefield.
So there's some interesting innovations there, some opportunity in terms of storage management.
What do you guys think about the DPU space?
Is that something that's interesting to you?
It's new, and we're watching it carefully.
Okay. I'll leave it there. Okay. So I'll put that one in your basket of startups to monitor.
Well, to be fair though, Broadcom's a big company and if you're going to go to market with something,
you need to feel reasonably sure that there's a certain amount of margin and volume distribution to make that thing work, I guess is the economic reality of it, right?
Right. Yeah, no, it comes down to a business model, and everything will be vetted based on that. The another new technology CXL about to
in the next few weeks, hopefully, maybe
be generally available as as server vendors and chip vendors make announcements.
What how does that impact your business or how do you see CXL impacting storage?
Yes, CXL is a hot topic, right? If you're at Flash Memory Summit,
I mean, you couldn't go very far without, you know, being in a conversation about CXL.
It's a very emerging topic. From a storage perspective, you know, it's not clear what role
CXL will ultimately play in pure storage, right? The pure storage ecosystem.
It's going to be very important in memory and CPU, GPU connectivity.
But from a pure storage perspective, still to be seen what role it will play.
Again, it's another technology we're watching very carefully.
We are intercepting it with our PCI switches, but for our traditional storage adapters,
we're still watching it carefully.
The other thing you talked about too was a little bit of networking and fabric.
What do you guys see going on over there that's interesting? Ethernet is a very, very innovative sector right now.
We play with Ethernet NICs,
and there's a lot of innovation happening there,
especially within the hyperscalers, right?
The hyperscalers are using Ethernet very prolifically.
Prolifically, yeah. A lot of it.
Prolifically within their, you know, modern racks, right? And so Ethernet's playing a
really important role, not only in the, you know, the traditional network, but also with
the hyperscalers. And what, I mean, we're coming up on sort of 100 being kind
of, you know, standard-ish for higher end stuff, but what, I mean, I guess obviously 25 and 40 are
still, you know, kind of at the more mainstream or SMB range, but do you see the need to start
looking at 200, 400 faster interfaces?
What do you guys see there in the business?
Our architects are looking at advancing beyond 100.
We do believe that there will be needs for that in the future.
Okay.
Well, look, I mean, this is a great conversation.
The fact that you guys touch all these different markets, I think, puts you in an interesting position to talk about, especially on the storage side, what's going on.
Because, you know, when I do this with the vendors, you know, they've got a very specific message that they want to get out, depending on who it is.
You know, Seagate's heavy on actuators.
WD's heavy on capacity and SMR gains and whatever. But the perspective is
great. Really appreciate you being here. If people want to learn more about Broadcom's
involvement in this stuff, send them to the website. Where should they go?
www.broadcom.com is a great resource or contact your local sales representative.
All right. Thanks for doing this. Appreciate you coming on.
Yeah. Thank you.