Podcast Archive - StorageReview.com - Podcast #138: Solidigm Liquid-Cooled SSDs
Episode Date: June 6, 2025Providing a look at the latest in liquid-cooled SSDs from Solidigm. Brian invited Cody… The post Podcast #138: Solidigm Liquid-Cooled SSDs appeared first on StorageReview.com. ...
Transcript
Discussion (0)
Hey everyone, welcome to the podcast. I've got with me Cody from Solid Dime, who actually
is part of the team that did something pretty crazy at GTC. They decided that cold plates
and SSDs need to go together and so they ran a demo showing what can be done if we liquid cool SSDs with these high end,
presumably AI or GPU servers?
Cody, what is going on over there at Solidigm
and what makes you guys think that liquid cooling SSDs
is an important idea?
Hi, Brian.
I wanted to first thank you for inviting me here
on the podcast and excited to be able to talk
a little bit about our, you know,
innovative solutions that we've come up over here with over here at Solid Dime. And yeah, so I think
that as we've seen the data center market evolve from, you know, strictly air-cooled to a hybrid
cooled solution, and now looking at, you know, the market twine with these fully liquid-cooled
GPU servers or high-performance AI servers.
I think that's where we saw the opportunity to introduce an innovative product that allows us to
push the market forward, if you will, in terms of, hey, how do we liquid-cool other devices
besides these high-powered CPUs or GPUs at the platform level?
besides these high powered CPUs or GPUs at the platform level.
Yeah, I mean, obviously the the liquid movement, whether it's directed chip immersion, which we can talk about a little bit
too, because we did a couple weeks ago, take a bunch of your
E1S drives and dunk them in Castrol data center fluid as
part of a video we did with with Doug, I don't know if you've got a chance to catch that one yet,
but the vision of hot swapping an SSD
and having to dump the oil out is pretty humorous.
But yeah, all these liquid cooling things are happening
and most of us think about it at the chip level,
either CPU or GPU.
But I think the trend that you're talking about,
and surely was a hot topic at GTC,
is can we liquid cool the entire system?
You know, Lenovo's done that a couple years
with their Neptune systems,
where they put copper on everything, the DRAM.
They've even done it with hard drives or SSDs
and these little blade servers.
What you guys were showing was a larger block, water block and SSDs and these little blade servers. What you guys were showing was a larger block,
water block and SSDs.
And getting after your piece of that puzzle,
how do we get smarter with cooling storage, right?
Right, right.
And so what our solution was is we wanted to maintain
the key functionalities of SSD.
You talk about things like hot swap ability or service ability.
I think that's a key aspect.
From what I've seen, there is oftentimes more cumbersome solution or something that
requires a special tooling to replace the cold plate or requires some downtime at the server.
And so what we wanted to demonstrate is that we're able to
take our normal SSD, we're able to look at coolant
with a carefully designed cold plate mechanism,
and we're able to extract the heat from just one side
of the SSD, even at these higher gen five speeds.
So this is interesting, and I probably should have started
this with, you're a technical guy,
you're not a product marketing guy,
or maybe you're a hybrid.
What is your function anyway over there?
Yeah, so I'm a thermal mechanical design engineer.
Perfect.
This is one of the rare cases where a company's brave enough
to let one of the nerds out of the lab to talk to a media outlet. So this is perfect though,
because we could really get into it if it was just a regular old run of the
mill marketing folks, they sometimes don't have the technical chops.
So you'll know everything about this, which is fantastic. So when we,
we think about cold plates,
traditionally pretty much with a conductor in between they
sit on the chip and as the chip off puts out its heat it runs a liquid loop
through there and then right places that with cold water or you know water water
mixed in with some chemicals to move that off somewhere else to cool it back
down to bring it back through and those designs have been largely driven by
vendors like Cool IT that are responding to
the needs from Nvidia, AMD, Intel, et cetera, Chill9.
I mean, there's a number of vendors out there
with really innovative cold plates.
But I haven't seen that same cold plate vendor crew
go after storage, and I guess it's because
they haven't felt like they really needed to
or weren't asked to, which is kind of where you stepped in,
and with you specifically understanding the thermals
of especially Gen 5, and it's not just Gen 5,
this problem gets worse as we go to Gen 6,
Gen 7 and beyond, right? Right.
You guys stepped in and really understood the heat profile of an SSD, and that's where
we ended up with this demo.
Exactly, and I think that we've had some conversations with some of our customers asking about, we
see liquid cooling happening at the data center.
What are your thoughts on SSDs?
Do you think this is going to come forward to the drive bay?
Some of the conversations,
we're like, we're not quite sure.
But I think that what our team saw is that with
these new high-performance data center servers that
are coming online, we were just thinking it's only going to be a matter of time before they
decide, hey, we don't have the space for these fans.
We have to re-architect the system in order to enable 100% liquid cooling across the board.
And so what my team and I did is we came up with a sort
of cold plate reference design that would say that, okay, as we looking forward, we see these systems
potentially going here to 100% liquid cooled, how can we help the market move things along?
And so what we did is we came up with a, you know, a unique mechanism that helps to facilitate heat transfer between
the SSD and the fluid, and we're able to maintain these key features that we've seen in the
data center up to this point.
So I think it was just us having this, hey, what's coming next mindset, an innovative
mindset along those lines and trying to figure out how do we help enable this, you know, hey, what's coming next mindset, sort of an innovative mindset along those lines and trying to figure out
how do we help enable this even though we haven't quite seen it
yet in the market. And, but we want to make sure that, you
know, these sort of things are unblocked, so that way we don't
slow down the innovation happening.
Well, I mean, it's a, I think it's a great technology halo for
you guys. I mean, obviously we've talked about solid diamond ton
over the years from the capacity leadership in QLC,
which remains with the 122 terabyte drives,
but also gen five, right?
You have very fast drives for the latest enterprise
workloads and while we all get wrapped up in AI
for good reasons, there's still, I don't know, like these database things
that still happen in the enterprise
that organizations want to go more quickly.
And, you know, there's other things that we sometimes
don't talk about as much anymore
in the AI washing of all IT marketing.
But yeah, the SSD itself in your demo,
let's tear into that a little bit.
You're showing E1S drugs, which are the kind of the standard.
If I'm looking at like the NBL 72 racks from Nvidia,
they went to E1S and those very popular
with the hyperscalers,
a little less on mainstream enterprise,
but also liquid cooling's a couple steps behind
in that group anyway.
So E1S makes sense just from an adoption standpoint,
but if we think about what you've got there
is a pretty slim SSD if you take the heat sink off,
it's what, a little under six mil, is that right?
Just the four?
It's the nine and a half millimeter form factor. Okay. And then we've
got the heat sink wrapped around it, but you've got NAND on both sides, you've got DRAM and
the controller on one side. And I guess those two components get the warmest. So as you
think about where you need to apply cooling, is it to both sides? So I need to sandwich that
drive in between two cold plates?
So what makes SolutionUniq that we came up with is we're only contacting the SSD on a
single side of the drive, and that enables us to pack in more storage in that same footprint.
So if there was a dual-sided cold plate solution,
you're talking about having a four or five millimeter
cold plate on either side of the SSD,
and then that is just taking up space within the server.
So the way that we've designed it,
we need a single-side contact,
but we've optimized the thermal topology of the product to make sure that we're taking
full advantage of the cold plate contact. And we're sort of taking advantage of
that space that would normally be occupied by the 15 millimeter version
of the UNS, that thin structure, we're taking that space that was generally
occupied by that thin structure,
and we're replacing it with a cold plate.
So at the platform level,
I think about UNS 15 millimeter SSD,
we're just taking that same drive bay,
we're converting it from a 15 millimeter SSD air cooled
to a nine and a half millimeter SSD liquid cooled.
And so talking about not losing that,
not losing that sort of drive,
that dense number of drives in that platform
designed for a 15 millimeter SSD.
Yeah, that's one of the fun things with the E1S spec
is that there's like six different Z heights
for these things, ranging from no whoops take it all
to I think they go up to 25
right I think that was a popular one at Meta for some reason but really what we're talking about
is not changing the PCB you're you're really changing the way the heat gets transferred
in your heat sink then to the cold plate. Is there any change at all to the PCB
or is this still just a garden variety PS1010 Gen5 drive?
So we did do some internal optimization of our drive
to take full advantage of that cold plate connection.
And since we're limited on the heat transfer,
we're getting on the opposite, opposite the cold plate contact.
We did have to do some unique things within the SSD to make sure that we're thermally
balanced in our design.
So what does that do then for the drive in terms of performance?
Because we know that if the drive gets too hot it'll throttle to
go into self-preservation mode. We've done that sometimes by accident, sometimes
on purpose in our labs. I know you guys, I've been in your labs at Rancho Cordova.
I've seen the the thermal room there where you guys will often overheat things
for funsies to test those outer edges.
But in a design like this,
is your goal to maintain parity with air cooling
and the full 15 mil heat sink,
or is this better than air cooling with that big sink
when you look at what you can do with liquid directed chip?
So it's better in the sense that what we've done is we,
we have a solution here that as we scale the power,
we're able to maintain the no throttle condition
on the drive.
And so thinking forward, like I've been talking about,
as we think forward to maybe these higher power
Gen 6 products or ramping up higher power Gen 6, you know,
products or, you know, ramping up the power on a Gen 5 product, liquid cooling sort of
unlocks that next level that you'd be limited by, you know, the thermal performance and
so it's say like a 15 millimeter SSD. Well, this allows us to go beyond that, that general
power limit for 15 millimeter and it sort of unlocks
that next level of performance.
Like, hey, if we wanted to drive the performance
of our SSD up further, you know,
that gives us an opportunity in this same form factor
to explore opportunities where we could, you know,
increase that power performance to the drive
and not necessarily break things thermally.
Yeah, it's interesting because we did some work
around power states at the end of last year for OCP.
We were using your drives, actually the QLC drives
and looking at what happens to performance
if I'm really concerned about 20 or 30 or 40 drives
in a single box.
And instead of running at full power, which is what 25 watts,
yeah. Yeah. Okay. And if I trim that back, and I want to save
10 watts of drive 812 15 a drive, because that adds up
that that's, that's, it sounds incremental, but times 30 or 24
is a number in terms of what you can do from a power managing
power envelope but if I take it down my reads remain pretty good but my rights
will will eventually suffer as I as I trim the power back to that drive but
talking about gen 6 you go from 25 watts to what's the target there 35 or something or is it higher?
You know I don't I don't know exactly what the target power is off the top of my head but I do
know it's higher than Gen 5 though. Yeah yeah and I do know that you know as an industry you know
like I think that there is some opportunities
where we talk about direct attached storage,
how do we drive up that power envelope
for specific applications.
And so I think that talking about higher power,
maybe the 30, 35, 40 watt,
I think that there is some limitations
with that UNS connector specifically.
And so I think that that obviously would come
into the equation as you think about
these higher powered solutions.
Yeah, I mean, the storage world's so wild right now.
We went from this U.2 form factor
that feels like it's been around forever.
It's not that old when
you compare it to SATA and SAS of course, but my goodness now with all of the
the form factors in the EDSFF world it's wild from the early Intel
ruler days to the short, the long, the E3, the double thickness E3. I mean
it's pretty crazy.
And I know you guys can build anything,
but the trouble with all these form factors is,
I suspect, is that your customers are all asking
for different things, which could be challenging.
But as you hone in on this liquid cooling thing,
I mean, it's really around these AI systems
that are taking the lead there.
So you talked a little bit about what you had to do with the drive.
What's the IP specifically?
Or did you guys put patent or something around this?
Are you open sourcing this technology?
What's the solid I'm goal with with this demo. Uhm, so the solid is goal.
I believe at a high level and you know,
being the technical guy that I am,
I don't get too much in the weeds as far as.
The IP and such is concerned,
but our goal is that.
We would help to enable you know
several OEMs and ODIMs with this technology to help sort of
with their liquid cooling adoption.
And so what we see is having a reference design
and these companies come to us,
we have our liquid cooling reference design,
we have some IP around that,
and then we're partnering together with them
to enable things at their platform
level.
So, you know, I see it more as a partnership here.
We have this technology.
We want to partner with you to see what we and, you know, sell you our drive to see what
we can do as far as enabling these liquid cooling solutions, you know, in their various
platforms. Well, the march towards liquid cooling is demanding all components to be aware of this.
We saw it, I was at Data Center World in DC recently, and there were more CDU vendors
there than I knew existed.
So we did a podcast with Cool IT when we were there, and it did not dawn on me at the time
that I should have probed into the cold plates
and storage there, but they can make whatever
the server guys want.
All it is is two more runs off a manifold
that's already in the system to run it to a cold plate.
And then if you guys at SolidIME are making SSDs
that are tuned for cold plates,
then it's a nice little marriage to put those together.
I suspect that the rest of your competitors will have to do something there too as they
think about what are the needs for a liquid-cooled system where I'm only going to get the cooling
on one side.
So, like what you were talking about, make the side that's only in contact with the case
has got to be a little more efficient
so that can wrap around
and then eventually get to the cold plate.
But as we saw too with the immersion,
I know immersion's not as popular in the United States
as it is elsewhere,
but we've seen some massive deployments.
Doug, for instance, they've got 440 plus immersion tanks
in their Houston data center, which is totally wild.
And as we talk to these guys, it started out with
just taking a server off the shelf, dunking it in.
What do you learn?
You learn the cables
get brittle and crack. You learn all the labels fall off and sink to the bottom and start to
corrupt your fluid, which is interesting. And you learn a bunch of other things about the
thermodynamics of how that oil moves through the system, what you need to remove, what you need to
enhance to make that happen.
And I think when we were there with Alan, he was looking at it from an engineer's perspective, thinking, My gosh, okay, so
if we're going to dump these things in oil, we can kind of rewrite the, we could theoretically rewrite the rules on how we treat flash as part of that.
I mean, it doesn't have to be delivered the same way as we deliver it today.
I mean, there's just so much going on there as a thermal guy.
It must be exciting on your side of the house to, it sounds more exciting than signal integrity
to me, but I don't know what does it look like in your labs?
Yeah. me, but I don't know what's what does it look like in your labs? Yeah, so I mean, I think it's really, it's an exciting time to
be a thermal engineer for sure. I think that, you know, for the
longest time, you kind of saw the market, you know, air cool,
how do we optimize air cool, and then I just feel like there's
been this jump in that I think it's been about the last three
years here, we've just seen this huge acceleration, in terms of
direct liquid cooling
and immersion. And it's been fascinating to explore these different details and different
trade-offs of different types of cooling technology. But at Solidigm, we want to make sure that
our products can fit into all these different spaces. We're optimizing for air cooling,
for immersion cooling, for direct liquid cooling.
We wanna partner with customers,
figure out their specific needs and what we can engineer
or how we can engineer our solutions
to ensure that our drives are performing
in these very strenuous environments.
I saw that deployment that Doug had there,
I think it was like in a parking lot, right?
Like a container in a parking lot.
It wasn't just like a Popeyes,
it was like next to a data center.
You make it sit like a gene showed up.
Yeah, so I mean, I remember Alan mentioning,
yeah, Alan mentioning something like that.
Yeah, I'm just like going to a parking lot
in the middle of some residential area and I was like, all right. So, you know, we think about these different
applications and us as a lot. And I want to make sure that where we are, you know, we got our hands
in these different markets to see, you know, how do we help enable our customers? And, you know,
talking about our lives specifically, we've definitely looked into these different cooling
technologies, you know, from immersion to direct liquid cooling to air cooling.
And so we're definitely exploring, you know, what opportunities are there and
how do we tune our drives to make sure that we're meeting our customers needs.
Yeah.
I mean, yes.
And it makes a lot of sense.
One of the things that we've seen and heard anecdotally from organizations that have adopted Liquid
at a high level, either Direct-to-Chip or Immersion,
specifically with Immersion,
I've got a little more data on it,
but the failure rates seem to go down tremendously.
And I know Flash already has a pretty low AFR, you know, across the
industry, not just you, but if we look at all NAND, pretty low, especially compared
to hard drives, which, you know, were higher, more moving pieces, obviously, and
complexity from an engineering standpoint, those designs. But what do you
see, or what do you expect to see?
Because it may be a little too early for your lab.
Do you think that as you go to more liquid cooling
of any variety with flash,
do you think there'll be a positive side effect
on SSD AFRs going down because of the better cooling?
AFR is going down because of the better cooling?
I'm not really sure, so I can't really comment on that.
I, yeah, that's not really something that I've, that I personally have looked into much.
So yeah, I'm not certain.
Well, I just think if we make the logical connection
that better cooling drives state cooler,
one less thing because heat is the enemy, an enemy of an SSD. So certainly performance should
benefit. I think the AFR should benefit. I guess we'll see as more of these get out there.
Yeah, and I think one thing that's powerful about liquid cooling is you're able to be a little bit more
selective in what you do as a thermal engineer
at the platform level.
So holding drives at more of a steady temperature,
understanding flow and pressure drop at a system level
to where you can peel off a certain liter per minute to keep your drive
at a certain temperature range.
So I think that some of those variables
that existed with air cooling don't exist as much.
It means that there's still plenty of variables
at a system level you need to consider.
But I think that there is a opportunity there
to sort of tune the platform in a way to keep your SSDs
at a performance and a temperature range
that would allow maybe for some better life
like you're suggesting there.
But yeah, I could definitely see that
you have a little bit more knobs to turn
where you're not necessarily hurting a downstream CPU or GPU
because you're increasing the Fendensity
or something on your SSD.
Yeah, for sure.
And of course, when we take the fans out of these systems,
tremendous power savings there, upwards of 25, 30%,
based on the numbers we're seeing,
and they get a lot quieter too.
So these AI data centers, fully liquid cooled,
it's very bizarre when you go in and hear the whooshing
and whirring more than the screeching of the fans.
So go back to GTC.
You guys were showing off this demo.
You had four or five drives set up in a little E1S backplane
with a liquid loop going.
This is obviously not production.
This is proof of concept.
As you work through these logistics,
what sort of feedback were you getting at the event
or afterwards?
Because I imagine most of the people coming by were like,
what is this?
This seems bizarre.
Other than disbelief, were there other takeaways
that you either expected or didn't?
Yeah, I didn't think it was funny.
A lot of people would stop by and were like,
what is going on?
And so a lot of it was just kind of explaining
what was happening in general.
And for our demo specifically, like you said,
we had U1S, we were demonstrating the hot swap ability.
We were demonstrating our drives.
They were running Gen 5 speeds.
We were able to hot swap, still maintain,
you know, that key feature there.
And then we were able to hold the SSDs
at a consistent temperature below throttle.
So we were kind of focused on that part of it.
And so sort of walking people through that,
you know, kind of showing off our technology
to them was fun, something I hadn't experienced before.
And so that was neat.
I'd say that overall the reaction,
like afterwards, some of the things that I didn't expect,
there was quite a few, you know,
articles and things of people popping up around this.
I didn't really expect that as much,
but it was neat to kind of see that. I think we've gotten the industry thinking a little bit,
you know? And we've definitely, I think people are starting to scratch their heads a little bit,
like maybe there is something here that we can, drive towards 100% liquid cooling and maybe there is some
benefits at a data center level that, you know, haven't quite considered to this point
because we haven't seen this sort of technology.
And so some of those reactions, I guess, I don't want to ramble, but...
No, I think it's fun and you you've encapsulated it
The what the heck are you crazy guys doing?
I think would be the number one and then as you said as it soaks in there
Okay, make sense and I think I
I
Don't want to put you in a position to speculate so I I shall. On the GB300, as we look at these platforms,
everyone we've seen is all liquid-cooled,
and so most of them are missing the drive bank,
if you look closely, but I think it's a foregone conclusion
that liquid cooling will be in the next generation
of high-density from Nvidia, at least.
And so they will drive this. The industry will have to respond, you and all of your competitors.
And the fact that you're there and showing it now, I think is fun. And maybe a little out of
character for Solidim in terms of waiting till something's shipping
or right around the corner to start to show things.
I think it's fun to show the new technology.
So on that front, we wrote about it, as you know,
and put up a lot of photos,
and we'll link to that in the description of this podcast.
So if you guys are curious and wanna learn more about it,
you can't buy it yet, but you can read more about it,
see these drives, and see the demo that Cody and
his team put together. But yeah Cody, I think it's pretty cool. I'm glad they let
you out of the thermal lab to chat with us for a little bit and your perspective
is is is unique and and you're well equipped to talk about this stuff. So I
appreciate that. thank you.
Yeah, and I really appreciate the opportunity, thank you.
Yeah.