Storage Developer Conference - #201: Towards large-scale deployments with Zoned Namespace SSDs
Episode Date: February 13, 2024...
Transcript
Discussion (0)
Hello, this is Bill Martin, SNEA Technical Council Co-Chair.
Welcome to the SDC Podcast.
Every week, the SDC Podcast presents important technical topics to the storage developer community.
Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developers Conference.
The link to the slides is available in the show notes at snea.org slash podcasts.
You are listening to SDC Podcast Episode 201.
All right, so I'm going to talk about these large-scale deployments,
so namespaces and where we kind of are today.
And kind of before we kind Before, with all the ecosystem
work that we've been doing, I want to set the
scene for why we've
been spending so much time building
ecosystems and all these
GNSSDs and so on.
One of the challenges that hyperscalers
and cloud service providers have is that
they have this, obviously, constantly
challenged with large
volumes of data that's reaching customer demand for cost-efficient storage and wants as cheap as
possible. And while they want to have that, they also want to have it high performance.
So the metrics they usually look at is kind of like IOPS, terabytes, throughput, latency,
quality of service, and so on. And the TCO impacts of all of that, just having it deployed in the
data center. And one of the cool parts, one of the TCO impacts of all of that, just having it deployed in the data center.
And one of the cool parts,
one of the all interesting parts
is kind of lifetimes and ride-rides per day,
which is, I mean, depending on where you are,
but roughly like one ride-ride per day
is kind of what some of them are looking for.
And that's, I mean,
and that's kind of like looking at a five-year horizon.
And some of these hyperscalers,
they're moving to a kind of extent extend the lifetime of the hardware in the fleet
from like five years to seven years or more if they could.
And so it is really important as we're kind of moving to QLC and PLC NAND,
which have lower drive rates per day,
to actually be able to have technologies that can extend that
such that they can have a longer time in the fleet and make use of it and obviously they have a lot of good reasons for doing this like carbon
neutrality emissions and all that and and kind of what they're seeing so far is that i mean if they
can get away with it they will actually go on to go to 10 years and onwards and so that's that's
really kind of like very interesting and kind of of fits into this where we have climate and all that these days.
And so conventional SSDs, they're really not able to achieve this from their point of view.
It's kind of like this typical lifetime of three to five years.
They want seven years or more.
And there are different ways to achieve that.
Either you use TLC memory, you pay more for it,
or you do like, and so that's one way,
or you go with QLC, but that has low dry rights per day.
And so they need kind of a solution
that kind of eliminates what is this issue
that SDS has with this writing verification,
and they use that to kind of fulfill
and increase the dry writes per day,
but also kind of to improve the performance overall.
So CNS kind of with some namespaces
is one way to kind of solve these challenges.
And down here at the bottom,
I've shown kind of like conventional SDs,
and we have high cost for TLC,
relatively high cost, right?
It's still what it is.
And then you have QLC, which is getting better,
has better cost structure.
It has a typical dry write per day.
There are some that are better,
like some vendors that have slightly higher dry writes per day,
but I haven't seen a QLC-derived conventional that has more than one dry write per day.
And then when you apply its own namespaces support to an SSD,
TLC becomes like very, very efficient.
I mean, you could do more two and a half drive writes per day with zero OP.
You can achieve the same kind of with a conventional SSD if you do like 28% OP.
You can get to kind of those drive writes per day,
but you have to pay like roughly 30% more for your media.
And then there's the QLC, which is kind of this balanced approach where we see kind of
roughly a drive ride per day.
That's kind of in that ballpark.
So that's one way to look at it.
You can kind of have these kind of different ways to look at the media.
Another way to look at it is kind of like performance is expensive, just in general.
So we have one use
case that Facebook has described, which is this cache-lib caching engine, where in the paper they
have wrote a couple of years ago on the cache-lib engine, one of the things they did there was kind
of like, hey, we want to have as low write-amp as possible on the drive. And to achieve that,
like 1.1 and 1.4 of these workloads they had,
they kind of just over-visioned 100%,
kind of like, so you only use half the capacity of the SSD.
So if we kind of extrapolate that
to get that performance for an eventual SSD
versus an SSD with CNS, we kind of see that goes from like, I mean, you need
probably twice the capacity
or if you use CNS, you need
half the capacity. And then what is the cost there?
And especially
on SSDs, primarily the cost
is the NAND media.
There is other costs as controller,
DRAM, and so on. But as you kind of have more
and more media attached to one controller,
the dominating factor becomes the
media itself, which means
for, in Facebook's case,
they're kind of overpaying for storage by
2x for this
particular workload when they want to have these low
write-ins.
All right, so
what's kind of like, so we'll do
a little bit about what we do,
like what SSDs with some namespace supports has and what they can do.
So essentially, they eliminate SSDs write amplification.
I kind of like the dark garbage-based write amplification.
There are still media reliability things that are going on within the SSD,
but it eliminates one of the primary sources,
the primary source of write identification within the SSD.
And that happens by
kind of fixing this mismatch
that is between the storage
interface that is
presented to the host, which is the
random write interface, and then you have to
like, when it goes into the SSD,
kind of like, kind of matching
that up, so you can kind of collaborate on the data
placement. So one part here is kind of like, one part is the throughput, which is kind of like you
hammer the drive just for drives, and as soon as it grabs case in, you draw a conventional
SSD, but with 7% OP, it drops like 3X.
And if you have like 28% OP, it's like around 2X.
So that's kind of like established there's kind of ways to do it.
So this kind of like matches, okay, so, but if you have a CNS drive so it's kind of like established there's kind of ways to do it so this kind of like matches okay so but if you have a cns drive it's like just stable there's not much to it
um another way to kind of look at it from the latency point of view and there is you kind of
like on the on the x-axis kind of like you're adding writes to it like 200 megabytes a second
400 megabytes per second and it kind of increases increases, and the read latency, like 4K,
when read latency increases as well,
but it's kind of very stable, linear,
whereas if you look at conventional SSDs,
because the GC is going on and getting activated,
you can see, like, the read latency kind of bumps up,
and as soon as you kind of hit the max,
let's say for the 7% OP drive,
that hits around, for this particular case,
400 megabytes, 360 megabytes per second.
While this is the drive, the media configuration has there
can do like 1.1 gigabytes per second write,
only it's actually doing that internally in the SSD,
but it's actually the host can only write the 360 of them.
The rest of it is going off to GC
and therefore there is all these activities
going on within the drive
and that's what we kind of want to eliminate.
All right.
So that's kind of like the benefits of C-list.
And this, you can then easily look at this
from throughput or from latency
or you can see it drive life per day,
like lifetime, you can kind of cut it
in which way you kind of want
and go from there.
All right.
So how did we get here?
So many of you in this room has been part of building this kind of ecosystem.
A lot of you have been involved in the CNS centralization, which we started off together
back in 18 and then worked together for a year and a half or so
to kind of get to a complete specification.
So this was a lot of work with a lot of you involved,
which were very exciting.
And also very, I mean, it was one of where we kind of added a new specification,
like a new specification document to NVMe.
So that was a big amount of work we had to do there.
So that we finished up in June 2020.
And then it's kind of,
then there's kind of like there was,
so that was kind of the specification.
We had something to work off.
Now we needed to have support for it
because while the numbers looks great for ZNS,
you do need software support for it to be used.
So there was also the other body of work. And for that, there was a lot of support added into the software ecosystem around
the same time that the spec was released. So this was all the general support into the Linux kernel.
So in the Linux ecosystem here specifically, I'm talking about like the Linux kernel kind of
enabling it natively. And that was
added in straight after.
And then we improved on it.
And then there was SPDK support
was added a little bit later.
If you have building something
with SPDK in my old class array or something like that,
you could use SPDK and
have library functions
to work with the ZNS files.
Also, right after the spec was released,
vendors started to come out,
hey, we have announcing our offerings with SDs
that has CNS support.
So all that's been evolving since 2020,
and things are, we are getting to the second generation
of CNS drives that we see in the market, which is really exciting. So one of the things
that we have kind of been involved in, kind of like the applications and so on,
and kind of enabling that ecosystem and showing off, hey, there's all these ways,
how do I use it, how do I deploy it, and so on.
Alright, so just a quick primer on, I've said CNS and so
in main spaces many times now,
and what is it actually?
So it's a, very briefly, it's kind of an NVMe namespace,
which has this abstraction of zones,
which is like logical blocks within these,
these are boxed into fixed-sized zones,
and then those are communicated to the host,
and then the host can do that to do data placement.
So it places this data into it.
There's one key here.
It's kind of like, I mean, so an NVMe SSD, it's a namespace.
So you, in an SSD, you can have, like, a zone namespace
with half this kind of zone abstractions.
You can also have the same SSD being a conventional SSD,
a conventional namespace.
So the SSD can be, like, yeah, a normal SSD.
So benchmarks that we're doing, we're kind of showcasing that
where we have a conventional drive
and we have a drive with some namespace
support. It's the same drive, it's just that
now it's a conventional namespace
that we're using and now it's a sole namespace.
So that makes it so that we can do a really good
comparison between the
actual, I mean, what is the benefits
that we get from it? Because we can do an Apple-Apple
comparison, which is something that I'm, I mean, academically I'm very that we get from it? Because we can do an Apple-Apple comparison,
which is something that I'm, I mean,
academically I'm very excited about that we could do that because that kind of like took away all these kind of noise
that usually can kind of impact the results.
Here it's kind of like same hardware, same everything.
The firmware kind of difference,
the data path within the firmware is very similar.
That was something that was really cool.
Another part which is interesting about this interface
is that it mimics some of the work that's been for this,
like, SAC and CBC models for host-managed SMR drives.
And especially Damien Lemol's team has, for many years,
kind of built up a robust ecosystem for that.
So we didn't start from scratch with DNS
because we aligned to that model.
So there was like previous years of work
which has been done to kind of like enable all of this.
So we didn't start from nothing.
We started from something that was already working
and deployed in the field by end users and so on.
And we were kind of like adding in
and kind of filling in the blanks.
So here's kind of an overview of the ecosystem.
So the development has been going on since 16,
and even before that.
So 16 is the year where there was general zone support,
generic zone support in the name screen.
Before that, it was kind of like pass-through commands
and all that.
But this is kind of where things became really stable.
So that's been there since 2017.
But this is a 10-year effort
we're getting to now, which is usually the time
it takes to build something
with file storage these days.
There is support across
all the different major distributions
like Red Hat,
SendOS, Tor, Debian, Ubuntu, you name it.
I mean, all of them have support
for Stone Storage. So if you have a Stone I mean, all of them have support for Stone Stories.
So if you have a Stone Stories drive,
and supports this specific device model,
I'm going to get back to that.
And you plug it in, and it'll just work.
And there will be tooling and everything.
Another part that's coming through
is that we have local file system support.
So initially, it was just F2FS,
which were targeted mobile systems like
phones and tablets and so on. And then the latest here recently, we've added Damien Mossium has
added in support to BorderFS, which means like there's an enterprise file systems, which now
work with both CNS and host management system water drives.
So that's kind of, I'm going to get back to why that's really, really exciting,
because it eliminates a lot of these issues that are challenges that have been around CNS.
Well, how do we use it?
Well, if you have a local file system that's generally available, you just put it in,
and then on top, it's a general purpose.
I mean, it's a normal file API.
You can do random writes.
You can do what you want.
Then there are storage systems,
like the Ceph distributed storage system.
There's something,
and then when we move into the cloud,
OpenEBS, MyOstore,
there's something called SPDK's cloud storage acceleration layer in SPDK,
like post-site FTL,
which is good for all flash arrays.
There's like a bunch of libraries,
like libcbd, libnvme, SPDK,
FIO, Coemo, like libcbd, libnvme, spdk, fio, coemo, blocks on blocktest, and many, many others.
So it's kind of like there's general support and so on.
And that's kind of our focus, the teams that we're working with, is kind of end-to-end application enhancements.
And that's kind of like cloud orchestration and databases,
and databases, and caching, but I wrote it twice there.
And so one of the cool things here is that this is kind of used in production at cloud service providers today at scale,
especially driven by the SMR ecosystem.
So this is already kind of daily used
and used in production by millions of drives today.
All right.
So that's something I think is very proud
of where we got to here after these 10 years.
I mean, there's been a lot of people involved
and it's been kind of like a huge effort,
yeah, over the last decade to get to this.
So this is really kind of has been a lot of people, yeah.
20 people, so there's been full-time on it for this time.
So it's really amazing.
So I want to talk about like how customers kind of like end users
kind of deploy these when deploy use of this storage.
And we see kind of two major ways to do it.
I mean, when you deploy SSDs,
one is kind of used through storage array.
Another one is kind of through local storage.
So with storage arrays, I'm thinking, I mean,
there are DIY solutions,
where there are some old-fashioned vendors
who have their own solutions.
Another way is to use, like, these off-the-shelf
ZNS drives and all that,3MSTs and so on.
And that's one way.
And then you're exposed.
I mean, you run like an FTL within this box,
and you're exposed like in common source at the other end.
Another way is kind of like local storage
where you can do any application
using this local file system that I talked about, support.
So yeah, I mean, application doesn't know, are aware of it.
And then there's one where we kind of gone, went all in.
Okay, let's go to build and make it just to make it,
like make it as fast as we can.
So for storage arrays, I mean, so this is, yeah,
the old Flask array.
I mean, you commonly expose this through like
NEMO fabrics, NFS, Samba, so on.
And the storage box runs some software that kind of terminates it and kind of like build,
I mean, do the translation,
and then you expose conventional storage to the end users.
It's like high-performance storage system for,
I mean, you use it for AI, ML, streaming,
and databases and so on.
And one of the goals when going towards this kind of solution
is to kind of like the dream is to replace some HTTP workloads with QLC SSDs.
And kind of having this where you have the write-writes per day guaranteed to be more than one, you're actually kind of getting closer towards having them be able to do that.
So one case here is Alibaba.
They have had a project going together with Intel and Al-Solid time to see that. So one case here is Alibaba. They have had a project going together with Intel and
us all the time, sleep people, and where they replace kind of hard drives with QLC SSDs in
their third generation big data platform. Whereas that old one where they didn't have the technology,
where they didn't use QLC, they kind of like, they were kind of like twice as fast and had more
density and so on. There's a lot more to it, but that's kind of the gist of it. So that's kind of like, they were kind of like, yeah, twice as fast and had more density and so on.
There's a lot more to it, but that's kind of the gist of it.
So that's kind of one way that we're seeing people
are deploying this kind of storage.
Another one is this kind of, yeah, okay, yeah.
And so there is kind of like, we've been doing it
kind of like you've been doing your own proprietary solution.
Well, so we might, so we've kind of been,
I mean, especially Solida, I we kind of been, I mean,
especially Solidigm together with Alibaba has been working on this C-cell part
with the Cloud Storage Acceleration Layer
inside of SPDK, which, yeah, is this translation layer.
And it has this right-shaping tier
and it scales well with QLC and so on.
And one of the work that they're doing,
Solidigm, together with their partners,
is to kind of like have a reference platform,
reference implementation,
and everything just you use it out of the box,
which is really, really cool.
So here it's kind of like, yeah,
that's kind of like one way to deploy,
where you're kind of like, I mean,
where Solidigm has like released this image
and it's really, I mean, just deploy it
and then you have your solution.
It's very easy to use.
All right.
Then there's kind of like the local storage
and file system with some storage support approach.
And then this is kind of for users, which, I mean,
they don't need like full speed benefits of some storage.
I mean, they actually just want to, they have it.
And maybe they have some applications that doesn't need it
or just haven't been optimized yet.
And typically, always, when you use these storage,
you use it as part of a file system that's there.
So you're anyway going to put something on.
Might as well put on a file system which has some sort of support.
So two of those is F2FS and ButterFS, which now has been enabled.
ButterFS specifically has been since 5.12.
Initially, I was SMRs, hard drives were stable.
In the latest release of the kernel, we are now stable on CMS as well.
And then Damien's team that's been doing this is really cool.
One of the learnings we had from that was that, which was kind of like this integration,
when we looked at it, it's kind of like, this integration, when we looked at it,
it's kind of like, so we compared this conventional drive
with its own drive, and then just used the file system
with the support, but we kind of did the benchmarking.
I mean, we got a little bit.
You got, I mean, you weren't like, same, you weren't slower.
You were like five, 10% better when you used a drive
that has DNS support.
I mean, and the interface up to the applications,
that's just a file, POSIX API.
I mean, you don't know that the stone store is underneath.
You don't care anymore.
But since you're just using a DNS drive underneath,
you get this 5, 10% extra throughput.
Which is, it's okay.
But where it really shines is kind of on the latencies,
where it's kind of like when you get to the four nines
and so on, it's kind of like,
you really see the conventional drive
like fall off a cliff,
and where it's much more stable when the CNS drive.
And so, additionally, you also get the 7%, 28% OP back.
I mean, so CNS drives, you can just utilize the whole drive.
There's no slowdown, it just works.
So you also have the extra capacity.
Another thing is that it works natively
with these hint-based placement approaches like streams.
It just works.
So any improvements that's done there,
it kind of just natively works with CNS as well,
with this drive there.
And there's, again, no software requirements. There's nothing, it just works. just natively works with CNS as well, with this drive server. And there's like, again, no software
requirements. There's nothing. It just
works. You don't have to do anything.
I mean, these days, you plug it in,
format it, MKFS, and then
you point it to the drives, and yeah, that's fine.
And then you put on your applications. You don't care
anymore that it's on storage. It just works.
All right.
But sometimes you just want to go
all in.
And that's where we have this end-to-end application integration.
That's like IO intensive applications.
That's where you're like banging the drive all day,
like at high right throughput and so on.
And it's kind of like these large-scale storage systems where you have like exabytes where it really matters
that you're using your storage efficiently.
So for that that we are
kind of like working on different applications so we we've done many and there's indigenous
and community has done many i mean but the main ones is like my sequel by dancing on tirac
db there's rocks db there's native support in upstream support that's something called longhorn
kubernetes land uh there's sephh, there's OpenUPS,
which is also kind of over in Kubernetes land.
And there's all the distributions, of course.
So I guess when it goes through,
so we have all these kind of integrations,
and I just want to go through the ones
that we have here in bulk
and talk a little bit about them.
So that's MySQL, CacheLib,
and it is Ceph and these cloud integrations.
All right, so for Pocono MySQL, so this is a collaboration between WD and Pocono,
where we have worked with them on taking the MyRoc storage backend
and then Pocono MySQL using that, where we, at a hands, just in the room,
he developed this ZenFS storage backend,
which plugs into RocksDB, made upstream today,
which kind of allows RocksDB to run natively
on top of drives that have some storage.
And today, the work is such that, I mean,
you take the public container image
that wears Piscona, like my SQL,
like just like normally when you have a container,
you say, hey, I need one MySQL.
That image has support for some stories.
I mean, if it sees a certain story,
it just deploys it and works with it.
So here it's kind of like, I mean,
we get, I mean, 80% higher throughput
just when using it,
and there's like lower latency,
tail latency, and so on.
So this is kind of the same story.
You get really high insert performance,
I mean, because now the write's really efficient.
And you also get a little bit of improvement
on the read side as well.
One of the cool parts here is, I mean, it's very subtle.
But so normally when you use MySQL,
you use a source backend called InnoDB.
And normally when you use MyRocks,
which is optimized for space amplification,
normally it is slower on the read path
because of when you couple it together with CNS
and these optimizations,
you're getting into the realm of where the InnoDB
and the MyRocks implementation
have roughly the same performance.
So while typically you're kind of like,
either if I read heavy workloads on database workloads, I'll go use InnoDB.
And if I write a write efficient workload and write heavy workload, I go use MyRocks.
With this kind of the drive integrated and all that, we can actually get to a point where you could just go ahead with MyRocks and work with that.
So that's really exciting.
I kind of like, oh, that's something where, obviously, MyRocks doesn't have the same kind
of functionalities as InnoDP.
So it's never black and white.
But it's kind of, it's one of the interesting things we saw.
Then we've been looking at Caslib.
There's a lot of more work to this.
I'm only showing a little bit.
There's paper on the review,
so I'm, but I'm just gonna take a very small part of it of this
work that's gone into it. So, CacheLib is kind of this general purpose open source
caching engine from Facebook. And it has, like, they use it for many of the different, like,
what's called, like, storage services that they have within Facebook. And it's the cool part is
it can both use DRAM and it can use Flash.
So we did one of the benchmarks we had done is kind of like KVCast with this five-day trace
from Meta that they provide. And here the key part is we're running the drive at full capacity.
And that even means we're also using the OP, the 7% or the 28% OP kind of thing.
I mean, so this particular thing is 7%, 107%.
We're using the whole drive.
So if you do this in a normal drive, it performs terribly.
That's where Facebook kind of goes to,
hey, let's only use half.
But we're just using it all in this particular case.
And then the key is, of course, I mean,
it's a WAF of one as usual.
I don't want to get too much into it, but there are some tricks that's being done in this particular case. And then the key is of course, I mean, it's a waff of one as usual.
I don't want to get too much into it, but there are some tricks that's being done
and where you do more reads,
but you get very efficient,
kind of like the write performance on it.
So what we see is like the 3x write throughput on it,
on this particular benchmark, and we see 2x read.
And on the latency point, we see like 2x, 2 to 10x,
kind of like depending on how many nines we have,
kind of improvement.
But it's a major difference and kind of like shows,
like, hey, you can go ahead and do this.
That's really exciting.
And I mean, it's just plug and play.
You plug it in and it works.
So there's not much to it.
This work is not upstream yet,
but it's kind of like to show what's possible because the paper is not out yet.
We still are waiting.
Then another one is kind of Seth Crimson.
This is, if you're building HPC,
if you're building kind of like really massive storage systems
with hard drives, for example,
you turn into Google Blizzard.
I looked up kind of like how much,
they have this telemetry thing that's like 1.1 exabyte deployed,
and that's the one that opt-ins talk about it.
So it's quite a used file system these days.
And one of the things that's been working on is kind of in Crimson,
which is the next release of Ceph,
there's like native support for zone storage.
So it doesn't mean it goes postmanaged, it's more drives than CNS drives,
it just works.
So that's all, yeah.
And when you use Ceph, it exposes conventional storage.
You don't know you have someone's storage.
It just doesn't care.
It just works.
So here you get, I mean, you still have conventional workloads.
You're on top, but you still get 30% extra throughput.
All is good.
So that's another one.
And all this is kind of natively supported, so you don't need to care about it.
It just works.
Then the last one is kind of natively supported, so you don't need to care about it. It just works. Then the last one is kind of cloud integration.
So many times when you do deployments,
like application developers,
they use these cloud orchestration platforms.
And we're getting into the states where initially it was stateless,
all these containers.
Now we're getting into making them stateful.
And for that part, we are seeing kind of like,
so you want to expose the storage into these containers
so they can use it.
And here we kind of integrated with Longhorn
and OpenEPS and C-Cell.
And kind of like what we've shown here on the right
is kind of like you have, yeah,
where we have it by, where it's backed by blocks.
So in stores like there's the Longhorn
and there's the SPDK, C-Cell work.
There's also at MyoStore,
where we actually expose zone drive,
just for those where we want to go all end-to-end fast,
and kind of show it, hey, and then we run a workload.
And this is all within a container
where everything is like wired in,
and it also kind of like, so, yeah.
And, I mean, the container,
the application that runs for them
doesn't have to care that it's ZoneStore,
it doesn't matter, it just works.
All that is taken care of somewhere else.
Yes, and Hans and Dennis in the talk later today
will talk more about the details and all of this.
I'm not going to talk too much about it.
Yeah, all right.
So it's kind of like,
so we look at the ecosystem where it is today
compared to in 2020.
So we have, like So a lot of this support for its own namespace command set has been announced
or added into products across a broad set of vendors.
I think there's solid support in the Linux software ecosystem.
It's been helped a lot by the existing foundation for host-managed SMR SSEs.
But recently, these improvements that's been done to the local file system,
it's extremely exciting and relational key value database systems
because that's some of the really IO-heavy workloads.
So, we have all these vendors that are building both SSEs,
but also validation and so on.
They also have been active.
One of the key parts that we learned
after the release was that
it shouldn't be easy.
I mean, we, vendors
implemented different versions
of CNS, which
caused not
always being supporting the work
that was in the Linux kernel
or just like didn't or customers were kind of getting different drives where they couldn't
quite work it so they didn't know what they were getting.
So one of the cool parts here has been here that's done work within SNEA has kind of been
to standardize kind of common device models such that we can all kind of rally behind
the same storage devices that have the same properties such that when we build a software
stack, it works together.
And then second is kind of like using it,
making it easy to use,
such that you don't need to care about the zone storage,
that you just use it and put your applications on top.
And that's something I think with the local file systems
and these cloud solutions that's now supported natively,
I mean, that's something where it really helps there.
All right.
So on this Neo model,
so this is a zone storage technical work group.
It's been going on for quite a while.
And one of the things that we got to there is kind of defining these common requirements for a zone storage device.
And it's really, really interesting in that kind of like so when we have that where we kind of say this is the properties, I'm going to get into kind of a little so when we have that, where we kind of say, this is the properties,
I'm going to get into kind of a little bit about what we define here. When you do that,
you kind of like, we get the multi-sourcing. So end users can kind of source from multiple vendors. They can say, this is the model I want. And then they can get in the same realm.
But it also has this common software requirements. So you have like something that's just generally,
you know, what you build for. So specifically, that's kind of like, so some of the things here that I'm really proud of, what's standardized.
I mean, there is the model where it's high performance and high capacity.
There's the use case described.
That's great.
And I think that was good that I was added in.
It gave a lot of clarity.
But what I'm really excited about is this common requirement of a zone storage device,
which kind of defined who managed reliability. Is it the host? Is it the drive? When we defined,
when I was part of the, we defined the zone storage specification, zone namespace specification,
command-set specification, I thought rare-leveling was also always on the drive.
The reliability was in the drive.
And I didn't think that anyone would think
that it would be in the host.
It didn't cross my mind back then,
but it kind of turned out later,
hey, we really like to put things
that have the reliability kind of,
the host taking care of that on the drive.
And it's kind of, ah, and that changes everything.
It makes it much, it's more difficult for the software.
There's a different software stack.
So kind of saying, it's on the drive.
Drive manage it.
Great data doesn't go away.
If the host does nothing, data stays there.
There's something about the zone.
Like when you write into it, the capacity is static and fixed.
That's also important.
There's something called zone active excursions.
So that's very technical.
And when you program to media,
sometimes you get a program failure
and you have to opt out
and then you can't write to that flash block anymore.
It's important that the SSD, when that happens,
it just says, okay, I'll fix that.
I mean, you're on the host.
You don't need to care about that.
You can just keep writing.
So that's kind of like, so that's being kind of defined.
And then there's like things around end of live, read-only mode, and so on.
So this is really exciting work that's kind of been, yeah, was released here back in July.
So it's very recent and very exciting kind of work.
Another one we kind of see is towards some storage for embedded devices.
So this is
I personally haven't been much
involved in it, but it's kind of Google has been
pushing zone storage support into mobile.
So this is kind of within JDAG.
So this has been driven by Google
and some of their partners. As you can see, there's like
SK Hynix there. So they've been working closely
with them. So they're a known one.
And it's kind of
like the use case they have in mind here is the
Android hardware ecosystem. So this is tablets, Google Pixel, like phones and so on. And there's
kind of like, there's a roadmap for kind of like getting into that. There's this word from Google
discusses roadmap back at FMS. So this is kind of like going together. The specification was completed back in July.
And it's kind of like, so this is kind of planned
to go into next-generation mobile platforms.
And one of the cool parts is that while for,
when we did CNS, we kind of had to start it a little bit,
and get everything, file system support,
for some UFS, the main Android file system for doing this is F2FS,
which were initially developed by Samsung and then went all the way,
and now it's Google as the main developer is there.
So that work started out, and now it's very stable today.
And the zone support in it is very stable.
I mean, it's been tested over a long time now.
So when the vendors, the media, like the UFS vendors,
kind of delivers these drives, I mean,
BART used to just take the drive, put it in the F2FS file system,
put it together, and ship it.
So where we had like a long lead time for ZNS to kind of get into,
we know ByteDance has probably talked about when they deployed as a production.
We kind of had a long lead time for the stoned UFS.
I believe this lead time is much shorter
because the ecosystem is already there.
It's working and so on.
So that's very exciting.
And something I'm very, very look forward to happening.
All right, we're getting to the last slide.
So kind of with some sorts, I mean,
with some namespaces, it kind of enables hybrid scalers and these CSPs to meet this increasing
customer demand. It helps them go from just five years to seven years and even beyond that.
And I feel the ecosystem, the software ecosystem, is very mature at this point. We have, like,
for storage array, you have turnkey solutions.
I'm thinking of C-Cell here with DNS and QLC. There's robust file system support, both client,
like embedded for tablets and phones and for enterprise workloads with Butterfest. That is
also stable, so you don't have to worry about there being some storage underneath. And then
there is these end-to-end integrations
where you just wanna go, let's go full speed ahead
and get all we can.
And of course there's databases that's been accelerated,
but there's also distributed file system and so on,
and Kubernetes and OpenStack and all that.
And we have all that kind of working today.
Yeah, and I wanna close on,
kind of there's more talks here at C.
There was one talk here on Monday by Swapna, which talked about
these kind of like cloud workloads
and the application they have for storage media,
where they talk about the requirements that they
kind of see now and in the future
for their storage systems, both for SSDs
and for hard drives.
There's later here today, they're splitting the gap
between host-managed system hard drives and
software-defined storage, where Piotr from
Light Storage is going to talk about it.
That's going to be exciting.
And then here at the end of the day, Hanson Dennis is going to have fun
talking about zones and the art of log-structure storage.
All right. Thank you.
Thanks for listening.
For additional information on the material presented in this podcast,
be sure to check out our educational library at snea.org slash library. To learn more
about the Storage Developer Conference, visit storagedeveloper.org.