Storage Developer Conference - #114: NVM Express Specifications: Mastering Today’s Architecture and Preparing for Tomorrow’s
Episode Date: November 22, 2019...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage
Developer Conference. The link to the slides is available in the show notes at snea.org
slash podcasts. You are listening to STC Podcast, episode 114.
All right. Hi, how you doing, everybody? My name is Jay Metz. I am a research engineer for Cisco. I am also on the board of directors for NVM Express.
And I am working with you today to talk about what's happening in the actual specifications moving from where we currently are into what's going on in the future. And I said to myself, you know what? I have access to people
who actually know what they're talking about. So I said to myself, self, because that's what I do,
I say self. And why don't I just ask Nick, who has been instrumental in doing a lot of the
documentation for the changes between 1.3 and 1.4. If you've gone on the website and seen the
changes, the list of changes for 1.4, that's's the man who wrote it so we're going straight to the source and then i said
well you know what about the consequences of these what is it going to have as an impact for for
testing well why don't we just ask david to show up too because if anybody knows about testing and
what's going on in the testing it's going to be him. So I said, hey, self, you just dropped your level of work down to
about two-thirds. Yay! So if you don't mind going to the next one. So what this is about is really
the culmination of some things that have been happening both in the marketplace, in the industry,
in development, and as Nick and David will tell you, also
in ways of how people are actually starting to implement the protocol outside of what
the expectations really were.
And that's fair, because like any new thing, people are going to start using it in ways
that you don't necessarily anticipate when you start making the developments in the first
place.
But as a result, there are some ways of doing things
that wind up getting confusing. People in the end consumer group, consumer and enterprise
group, start to wonder, well, what does it mean if I have this number attached to a specification?
What do I develop to? If this is optional and that's mandatory, what happens? And what
happens to the changes?
So we started taking a hard look at what it is that we are doing happens and what happens to the changes so we started taking
a hard look at what it is that we are doing as we start to develop this organically fluid
specification and start to realize that we probably need to codify it a little bit better
so uh what was about a year ago did we start talking about refactoring about that roughly
yeah somewhere somewhere between nine months and a year ago we started talking about refactoring? About that, roughly? Somewhere between nine months and a year ago,
we started talking about, well, what we really need to do
is we need to take all of these different aspects of the specification,
and we need to repackage it in a way that's a lot easier to find stuff.
And you can be honest.
How many people have had a hard time finding what you need easily?
Okay, again, we're here in the audience participation portion of the
day. I'm not asking you to cheer, just give me a sign of life. Okay, so what we're going to talk
about today is we're going to talk about the reasons why we do the refactoring, what we're
expecting people to get out of the refactoring, and how we get to and through the refactoring
moving forward. But as I said earlier, one of the most importantoring and how we get to and through the refactoring moving forward.
But as I said earlier, one of the most important parts about the whole thing is how do you
make sure you're doing what you're supposed to be doing.
And as a result, when we started talking to David early on in this process, he said, I'm
glad you asked because there's a lot of stuff going on that is really going to wind up confusing
people if we're not clear.
So this effort is to try to make things clear people if we're not clear. So this effort
is to try to make things clear for you and for
the viewers at home who are listening to this on,
on the website.
So once we get through this, we're going to be
able to hopefully have not just a clear idea of
how to use the specifications more efficiently, but also what
to expect moving on in the, in the future, and
how NVM Express
is going to help you get there.
All right?
If you don't mind.
So, quick level setting as to what we're looking to do here.
What we're going to do is we're going to be kind of brutally honest in a lot of ways.
We're going to tell you where the warts are, how to put the cream on them and make them
go away.
We're going to help you understand what we think you should probably be doing,
so we're going to suggest courses of action,
but we're not going to be prescriptive.
We're not going to tell you what to do
because there is a lot of optional stuff
in the features by design.
So there's this tightrope walk that we're trying to take
that will help you understand where you can take
what you're looking to develop in
the right direction without going too far as a deviance from where the specification
itself is going.
We're also not going to be exhaustive.
As we'll see later on in the presentation, there are many, many, many changes, many mandatory
changes from going from 1.3 to 1.4, for instance.
There will be other additional changes going from 1.3 to 1.4, for instance. There will be other additional changes going from 1.4 to 1.2.
There's no way we can talk about it in 45 minutes and cover everything.
So we're going to give you some samples and some examples
of what kind of changes to expect,
what the consequences of those examples are going to be,
and then we can kind of extrapolate on to additional changes
that we can't go over in fine detail.
Fair?
Okay, for the record of those at home, we had nod heads. Okay. Next. All right. So starting from getting from there to here.
We're going to start at the top of the funnel. We're going to work our way down pretty quickly.
Now, in theory, the process of getting from here to there from 1.2 to 1.4 should be relatively straightforward.
You pick up 1.4 and you start writing to the spec, right?
Well, not really,
because if you instead go to 2 to 1.3,
you have to deal with 12 ECNs,
14 technical proposals,
just to get to 1.3.
Now, why would you want to do that
if I can just pick up 1.3?
Well, because the problem is that a lot of the stuff that goes on in the technical proposals
from 1.3 to 1.4 could also be applied to 1.2.
So you can go back and read the technical proposals from 1.3 to 1.4 and go back to 1.2.
So you're doing a lot of hopscotching.
And then what makes things even worse is that sometimes you have multiple ECNs that don't necessarily accumulate the previous ECNs.
So there's text in some ECNs that are not in other ECNs, and you wind up having to do a bunch of jumbling.
Yay!
Fun.
All right.
Now, if we go to 1.3.1.4 to 2.0, things get a little bit messier.
And we're like, you know what?
Let's just take a step back and take a breather.
The reality of the situation is that what we're doing for NVMe 2.0,
which we're going to get to in a little bit,
is going to be able to clarify a lot of the hopscotching.
We're looking specifically to be able to concatenate a number of these things
so that you can have pretty much a single source of truth. Moving forward, there will be some additional technical proposals
that are going into 2.0 as well. But the key thing to keep in mind here, and if you walk away
with one thing today, the most important thing to walk away from is that this 1.3 to 1.4 to 2.0
is an easier path. And the reason why it's an easier path has to do with the way that the
hopscotching works. So NVM Express is going to be helping everybody get from 1.4 to 2.0 because
we'll be using 1.4 language to get to 2.0. Which means that if you try to skip 1.4 and go straight
from 1.3 to 2.0, you will find yourself having to do a lot more hopscotching than necessary.
So kind of a measured pedantic pedestrian way of approaching this will wind up being
very, very useful, particularly when we start to get into some of the mandatory changes
that are going on in 1.4.
Skipping over those are going to cause a little bit of hurt.
And as a result, it will also become pretty unique.
So the problems that you have as a developer
going from 1.3 straight to 2.0
will be left up to you, for the most part,
to solve because of all the different options.
So what we're going to try and convince you today,
and like I said, it's going to be highly suggestive,
is go from 1.3 to 1.4, then to 2.0, because that process, that putting
on your socks before your shoes, will actually wind up helping you in the long run.
So at that, just a summary.
You've already seen Nick's great changes made in 1.4.
We're doing the same thing for NVMe over Fabrics 1.1.
We will be doing one for management as well.
You'll notice that we have those for 1.4.
We don't necessarily have them for 1.3.
We will likely also have that for 2.0 as well, I would assume.
We're going to give more help going from that direction. So start off going from 1.2 or 1.3 to 1.4 before moving on to 2.0.
All right.
One of the things to just kind of reiterate that Jay's alluding to here,
when we say that going from 1.4 to, or excuse me, from 1.3 to 2.0 will be more difficult,
it's because we're refactoring the specification, right?
And we'll go into more detail about that in a little bit. But one of the
things there is that there will be sections of the specification that move around.
There will be some significant changes, not in terms of
technical content, but in terms of how the spec is put together. And a lot of
the technical proposals, the current technical proposals
are written against that 1.4 specification.
When we release 2.0, that's going to be a challenge to be able to put together. We just
want to make sure that's clear.
This is all you.
To start the presentation, we're going to kind of go back a little bit,
and we're going to talk about what does it take to get ready for 1.4.
You know, we've kind of come in and we've said, hey, it's important to kind of take these steps there.
That first step is actually getting to 1.4,
and so we want to talk about the types of changes that have gone into this 1.4 spec.
We've got changes that are basically new features
and feature enhancements, these things are similar,
but then the important thing to be able to iterate here
and just be able to make sure you guys take away
is that there's a number of required changes
that are incompatible with previous versions
of the specification.
So as you go and get compliance checked through UNH,
we need to make sure that all of those things are actually captured well and that you're aware of how to find that stuff.
Because when you look at a couple hundred page specification,
it can be challenging to figure out what do I need to do, right?
And so that's one of the things that we want to talk about today.
Where do I start?
As you see the camera here, pictures, this is something to keep track of.
And you'll be able to get this on the website afterwards.
But the idea is this is an important slide.
As we went about making the 1.4 specification,
and as Jay alluded to with regard to Fabrics 1.1 as well as the MI 1.1,
we have added this new kind of change list to the website.
And so specifically out on the NVM Express website,
right in the section where we've got all of the specifications,
there is a listing of all the changes that have gone into the 1.4 spec.
And on top of that, it calls out not just what the change was, but the specific
sections in the specification where those changes came in at, as well as which TPs and
or ECNs that change is related to. You can really get the detail about where do I need
to go look to be able to find out about a particular change. And that change list that
is on the website is exhaustive. Unlike our presentation here, it contains all of the
changes that are required to get from 1.3 to 1.4. And so this kind of detail is part
of the reason why we really encourage folks to move to that 1.4 kind of release of the
spec and that readiness before they move on to 2.0 because it's very,
in this way it's very prescriptive.
We aren't going to update some of those changes
and where they're at into the 2.0 spec.
You'd have to kind of search around for that stuff.
So that's one of the things we're trying to encourage here.
So here, this is just, I don't know,
groups, classifications of updates that were made for required changes.
So, this is kind of going through that list of things that if you're going to be 1.4 spec compliant, you need to make updates in these various areas.
I'm not going to go through this whole list, but the idea here is to really talk to you about the fact that there are a number of changes and that they're important changes.
A lot of these have to do with things like properly handling error conditions or clarifying
different kinds of NSID values, namespace identification values.
Make sure that things are not left kind of implied in the specification, but we've really
gone through and made sure that things were explicit so implied in the specification, but we've really gone through
and made sure that things were explicit so that we don't have cases where vendor A implements
it one way and vendor B implements it another way. We're putting compliance in place for
these things to make sure that people are able to, as they consume SSDs or arrays or what have you, they get a consistent implementation across vendors.
One of the things that we're going to do here today
is dive in just a little bit to give some examples
of what those changes are
and how we've explained them on the website.
This first one is about controller memory buffer.
Controller memory buffer support has been there previously.
It was introduced into the 1.3 specification.
But what we did here is we kind of hardened the implementation.
We found some error conditions where if a drive supported CMB but the OS didn't support CMB,
there were some scenarios where you could get some bugs and some issues, and we wanted to make sure that we closed through some of
those gaps, especially as some of them might be security-related. And so what we went and
did is we added basically a support bit and an enable bit. And the idea here is to make
sure that, you know, as a drive implements the controller memory buffer,
that the OS has to be aware to be able to turn on that functionality.
And it's not sitting on in the background
just kind of as a default without the OS realizing it.
So this is a kind of important thing.
In addition to that, as part of the changes
that we went and did, we kind of removed
some of the restrictions for the controller memory buffer.
Previously, you had to have all your data and your command queue, excuse me, your submission
queue entries all into that same buffer, but now we kind of removed the restriction.
You can have some in host memory and some in the controller memory buffer, kind of either
direction and that's okay.
In addition to fixing some of the gaps, we also allowed for more flexibility in the controller
memory buffer implementation. I talked to some of the whys already, but
the key piece of why we did this is to make sure that the OS is aware of the fact that
the drive is supporting this
and that some bad actor can't do something in the background without the OS being aware
and without the drive knowing that it's not the OS doing it.
Those are some bad scenarios.
So we wanted to make sure and do that.
And the impact of inaction in this space is that you continue to leave a drive that supports CMB
potentially vulnerable to some of these types of issues.
And so we want to make sure that those things get closed down, that we make folks aware of that fact.
And so that's why we called out this change and that we made the associated changes to the specification.
One of the things you'll note down on the lower right is kind of an example of what we did inside that change list on the website.
We've got the NVMe revision 1.4, section 3.1, 4.7, 4.8, 7.3,
and blah, blah, blah, blah, blah, right?
But the idea is it's very explicit as to which section the changes are in
so that when you go and you try and read through a large document,
you're able to find where that stuff is at, and it's not hidden.
The other thing, especially when it comes to some of the required changes, like something like this,
we actually call out the explicit sentence where the change happened,
so that not only do you know the section, but you actually know this is the line that you really got a key in on,
and this is what was important about the change.
Some of these have half a dozen of those sentences in there,
and it's really important that you read through each one of those things
because it will help you make sure that your implementation is solid.
Yes, so on this one, Joe gave this.
This is Jay.
He's much better at these deliveries than I am, but, you know,
it's all good. Here, what we've got is, you'll notice one of the changes is with namespace IDs.
We have this kind of general usage for FFFF, well, 8Fs here. And what it generally means is that we're broadcasting that action
to all of the namespaces inside of the subsystem.
Okay?
But the thing here is that we didn't explicitly define it for every single command.
Right?
And so there's a lot of kind of vendor-specific implementations that are out there,
and the vast majority of them are exactly
what we intended. And then there's some that aren't. And some of them, even when we went through
as the experts and we're saying, oh, yeah, it needs to be this way, it needs to be that way,
and we're like, oh, we're at loggerheads with regard to what's the right thing that's intended
for the various situations. And so what we did inside of 1.4 is we went and we made very explicit
exactly what the different FFFF definitions were for each of the commands. This is true
for, I guess, IO commands, set and get features, admin commands, as well as reservations. You
know, all of these things, it's either, you know, FFFF is supported or it's not supported. When it is supported, it means this.
You know, these are the error conditions to send back when it's not the case.
And honestly, the majority were actually defined before.
But what we did is we went through and we found some gaps.
And so what this did is, you know, this change really was about making sure that we had filled those gaps
and that it was explicit what to do.
Again, it's that cross-vendor support.
As you're a host or as you're an application software, you're able to know exactly how the device is going to respond ahead of time.
This is a really key thing.
One of the things that we want to make sure is clear is what's the impact of not doing this?
My drive's fine.
Nobody complains about this stuff. I never hear
anything from my customers.
Well, you know, this last example we have, what happens
when a delete command is sent with NSID FFFF? Do you
delete all your namespaces? Do you not delete all your
namespaces? I mean, what do you do? I mean, that seems
pretty obvious, right? It's a broadcast. You should delete everything. I see a cringe in the front row, second row. No one's in
the front row. So in the second row, there's a cringe. So we'll get onto this a little bit more
later. But that's an example of the types of things that were unclear before that we've now clarified.
Moving forward. So we couldn't go and not talk about any of the
new features. So one of the things that we wanted to talk about was the persistent event log. Again,
there's a number of new features that were added to the specification, but here we wanted to give,
this is kind of a two-sided thing. This provides all kinds of ability to be able to capture logs and persist
them across power cycles. That's important for a number of reasons, the biggest of which
generally is kind of debug and to figure out what happened and to be able to keep logs
of what's going on. But the idea here was that we create a consistent way to do that
again across vendors and across OSs. So this allows for basically all your SSD manufacturers
to be able to generate things that are custom for their drive,
but inside of this number of different types of events.
So firmware commits, which are obviously going to be implemented
in a vendor-specific way.
Thermal excursions, the hows and whys of that are, again, going to be specific to a particular vendor.
There's a lot of different things here where there's vendor specifics to the implementations here,
but how you get that data and the types of data, those types of things can be consistent. And so we wanted to provide a mechanism, a framework that could be used
both by the device vendor as well as by the OS side or the host side. So that's what we
wanted to go. And that one's building for some reason. So now we will go back to Jay
and we'll get a little bit more about kind of where we're going forward from here with
refactoring.
That's right.
It's exciting.
Oh, yeah. about kind of where we're going forward from here with refactoring. That's right. It's exciting.
Oh, yeah.
Okay, so let's start a little bit of dirty laundry here.
One of the things that we've noticed as we've been evolving,
the intentions have always been pure.
There's no question about that.
The idea of NVM Express from the very beginning was that
it was supposed to be a very simple approach
to handling block storage for non, non-volatile memory.
As a result, the organization wanted to keep things
relatively simple. They wanted to keep a very small
mandatory set of commands, and then you just add
in a bunch of optional features.
Great in theory.
In practice, what's happened is that
with all the different optional features
and the different dependencies that's happened,
you've wound up with some inconsistencies
in how the actual specification has been written.
The other thing that's wound up happening as a result
is that because of the fact that NVMe
was originally designed for PCIe, there has been a trend in conventional wisdom, for lack of a better word,
where people have associated NVMe with PCIe. In fact, have you all heard of the website Quora,
where people can ask questions and get answers from so-called experts? Well, so if you look at the questions on NVMe and Quora,
a lot of those happen to wind up being an equation of three different, very different elements.
NVMe equals PCIe equals M.2.
I can only do so much.
All right.
But the idea here is that NVMe and PCIe being equivalent is a very real
problem, even in the engineering space where people should actually know. So I work for
a company that's not well known for storage, and so sometimes we have to have a little
bit of a come to Jay moment and identify what it's actually what.
Now as a result, we're realizing that, generally speaking,
the way that we're treating NVMe right now
is that it is using PCIe as its own transport capability,
not that they're one and the same.
Now, if you look at some of the older specifications,
you start to realize that there's a lot of synonyms about the two.
So, okay, our bad. We're trying to fix that.
The other problem is that
when we started adding in fabrics,
when we started writing in NVMe over fabrics,
the way that NVMe over fabrics
is written is slightly
different than the way that NVMe is written.
So the structure is a little bit
different, and some of the
language conventions
is a little bit different. All that stuff
has made for an interesting interpretation exercise on the part of the reader.
So we're going to try and fix that, if you don't mind.
So what we need to do is we need to figure out, we're re-examining, let me put it that way,
we're re-examining what we have to understand is what are the coarse table stakes?
What are the things that are absolutely positively part of the spec? What is the true element,
the pure distilled NVMe, right? Now I think it's alcohol time or something. I don't know.
We're looking for the distillery of NVMe, which basically means effectively what are the parts
that are going to be the core? I mean, we've got our queue pairs, right?
We have our IO queue pairs.
We have our admin queue pairs.
Those are part and parcel of the core of what NVMe is.
Those won't change.
But what about other things?
What are the optional things?
What are the things that people will want to do for specialized use cases?
So we need to separate out namespaces, which is a core aspect of what NVMe,
and the type of namespace
that we're going to use. Those are going to have to be separated out intellectually and
logically.
So when we start to look at this, we also have to look at longevity. A longitudinal
study of what is going to be implemented versus what is not. What is a temporary optional
feature versus a long-term optional feature? So we've really rethought the way that we're going
to be repackaging the specification to that end. And so as a result, we're going to be
changing the actual writing of the specification to mirror this longitudinal approach to handling
the specification. And we think that over time, that will wind up being a better way of developing to the
specification consistently now the tightrope as i mentioned earlier that we have to walk
is how do we do this without being prescriptive that's the big question so if you don't mind
the way that we've originally done things it helps to kind of see where things go in a nice
big picture and then nick is is the genius behind this graphic.
So for the sake of cadence.
So we started off with the optional,
the NVMe base specs, the original base specs,
and then we added NVMe over fabrics.
And we also added NVMe over RDMA and eventually TCP.
Fiber channel was its own thing.
The T11 group was doing its
own. So you could sort of have a imaginary dotted bubble to the side because that was a separate
entity unto itself. And then we started adding in additional major categories from the NVMe
specification. But it also had, a lot of these had very, very real implications for NVMe over
fabrics as well.
And one of the questions that has come up often is, well, does this apply to NVMe only, or does this apply to NVMe over fabrics?
A perfect example for this is asymmetric namespace access.
Is that a fabrics thing, or is it a PCI thing? Is it an NVMe thing? Where does it actually fit? So as we start to add bubbles to all these different things and these bubbles start to overlap, the Venn diagrams get a little confusing.
So what we've done is we've decided to change this around and have a core set of
specification that includes both the NVMe spec and the fabric spec while maintaining
what constitutes a transport versus what constitutes
features all right so let's break this down a little bit more and see what it looks like
now if you look at the way that we currently have this we have the nvb base spec
uh in blue in the middle we have the teal it's teal right anybody know the colors better than
me i feel like i should be in kindergarten again okay Okay, this is red, this is blue, this is chartreuse. Anyway, NVMe over Fabrics has its own discovery
service, the NVMe over Fabrics command set, various data structures, and a lot of these
things kind of cross-pollinate. A lot of these things overlap. So what we're looking to do is
saying, okay, well, which one of these actually is the same, and which one of these is not? So we've gone through and said, look, well, okay, well, we need
to have a discovery service for both. We need to have a queuing model of the logs and the
status codes for both. We need to have admins commands for both, but we don't necessarily
need to have the nvmm command for both. That's only going to be on the, on the base side.
And so on and so forth. So if we take these things in pink
and we start to put those things together,
then we really have a core set of NVMe.
So it really kind of looks like this,
where we have the pink-based specification
with all those different things
that were all common to both sets,
but then we have the transport mapping separate.
We have the individual transport
mapping that we can include other transport mappings if they so should eventually come into
pass and likewise the different types of feature sets that are not necessarily mandatory key value
is is a great new namespace type but it's's not mandatory. Zone namespaces is not mandatory, but it is a
great new feature.
But all of these things are basically,
if you will forgive the expression,
plugged into the core
elements. So the way
that the management, I'm sorry, the way
that the specification is
handled is that we have management
on one side which kind of covers all of this,
but at the same
time you can individually identify what is necessary and what's not and easily find where
the dependencies fall so you don't necessarily i'm going to go back real quick here going backwards
this is very easy to understand if your perspective happens to be understanding
what mvme over fabrics works how works. It's not very easy, however,
or at least it's not as easy as this,
is understanding where your dependencies fall.
If your dependencies for this TCB transport mapping
do not go into there, but all you have to hear,
this is a much easier way of understanding
where your dependencies fall.
Therefore, it's a lot easier to find out
where to find the information in the specification. And, you know, the stuff
that we've done with the change logs that Nick was talking about earlier is great for
itemizing everything out once you've understood this. Right? So does this make sense? Any
questions so far about what this is? Now, I also want to point out really quickly
that there was a lot of thought that went into these kinds of, these kinds of philosophical
questions about how NVMe should be presented to developers. So the idea here is to make it easier
for, for developers to do the, to do the job. We are eager for getting feedback as to how things
are or are not working.
And actually to Jay's point there, we're still going through this process
with some of the details. So there's work ongoing
today with regard to what some of the specifics
of these look like. So feedback is definitely welcome.
And now we're going to get to the real meat of the stuff where the real smart guy comes up.
And just before Dave gets going, I really want to underscore this.
I said this earlier before, but when we started talking to Dave, he's like, oh, there's so much stuff that people need to know.
I'm like, okay, like what?
And then he just started listing things off. And
there isn't just more tests. There's
different types of tests. Some tests are going away.
Some tests have to be repackaged. Some tests have to be redone. So this
and I want to bring this back to those bubbles at the beginning about avoiding
the bears. You want to bring this back to those bubbles at the beginning about avoiding the bears.
You want to understand very quickly why going from 1.3 to 1.4 is easy.
This is why.
Once you get to the compliance part of this,
it will make a lot more sense as to why going to 1.4 to 2.0 is going to be a lot easier.
Okay.
Thank you, sir.
All right.
Thank you, Jay.
So one of the things we want to point out with regard to compliance is that our bigger goal with compliance is to protect interoperability.
Our goal, even with the refactoring, is to protect interoperability.
The steps from the 1.3 spec to the 1.4 spec is to ensure that products remain interoperable.
And so we're putting in a lot of work into our test documentation, a lot of work into our test tools
to enable people to protect the interoperability of their products.
And so one of the things that we've seen people doing
is in the steps from 1.2 to 1.3,
even now in the steps from 1.3 to 1.4,
they're using those test docs, they're using those test tools,
and running them against their products in development
to ensure that they stay compliant
and they stay interoperable, even running them weekly and nightly.
So there's resources that have been provided for that.
Now, as we've gone through creating these compliance tests over the last several years,
one of the things we've been paying a lot of attention to is whether a product is
advertising support for the 1.2 spec or the 1.3 spec or now the 1.4 spec, a host is going to treat
that differently. A host is going to look at that version field and then behave accordingly and
expect certain types of features, expect certain types of behaviors based on that. We've put that
into the compliance step specifications as well. We've put that into the compliance step specifications as well.
We've put that into the test tools as well. So our tests are going to behave differently depending on what version of the specification
that product is advertising support for.
So that's one of the things that we're doing in order to preserve compliance
and to preserve interoperability.
Now one thing we want to point out, one kind of our key tenets, one of our guiding principles
when it comes to the refactoring and compliance is that refactoring in and of itself isn't
going to magically create a bunch more tests.
What it means is that just as there's going to be portions of the base specification that
move, there's going to be things get added into the base specification, things are going to move in the test specification that move. There's going to be things that get added into the base specification.
Things are going to move in the test specifications as well.
But the refactoring in and of itself isn't going to add a bunch more tests.
But it is going to be some complexity in determining where the tests that apply to your product now reside and where they're documented.
And I've got a couple slides to make that a little bit clearer.
So this is the situation with specifications and test documentation today. There's three big specs that
have come from NVMe, the base specification, the management specification, and the fabric
specification. And we have a test document for each of those. So depending on which spec affects
your product, you're going to look in a different test document to determine what you need to do for compliance.
This is the situation today.
When we go into refactoring, that's going to change.
The fabric specification goes away.
Parts of that go into the base specification.
We end up with transport specifications for the different transports.
There's going to be independent compliance documents for that.
Even command sets, which we just alluded to,
there's going to be separate compliance documents for that.
So, again, the number of places that need to be checked,
the number of documents you need to look at for a particular product,
that's expanded, but the number of tests hasn't expanded.
And I'm going to show that on the next couple slides.
So it's not the refactoring that's going to cause more tests.
It's new TPs, which are new features.
It's new ECNs, clarifications in the specification.
Those kinds of things will create more tests,
but the refactoring in and of itself will not.
So to illustrate that,
I'm going to do another today and tomorrow comparison.
So today, if we start on the top row in this diagram,
if you have a basic run-of-the-mill NVMe SSD using PCIe as a transport,
the test specification you're going to want to look at is the NVMe base spec conformance document.
In that document, there's about 270 tests.
Now, the ones that apply to your product, it might be a little less than that depending on feature support,
but defined in that document, about 270 tests.
If we go to the middle row here, you've got an NVMe SSD that's implementing the management interface.
So naturally, there's more tests, about 323 tests,
because now there's two test specifications you want to look at.
If the product is something like an all-flash array, there's some things in the base specification,
in fact, quite a few things in the NVMe base specification that you need to pay attention to.
There's also things in the fabric specification that you need to pay attention to.
All told, that's about 217 tests.
Again, feature support, whether you support certain optional features or not, is going to change that total test number that applies to a specific product, but this is what's defined today.
Now, again, looking at tomorrow, you can see there's more test specifications that you need
to look to, but the tests have simply found new homes because the number of tests has not increased
just because we've done the refactoring. So again, if you have an NVMe SSD using PCIe as a transport, it doesn't have a management interface,
that number of tests is still about 270. Now I say about because there could be some TPs that
get applied, there could be some ECNs that get applied, that may add a few tests, but again,
the refactoring in and of itself isn't adding a number of tests.
So now we want to look into a couple examples, a couple of things that were changed from 1.3 to 1.4
that are very important for us to pay attention to, that if we don't get right, are going to
cause compliance problems, are going to cause interoperability problems. And some of this is
a little bit of some of the dirty laundry that we referred to earlier,
but we're going to dig a little deeper and see what can happen here.
So one of the things that got cleared up in the 1.4 spec was the proper use of that NSID of all Fs.
And as was alluded to, there were some cases where the use of that NSID were well-defined,
and then there were some cases where it was maybe optional.
And then actually there were implementations that kind of took how it was defined for one command
and then used that in another command, and basically were doing some undefined behavior.
And so a lot of effort went in on the 1.4 spec to clear that up.
The example we're looking at here is a get feature command
for a namespace specific feature.
If that get feature command gets sent with an NSID of all Fs,
what exactly is gonna happen?
Now under 1.3 there was a controller who might accept that.
So it could get that get feature command and say,
yeah, I know what I'm going to do with that.
Here's the information about my namespaces and how they are persistent across a power loss.
And so the controller can send that off to the host.
And the host gets that information, but the problem here is that the host and the controller,
they weren't talking the same language about that.
So if there's a power loss, the host gets upset with what happened.
Not because of the power loss. We built the protocol to be resilient to those kinds of things, to
be able to deal with those kinds of things. The problem is that the controller and the
host had different expectations about how those name spaces would be, or reservations
on those name spaces would be persistent across power losses. So that was a little bit of a gap in NVMe 1.3. So in 1.4,
we cleaned it up. If that get feature command is sent with an NSID of all Fs for a 1.4 compliant
controller, now the controller is required to write back and say, that's an invalid namespace ID
and report an error. So the host gets that error. And then what can it do? How can it accommodate
for that behavior? Well, it can start to query each of those namespaces individually.
So it can send that get feature command again for namespace 1
and get actually the information back for that particular namespace.
It can send it again for namespace 2
and get the information back for that particular namespace.
So now the persistence may actually be the same between these two products.
The power loss might still happen, but now the host and the controller have the exact same expectation about how those things will behave.
That's one little adjustment that was made with regard to the all Fs NSID.
One more, and this is the one that Nick alluded to earlier, is how that namespace management command with the delete action will behave with that all Fs NSID.
So under 1.3, if that namespace management command came in for all Fs,
the controller might delete or he might not.
And so that namespace management command comes in,
and maybe the controller says, I don't need to do anything with this. It's an optional thing, whether I'm going to delete this or not.
Then the host goes to double-check that that delete action occurs with the identify command.
The controller sends back the namespace list.
All the namespaces are still there. That's not what the host was expecting.
That can cause a problem.
In 1.4, that activity or that behavior was cleaned up.
Now that all Fs namespace ID
is treated as applying to all namespaces.
So the host sends that namespace management
with delete and NSID of all Fs
and the controller can write back and say,
yep, I'm going to delete those.
It actually deletes it.
Then the host can do the double check
to make sure that those namespaces were deleted
with the identify command,
and the controller writes back
with exactly what the host was expecting.
So here we see that what was expected to be deleted
actually got deleted.
The host and the controller have the same understanding.
They're on the same page
about how that namespace management command
actually is implemented.
And so these are things that we're going to be checking with compliance with regard to 1.4.
So a quick summary about compliance.
Kind of like we said earlier, if there's one thing that you're going to walk away from this presentation for,
it's going to be that moving to 2.0 is going to be much easier from 1.4 than from 1.3.
And so we're putting in a lot of effort right now, both in the test specification and the test tools,
to ensure that the community can get 1.4 compliance right.
So with that, we'll hand it back to Jay to bring us home.
Bring it on home.
Okay.
So we try to time this in such a way that we would be able to have some questions at the end,
and I think we did an okay job.
But, again, I just want to reinforce a couple of the different things that we're trying to get across here.
Once you go through this particular type of a presentation, you start to get a little antsy
because it seems like, well, this is kind of a duh moment.
It all makes sense now.
But one of the things that we're trying
to combat is some of the confusion that's been going
on, both in terms of
the end user space as well as the developer
space, about how certain things are supposed to behave,
both at the really low level and
also at the high definitional level.
So to that end,
it's best not to wait until the refactoring comes out,
which will happen sometime next year.
The best thing to do right now is to understand,
especially the stuff that's going on in 1.4.
If you haven't taken a look at the list of changes that's going on in 1.4,
you really should do that ASAP because it's extensive.
And a lot of the mandatory changes for expected behavior
are going to be particularly salient,
especially since we can leapfrog and hopscotch back and forth
between different specifications for optional features.
If you're thinking about taking in TPs and putting them into a 1.2.1,
or a 1.2.1 specification, you're going to get some unexpected results.
So start to think about this transition now
and start to plan a migration strategy,
for lack of a better word.
It's not really migration,
but I hope you understand what I'm talking about.
But a plan of action as you move forward
to go from 1.4 to 2.0.
One of the other things that we're also trying to keen on
is to get really good feedback
for the kinds of needs that you have
for what's confusing
in your own implementations as well.
We want to be sure
that we're actually addressing real-world problems,
not just the theoretical ones that happen
and that we get on a Thursday morning call.
So to that end,
the idea here overall
is that we really want to try to help, is that we really want to try to help, OK, we
really want to try to help developers be able
to be consistent. Nobody likes developing something that
A isn't used or B is used incorrectly or
C doesn't work. So we're trying to make that
as easy as possible, given how difficult life
is already.
Now, before we get too far into it,
any questions that we can answer?
Any specifics about some of the changes
for 1.4, for example?
Yes, sir?
You noted that the T11 community
has a wider channel of data.
Correct.
What's the relationship between MD&E Express
and T11?
You guys go along.
And how do you actually make
that big pink horizontal bar where you're making things and common go in there? Excellent question.
So I'm going to repeat the question for both the recording as well,
the people in the back.
The question really evolved around the relationship between NVM Express
and the T11 group, which manages the fiber channel standards.
And in particular, the question was, what's the relationship between NVM Express and the T11 group, which manages the Fibre Channel standards. And in particular, the question was, what's the relationship between NVM Express and T11?
And the second question is, does the T11 group use the new refactored approach to doing NVMe?
Did I get that right?
Okay.
Is that a five minutes?
Okay.
It was a glare.
I couldn't see.
Excellent question.
So I'm also on the board of directors for Fiber Channel.
And it turns out that there are several members of T11 in the technical working group for NVM Express.
There's a lot of cross-pollination on that end.
The relationship between T11 and NVM Express,
there's a memorandum of understanding between the two groups,
and they work hand-in-hand.
What goes on inside of T11 is tightly coupled with what goes on in NVM Express.
Now, to your other question about does fiber channel use the core elements of it,
the way that NVM Express is layered is that you have the NVM Express layer on top,
then you have an NVM over fabrics binding,
and then you have the fiber channel or other transport underneath.
The key thing is that the bindings between the fiber channel and the NVMe layer,
that bindings layer, those are worked on jointly.
And the fiber channel specification will match what's going on inside of the bindings,
but the binding is handled by NVMe.
Now the good news is that the same people are on both committees.
So it makes it a lot easier.
And on top of that, the refactoring effort won't change that binding.
That's right.
It's just a matter of how the specification is called out
and making that clear inside the specification
that that binding doesn't exist that way.
That's correct.
So from a practical working relationship,
that part won't actually change.
I would think that on the fiber channel side,
you'd have less skew than with, say, the IP or the channel side. Would have less...
SKU.
SKU.
The question is that Fiber Channel would have less SKU.
It has less SKU because it's a much more mature protocol for storage.
Whereas Rocky is really kind of transferring things over from HPC into a storage world.
So we've got 25 years of fiber channel with a very well understood
and well defined relationship between host targets and switches.
Whereas it's not quite so well defined in Ethernet racing.
Now, InfiniBand also has a well-defined area, too, but for whatever reason,
really the Rocky and TCP and fiber channel,
those happen to be the ones
that are getting most questions about.
Any other questions before we have to close up shop?
Yes, sir?
Can I just make one suggestion?
I don't know if the web page is changing.
Ooh, we have suggestions.
All right.
It's very useful, but one thing I could suggest is that maybe it could be a downloadable file.
Yes.
We'll do that.
Yeah.
No problem.
Did you just volunteer for work?
Oh, no.
No, I volunteered for work.
No, I volunteered Liz for work.
Oh, okay.
Poor Liz.
That's our admin, by the way.
Poor Liz.
It's a good suggestion.
We'll do that.
It might take a little while, but we'll do that.
Yeah, as the number of items gets bigger,
then having an easier way to track is what we're looking to try to do.
You have to remember we're also working with a bunch of people
who are really effectively volunteering to do this.
Now,
if you want to pay me a lot of money to do it,
that's a different question.
Alright, last question. Anyone?
Bueller? Bueller?
Alright. Thank you very much,
gentlemen and ladies. I appreciate it.
Thanks for listening.
If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers
dash subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the Storage Developer community.
For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.