Storage Developer Conference - #182: Computational Storage: How Do NVMe CS and SNIA CS Work Together?
Episode Date: February 28, 2023...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage
Developer Conference. The link to the slides is available in the show notes at snea.org
slash podcasts. You are listening to STC Podcast, episode number 182.
So good afternoon. Welcome to the fourth presentation on computational storage from a standards point of view.
I am Bill Martin.
I am SNEA Technical Council co-chair and also the editor of the SNEA computational storage documents,
both the architecture document and the API document.
And I am also on NVMe. I am on the NVMe board of directors and also the computational programs
co-chair along with Kim Malone and Stephen Bates. So what I'd like to talk about now, having
had the opportunity for you to hear what's going on in
SNEA and what's going on in NVMe, is a matter of how do the two of these work together.
So I will start with an overview of the SNEA computational storage model,
which if you've been here earlier, you may have heard this. You may find it repetitive, but I want to make certain that for those who haven't been here,
they get to actually hear and see how these interact.
Then I'll do an overview of the NVMe computational programs model,
and then I'll go into an NVMe and SNEA mapping,
and then a summary of how they work together or where there are differences.
So starting out from a SNEA computational storage architecture point of view,
if you were here for the Scott and Jason show, you would have seen that there are three different architectural models. There is the
computational storage processor that does not have storage actually in it, but is associated
with storage. There's computational storage drive, which is your standard SSD with computation added,
and there is a computational storage array that is a combination of multiple storage devices
that may or may not include computation.
So for this presentation, however,
I'm just going to focus on the center picture there,
which is the computational storage drive.
So from the SNEA architectural point of view, and I guess let me start off a little bit
above this, and that is what is SNEA doing? What's NVMe doing? So SNEA is defining an architecture
for computational storage. It is not a protocol per se. It is an architecture. It helps to define the components
and what the components are and terminology that can be used so that as we talk about computational
storage within the industry, we can have kind of a baseline of how we talk about it. NVMe,
on the other hand, is defining a protocol layer that allows you to actually
implement the computational storage. Also in SNEA, we're defining the API. The API is
a programming interface that Oscar just presented on that allows you to have an abstraction to program your application to utilize computational storage
and then has a library underneath it that maps to a given protocol.
The API is protocol agnostic, meaning that that API can actually work with any upper layer protocol, which includes NBME, which is the
first one to be developing that protocol layer, but CXL, as you've heard at other presentations
this week, CXL is a prime candidate for doing computation. That API can be used whether you are doing it to storage or to memory.
So the API is agnostic to the underlying protocol.
But to go over what are the architectural elements
for computational storage drive from a SNEA architecture point of view.
So within this, you have the computational storage engine.
Let me see if I can get my pointer to go over to that screen.
So you have the computational storage engine.
That's the actual hardware that does the compute.
That could be an abstraction of that hardware.
So in other words, if you have a CPU, that engine could actually be multiple virtual images of that physical engine.
Within your computational storage environment or computational storage engine, you have a computational storage environment.
This is an environment that could be a Linux operating system. It could be your bit code
environment for your FPGA. But it is what does the computational storage function that resides under that environment,
what does it operate on?
You then have your resource repository.
Your resource repository is a place where you have either downloaded or fixed
or preloaded computational storage engine environments
or computational storage functions.
Data may be transferred between device storage and your AFDM,
which is part of your function data memory.
Your function data memory is something that can be allocated for given computation.
But computational storage functions, which in order to be operating,
are over here as part of the computational storage engine.
They have to be activated here.
They can, within the SNEA architecture,
operate on data that is in allocated function
data memory. They can put their results in that allocated function data memory. From a model point
of view, however, these computational storage functions could operate on data that is in device
storage. They could put their results into device storage.
The model does not differentiate between those.
And as we come to the comparison, we'll talk about some of the differences between the model in SNEA
and the first function data memory.
NVMe computational storage architectural components.
If you were here for the second presentation of this afternoon, you got a fairly detailed description of this.
But for those who weren't, for those who may listen to this later, I want to go through this again in somewhat of a detail.
So we have within NVMe, we have what are called compute namespaces.
A compute namespace is a namespace that is associated with a particular
command set. In this case, the command set that it is associated with is the computational
program's command set. And within NVMe, each namespace is associated with some command set.
So the compute namespaces are associated with the computational programs command set.
Within the compute namespace, you have compute engines,
and you also have compute programs.
And the compute engines and compute programs are associated with one and only one
namespace. You can, again, have a compute engine that happens to be a CPU. It is virtualized,
and through that virtualization may be associated with different namespaces. But each virtual
instance of it is associated with one and only one compute namespace.
Okay, within the NVMe computational storage architecture,
the programs operate on data in your subsystem local memory.
They do not, for the first revision of this,
have the ability to operate on data that is in an NVM namespace.
So the programs operate on the data that's in this memory.
This is allocated into memory range sets.
From the previous presentation,
these memory range sets are made up of ranges of memory that are in individual subsystem local memory namespaces.
And a memory range set may be made up of data,
of memory ranges from multiple namespaces.
Okay?
The programs, the memory in the memory range set is used for both the program input and the program output.
The memory range set can, if need be,
or if defined within the program,
be used for the scratchpad memory that is necessary for the program,
or the program may have its own scratchpad memory
that is not visible in the memory range set or anywhere else.
The NVM namespaces, this is where you have persistent storage of data.
There are currently three different types of NVM namespaces defined.
There is NVM, which is the traditional flash memory type namespace. It's a logical block address namespace.
You have ZNS, which is the zone namespace, which builds on top of the NVM namespace,
and it has zones within that block storage, and you have KV or key value,
which is a totally different format of persistent storage of data. All three of these are defined
already in NVMe, and all three of them could be used for moving data from that namespace
into the memory range set or out of the memory range set.
So data is transferred between NVM namespaces and subsystem local memory
and then used by the computational programs.
So the correlation of terms.
So first I'll start by going through this just in a verbal description of them and then show graphically how they correlate to each other.
So the SNEA term is a computational storage engine. Within NVMe, that is a compute
engine and that is actually associated with a unique compute namespace. In SNEA, we have the
computational storage engine environment. That computational storage engine environment in NVMe is a virtual entity at this point.
There is a possibility that going forward we define environments within NVMe, but in TP4091
we are not defining those. Just to give kind of a little bit of background, TP4091 within NVMe,
we have tried to narrow down our scope to something that we can
actually get out the door to allow individuals to start playing with, to start building things out
of, and to start providing products that have the same commands within them so that different vendors that have defined their upper layer
applications can all use these products interchangeably in some fashion. So it's by
no means finalized as was stated earlier. It is still under development and things may change as things go along, so don't take what I say today to be
valid tomorrow. Within SNEA, we have the resource repository. The resource repository
in the SNEA architecture has downloaded CSFs and CSEEs.
It also has preloaded CSFs and CS, oops, this should be a CSEE.
So those are both available in the resource repository.
It may have only downloaded.
It may have only preloaded.
It may have a combination.
Okay. only downloaded, it may have only preloaded, it may have a combination. Within NVMe, we have programs, and this is the programs are loaded at an index or associated
with an index, and the programs you have both downloaded programs and device-defined programs,
which is the terminology that NVMe has used for the preloaded programs. Yes?
Very quick question. On your NVMe architectural overview, it looked like you had a fixed number
of programs numbered from zero to three. Is that accurate, or is that just a part of that?
So the question was, on the NVMe slide, it showed a fixed number of three programs, or four programs, numbered from zero to three.
Is that an actual fixed value, or is that just an effect of trying to show it on the slide?
That is just an effect of trying to show it on the slide.
I don't remember, Kim, how many bits are there for the index? Is it eight bits?
So there's at least eight bits for the program indices. It may be more bits than that.
So both the SNEA terminology and the NVMe terminology deal with activating programs or functions.
And the activation has to do with the fact that you may have a number of functions or programs available,
but in order to actually utilize them, you need to activate them, and a given implementation may not be
capable of activating as many as reside on that device.
In SNEA, we have function data memory.
The term for that in NVMe is subsystem local memory.
Part of the differences there in terms of naming is the fact that in NVMe,
as Kim talked about earlier today,
as we started defining this as something that was used for computation,
there were others within NVMe who said,
hold on, we may want to use this for something else.
So we took a step back and actually defined it in a way that can be used for
other things within NVMe. Within SNEA, we have allocated function data memory. So this is
function data memory that is allocated for a specific instance of a computational storage function to utilize. Within NVMe we have memory range sets
which are specific ranges of memory within the subsystem local memory that are allocated for
a particular program or maybe multiple programs to utilize in an area where they're not touching memory that they're
not supposed to be accessing. In SNEA, we call it device storage. In NVMe, we call it NVM namespaces.
So looking at this graphically, putting the two slides side by side,
first off, you have the computational storage engine.
Well, that's equivalent to the compute namespace.
One-to-one correlation, really, between those.
The computational storage functions,
these are the programs that are in the compute namespaces in NVMe,
and they can be downloaded or they can be device-defined.
The ones that are shown on the left here
within the computational storage engine
and computational storage engine environment,
there's an implication that those are actually the preloaded ones.
The ones in the resource repository, those could be preloaded
or those could be downloaded.
Then you have function data memory.
As I said, that is equivalent to your subsystem local memory.
And within that, you have your allocated function memory,
and that is equivalent to the memory range set.
And then finally, you have the device storage in SNEA that is equivalent to the NVM namespaces.
So getting into what are the differences between SNEA and NVMe, I think that's probably the more
interesting things to look at. So differences between SNEA and NVMe. SNEA defines the computational storage engine environment.
So that's Linux kernel, that's a Docker packet processing, whatever the environment is that
your programs are going to run under. NVMe, the CSE is a logical entity with no specific definitions.
One of the things that NVMe has done is they are defining,
they're allowing for a definition of program types,
and program types will be able to run on different types of computational storage engines. So they've kind of rolled the engine and the environment together
to indicate the equivalent of a computational storage engine environment.
So also I pointed out a little bit earlier that a CSF can directly access AFDM
or storage in the architecture model.
So that is not precluded by the architectural model.
For the first pass at TP4091,
programs are only allowed to access memory range sets.
That's the only way they access memory.
They don't access memory namespaces even.
They have to access a memory range set.
This is partly because there's a lot of security issues
if you allow access to something beyond a fixed set of memory range sets
that a program can access.
So trying to avoid some of those security holes,
we have started out with only allowing access to memory range sets.
There has been discussion about the ability to have programs directly access storage,
and that is something that we may very well do in the future.
The SNEA model, it supports an indirect model.
So if you were here for the discussion that Scott and Jason had at the beginning at 1 o'clock,
they talked about two different models, a direct model and an indirect model. So the indirect model basically says that you can have specific storage locations that are associated with a particular computational storage function.
So if you read that location or you write that location,
that function is automatically performed based on the fact that you're doing that location or you write that location, that function is automatically performed
based on the fact that you're doing that read or that write of that location.
How the indirect is actually specified is an abstraction and is not specifically defined
in the architecture model, just that that is allowed.
On the other hand, NVMe only has a specific execute command.
That is equivalent to the direct access model in SNEA,
where basically you've got your data where you want it, and you tell it, go execute this computational storage function in SNEA
or this program in
NVMe.
So
with that,
I do want you to join the SNEA and NVMe
in the standardization effort because
we could use your help in all of these
areas.
To summarize,
SNEA, what I said at the beginning, the SNEA is a general architectural
model for computational storage. It is not specific to a given protocol. It is for any
protocol that chooses to build computational storage. And that can even extend to computational memory.
So you will hear more and more, I think, as the time goes by,
more discussion about the fact that computational storage
probably really is computational memory as well,
memory being a media for storage.
NVMe is a specific I-O command set for computational programs,
and it is, as the name implies, specific to NVMe.
SNIA provides flexibility for a variety of protocols, while NVMe is specific for the NVMe protocol.
The API document in SNEA specifies an application programming interface that is a set of interfaces that are abstracted from the protocol layer.
So if you are allocating memory in that API,
that, depending on the library that you have installed below it,
could operate over CXL.
It could operate over NVMe.
It could even operate over SCSI.
It could operate over any lower layer
protocol. So it's an abstraction to get you away from the protocol layer so you don't
have to think about what are the protocol commands that I need to do in order to utilize
computational storage, but an abstraction so that my application can use it no matter what I put
below it. For NVMe, we will be developing a library that maps the API calls to the NVMe-specific
protocol commands. And that is something that there are some prototypes of that that have been developed. As we move forward, we will expect to see more standardized APIs.
Something that, again, was mentioned in the NVMe presentation
is the fact that we are looking at what we need to do
in the software community to enable the NVMe operation.
With that, I'd like to open it up to questions.
I tried to go through this a little bit quickly
because you've had the access to this material previously
for the SNEA side, the NVMe side.
I'm not trying to be terribly redundant with that information.
Yes?
Could you talk a little bit about the relationship between the two groups?
Because there's a degree of parallelism running, and there's a discussion that results in some decision in one of the groups
being reflected in the other aspect as well, so they're was, could I talk a little bit about the relationship between the NVME group and the SNEA group
and the fact that as there's changes in one of those that they need to be reflected in the other
so that the two of them can remain somewhat in lockstep.
There is a formal liaison agreement between SNEA and NVME to share documents.
The mechanisms for sharing it are a little bit different going each direction,
but that liaison agreement is in place.
In addition to that, you have a large crossover of membership between the two.
As you saw from my brief bio at the beginning, I am editor in SNEA.
I'm co-chair in NVME.
I have a very strong desire to see the two of them mesh together.
Kim Malone, who's co-chair in NVME, is also very active in
the SNEA side. She's very great at reviewing every time I put out a document and giving me
lots of comments on how it's broken. Appreciate that, Kim. And so we've got a large crossover
as well as having a formal liaison agreement.
Yes?
So, will there be a Shume library or something that allows you to address NVMe devices using the SNIA API?
How will that work? How do those two come into existence or coexist?
Certainly, you don't write the same code.
So the goal of this is you write your application to talk to the API.
The API is basically a set of here are the program calls. Then there is, yes, a library that goes and takes those calls
and translates them to the NVMe commands, as was shown in the previous presentation on the API.
That library is most likely in user space, although it could migrate to kernel space.
And then at the bottom end of that, it talks to your NVMe driver.
And one of the questions that was raised in the API discussion was, well, can it interface to
SPDK? Yes, you could use SPDK as your driver-layable interface. You could use the Linux NVMe driver. You could use a Windows NVMe driver,
whatever driver. So you've got multiple layers in this stack. You've got your application. It has
an interface, which is a set of calls. You then have your library. It has an interface, which is
your protocol interface to your driver. You then have your driver.
It generates the code that actually talks directly to your NVMe device.
So then one follow-up question.
A person trying to use NVMe computational storage,
if they want to use the SNEAPI, that's a parallel side-by-side thing.
Despite the fact that there's a good correlation between the two models, there's no commonality
in implementation.
So the question was, so a developer that's developing an application, the SNEA API and the NVMe computational storage
are parallel efforts. And my answer to that is they're not really parallel efforts. So
the developers, so as a vendor develops a particular NVMe device,
they would be writing a library that would take advantage of the SNEA API.
So you write your application at the API level.
You allow the vendor who's creating the library,
or hopefully in the future a common library that applies to
all vendors NVMe devices, you put that library in there and it does the conversion for you and
you don't even have to think about what the NVMe commands are.
Then you see eventually the NVMe specific API.
Okay, yeah.
So yeah, that I think is the difficulty was there is not a specific NVMe API.
Hi.
I guess I said this at FMS, but this was to me like a framework for a data flow multiprocessor.
Have you had people talking about building some long big thing they want to do
and stage it or decompose it into a number of stages
and the data flows through?
Or are you seeing very limited, I'm going to invoke only this one function in the process
and get it and then it's done?
No, I personally see this as very much being applicable to a data flow where what you have done is you have perhaps defined
multiple memory range sets. The memory range sets may overlap in part. So memory range set one has
this part and that part in it. Memory range set two has this overlapping part and some other part, and so on down the chain.
You invoke program one against memory range set one.
Program two now operates on the output of that, becomes the input to program two,
which then creates the output to program two,
and on down however far you need to go in that sort of a data flow architecture.
Is there a mechanism when this node finishes
filling in this part of the range set, it can signal the next port that it's ready to
put, and it can begin to fill in the data out?
Okay, so the question is, is there a process to allow
notifying the host, in essence,
that one part of the memory range set is filled,
allowing the next program to operate on it.
Is that what the...
Well, it's got...
So you're saying it has to go back to the host
to see this one finished,
and then it can start the next one.
I was thinking, you know,
direct from here to here,
a notification to go.
Okay, so the question is, is there... No requirement for that.
Right, so is there the thought or consideration of a peer-to-peer
of one computational program signaling another computational program
that it is ready to operate?
I know that within the API,
we have specified a batch mode
that allows that type of batching,
but that's actually up at the host level.
I think that that is something
that eventually could indeed,
you know, at the host API level,
that can be done within the library.
If NVMe, however, starts to define that batch process
where you can actually pass that entire batch down to NVMe,
then that piece of the library becomes moot
and you simply have passed the batch all the way to NVMe
and you tell it to execute this batch process.
Other questions?
Speaking of alignment between SNEA and NVMe,
has anyone asked why NVMe changed the name to competition programs?
It seems a little redundant.
A lot of history there.
I think the biggest thing was that some of the people that were doing the NVMe work, which actually got started before the model was finished in SNEA,
found that function in their mind meant something different.
So they did not want to call it a function. And SNEA, on the other hand, felt the program
had some implications and felt that function was a better term for it. And the two could not come to agreement on that,
and we agreed to disagree and allow for a mapping.
The functionality between the two is the same.
Other questions?
Okay, Thank you.
Hope you have enjoyed the day worth of discussion about,
or the afternoon worth of discussion on the standardization of computational storage.
Thanks for listening.
If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further with your peers in the storage developer community. Thank you.