Storage Developer Conference - #182: Computational Storage: How Do NVMe CS and SNIA CS Work Together?

Episode Date: February 28, 2023

...

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to STC Podcast, episode number 182. So good afternoon. Welcome to the fourth presentation on computational storage from a standards point of view. I am Bill Martin. I am SNEA Technical Council co-chair and also the editor of the SNEA computational storage documents,
Starting point is 00:01:02 both the architecture document and the API document. And I am also on NVMe. I am on the NVMe board of directors and also the computational programs co-chair along with Kim Malone and Stephen Bates. So what I'd like to talk about now, having had the opportunity for you to hear what's going on in SNEA and what's going on in NVMe, is a matter of how do the two of these work together. So I will start with an overview of the SNEA computational storage model, which if you've been here earlier, you may have heard this. You may find it repetitive, but I want to make certain that for those who haven't been here, they get to actually hear and see how these interact.
Starting point is 00:01:52 Then I'll do an overview of the NVMe computational programs model, and then I'll go into an NVMe and SNEA mapping, and then a summary of how they work together or where there are differences. So starting out from a SNEA computational storage architecture point of view, if you were here for the Scott and Jason show, you would have seen that there are three different architectural models. There is the computational storage processor that does not have storage actually in it, but is associated with storage. There's computational storage drive, which is your standard SSD with computation added, and there is a computational storage array that is a combination of multiple storage devices
Starting point is 00:02:48 that may or may not include computation. So for this presentation, however, I'm just going to focus on the center picture there, which is the computational storage drive. So from the SNEA architectural point of view, and I guess let me start off a little bit above this, and that is what is SNEA doing? What's NVMe doing? So SNEA is defining an architecture for computational storage. It is not a protocol per se. It is an architecture. It helps to define the components and what the components are and terminology that can be used so that as we talk about computational
Starting point is 00:03:34 storage within the industry, we can have kind of a baseline of how we talk about it. NVMe, on the other hand, is defining a protocol layer that allows you to actually implement the computational storage. Also in SNEA, we're defining the API. The API is a programming interface that Oscar just presented on that allows you to have an abstraction to program your application to utilize computational storage and then has a library underneath it that maps to a given protocol. The API is protocol agnostic, meaning that that API can actually work with any upper layer protocol, which includes NBME, which is the first one to be developing that protocol layer, but CXL, as you've heard at other presentations this week, CXL is a prime candidate for doing computation. That API can be used whether you are doing it to storage or to memory.
Starting point is 00:04:50 So the API is agnostic to the underlying protocol. But to go over what are the architectural elements for computational storage drive from a SNEA architecture point of view. So within this, you have the computational storage engine. Let me see if I can get my pointer to go over to that screen. So you have the computational storage engine. That's the actual hardware that does the compute. That could be an abstraction of that hardware.
Starting point is 00:05:27 So in other words, if you have a CPU, that engine could actually be multiple virtual images of that physical engine. Within your computational storage environment or computational storage engine, you have a computational storage environment. This is an environment that could be a Linux operating system. It could be your bit code environment for your FPGA. But it is what does the computational storage function that resides under that environment, what does it operate on? You then have your resource repository. Your resource repository is a place where you have either downloaded or fixed or preloaded computational storage engine environments
Starting point is 00:06:25 or computational storage functions. Data may be transferred between device storage and your AFDM, which is part of your function data memory. Your function data memory is something that can be allocated for given computation. But computational storage functions, which in order to be operating, are over here as part of the computational storage engine. They have to be activated here. They can, within the SNEA architecture,
Starting point is 00:07:04 operate on data that is in allocated function data memory. They can put their results in that allocated function data memory. From a model point of view, however, these computational storage functions could operate on data that is in device storage. They could put their results into device storage. The model does not differentiate between those. And as we come to the comparison, we'll talk about some of the differences between the model in SNEA and the first function data memory. NVMe computational storage architectural components.
Starting point is 00:08:00 If you were here for the second presentation of this afternoon, you got a fairly detailed description of this. But for those who weren't, for those who may listen to this later, I want to go through this again in somewhat of a detail. So we have within NVMe, we have what are called compute namespaces. A compute namespace is a namespace that is associated with a particular command set. In this case, the command set that it is associated with is the computational program's command set. And within NVMe, each namespace is associated with some command set. So the compute namespaces are associated with the computational programs command set. Within the compute namespace, you have compute engines,
Starting point is 00:08:57 and you also have compute programs. And the compute engines and compute programs are associated with one and only one namespace. You can, again, have a compute engine that happens to be a CPU. It is virtualized, and through that virtualization may be associated with different namespaces. But each virtual instance of it is associated with one and only one compute namespace. Okay, within the NVMe computational storage architecture, the programs operate on data in your subsystem local memory. They do not, for the first revision of this,
Starting point is 00:09:51 have the ability to operate on data that is in an NVM namespace. So the programs operate on the data that's in this memory. This is allocated into memory range sets. From the previous presentation, these memory range sets are made up of ranges of memory that are in individual subsystem local memory namespaces. And a memory range set may be made up of data, of memory ranges from multiple namespaces. Okay?
Starting point is 00:10:28 The programs, the memory in the memory range set is used for both the program input and the program output. The memory range set can, if need be, or if defined within the program, be used for the scratchpad memory that is necessary for the program, or the program may have its own scratchpad memory that is not visible in the memory range set or anywhere else. The NVM namespaces, this is where you have persistent storage of data. There are currently three different types of NVM namespaces defined.
Starting point is 00:11:11 There is NVM, which is the traditional flash memory type namespace. It's a logical block address namespace. You have ZNS, which is the zone namespace, which builds on top of the NVM namespace, and it has zones within that block storage, and you have KV or key value, which is a totally different format of persistent storage of data. All three of these are defined already in NVMe, and all three of them could be used for moving data from that namespace into the memory range set or out of the memory range set. So data is transferred between NVM namespaces and subsystem local memory and then used by the computational programs.
Starting point is 00:12:10 So the correlation of terms. So first I'll start by going through this just in a verbal description of them and then show graphically how they correlate to each other. So the SNEA term is a computational storage engine. Within NVMe, that is a compute engine and that is actually associated with a unique compute namespace. In SNEA, we have the computational storage engine environment. That computational storage engine environment in NVMe is a virtual entity at this point. There is a possibility that going forward we define environments within NVMe, but in TP4091 we are not defining those. Just to give kind of a little bit of background, TP4091 within NVMe, we have tried to narrow down our scope to something that we can
Starting point is 00:13:07 actually get out the door to allow individuals to start playing with, to start building things out of, and to start providing products that have the same commands within them so that different vendors that have defined their upper layer applications can all use these products interchangeably in some fashion. So it's by no means finalized as was stated earlier. It is still under development and things may change as things go along, so don't take what I say today to be valid tomorrow. Within SNEA, we have the resource repository. The resource repository in the SNEA architecture has downloaded CSFs and CSEEs. It also has preloaded CSFs and CS, oops, this should be a CSEE. So those are both available in the resource repository.
Starting point is 00:14:18 It may have only downloaded. It may have only preloaded. It may have a combination. Okay. only downloaded, it may have only preloaded, it may have a combination. Within NVMe, we have programs, and this is the programs are loaded at an index or associated with an index, and the programs you have both downloaded programs and device-defined programs, which is the terminology that NVMe has used for the preloaded programs. Yes? Very quick question. On your NVMe architectural overview, it looked like you had a fixed number of programs numbered from zero to three. Is that accurate, or is that just a part of that?
Starting point is 00:15:00 So the question was, on the NVMe slide, it showed a fixed number of three programs, or four programs, numbered from zero to three. Is that an actual fixed value, or is that just an effect of trying to show it on the slide? That is just an effect of trying to show it on the slide. I don't remember, Kim, how many bits are there for the index? Is it eight bits? So there's at least eight bits for the program indices. It may be more bits than that. So both the SNEA terminology and the NVMe terminology deal with activating programs or functions. And the activation has to do with the fact that you may have a number of functions or programs available, but in order to actually utilize them, you need to activate them, and a given implementation may not be
Starting point is 00:16:05 capable of activating as many as reside on that device. In SNEA, we have function data memory. The term for that in NVMe is subsystem local memory. Part of the differences there in terms of naming is the fact that in NVMe, as Kim talked about earlier today, as we started defining this as something that was used for computation, there were others within NVMe who said, hold on, we may want to use this for something else.
Starting point is 00:16:42 So we took a step back and actually defined it in a way that can be used for other things within NVMe. Within SNEA, we have allocated function data memory. So this is function data memory that is allocated for a specific instance of a computational storage function to utilize. Within NVMe we have memory range sets which are specific ranges of memory within the subsystem local memory that are allocated for a particular program or maybe multiple programs to utilize in an area where they're not touching memory that they're not supposed to be accessing. In SNEA, we call it device storage. In NVMe, we call it NVM namespaces. So looking at this graphically, putting the two slides side by side, first off, you have the computational storage engine.
Starting point is 00:17:45 Well, that's equivalent to the compute namespace. One-to-one correlation, really, between those. The computational storage functions, these are the programs that are in the compute namespaces in NVMe, and they can be downloaded or they can be device-defined. The ones that are shown on the left here within the computational storage engine and computational storage engine environment,
Starting point is 00:18:23 there's an implication that those are actually the preloaded ones. The ones in the resource repository, those could be preloaded or those could be downloaded. Then you have function data memory. As I said, that is equivalent to your subsystem local memory. And within that, you have your allocated function memory, and that is equivalent to the memory range set. And then finally, you have the device storage in SNEA that is equivalent to the NVM namespaces.
Starting point is 00:19:10 So getting into what are the differences between SNEA and NVMe, I think that's probably the more interesting things to look at. So differences between SNEA and NVMe. SNEA defines the computational storage engine environment. So that's Linux kernel, that's a Docker packet processing, whatever the environment is that your programs are going to run under. NVMe, the CSE is a logical entity with no specific definitions. One of the things that NVMe has done is they are defining, they're allowing for a definition of program types, and program types will be able to run on different types of computational storage engines. So they've kind of rolled the engine and the environment together to indicate the equivalent of a computational storage engine environment.
Starting point is 00:20:15 So also I pointed out a little bit earlier that a CSF can directly access AFDM or storage in the architecture model. So that is not precluded by the architectural model. For the first pass at TP4091, programs are only allowed to access memory range sets. That's the only way they access memory. They don't access memory namespaces even. They have to access a memory range set.
Starting point is 00:20:49 This is partly because there's a lot of security issues if you allow access to something beyond a fixed set of memory range sets that a program can access. So trying to avoid some of those security holes, we have started out with only allowing access to memory range sets. There has been discussion about the ability to have programs directly access storage, and that is something that we may very well do in the future. The SNEA model, it supports an indirect model.
Starting point is 00:21:37 So if you were here for the discussion that Scott and Jason had at the beginning at 1 o'clock, they talked about two different models, a direct model and an indirect model. So the indirect model basically says that you can have specific storage locations that are associated with a particular computational storage function. So if you read that location or you write that location, that function is automatically performed based on the fact that you're doing that location or you write that location, that function is automatically performed based on the fact that you're doing that read or that write of that location. How the indirect is actually specified is an abstraction and is not specifically defined in the architecture model, just that that is allowed. On the other hand, NVMe only has a specific execute command.
Starting point is 00:22:36 That is equivalent to the direct access model in SNEA, where basically you've got your data where you want it, and you tell it, go execute this computational storage function in SNEA or this program in NVMe. So with that, I do want you to join the SNEA and NVMe in the standardization effort because
Starting point is 00:22:58 we could use your help in all of these areas. To summarize, SNEA, what I said at the beginning, the SNEA is a general architectural model for computational storage. It is not specific to a given protocol. It is for any protocol that chooses to build computational storage. And that can even extend to computational memory. So you will hear more and more, I think, as the time goes by, more discussion about the fact that computational storage
Starting point is 00:23:38 probably really is computational memory as well, memory being a media for storage. NVMe is a specific I-O command set for computational programs, and it is, as the name implies, specific to NVMe. SNIA provides flexibility for a variety of protocols, while NVMe is specific for the NVMe protocol. The API document in SNEA specifies an application programming interface that is a set of interfaces that are abstracted from the protocol layer. So if you are allocating memory in that API, that, depending on the library that you have installed below it,
Starting point is 00:24:37 could operate over CXL. It could operate over NVMe. It could even operate over SCSI. It could operate over any lower layer protocol. So it's an abstraction to get you away from the protocol layer so you don't have to think about what are the protocol commands that I need to do in order to utilize computational storage, but an abstraction so that my application can use it no matter what I put below it. For NVMe, we will be developing a library that maps the API calls to the NVMe-specific
Starting point is 00:25:16 protocol commands. And that is something that there are some prototypes of that that have been developed. As we move forward, we will expect to see more standardized APIs. Something that, again, was mentioned in the NVMe presentation is the fact that we are looking at what we need to do in the software community to enable the NVMe operation. With that, I'd like to open it up to questions. I tried to go through this a little bit quickly because you've had the access to this material previously for the SNEA side, the NVMe side.
Starting point is 00:26:02 I'm not trying to be terribly redundant with that information. Yes? Could you talk a little bit about the relationship between the two groups? Because there's a degree of parallelism running, and there's a discussion that results in some decision in one of the groups being reflected in the other aspect as well, so they're was, could I talk a little bit about the relationship between the NVME group and the SNEA group and the fact that as there's changes in one of those that they need to be reflected in the other so that the two of them can remain somewhat in lockstep. There is a formal liaison agreement between SNEA and NVME to share documents.
Starting point is 00:26:55 The mechanisms for sharing it are a little bit different going each direction, but that liaison agreement is in place. In addition to that, you have a large crossover of membership between the two. As you saw from my brief bio at the beginning, I am editor in SNEA. I'm co-chair in NVME. I have a very strong desire to see the two of them mesh together. Kim Malone, who's co-chair in NVME, is also very active in the SNEA side. She's very great at reviewing every time I put out a document and giving me
Starting point is 00:27:33 lots of comments on how it's broken. Appreciate that, Kim. And so we've got a large crossover as well as having a formal liaison agreement. Yes? So, will there be a Shume library or something that allows you to address NVMe devices using the SNIA API? How will that work? How do those two come into existence or coexist? Certainly, you don't write the same code. So the goal of this is you write your application to talk to the API. The API is basically a set of here are the program calls. Then there is, yes, a library that goes and takes those calls
Starting point is 00:28:28 and translates them to the NVMe commands, as was shown in the previous presentation on the API. That library is most likely in user space, although it could migrate to kernel space. And then at the bottom end of that, it talks to your NVMe driver. And one of the questions that was raised in the API discussion was, well, can it interface to SPDK? Yes, you could use SPDK as your driver-layable interface. You could use the Linux NVMe driver. You could use a Windows NVMe driver, whatever driver. So you've got multiple layers in this stack. You've got your application. It has an interface, which is a set of calls. You then have your library. It has an interface, which is your protocol interface to your driver. You then have your driver.
Starting point is 00:29:25 It generates the code that actually talks directly to your NVMe device. So then one follow-up question. A person trying to use NVMe computational storage, if they want to use the SNEAPI, that's a parallel side-by-side thing. Despite the fact that there's a good correlation between the two models, there's no commonality in implementation. So the question was, so a developer that's developing an application, the SNEA API and the NVMe computational storage are parallel efforts. And my answer to that is they're not really parallel efforts. So
Starting point is 00:30:18 the developers, so as a vendor develops a particular NVMe device, they would be writing a library that would take advantage of the SNEA API. So you write your application at the API level. You allow the vendor who's creating the library, or hopefully in the future a common library that applies to all vendors NVMe devices, you put that library in there and it does the conversion for you and you don't even have to think about what the NVMe commands are. Then you see eventually the NVMe specific API.
Starting point is 00:31:23 Okay, yeah. So yeah, that I think is the difficulty was there is not a specific NVMe API. Hi. I guess I said this at FMS, but this was to me like a framework for a data flow multiprocessor. Have you had people talking about building some long big thing they want to do and stage it or decompose it into a number of stages and the data flows through? Or are you seeing very limited, I'm going to invoke only this one function in the process
Starting point is 00:31:55 and get it and then it's done? No, I personally see this as very much being applicable to a data flow where what you have done is you have perhaps defined multiple memory range sets. The memory range sets may overlap in part. So memory range set one has this part and that part in it. Memory range set two has this overlapping part and some other part, and so on down the chain. You invoke program one against memory range set one. Program two now operates on the output of that, becomes the input to program two, which then creates the output to program two, and on down however far you need to go in that sort of a data flow architecture.
Starting point is 00:32:46 Is there a mechanism when this node finishes filling in this part of the range set, it can signal the next port that it's ready to put, and it can begin to fill in the data out? Okay, so the question is, is there a process to allow notifying the host, in essence, that one part of the memory range set is filled, allowing the next program to operate on it. Is that what the...
Starting point is 00:33:13 Well, it's got... So you're saying it has to go back to the host to see this one finished, and then it can start the next one. I was thinking, you know, direct from here to here, a notification to go. Okay, so the question is, is there... No requirement for that.
Starting point is 00:33:28 Right, so is there the thought or consideration of a peer-to-peer of one computational program signaling another computational program that it is ready to operate? I know that within the API, we have specified a batch mode that allows that type of batching, but that's actually up at the host level. I think that that is something
Starting point is 00:33:56 that eventually could indeed, you know, at the host API level, that can be done within the library. If NVMe, however, starts to define that batch process where you can actually pass that entire batch down to NVMe, then that piece of the library becomes moot and you simply have passed the batch all the way to NVMe and you tell it to execute this batch process.
Starting point is 00:34:31 Other questions? Speaking of alignment between SNEA and NVMe, has anyone asked why NVMe changed the name to competition programs? It seems a little redundant. A lot of history there. I think the biggest thing was that some of the people that were doing the NVMe work, which actually got started before the model was finished in SNEA, found that function in their mind meant something different. So they did not want to call it a function. And SNEA, on the other hand, felt the program
Starting point is 00:35:16 had some implications and felt that function was a better term for it. And the two could not come to agreement on that, and we agreed to disagree and allow for a mapping. The functionality between the two is the same. Other questions? Okay, Thank you. Hope you have enjoyed the day worth of discussion about, or the afternoon worth of discussion on the standardization of computational storage. Thanks for listening.
Starting point is 00:36:01 If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.