Storage Developer Conference - #171: Computational Storage Moving Forward with an Architecture and API

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to SDC Podcast, episode number 171. Hello, welcome to Storage Developer Conference 2021. I am Bill Martin from Samsung. I am co-chair of the SNEA Technical Council,

Starting point is 00:00:51 and I am the editor of the two standards being developed by the SNEA Computational Storage Technical Work Group. Today, I would like to share with you some information called Moving Forward with an Architecture API for the Computational Storage. This presentation will focus predominantly on the architecture, but we'll talk a little bit about how that architecture interleaves with the API. So as we go through today, what I'd like to do is I'd like to start as an overview of the SNIA computational storage standards. Once you've got an idea of what the standards are that we're developing, we're going to talk about the CS device architecture, dig into that architecture, and talk in depth about that. We'll then talk about computational storage discovery, then computational storage configuration, and end with three different examples of computational storage execution. So in terms of the standards, the computational storage technical work group

Starting point is 00:02:02 in SNEA is developing two standards. The first is the SNEA Computational Storage Architecture and Programming Model. We're currently on version 0.8, revision 0 of that. The second is the Computational Storage API version 0.5, revision 1. Both of these documents are available to the public at SNEA.org slash public review. So these are out for public review. We're interested in input from you on those. Other presentations on SNEA computational storage at SDC are first a presentation by the co-chairs of the Computational Storage Technical Work Group, Scott Shadley and Jason Mulgaard. They're presenting a presentation called Computational

Starting point is 00:02:52 Storage Update from the Working Group. If you're unfamiliar with the work that is being done by the Working Group, I would actually recommend that you stop watching this presentation and jump over, take a look at their presentation to familiarize yourself with what the working group is doing, some of the terminology that's being used, and where the computational storage technical work group is going. And then come back and join this presentation to get an in-depth dive into the architecture model. Following watching this presentation, I'd recommend that you go look at Oscar Pinto's presentation on computational storage APIs. That will give you an in-depth look at what we're doing

Starting point is 00:03:40 to provide APIs in order to implement the programming model. So in the programming model, there are three computational storage models. The first of these is the simplest. It is simply a computational storage processor. You will notice that this has device memory, but there's actually no storage on the computational storage processor. The processor has to be associated with storage that is on whatever bus the computational storage processor is connected to. The second of these models is very similar to that.

Starting point is 00:04:22 However, this model is called a computational storage drive. It adds storage, device storage, and a storage controller. The third of these models is the most complex. It's a computational storage array. In addition to having device storage, it also has an array controller that provides transparent storage access and proxied storage access. It is made up of everything that is in the computational storage processor and two or more storage devices or computational storage drives. Now, in combination, these three different device types are called computational storage devices, represented by CSX. Throughout this presentation, I'm going to focus on the computational storage drive, as it has all of the components necessary to do a deep dive without the complexity of the array, but also without the complexity of having to talk about storage that is not actually on the

Starting point is 00:05:34 device for a computational storage processor. So I will throw around the terms CSD, CSX, and some other terms that I will define for you as we get into the presentation. So where I'd like to start is by talking about the interrelationship of elements of the architecture. So within the architecture, the first thing that we'll talk about is the fact that we have defined computational storage resources. That is this teal box here, and it includes all of the components necessary to actually have computational storage. The rest of this slide and the next slide will actually dive into each of the components that are part of the computational storage resources. So first, within this box, we have the computational storage engine.

Starting point is 00:06:31 This is the basic component of computational storage. It is, for example, a CPU, or it is an FPGA, or possibly some other form of computational storage engine. It is the hardware that actually performs the computation. However, within that, we have a computational storage engine environment. And while this is shown within the module, it may actually be using the module, but not actually inside of it. But an example of this would be a Linux operating environment. That Linux operating environment makes use of a CPU that would be your computational storage engine. So a computational storage engine environment must be activated on a computational storage engine. And in order to actually perform any

Starting point is 00:07:34 computation whatsoever, one or more computational storage elements has to be activated on the computational storage engine. You'll notice from this diagram, there may be multiple computational storage engines. Within each computational storage engine, you may have one or more computational storage engine environments, but you still don't have enough to actually perform computation. The last piece that you need to actually perform the computation is you need a computational storage function. This is the actual item that is what you're going to do in terms of compute. This could be something like compression, encryption, transcoding, any of those would be computational storage functions. So the computational storage function is activated within a particular computational storage engine environment. And again, you have to have at least one computational storage function if you're actually going to perform any computation.

Starting point is 00:08:48 So now we've talked about the right-hand side of this computational storage resources, and I'd like to move over to the left-hand side. On the left-hand side up here at the top, we have the resource repository. The resource repository may be stored either in device memory or in device storage. So while it's showing up here as part of the resources, its actual location is somewhere in some form of storage on the device, whether it's device memory or device storage. What does it contain? Okay, it contains computational storage engine environments that may be activated. So these may be activated on a particular computational storage engine. It also has computational storage functions that may be activated on a CSEE within the CSE.

Starting point is 00:09:57 Secondly, over here, we have functional data memory. This is the area that the computation storage actually uses to perform computation. This may be used to store the source data for the computation, scratchpad data for the computation, or the results of the computation, or all three. Within the function data memory, there's a block here that's called allocated function data memory and for a particular execution of computational storage you would be operating within a particular allocated function data memory. Our next couple of slides will dive a little bit more in depth into the computational storage environment. So the computational storage engine environment may be hard-coded in a CSE. And what I mean by that is when you get a computational storage device from a manufacturer, they may have a computational storage engine, and within that computational storage engine, and within that computational storage engine,

Starting point is 00:11:07 they have a computational storage engine environment already within it. It may also be in the resource repository. If it is in the resource repository and you activate it in your computational storage engine, a copy of it still remains in the resource repository, but now you have done everything necessary to actually perform or utilize the computational storage engine environment. The computational storage engine environment may actually be virtual in that there isn't a specific environment, but as an element, it still needs to be there logically in order to put the pieces together. The case where this may be virtual is when you have an FPGA. You actually don't have different environments that exist for that FPGA. You have a single environment, and it's always there.

Starting point is 00:12:10 The next thing we have is the computational storage function. The computational storage function, just like the computational storage engine environment, may be hard-coded in the CSEE. It may also be a separate entity within the resource repository. Now, when I talk about it being hard-coded in the CSEE, you could get a device in which the computational storage engine, the computational storage engine environment, and the CSF are all present when you get the computational storage device directly out of the box from the manufacturer. The other way that it may be hard-coded in a CSEE is the computational storage engine environment within the resource repository may actually have one or more computational storage functions built into it. For example, if your computational storage engine is a CPU and your CSEE is a Linux operating environment, it may actually be a Linux

Starting point is 00:13:30 operating environment that already includes computational storage functions like encryption, transcoding, etc. So that is how that may be. The CSF may be in the CSEE here in the resource repository, or it may be in a CSEE that was shipped with the product in the computation storage engine. So now that we understand what the elements are, what's the process for discovering those elements? So the first thing that you have to do is you have to discover the computation storage devices that are available in the system. Now, discovery of devices within a system is outside the scope of the architecture model and is actually within the scope of the protocol layer that you're using.

Starting point is 00:14:34 For example, within the NVMe protocol layer, there are mechanisms to discover what devices are available to the host on the NVMe bus. Once you've discovered the CSXs, however, that discovery then drops into what is defined within the architectural model. So when you have discovered the CSXs, there are a number of things that you will want to discover about each CSX. So first off, for each CSX, you need to discover what CSEs, computational storage engines, are available within that CSX. Within the CSE, whether it is when you first start up or after things have been configured, you want to discover what CSEEs are activated currently on that CSE. And when you've discovered activated CSEEs, for each of those, you want to discover which activated CSFs are available on that CSEE. Now, in parallel with that, or in serial with that, for each CSX, you want to discover the resource repository.

Starting point is 00:16:07 And within the resource repository, you want to discover the CSEEs that are in the resource repository, part of that discovery is to discover information about those CSEEs. And that information may include the fact that that CSEE actually contains an embedded or hard-coded CSF. Now, the last thing that you need to discover is you need to discover the function data memory. And you need to discover the characteristics of that in order to be able to allocate function data memory for a particular computation. Now, after you've discovered all of this

Starting point is 00:17:04 and you're running, and you now want to run a particular CSF, how do you actually configure your device to run that particular computational storage function? Well, you go through a similar process. So you go into a CSF discovery and configuration process. Now, the first question that you have to ask is says it isn't already activated on a CSEE. In that case, the next question is, do I have a CSEE activated on a CSE that is actually capable of running the particular CSF that I want to run? If yes, again, I'd take this path to the right, but let's continue going down and say, no, I don't have a CSE activated on my CSE. Then my next question is, is the CSEE in the resource repository? If yes, I come on down, but let's go over here to the left. If it's not already in the resource repository, then I'm going to download that CSEE. Once the CSEE is in the resource repository, the next step that I have is I need

Starting point is 00:18:50 to activate the CSEE on a particular computational storage engine and I need to ask, is my CSF in the resource repository? If not, I need to download the CSF. Once CSEE. Now that my CSF is activated in a CSEE, now I ask the question, is there configuration required? If there is, then I need to configure the CSF is ready for use. At this point in time, I can now execute a program on my computational storage device using this particular function that has been downloaded. So the first example that I want to go through is called performing computation directly. Now, assumptions on this is first, data on which the computation is to be performed is placed into the function data memory, in particular into a piece of allocated function data memory, prior to the request of the CSE. That download is done through some process that is not shown in this figure.

Starting point is 00:20:33 It may be a command through the storage controller that tells the storage controller to load information into the AFDM. It may be a command to the storage controller to tell the storage controller to take data from host memory and put it into AFDM. So there's a variety of ways, but this assumes that the data is already there. Secondly, result data, if any, is returned to the host through some process that is also not shown in this figure. So what happens in this figure? So within this figure, the computational storage driver, the host, sends a command to the computational storage controller to invoke a particular computational storage function. Within that request to invoke the function is going to be information about where in the allocated function data memory the data is that is being operated on. At that point in time, the computational storage engine, using the computational storage engine environment and using the specified computational storage function, will utilize the data that is in the allocated function data memory. It will perform a computation on that data

Starting point is 00:22:09 and it will place the results, if any, back into the allocated function data memory for that specific invocation of the computational storage function. Once that's done, the computational storage engine will return a response to the host. This response is not the data going back, but just a response indicating completion of the requested computational storage function. So the next one is performing computation directly on data that is in device storage now the assumption is that this example is for a computation on data that is in the device storage

Starting point is 00:22:57 already so in this particular scenario the host again sends a request down to the computational storage engine to invoke a computation storage function. The command is going to have to specify the device storage location of the data. So in this particular case, the computation storage engine moves data from the device storage into the allocated function data memory that has been allocated for this particular instance, the computation storage engine performs the requested computation on the data that is in the function data memory, and it places the a response to the host indicating that the computation is complete. At that point in time, the host still has to move the results from the function data memory either back up into host memory if required, or back into device storage. Now, this example could be more complex, in which case the command would not only specify the location of the data to perform the computation on, but also the location where the data was to be put after the computation was done, indicating moving the data either from the allocated function data memory back into the host or moving it back into device storage, depending on what it was that you were doing. So an example here of what you might do with this is if you had a license plate scan algorithm that was implemented as a particular computational storage function, you could request to pull a chunk of data from device storage that had a variety of images, each image containing

Starting point is 00:25:28 license plates, and move all of those images into allocated function data memory, perform computation on all of those images. And when you find the image that matches the license plate that you're looking for, then take the additional metadata associated with that image and put it in function data memory, but then actually move that metadata back up to the host to indicate the record that has the particular license plate that you are scanning for. So that's an example of how this entire process could be used. Now the third example that I want to talk about is a little bit different. This is called indirect computation. This example is for a device-to-host operation.

Starting point is 00:26:37 Now, in this particular example, the host sends a storage request to a storage controller where that storage request is associated with the target CSF, and the storage controller determines what CSF is associated with the storage request. So it's up to the storage controller to determine that. Now, how is that done? Well, there's a couple of ways that it could be done. Actually, probably a number, but I'll give you two examples. The first example is an example that for any write data command, that command is associated with a particular CSF. For example, it could be an encryption. So on write, you would want to do encryption. On read, you would want to do decryption. It could also be that it's associated based on the addressing. So if your device storage happens to be block addressable device storage, then a specific group of LBAs,

Starting point is 00:27:48 if those are passed down in your operation requested to the storage controller, the storage controller would know, oh, these LBAs get a particular computation storage function applied to them. So what happens here? So the host sends the storage request to the storage controller. The storage controller moves data from device storage into the allocated function data memory. Okay. From there, the storage controller next sends an instruction to the CSE to perform the indicated computation on the data that is in the allocated function data memory. The CSE then performs that operation on the data and it places the results back into the allocated function data memory. And finally, the storage controller returns

Starting point is 00:28:59 the computation results, if any, from the function data memory back to the host. So this has given you an overview of the architecture, how you discover elements within the architecture, how you actually configure a computational storage function to be usable, and three examples of how you would utilize the computational storage. So how do we use all of this? Well, there is API support being defined within the technical work group. We have APIs that are defined for discovery. We have APIs that are defined for configuration. We have APIs that are defined for memory allocation. We have APIs defined for execution.

Starting point is 00:30:08 There are also some APIs being developed for more complex algorithms, more complex operations of computational storage. However, for complete details, as there's not enough time within this presentation to go into the depth of computational storage APIs that I'm sure you're interested in, I recommend that you go look at the presentation by Oscar Pinto that is called Computational Storage APIs. He will dig in-depth into the computational storage APIs that are currently being defined and a lot of the work that is currently going on within the technical work group. The interesting pieces of this are that this is working together between the architecture document and the API document to make certain that the two of those stay in sync. Other computational storage

Starting point is 00:31:15 presentations that you would be interested in, the first of those that I'd really point you to is a presentation by Kim Malone and Stephen Bates, who are going to be talking about the work going on in NVMe, defining NVMe computational storage. It will talk about the technical proposal that's being developed in NVMe, and how that layers on top of or utilizes the architecture that's being defined in SNEA. The two of them are being done by a lot of the same people. However, they're defining different layers. The architecture is a high-level view of computational storage, while the work going on in NVMe is a very specific implementation across the NVMe protocol, and it's the first of the protocols that's actually being defined in a standards way to make use of computational storage. The other thing is today, Wednesday, September 29th, if you're watching this on the day that it's released as part of SDC, there is a computational storage birds of a feather going on from 4 to 5 p.m.

Starting point is 00:32:37 It's called Comput by the co-chairs of the Computational Storage Technical Workgroup, and we'll give you an opportunity to hear some brief overviews about computational storage, but we'll also give you the opportunity to ask questions and talk with the experts that are developing all of this. There are numerous other computational storage talks within the computational storage track, so I would encourage you to look for all of those. So thank you for watching this presentation. I hope that you've gotten something useful out of it. I hope you're enjoying the presentations that are available through Storage Developer Conference 2021. Please take a moment to rate this session. Your feedback is important to us, to both the speakers, myself and others, as well as to SNEA and the group that has developed the Storage Developer Conference. So thank you. Hope you've enjoyed it.

Starting point is 00:33:52 Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.

Storage Developer Conference - #171: Computational Storage Moving Forward with an Architecture and API

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.