Storage Developer Conference - #120: What Happens when Compute Meets Storage?

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to SDC Podcast Episode 120.

Starting point is 00:00:40 Welcome to the final session for Wednesday where this is the opportunity for us to introduce Welcome to the final session for Wednesday. This is the opportunity for us to introduce the concept, for those that aren't as familiar with it as some of us, with Insania to the theory of computational storage. So my name is Scott Shadley. I'm one of the co-chairs of the working group on computational storage with Insania. And this is?

Starting point is 00:01:05 Nick Adams, also one of the co-chairs for the computational storage twig. So we're going to take some time and kind of the goal of this particular session is to introduce what we've been doing from a SNEA perspective, how we got here. There's a birds of a feather session later tonight after dinner across the hall where we can create a more interactive session because this is going to be more of a presentation style with some questions. And then throughout the course of the day tomorrow, there's an, as you've probably already seen on on the schedule a track full of various different member companies and others presenting solutions in this space so we thought this was a good way from a organizational point of view give you the chance to get a peek into what SNEA is doing with computational storage so when you see tomorrow's presentations they kind of make sense so thank you sir so what we'll do today is we're going to talk about where this came from how we got this particular working group formed, get into some of the aspects of what we've already been working on as the working group.

Starting point is 00:01:50 It's fairly new as far as an overall organizational effort for what we've been up to. And then we'll talk into some more architectural detailed and discussion points and kind of highlight a couple of member company solutions for this as well at a very high level just to give you some feel for the types of things that we're going to accomplish with this work. So the concepts for computational storage have been around for some time. There's lots of different articles,

Starting point is 00:02:14 lots of different terms, things like near-data processing. But one of the biggest things that's forced, or is like a forcing function, if you will, for computational storage is the fact that our data sets are getting so large that there's many people in the industry that use the term data gravity.

Starting point is 00:02:30 The fact that we generate a lot, we store a lot, but we really don't care about 90% of what we store. And being able to do something more efficient with getting out what we really care about from those devices is kind of the genesis of what we're working towards with some aspects of computational storage. So it's been around for some time. We have our traditional, sorry, good timing. We have our traditional storage architecture where everything comes in, goes through the CPU,

Starting point is 00:02:55 you've got storage and you've got memory and we've had these three building blocks. And we've done a lot of different things with these building blocks. We've hyper-converged them, we've disaggregated them, we've composed them. But the idea of actually marrying two of them in a much closer way started with things like in-memory processing and things like persistent memory, those types of architectures. Now we're taking it to kind of another level where we're putting real compute next to an SSD product. And this isn't the first time this has been done. Some examples go all the way back to 2012, 2013, where companies tried to do aspects of this.

Starting point is 00:03:29 And they were a little early. They were doing them in a different fashion. And now it's time, the market's kind of matured to a point where it makes sense to do it. And we got together as a group of people and said, this isn't going to work if we all go off on 12 different tangents and try to make things work. So if you go to the next one.

Starting point is 00:03:44 Wrong way. Oh, wrong go to the next one. Wrong way. Oh, wrong way. There we go. There are, as you can see, things like active disks, near data, in storage, in situ, near storage. And we have concepts of accelerators. And people have asked a lot of questions just this week as they've walked around the room. Are you accelerating things? Are you pushing things down into the SSD?

Starting point is 00:04:02 What are you doing with this? So even though we came up with a term that everybody agreed with, which was computational storage, it's now trying to educate people. So one of the core focuses is taking this and saying, we've got all these techniques, all these technologies, they're all very real, and you're going to have products from multiple companies doing multiple things. But at the end of the day, when you plug a device in, you as consumers and developers have to understand what that product can do, and it needs to present itself to you in a common fashion. And that's kind of the genesis of what we wanted to start with

Starting point is 00:04:31 from a perspective of computational storage. So people have asked, where does it fit into the ecosystem of the pyramid? So I've taken the liberties as a co-chair to redraw this pyramid without permission. So you can see how we put NVDIMS next to the DRAM footprint within the architectures. That's where the persistent memory, things like that exist. Our computational storage sits alongside the existing storage architectures,

Starting point is 00:04:55 and we'll show you some examples of an architectural model that we're working on, as Nick will go through that later. But you can see the idea is we're not trying to usurp, we're not trying to replace existing technologies, we're trying to augment. And that's kind of the key thing is people are sometimes look at these new innovative things like, oh, I have to go do something completely different. And our goal is to not make you do something completely different, but to understand how you can do things more efficiently or in a slightly different way that can make your total ecosystem more effective. And so that's what as part of what we're doing

Starting point is 00:05:24 with this different thing is talking about it's a different facet of advancing it. It's not replacing in memory. It's going to work within memory. It's going to work with standard storage. It's going to work in ecosystems where you have heavy analytics. You're going to work in systems

Starting point is 00:05:36 where you have heavy real-time writes. It's just a matter of how the solutions are developed and then how you can implement them in a common fashion. So the progression of the twigs. So now what? So if you go ahead to the next one, this is an example of the participating companies that have already signed up within SNEA to support this. Now, just to give some frame of reference of where we were and making sure, yeah, okay. We were in this exact room one year ago today for our first birds of a feather session where we didn't even

Starting point is 00:06:05 have a working group. We were just a bunch of people saying, we think this is a cool idea, which fell on the heels of the initial meeting from Flash Memory Summit last August, August of 2018. So it's been less than a year that we've put this working group together. We've started developing the techniques around it. And you can see we've got quite a few people that have jumped on board to help us out. Now, one of the key things here is we will note that there is a lot of vendors. There's a lot of NAND vendors, SSD vendors, system vendors. We've got a few code level guys in the way of VMware, Oracle, and others. We're definitely interested in making sure that we get the inputs from the developers that are going to use this technology. Because we,

Starting point is 00:06:42 as we all know, I can come up with a really cool way to do things and try to convince you to use it it's a lot easier for me to work with you to solve that problem together so we've got a lot of work that's been started we meet weekly now with this particular group we've got contributions from all parts of the ecosystem we're continuing to try to expand that so that we can ensure that we get this product technology actually driven where it needs to end up. And so from that point of view, it's a matter of focus. And so from a focus perspective, we sat down and spent about six months working on the definitions for computational storage, computational storage products or devices, and the services

Starting point is 00:07:20 or computational storage services that they can provide. And we'll go through that as part of this presentation, because that's really key to understand why and how this fits into the ecosystem that we're working with. And so once we get those definitions done, then we have to start thinking, okay, what can we do today? What can we work on? What should we work on? And what are we not going to touch? Because there are certainly things that we can do as an ecosystem that we could continue to delve into and dig into the weeds and never come out with a spec or a direction because we're spending so much time trying to solve every problem so we've picked some problems we've said some things we're not going to look at and at the end of the day we're going to end up working with

Starting point is 00:07:56 other industry standard bodies groups whether it be nvme pcie sig you name we've even had the term scuzzy thrown out there all that kind of stuff t10 all those have come up as partnership efforts to drive this technology forward because we're sitting today kind of one level above that we're at a at a higher level from that perspective and you'll see from some of the architectural drawings we're not trying to specifically iterate or say you have to build it this way you have the constructs to build it this way and that's kind of what this particular focus and effort is around. And it's trying to make sure that when we go to do these types of things, we don't end up doing the Wild West.

Starting point is 00:08:31 So if you think back just a few years ago, you had PCIe drives come to market. Everyone had a custom driver. So if I implemented one version of it, I had a hard time implementing the next version because I had to keep loading drivers, playing with them, doing all that kind of stuff. We're starting to see that happen in this particular ecosystem.

Starting point is 00:08:46 We decided to jump ahead of that and say, no, we're not going to do that. We know that there's different needs, different products, different solutions. Let's find a common way to talk to those products in this particular working group. Is that where I hand it off to you? So I'm going to mute myself and let him take over. Somehow the clip didn't want to come on.

Starting point is 00:09:19 It's working? There we go. So we will get going. So what I'm going to talk to you guys about today a little bit is just kind of like what we're doing within the technical working group, what some of the architectural constructs look like, what some of those definitions are that we have developed, and kind of where we're going to talk about was kind of the charter of the TWIG. You know, first we're going to prioritize industry-level requirements. So this is getting together a bunch of different industry participants, all those folks from the ecosystem that you saw on the last slide, and understanding what are the requirements that we have for computational storage. The second one, we want to develop standards, interfaces, and protocols.

Starting point is 00:10:13 Here, as Scott alluded to, we're really looking to make sure that people are speaking the same language, that the different entities that we're talking about are able to be well-defined, and how those entities interact with each other, those are key things. Inside of Cineo, we're not getting down to the level of protocols. We're not getting down to APIs and specifics at this point in time. We're really trying to make sure that we have a good way of understanding how these different pieces are talking to each other and the expectations of both the host,

Starting point is 00:10:47 the device, some of the other pieces that we'll talk to in a minute. Aligning the industry. Again, as Scott alluded to, right now there's starting to be different solutions that are out in the industry, things that are custom, and we're trying to take as early as possible, kind of look at that and make sure that we're pulling those things together. There's a lot of consistency in the actions that are being done by these computational storage devices, and we want to make sure that we're actually designing interfaces around that so that we don't end up with a fractured industry over the course of time facilitating and driving software development so another piece that's really critical for the

Starting point is 00:11:31 twig is not just that we're making devices but that we're able to actually build out an ecosystem where application and like storage applications are able to actually make use of the infrastructure that we've done. They don't have to go out and create one thing for this driver and another API for that driver. Because, again, this infrastructure is at a higher level than, say, NVMe or, say, some kind of memory interface. There could be multiple different kind of back-end solutions and we want to make sure that the interfaces are able to be consistent at that higher level. And then the last one here that's a focus for the TWIG is education. That's part of what we're doing today, right? We're talking about what are we trying to accomplish as a group?

Starting point is 00:12:22 What are we trying, how are we communicating about these things? We want people to be able to speak the same language across devices and OSs and applications. We want to make sure that people from all different areas are able to speak the same way. So at this point, we've actually had multiple face-to-face sessions around this. We get together on a quarterly basis uh for senea basically we've had multiple face-to-faces uh on a quarterly basis we get together we work through different areas of the specification through these definitions and we've got a couple of areas of focus um how we manage the devices is the first place that we've... Ah, now we've got

Starting point is 00:13:07 multiples. So we've got focus in the management area. We have focus in the area of security. And then also on how these devices operate. So we want to be able to make sure that in each of the various areas of how you use the device, that we've got details enough to understand what those entities are responsible for, but not to the level that defines how it would work against a particular protocol. So let's kind of move forward from there. So what is computational storage? You know, we keep talking about making sure that folks are able to speak the same language. It's an architecture that provides computational storage services that's coupled to storage offloading host processing and or reducing data movement. So now it's like, okay, well, what does computational storage services mean? And what does it mean to,

Starting point is 00:14:01 you know, kind of create an architecture to do that? We'll get into the definition of computational storage services, but you could think of these things like encryption or dedupe or other kind of storage-centric usages. But then also you have things that maybe require something that requires like a structured data, like video compression or something like transcoding or database acceleration. These are also computational services. In a little bit further, you might think of things that are even a little bit different than that,

Starting point is 00:14:36 like loading an FPGA bitstream or being able to load a container onto a computational storage device. So there's multiple different types of things that could be considered computational storage services, and we're looking at defining those things, kind of bucketizing them in a way that separates how those things would be used. The two kind of foundational constructs that we're talking about, I just talked to the computational storage services. And then we also have computational storage devices. And we'll talk to, in the next slide, a little bit of detail about what those look like. So here, what we have, we don't have any vendor names on this, but these are devices that are in market today. Okay, this is how things

Starting point is 00:15:21 are kind of put together. When we look over here, we've got this thing sticking out. We've got basically an FPGA that is attached to a bridge with multiple SSD devices behind it. This is all integrated into a single device. We have basically another one where we have an FPGA that's directly connected to an SSD. This is something else where we've got an FPGA in RAM, but it actually doesn't have a storage device integrated into it. In this case, you have something where PCIe is used to put the computational storage device in parallel with the SSD. And then over here, what we have is instead of having an FPGA that's doing acceleration, we have a CPU. And so it's very similar, but the capabilities of a CPU for acceleration are very different than the capabilities for FPGA. And so when you've got

Starting point is 00:16:22 all this variety of types of devices that are available in the ecosystem, how do you make it so that these protocols up here, or these different interfaces, are able to be consistent? This is the challenge that the computational storage twig is really looking to solve. So let's kind of move forward here. So with this, we're going to talk to each of these different kind of computational storage devices. Let's move forward to the next one. What we bring up here is a computational storage drive. Okay? So a computational storage drive is defined in the computational storage twig.

Starting point is 00:17:01 It's really a device where we have both a computational storage processor, some kind of FPGA or CPU or what have you that's going to do the acceleration, integrated together with the storage itself, with the flash. And this is all one device, and you're speaking out a management and an IO interface. So let's move to the next, and we'll go forward one more time. Now we're going to talk about a computational storage processor. So another device that we've kind of defined here is just that computational storage processor. Now think of this device, it's still sitting in some kind of SSD form factor. It's sitting on the data bus, so PCIe in the case here most of the time, but it could be across the fabric. It's sitting on the data bus, so PCIe in the case here most of the time,

Starting point is 00:17:47 but it could be across a fabric. It's sitting on the data bus, but it does not have the actual flash integrated into it. It doesn't have the media integrated into it. It's just a processor with some memory. We'll move on to the next device. The next thing that we defined is actually a computational storage array. Now the difference with the computational storage array is that it has some kind of processing power, so computational storage processors, along with, you know, storage devices. This could be a typical

Starting point is 00:18:17 SSD or computational storage drives. So that integrated, you know, kind integrated FPGA and flash or CPU and flash device integrated together. Think of this, it's like a JBOF kind of thing, right? Except that the additional piece here is that there is some array control software, okay? So again, what this talks out the front end is that same kind of interface that will be used for computational storage drives or computational storage processors. or an array that has the processors and storage together in individual kind of connections, or that those are CSDs along with that kind of control software, we're all talking the same interface out the front end up to whatever that host is. So we can move forward. So going back to the concept of computational storage services,

Starting point is 00:19:22 what we're looking at here is two kind of main themes. One is a fixed computational storage service. So these would be things like I was talking about previously. So things like compression, encryption, erasure coding, that's one example. These are fixed activities that that computational storage service would provide, that computational storage device would provide.

Starting point is 00:19:45 Excuse me. Other examples of these, I pointed out before, again, video compression or database acceleration. But again, it's very specific functions. It's not like I can send generic code to run on the device. What I'm doing is I'm calling some kind of function that the device has the capability of. It's advertised that, and I'm using it. And I guess to kind of add to that a little bit just for some clarification,

Starting point is 00:20:11 these features are configurable within the service? Absolutely. But the device is delivered to support that service. That's really what we mean by definition. For example, in compression, there's many, many different kinds of compression algorithms. You may be able to configure different bit levels or different algorithms on a particular device, but that device is always going to be doing compression, for example. With programmable storage services, this is the other type.

Starting point is 00:20:42 Maybe there are different usage models that we'd be looking for. Things like loading an FPGA bit stream or potentially loading a container. Those types of things where you're actually pushing very different kinds of content down to the computational storage drive. One of the things that those programmable computational storage services may do

Starting point is 00:21:04 is you push down a particular FPGA bitstream, and then it starts looking like a fixed computational storage service because that bitstream that you loaded allowed you to be able to do something like compression. However, it could also be a different type of usage where you're pushing down a container and you're doing a very specific workload and maybe we don't provide a generic way to use that workload but what we have provided is a generic way of getting down that

Starting point is 00:21:31 service, that container down to the actual drive so that you could do something custom from there or vendor specific from there. Anything that you wanted to add on that? Yeah, no, that's kind of just to be that's how we're trying to help the industry understand you, no, that's kind of just to be, that's how we're trying to help the industry understand you can get something that does a feature

Starting point is 00:21:49 or you get something that can do more than one thing at a time based on that device. We're not specifying uniquely, but they are also nestable. And if you look at the block diagrams, if you look at the slides after the fact, you'll see that when we put it, if we go back here, we have where you can have an opportunity for the computational storage services

Starting point is 00:22:06 to nest within each other. So we are trying to think about the various different pathways that these particular products can be done, but we also want to make sure it's not too complex for someone to be able to say, I wanted to do this. Great. We can deliver that product. Here's how that works.

Starting point is 00:22:21 Yeah. I think one of the things that we're trying to do is not make the interface so complicated that it won't be used, that we're providing some very well-defined services as well as some generic services that are then able to either be well-defined or vendor-specific from there. So we can move forward now. So within the scope, what are the things that we're looking at? The first piece, when we talk about kind of management,

Starting point is 00:22:50 we're really talking in terms from the computational storage twig, discovery. How does a host know that I'm there, that I am a computational storage service? How does it know what capabilities I have and how I'm able to actually work, what I'm able to do? So we're working on kind of defining mechanisms, or not mechanisms so much, but defining what those things that the host needs to be aware of so that we can publish that. Configuration. So this goes back to Scott's point earlier, where we want to allow for a device that's providing that kind of fixed computational or any kind of computational storage service to be configured. That could be

Starting point is 00:23:31 different sizes, different algorithms, different configurations. Resource allocation is another piece. And then also monitoring. We need mechanisms to be able to know what's going on, how it's being used, being able to do debug, being able to do telemetry, those sorts of things. These are other areas that we're going to be making sure get defined in a consistent basis so that when things go out and are deployed, regardless of the transport they're on, that we have all of these key boxes checked, right? These are your key pieces. And if we move forward to the next one, security. So another thing that's important, when you think about this, we're actually looking at moving processing of data off of a host processor and onto another processor. That implies that there's a different computational engine that's

Starting point is 00:24:23 actually doing something with, you know with some kind of host data. To do that, we need to be able to make sure that we're authenticating the device that's down there, that it's coming from the manufacturer that we think it's coming from. We're looking at mechanisms for authorization just to make sure that we're saying this entity is actually allowed to do that work or not.

Starting point is 00:24:46 Encryption, make sure that we're able to encrypt the data that's down there. And then also auditing. The whole idea of being able to track what's being done, we need to be able to get that information back out to the host. And then operation. So this is basically how do I use this? What is the actual interface? For example, do I have to do anything special in my interaction with the device to get it to do compression? Do I have to send something on every single IO transaction to get it to do

Starting point is 00:25:20 compression? Or do I put it in a mode that just says compress? The same thing with regard to any kind of, you know, like for example with a video compression, what do I have to do from a host side? What kind of commands do I need to send to be able to do that compression on a particular file type? How do I even get the knowledge of a file or an object or something down to a computational storage device to actually do that? Because right now, that context isn't available inside of the drive.

Starting point is 00:25:55 So we need to be able to figure out how do you use these things and which entities, whether it be the host, the computational storage processor, the computational storage drive, which pieces actually have to have knowledge of what information to make these flows work. So when we talk about discovery, we're going to kind of walk through a couple of different things here. So let's say that the host is actually trying to understand what's going on inside of this computational storage processor. As you see right now, the first thing that's going to happen,

Starting point is 00:26:30 let's see what we've got here, is that we're going to be taking the host and it's going to be querying, you know, I have some kind of device down there, and it's going to be basically asking the computational storage device to send back to it what kind you know what what do you have what are your attributes your properties that type of thing that information will then go back up to the host so that it knows hey i've got these computational storage services available i've got ids for my my computational storage device i have I've got IDs for my computational storage device. I have different IDs for my computational storage service.

Starting point is 00:27:11 I have a vendor, the different types, state and kind of reservation information. Basically, do I have access to use this particular device as the host? What is my configuration? Am I kind of being actively set up to do some sort of service? What capabilities do I have then get sent back up to the host, which then is able to know the capabilities of the device and be able to initialize those things? Then we'll move to the next one, configuration. So configuration

Starting point is 00:27:45 is really about kind of initialization. These are the things that I have to do to actually set up the computational storage device to be able to do some sort of work, right? I have to be able to initialize the drive into the mode that's going to actually do that computational storage service. So the host agent sends the CSS configuration request down to each one of the different computational storage devices. You know, as we talked about earlier, with a programmable computational storage service, maybe what this action will do is it will actually turn that computational storage service into a fixed computational storage service, so from a programmable one into a fixed one.

Starting point is 00:28:28 But for a fixed one, what this will do is it'll actually make available that service, like compression, like encryption, those types of things. So we ready the use of the host agent for, excuse me, ready the use of the computational storage service for the host. And so, again, this is basically like, what do I need to do to configure the actual service

Starting point is 00:28:55 so it's in a state where I can use it? Okay, if we move down to the next one. So now, when we put a computational storage service into use, we have kind of two modes that we've talked about. One of these modes is the direct operation. And what the direct model does is it actually makes it so that any kind of computational storage transaction has to be directed by the host. So if I want to compress this section of a drive, I have to send some sort of command that says compress. If I want to be able to do some kind of like transcoding of a video file,

Starting point is 00:29:33 I have to send that command that says, hey, go change the format from this to that. So in this case, what happens is you're sending that command directly from the host down to the computational storage device. It is working to actually do that transaction, and then it returns the value to the host. So that's kind of the direct model. If we move to the next one, the transparent model, what this is is effectively any any IO usage. So I'm doing

Starting point is 00:30:09 writes, I'm doing reads. It's been configured in a model where I'm doing that compression in the background without having to do any specific command to do that computational work. So during the configuration phase, I have set up my drive so that I'm going to do, let's say, compression and then encryption. When I do any I.O. transaction, it's going to know that I'm needing to do compression and encryption. It's going to do that on a per-command basis. So there's nothing that has to be done

Starting point is 00:30:46 specifically or explicitly to be able to get that transaction to go through. Okay? Other than the standard IO reads and writes and that kind of thing, specifically on the writes. Okay. One other thing that,

Starting point is 00:31:00 beyond just the device aspects that's really important for the computational storage twig is what we're doing also at the upper layers. So beyond just defining kind of the interface that goes from the CSDs and CSPs, CSAs, up to, you know, like a kernel mode driver or up to something like SPDK,

Starting point is 00:31:21 this interface we kind of talked quite a bit about in the previous few slides. However, we also are really interested in kind of how do applications actually make use of the functionality that is going to be coming from the computational storage work, right? If we define something that's only at that kind of kernel level or a user mode driver level, then applications will have a hard time using those things directly. And so another area that we're looking at is how do we develop some sort of kind of library within Linux

Starting point is 00:31:58 to be able to make use of this functionality in a very common way so that applications can write something against this library. And I apologize, this is NVMe right here. This should be more generic than that. This is kind of like the initial example that we've been doing. The intent would be, regardless of what that back-end protocol is, that we're able to provide a library that will be consistent for the types of transactions that we would want to do in a computational storage service. So again, we're focusing on how do we develop a library at this kind of upper layer to be able to provide consistency in how applications would make use of computational storage services. And with that, I think I'm handing this back to Scott. Tag team, got to love it.

Starting point is 00:32:50 You good? I think so. Okay. Maybe. There we go. So what we did is we asked several of the Twig companies to prepare as generic a slide as possible about some example workloads. And we put a few of them in this particular deck to talk to it. I'm not the expert on these particular workloads. And we put a few of them in this particular deck to talk to it. I'm not the expert on these particular workloads. So I just wanted to highlight that we just wanted to be

Starting point is 00:33:09 able to showcase some examples. So if we go to the next slide, this is an example of a CSP that is being used for RAID or even compression offloads, several of the topics we've talked about before. And our member company is present and will sure be able to answer questions if you have specific ones on it. But this is an example where they're doing peer-to-peer processing, and this particular one is an FPGA-based solution for this particular product.

Starting point is 00:33:33 So it allows them an opportunity to do some offload in the way that they're sitting alongside traditional storage devices using a processor on the storage protocol bus to provide these certain features, utilizing several of the techniques, in this this case of an NVMe protocol. The next one is one of several different examples of Hadoop implementations that customers and or suppliers are putting out there.

Starting point is 00:33:57 This is the use of a programmable computational storage service to push Hadoop workloads down to the device that's based on a CPU with a containerized application. And you can see that basically the concept here is we have these primary pieces of Hadoop and a portion of that Hadoop workload is being migrated into the devices and configured and run in parallel within those devices to offer some sort of acceleration or prefetching of data, presorting of data. And then the last one is an example where we start talking about

Starting point is 00:34:31 getting things over network and putting some form of AI into a platform that's attached to either an array or individual devices where you're being able to do AI inferencing at the device level. And in this case, they may or may not be attached by a network versus being attached directly by a CPU or a data bus within the device or even potentially sitting off on the side in the way of a

Starting point is 00:34:53 processor within the array. So there's various different things that a lot of the companies have thought through. Several of the products have been delivered to market. This one is basically saying that what we want to do is we have this whole bunch of work our CPUs need to do, and all of these premises are that CPU is getting overwhelmed with the amount of work it has to do,

Starting point is 00:35:11 and the gravity of moving data from these devices to the host is creating the need for this, so we're pushing aspects of that compute back out into these devices. So that's a quick down-and-dirty run-through. I'm sure you've been inundated with enough slides. We tried to keep the slide count down, but it's really fun when we can still brief through things. It is a new market.

Starting point is 00:35:30 It is a real market. The companies that are producing products today have customers that have deployed these products. It is certainly a growing market and an emerging market, so we're very excited about the opportunities that it puts in front of us. But it also creates those unique challenges because now that they're already in the market,

Starting point is 00:35:46 how do we continue to grow that ecosystem with some potential change required for these solutions as well? And so we continue to want to figure out how to make that the most uniform capability as possible. And so that's why we started working on what we did with the definition so people can understand what we mean by computational storage because there's intelligent storage, there's smart storage. There's all kinds of words thrown out there that are combined. Some of them are rack-level solutions, some of them are array-level solutions,

Starting point is 00:36:14 and then we now throw in a device-level solution. So hopefully this is giving you a little bit of introduction to what we've been talking about from our perspective. And then we want to keep working with the different industry associations, different customers, different developers, different companies, to understand really what we can do with this moving forward. So the call to action, if you feel like you would like to participate,

Starting point is 00:36:34 we would welcome you to join our efforts. There's ways to join the TWIG so that you can participate and give some value add. We've gotten lots of very unique viewpoints that we've worked through at the process that we've been at with the members that we have today as it is. And we welcome to make sure that what we are doing is significant value to the market,

Starting point is 00:36:54 not just a bunch of vendors putting together ideas on what's next. This particular session was meant to be an information session. Like I said, we do have, after dinner, a birds of a feather across the hall in Stevens Creek where we hope to have this be a little more interactive. I'll be honest, I'm in charge of that session, and I've got four slides, and two of them came from this deck. So I'd appreciate the attendance to help make that very valuable.

Starting point is 00:37:15 The goal is a lot of people are not currently in the TWIG. A lot of people have questions about what we mean by computational storage. And all of the vendor companies that are here will be there to answer questions as kind of a round robin, whether we're sitting in the room or standing behind the podium. It's not meant to be a slide presentation. It is really meant to be interactive, an hour of conversation after dinner.

Starting point is 00:37:35 And then, of course, tomorrow, there's many companies that are presenting throughout the course of the day from 8 to 5 on various different ways this particular technology can be used. Again, thank you for your time and joining us. If you do have immediate questions, I think we have a couple minutes at least

Starting point is 00:37:51 to answer some right now. Back corner over there. First question. Using some of your examples, I have 30 seconds of uncompressed video print. I want to send it to a device, tell it to transcode to three different formats,

Starting point is 00:38:09 save it, tell me how big they are, and let me use them later. Is that in the scope of what you think is done? So the question, so just for the purposes of also the video recording, the question was if I have 30 seconds of uncompressed data and I want to manage it in 30 seconds of uncompressed data, and I want to manage it in some way,

Starting point is 00:38:28 uncompressed video, and I want to manage it into a compressed form, is that something that's... A couple of different bit frames. So, long story short, the concepts of what you're talking about is exactly what we're trying to define. That specific example

Starting point is 00:38:43 could be classified as a fixed computational storage service on a device that provides the capability to do that transcoding. And there's actually a company that's very interested in specifically the type of transaction that you're talking about that's engaged in the Twig and wants to be able to develop a solution to be that. Well, that type of, I can see why people would want to do that, but it also to me says the devices have a lot of the features of not a block device. It has to have name storage, it's not, you know, it's not blocks anymore.

Starting point is 00:39:23 Right, so there's a lot of opportunity. The API talking to the service looked more like an RPC call and less like a block IO call. And from that perspective, I think it comes down to what that particular solution that the vendor's providing offers in the way of, does it have to be blocked?

Starting point is 00:39:43 Is it meant to be filed? Can it be KV? Can it be? Those are aspects of what the vendor solution are doing, and they'll identify to you to let you know how that's being managed for that particular solution. So your interface back allows for KV, allows for the input in scope to support files, objects, LDA,

Starting point is 00:40:01 memory, thought mode. That's all within the scope. Now, what gets done for the first revision of this may or may not be able to do everything that you've asked, but definitely being able to have different back ends, whether that be block or KV or what have you, is within the scope of the TWIG. You were going to say something, Bill?

Starting point is 00:40:22 The other thing to that question is that there are two modes that we are defining. One of them is a transparent mode where you talk to the device based on its particular storage protocol. And the other mode is a mode where you are actually talking to the service. And that talking to the service could be what you're talking about in terms of, quote, RPCs or something of that nature. So as mentioned as we walked through it, the direct versus the transparent interaction with the device can help justify or articulate how you would use that particular solution for that need.

Starting point is 00:41:02 I was just seeing a problem with, really, use cases do not have a fixed number of bytes coming in and a fixed number of bytes out. Oh, absolutely. Yeah, yeah, yeah, yeah. The form is much more complex. There is, and there are definitely lots of different aspects of how these products interact with the ecosystems that are in place today around how you've written the data to those particular devices.

Starting point is 00:41:24 Because if you're, for example, doing some massive sharding of data, how can we, the computational storage, take care of that? So there are limitations. Do you expect these protocols to be used not with two little disk drive looking things, but with computational traditional servers

Starting point is 00:41:38 acting like two little disk drive looking things? There's opportunity for that. Let's put it that way. At not at this point in time we're trying to keep the the scope as as nick has mentioned very limited for the initial release but the feedback like you're providing is definitely something valuable to us to help add into what is needed to as part of the next steps or what we need to look at before we can release the initial version. If a spec with a transport layer that's fast and tight and low latency, I'll use it with decade-six servers. Right.

Starting point is 00:42:09 Because I have reasons where I need that one processing power as the computation. Yeah, and how much computation is needed at that kind of CSA or CSD or CSP should vary. We expect that to vary. Whether it's really small, low-power devices that all fit into an SSD device or device form factor, or if it's an array, JBoff, something bigger, the intent is to design that in a way that would scale. The intent is such that you could scale it all the way to a Ethernet connected server that you can then invoke

Starting point is 00:42:49 highly computationally intensive processing on the data store locally to the server. The specific thing we're restricting it to is it's not general purpose. It's not RPC and distributed computing framework. It's about invoking computational storage services that act on stored data. That's about invoking computational storage services

Starting point is 00:43:05 that act on stored data. That's kind of the narrowing the other scope. Right. Yeah. If you provide for containers to get this thing out there, don't you use a brand that you've watched? Certain implementations could provide that, yeah, from that perspective. But there are

Starting point is 00:43:21 opportunities to overpower, to Nick's point, what's being provided. I think that's like the ninth time in four days I've heard Bitcoin. Yes, sir. Another question. Sure. First one is

Starting point is 00:43:40 security. Part of the concern, we're looking at that. And also, do you think the government needs to provide that? That's one. The second question is, is there any, for the first, I mean, the July case, you're in a turbo or power surface of the daddy? Yeah, I can talk to this.

Starting point is 00:44:09 So let's answer the second question first, and then we'll go back to the first question. So the question was related to security and authentication, which there was some aspect of that that we talked to already, and then the second was related to the power envelope that's impacted by this. So I think the power and thermals is really kind of dependent on the form factor that the device goes into. And so how do we limit that?

Starting point is 00:44:34 It's kind of out of scope, and it's more of a form factor thing. However, we will have mechanisms, or we will have to deal with mechanisms about what are my thermal limitations and how do I communicate those things up to the management system, right? And, you know, do I have separate power modes and that kind of thing? Those types of things will have to be built in. However, the actual overall limitations will be more of a function of form factor. Is that fair? Yeah, so, again, the way we've represented the representations of the products

Starting point is 00:45:07 don't encompass that specifically. So if it's an M.2, there's known limitations. If it's a U.2, there's known limitations. Whoever's providing that solution will have to stay within those limitations or provide you an opportunity to understand how to expose or go beyond those limitations. So that's not something at this point

Starting point is 00:45:24 that we're putting effort on because, again, that delivery engine is a hardware problem where we're looking more at the engagement of the host at a software perspective. But there clearly will have to be reporting mechanisms and that type of functionality. For example, if we leverage an interface like NVMe or, excuse me, a protocol like NVMe, we would make use of the infrastructure that's already there to be able to do that. If we go and develop something that's separate from that, we would end up having to actually provide that functionality inside of that kind of protocol. Does that make sense?

Starting point is 00:45:59 Just a second. Is it on the same topic or a different? An extension. Okay, sure. Okay, sure. Okay, back to the first question. With regard to the security piece, can you clarify just a little bit more about what you mean?

Starting point is 00:46:12 When you're talking about security, are you talking about authentication of the device that's going to be doing the actual computation, or are you talking about, like... I mean, in my mind, I'd imagine, like, say, on S3, so that's kind of... Mm-hmm. If there's a token that won't be able to access a buffer, then there's another, I think, token that you're able to do that

Starting point is 00:46:37 and it has to be a little more of a split. Yep. So I was just asking if that, the candidate part, is pointing to this. We're definitely considering what security pieces we have to have in place to be able to make this a usable function, usable protocol. I don't think that we're at the level specifically of dealing with a specific problem like S3 Select and kind of the mechanisms there. However, we will have to deal with that for it to be viable.

Starting point is 00:47:08 There's another question in the front. The third bullet of monitoring that we were just talking about, it seems like it might be a little early on because those four different block diagrams are so very high level. But the more fixed that schema or format you get, the more it implies fixed syntax that leads to better semantics and understanding of what's going on.

Starting point is 00:47:36 So how long do you think it'll be before the vendors have kind of settled on different architectures and get a more fixed reporting mechanism? So I think that takes us to kind of settled on different architectures and get a more fixed reporting mechanism. So I think that takes us to kind of what are the goals for our specification releases over the course of the next period of time. We're looking at having a first revision of the specification in the first half of next year. And that, again, is at the architectural level.

Starting point is 00:48:08 So it's kind of at the level that we've discussed today, not necessarily specific protocols. We're working with third-party organizations to figure out that API level interaction. And those dates, we have targets, but because we're working with those other groups and we haven't formalized those relationships yet we need to just be a little patient

Starting point is 00:48:29 on when exactly that will happen. It's on the target list of things to accomplish. The delivery of that particular example with the monitoring aspect of it is certainly well somewhere within the timeline. Not necessarily at the beginning of the timeline. To give a concrete example,

Starting point is 00:48:45 computational storage services invoked over NVMe, instead of our technical working group going in and trying to define NVMe-specific things, we liaise with the NVMe technical special interest group, et cetera, to determine how that fits into NVMe management, name spaces, et cetera. So the collaboration there will, of course, have its own timelines. Okay.

Starting point is 00:49:07 Are you looking at the interface for configuration, call, whatever, for a very big data model, a list of parameters, et cetera? Or are you looking at something more extensible, like the node data, as the framework to build your parameters inside of? Do you want me to take that one? Go for it. So for those that aren't aware, David is one of the technical working group members.

Starting point is 00:49:33 That's a great question. So it's a very early stage. We have a few proposals tossed out, but definitely we're looking for something that is self-describing and is sensible in is self-describing and sensible in that self-description so that when a host discovers a computational storage service, it can discover

Starting point is 00:49:51 how it can be configured, what it's capable of doing, what those knobs can be adjusted to, and with the configuration itself being able to express, okay, I want for example this compression algorithm tuned to this parameter all the way on up to some of the more elaborate functions

Starting point is 00:50:11 that various vendors have put forward. Currently, we're looking at C4, which is a binary variant of JSON, but once again, that's still very early stage. I think... You are looking for something that has the nested extensibility of JSON, but perhaps without the integer converts.

Starting point is 00:50:32 Correct. Okay. All right. So with that, this session is officially done. So thank you very much for your time. Thanks for listening. If you have questions about the material presented in this podcast,

Starting point is 00:50:46 be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference, and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.

Storage Developer Conference - #120: What Happens when Compute Meets Storage?

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.