Storage Developer Conference - #181: The latest efforts in the SNIA Computational Storage Technical Work Group
Episode Date: February 17, 2023...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage
Developer Conference. The link to the slides is available in the show notes at snea.org
slash podcasts. You are listening to SDC Podcast, episode number 181.
My name is Scott Shadley. I'm a board of directors member at SNEA.
I've been on the board just over a couple of years now.
And then I co-chair the computational storage technical working group with my counterpart here.
And on my day job, when I get around to it, I'm a strategic planner at Solidaim.
Hello, everyone.
I'm Jason Molgaard, and I'm obviously the co-chair along with Scott for computational storage.
I also have been on the SNEo Technical Council for almost a year now, and that's been an exciting and different role for sure. And my day job is working with storage architecture
strategy at AMD. So very happy to be with you here today. All right. So what we're going to talk
about for you today to
set up the track, so we've got a whole bunch of sessions left between today and tomorrow on
computational storage. We're very excited about all our co-presenters and the members of the
TWIG. We're going to tell you what the TWIG's been up to and where we've gotten to, what we're doing,
what the architecture looks like, and then we're going to use the rest of the sessions
to talk about use cases, implementations, and other aspects of the work that we're doing.
So for today, we're just going to do a quick overview of what's going on with the Computational
Storage Technical Working Group.
As far as the working group itself, we're at 52 member companies.
You can see the logo chart there.
I have fun making NASCAR slides.
I am a former engineer turned marketing, so I got to have fun with slides.
We have over 265 individuals that are
tracking what's going on within the work group. We average between 30 to 40 people weekly on our
calls that we have discussing what we want to do with computational storage as a market. One of the
key things here is it's not just all of us vendors. There are some customers, there are some enablers,
there are partners that are involved in that logo chart. So it is something where we're getting a
very large ecosystem of people put together
because we know that it can't just be a hardware solution.
It has to be a software piece.
It has to fit into where you're going to plug in these devices.
Within SNEA, we have a lot of cross-collaboration work that the technical work group has been doing.
We have a special interest group, which is our marketing arm.
So the technical work groups within SNEA do the work.
The special interest groups or the initiatives do the fun talking about the work. And so we split
time with that. We also have a partnership that we're working with the SDXI group. So if you saw
the presentation yesterday on SDXI or presentations on SDXI, we're starting to look at how we can
talk about the cross-collaboration of that architecture with what we're doing as we're
moving compute all over in the ecosystem. And we're kind of finally getting away from our friend,
Mr. Von Neumann, and moving into things like Amdahl's law about compute. We want to make
sure that we're collaborating within the infrastructure. Another big one, as I just
mentioned, since we're moving compute around, compute does have its challenges when it comes
to security and the breaches, cyber threats,
all that kind of good stuff. So we have a partnership with our security twig partners
as well. We talked about that last night in the BOF and the leader of that group was very
kind to let us picket him in that presentation as well. And there are quite a few security
related presentations here as well. So be sure to keep an eye out for that. And then the option of this
XPU, so you can call it a DPU, an IPU, our friends at Gartner call it a FAC, a feature accelerator
card. Those products work cohesively in the ecosystem with computational storage solutions,
and so we're continuing that cross-collaboration as well. As far as what we're doing outside of
SNEA, we've built an architectural model,
and we'll get into what that looks like. One of the key aspects of that is we don't have a
transport or command set specifically assigned to that architecture, so we're working with NVMe.
You'll hear right after this presentation, Kim from Intel is one of the co-chairs of that
work effort within NVMe to actually define the instruction set that utilizes the model. So we have that cross-collaboration is going as well. And we've been participating in
pushing things into OCP storage, as well as the Soda Foundation and other industry compatriots
within the architectures today. As far as looking more at the marketing side of things, so since we
do participate in both of those, computational storage has been around for quite some time. The original papers were written in
2011-12. We're literally four years and a month, or four years and three weeks from when we actually
started this effort within SNEA. And so we're very excited about what's been going on. You can see
the hype cycle on the far right of your screen, I've highlighted in green, the computational storage place.
So the hype cycle is used by a lot of our end customers to determine what technologies they want to investigate and look at.
And it's really nice when you start to see things progress past the peak, as they call it, the peak of inflated expectations.
This is 2021.
The 22 version of this just came out.
I didn't get it in the deck.
Computational storage has hopped the line into sitting at the top of the peak. Just to call out some other things in the marketplace,
directly across from it, coming off of the peak is where NVMeOF is sitting. So for example,
you know where that technology aligns with the type of work that we're doing. And as I mentioned,
the XPU slash FAC cards, from a Gartner point of view, are lagging significantly, if you will,
from a perspective of where they fit into the space lagging significantly, if you will, from a
perspective of where they fit into the space. So we are on track where we think we would like to
be. Of course, we'd like to be deploying products in the market much sooner, but part of getting
there is what we're going to talk about, which is our architectural model. A couple of examples on
the left of articles that have been written just this year in 2022 related to computational storage
and how it fits into the ecosystem.
So Gartner also does a strategic roadmap. Yes, I get paid by Gartner every time I mention Gartner,
so don't mind me mentioning Gartner. But they have a strategic roadmap that talks about
computational storage, composable architectures, DPUs, how they all fit into what you as an end
consumer should be looking at for your architectural needs. And they call out computational storage now as one of those architectures.
And then you heard about the accelerated box of Flash
from Mr. Greider earlier today from LANL.
That was one of the products that he talked to
of the ecosystem of solutions that they're working on.
So we have a lot of stuff moving forward
in the market around this
as we're still just getting to the point now
where we have released the 1.0
version of the computational storage architectural and programming model. It says version 9 on the
screen, but 1.0 went live a couple weeks ago on snea.org, so feel free to check that out if you
haven't already. And now that we've got an architecture in place, you'll get a little bit
of an introduction to that from Jason here in just a second. We've also been working on an API platform. Our friend Oscar Pinto will be presenting on that
later today as well. We'll give you a quick intro to it as part of this session. And one of the
biggest things that we took our time with, if you will, from the previous version is we wanted to
make sure that the considerations for security were included in the document. One thing that
we're not going to do is define new ways to think of security.
There's plenty of other four-letter, three-letter, five-letter acronyms for security.
We just want to say pay attention to these considerations for your secure needs within a computational storage product.
We do have some limitations associated to that just to be able to get us to a point where we can at least put this out as a live document.
So as we get to 1.2 for whatever versions come next,
some of those are to expand the security considerations within that document.
So that's kind of a quick overview of what's going on in the space.
What I was going to do at this point is now hand it over to Jason, my counterpart,
the engineer in the group, so that we can actually hear some really cool geeky stuff,
because I've just been too much marketing.
All yours.
All right, thanks, Scott.
So now that we have that 1.0 spec out and available for download off of the SNIA.org website,
I thought we could kind of delve into a little bit of detail about what's actually in that spec.
Certainly you're welcome to go download a copy, and I encourage you to do so.
But we can kind of talk through some of the key points, not necessarily the most important,
but certainly some key points.
So we've defined three different architectures in the architecture and programming model.
Over on the left, we've got the computational storage processor.
In the middle is the computational storage drive.
And on the right of your screen is the computational storage array.
So kind of getting in a little bit more detail on each of these, the computational
storage processor is a computational storage device that doesn't have any storage built in.
It has no storage connected to it. It is connected up on the storage fabric and so can interface
with storage and perform operations on storage. You know, Scott mentioned the XPU efforts that are going on within SNIA.
One way you could think about an XPU is in the flavor of a computational storage processor
where, you know, a DPU doesn't have any built-in storage, but it certainly interfaces to it.
So that is a potential use case for a computational storage processor,
and not the only one by any means.
In the middle, the computational storage drive,
this is what everybody kind of thinks of.
This is the poster child for computational storage, which is great.
I'm happy to have that be the poster child.
This is your traditional SSD that you're all familiar with that has
additional compute capabilities built into it and can perform computations on the drive itself.
So that's kind of what everybody thinks of when they're thinking computational storage,
and it's not incorrect, but it's certainly not the only option. And then over on the right,
this is a computational
storage array. So it is an array that you would think of, a storage array that has computational
capabilities or computational storage resources. But in addition to that, you could have the array
be populated with either traditional SSDs or computational storage drives.
So if you can imagine, now you have multiple levels of offload
where you've got compute in the array controller for performing operations on the data,
and you have compute in each of the individual drives that the array controller itself manages.
And you could potentially offload work to the array controller and or into the drive itself
and potentially even move data workloads back and forth between them.
That's maybe a little bit further down the road in terms of deployment,
but the architecture definition certainly is present in the architecture and programming model from SNEA.
One thing that's true of all these pictures you'll probably notice is they all contain a teal-colored block
labeled computational storage resources.
And that was deliberate that we kind of replicated that same block across these different architectures.
So let's kind of dive into that in a little bit more detail.
This happens to be the computational storage drive, because that's our poster child. But we've grayed out all the traditional storage device blocks,
and just to focus on the computational storage resources themselves.
So if we kind of go through all these blocks contained within there, the computational
storage resources. So these are the resources that are available and necessary to execute and store a computational storage function.
So, okay, great.
So what's a computational storage function?
So a computational storage function, this is the function or the workload that you want to execute on the drive itself or on the device.
And when I say drive, I also mean that it works on a processor or an array.
But so anyway, the computational storage function is whatever you want to do,
whether that's compression or encryption or filtering, searching, you name it.
That's our operation that we want to perform on the drive.
And that has to execute on a computational storage engine, or CSE. So a computational storage engine, this is a resource that can be programmed to provide one or more operations.
So you could think of a computational storage engine as a general-purpose CPU.
That would be one example of a CSE.
Another example of a CSE is an FPGA.
So something that you can program to perform an operation.
Now you need a computational storage engine environment, or CSEE.
This is an operating environment for that computational storage engine environment, or CSEE, this is an operating environment for that
computational storage engine. So if you're thinking general purpose CPU as your computational storage
engine, the computational storage engine environment could be the operating system to boot that CPU
and get to the point where you could actually execute a program. Or if you're thinking FPGA,
then the computational storage engine environment would be the bit file that you would download to the FPGA for it to take on some functionality.
Otherwise, an empty FPGA obviously is capable but doesn't do anything without that bit file.
So that's the computational storage engine environment. So when you're executing a function on the drive, you need to have some
temporary memory for scratch space or storing the data that has come in from the persistent media
or maybe storing results. So we need function data memory, FDM. That's what we've designed
into the architecture. And that's the memory that can be used for those computational storage functions
to store those temporary variables or store the data
and generally do whatever manipulations need to be done.
And your architecture may have a large block of FDM,
but you don't want to allocate all of it to any specific function.
You want to be able to have multiple functions, let's say, that you don't want to allocate all of it to any specific function. You want to be able to
have multiple functions, let's say, that you can execute on your drive. So you have to be able to
partition out that memory and make it available for that, for a specific function. And so when
you allocate it to that function, now you've got allocated function data memory, AFDM.
The only other block that I haven't touched on is the resource repository.
And this is the resources that are available but haven't been activated in a computational storage device.
So maybe you have multiple functions or maybe you have multiple computational storage engine environments, but you can't run all of them at the same time.
Maybe, you know, the hardware that you've designed has a limitation.
You can only have X number of functions activated at that moment or X number of environments.
No problem.
The others are there.
They're available.
Someone could come along and potentially deactivate one and activate a different one and make it available for use.
All right, so kind of with that background on the architecture of what we've envisioned for computational storage, we now move into the usage a little bit.
So the first model that we've described in the architecture and programming model is the direct usage model. And in this case, it's very host-driven. And when you hear the next presentation
from Kim Malone, you'll say, oh, what's being done in NVMe aligns very closely with what you see here in
terms of the direct usage. So in the direct usage, the host kind of has to orchestrate a lot of
the operations, which is just fine. In this example, the host sends a command to invoke the CSF.
The computational storage engine performs that requested operation on data that's already been placed into AFDM by some other mechanism, some other commands put it in there.
And then the computational storage engine returns any response back to the host.
So the host is very much involved in that, in sequencing those events.
And that's great.
That's a very valid way to
implement computational storage. But obviously not the only way. And therefore, we have,
and as you might have guessed, if there's a direct, there has to be an indirect.
And sure enough, that was a good guess. There is. And so with the indirect, this is more of
an autonomous view of how computational storage could operate.
Certainly not the only way, but I think another valid way from an indirect or autonomous perspective.
So in this example, we're going to associate a read operation with a specific function that's stored on your computational storage drive.
And essentially what we want to be able to do is when the host makes read requests,
we want the drive to recognize, hey, this read to this memory range is,
we want to perform operations on it.
We want to do some kind of compute on the data that's coming in
and not just transparently send the data back to the host. And so that's exactly what we're going to do some kind of compute on the data that's coming in and not just transparently send the data back to the host. And so that's exactly what we're going to do. So we've, you
know, first configured the drive to behave as described. And then the host begins issuing
read commands. So this host sends a read command. The storage controller that's built into the drive intercepts that read command,
and it starts to initiate all the actions for performing the compute operations.
So it first moves data from the storage into AFDM and puts it in the AFDM so that it can be
operated upon. And then it instructs the CSE, hey, go execute that CSF that we've associated with this read data.
The CSF is then executed by the CSE on the data that's already, the results are stored back in AFDM, and the storage controller is notified, and then the storage controller then will return any results back to the host if there are any from AFDM.
Why does it have to be a read?
It doesn't.
It's just an example.
It can be anything.
I thought it was some architectural requirement.
Not at all.
It doesn't make any sense.
Correct.
It does not have to be a read.
It can be a write.
It can be whatever you want.
This is just one example of a read where we're taking data from storage
and manipulating it and then sending results.
Did the results have to be sent back to the host
or can there be additional DM applied to those results
that save on data?
So, no, the results do not have to be returned to the host.
Certainly, you could, I think that there's several of us that have a vision of chaining commands,
which I think is kind of where you're going here, where you first want to do a decryption operation.
That's step one.
And then we want to, while that data that's now decrypted is stored in AFDM,
we want to come in and we want to do some different computation on it
and then maybe a third computation.
And we do a sequence of events or a chain of events.
And at the very end, maybe only then do we either return success or the result itself,
whatever the case may be.
Is that what you're...
Yeah, yeah.
Okay.
Actually, I was wondering about the array.
If you have limited space, you have different functions,
different CSTs that migrate directly between CSTs and...
So the architecture doesn't preclude any of that.
That's all very possible.
I think that's a great vision of exactly where we want computational storage to go,
where you can migrate workloads from executing on the drive to executing at the array level.
And maybe in some cases, part of the workload actually has to be done by the
host itself, or something like that, or some higher entity. And that's all perfectly valid.
And the architecture does not prevent you from doing any of that. Some software development may
need to be done in order to reach that point where you can accomplish that level of automation and
migration. But, you know, we'll leave that to the software engineers.
Yeah, another way to think about it is you can see on the graphic we've done a stacked image
to showcase that you can have more than one CSE, more than one CSF.
And the reason we created an FDM is that there's a dedicated amount of space
because nobody's going to have infinite RAM.
So we don't want to run into,
oh, I can't execute something because I don't have enough space.
Therefore, here's the block of RAM,
here's the allocated for that function,
then we have another block for another function,
those types of opportunities.
And in the array type architecture,
not only is it more CSDs,
but long-term you could have standard drives
with one or two CSDs
that can actually do peer-to-peer type work,
things like that.
Those are the next steps of what we're looking at in this architecture.
Yeah.
So from a model perspective, that block of FDM is within the device memory.
We're just highlighting it as part of this computational storage resource block.
The FDM does, quote, exist within the device memory space on an SSD,
whether it's a dedicated DRAM device or a shared portion of the total DRAM
inside of a computational storage drive in this case.
It's just for block purposes.
This isn't meant to be 100% literal
that we have something specifically carved out separate.
Just to illustrate the...
Yeah.
Precisely.
That's exactly right, yep.
So yeah, it could be physically one RAM
or it could be multiple RAMs,
however you want to implement your design.
All right, so transitioning to security.
So Scott had mentioned security, and we spent a significant amount of time on security,
quite a number of months, actually.
And it's a very important topic.
I think we can all agree, you know, it's a hot button topic for sure and to ensure security of your data. And especially with computational storage,
we're opening up new attack surfaces because now we've put some kind of compute capability in our drive. We're saying we can go run functions or programs down in the drive.
Some basic tenants of security and storage have maybe been invalidated by saying, you know, previously if you joined our BOF last night, our security expert who was present had commented that there was an assumption made that the drive can never attack the host.
And now that we have compute capability in the drive, that may no longer be
a valid statement. The drive could be used maliciously to attack the host, and we don't
want to allow that. So we spent quite a bit of time. And as Scott also mentioned, we don't want
to invent new security technologies or new security practices. There are well-understood,
well-utilized security methodologies that are already out there for storage-related applications, and we certainly wanted to leverage those as much as possible.
So when you look at this list of the types of security that we added into the spec, you're probably saying to yourself, well, wait a minute, a lot of this stuff was already done, right?
You already had privileges of accesses.
You already had sanitization and data at rest encryption.
We're already familiar with a lot of that on existing storage devices,
on existing SSDs and HDDs.
And you're absolutely right.
And we didn't go in and provide any new insight or guidance on that.
What we did do, though, is look at these same areas and say,
well, what needs to change now that we've added computation into the drive? And what needs to be
considered by a developer who's implementing a computational storage drive? So, for example,
with sanitization, certainly that takes on one meaning in terms of eliminating all the user data, but what about the programs that are potentially now stored on the drive?
Should those also be removed?
And potentially the answer is yes, because you don't want to reveal what it was you were searching for in the data that was contained on that drive.
And that's different than an existing traditional SSD and needs some further consideration on how you're going to implement that.
You know, likewise, as far as key management is concerned, now maybe we want to be able to have the drive do the decryption directly and then perform operations on it.
And so now the drive has to have all of the keys and be able to do the decryption, whereas
before maybe the host managed the key and just, you know, let the drive, informed the
drive that it was unlocked and it could go off and do something now.
So a little bit different shift in paradigm there in terms of how the drive operates given
the fact that it has this additional compute capability in there.
So lots of sections added on this.
Nothing that you probably haven't heard of in terms of security technology,
but definitely a different application of those technologies for computational storage.
Yes?
And then, of course, if you're going to do compression,
you have to do the compression before you do the encryption.
If it's compressible after it's encrypted,
your encryption isn't any good because it's just look random.
That's right.
So that would have to be in there as well.
Correct.
You can't do that after you do the encryption and the host.
Exactly.
Yep. Exactly, yep.
All right, so let me switch gears a little bit and talk about the API briefly.
So coming up in two hours, I believe,
Oscar is going to give a very detailed presentation about the API and what we've been developing, just to kind of pique your interest.
And for his presentation, I'll kind of give a little bit of an introduction.
And then hopefully if you're able to make it to his session, he'll get into a lot of detail.
But so while we've got the 1.0 architecture document that is available today
and development can commence from an architecture perspective.
We're still working on the API.
We're going to need a little bit more time on that.
We're at a 0.8 revision currently.
You can download a draft.
It will give you a good flavor for what we're working on and help with any of your thought processes around computational storage.
The API itself, the goal here is to have a unified software interface
because maybe there's different hardware architectures
or hardware implementations for the computational storage,
but if everybody's using a common API,
then you're going to be able to have that multi-vendor support that the industry expects,
right? That's just prerequisite. And without that, we're never going to get adoption. And
that API is going to provide that commonality that we all need for those drives.
So long list of functions in that API document.
Didn't want to bore you with all that.
Instead, I'll just kind of walk through briefly an example of the APIs and the function calls that are being defined or have been defined in that API.
And like I said, Oscar will get into a lot more detail later.
So, this is kind of a very host-driven example.
So, the direct example from an API perspective.
And in this particular example, we start on the left.
I know the diagram is just a little bit small, but we're basically making a loop around from left to right where the host is at the top.
And on the left, the first thing we do is we call CSAllocMem to reserve some FDM.
So we're allocating, we're setting aside AFDM for a particular function to execute.
And so the CSAllocMem, that's one of the functions.
And, of course, it's got a number of parameters to specify where and the size and all that for allocating that FDM.
Then once we have allocated some FDM, then the host sends a command or makes a function called CSQ storage request.
So it's going to send a command to say, please perform a storage operation.
We want to move data from the storage into that AFDM that we just allocated.
So then the controller goes off and performs that operation, moves that data into the AFDM.
And then now that the data is there, we're ready to perform a computation on that data.
This little example here kind of walks through if we're doing a decryption operation,
but again, it could be anything, but just as an example. So in this, the next step is to, it would be called CSQ compute
request. So we're going to queue up a request to perform a computation, to make a call and have have a CSF execute on one of the CSEs.
So the CSF executes on that CSE.
It completes.
Results get returned back to AFDM.
If there are any, maybe, you know, if there aren't, it doesn't have to.
But the operation finishes.
And then lastly, the host would make a function call to CSQ copy mem request to copy the data out of AFDM over to the host.
Yes? So this is going to be binary?
In other words, if I write this,
somehow I create all these steps,
I make the API calls,
they'll be the same call.
I can take particular binary.
This generates messages to the computational storage device, right?
That's what the API calls.
You call this function, and it sends something to the device,
and then there's some result possibly.
And those will be the same API calls regardless of the device as long as it conforms to the standard.
The object could be the same.
It could generate the same sequence of bytes or something.
I'm not sure what the compatibility is.
All right, so great question.
So let me try to describe a little bit.
So essentially there's still a storage fabric layer that's between the API and the actual computational storage device.
And the next section will be about what's going on in NVMe and the commands that are being or command sets that are being defined over an NVMe. Certainly, there's not the only fabric that could implement computational storage,
but it is the only one that I know of at the moment that is actively developing commands
for computational storage. So if you make a call to, you want to allocate some FDM, that will get translated into an NVMe command to set aside memory, to reserve memory.
And what exactly that NVMe command looks like is still under development.
But that's what would happen then, and that NVMe command would then execute and perform that operation.
It would be the same for everybody who's conforming to the standard,
who wants to rewrite the program and say, oh, if it's this kind of drive, you have to do this.
That's correct. That's correct.
So no matter whose drive you have, the API will work. And, in fact, the vision for the API is if you've got a function that could be implemented in software
instead of with a dedicated hardware engine, the API will handle that.
So if you've got a compression algorithm that could execute on a CPU
versus a hardware engine that's actually built into your computational storage drive,
it will pick the correct choice here.
Is it going to use the engine or is it going to execute it by emulating it in software?
Okay, that's pretty difficult to envision.
I assume you guys know what you're talking about.
Believe me, I don't understand.
Okay.
You'll hear more from Oscar.
Oscar will get into a lot more detail later on.
I'm not a software guy, so that's why we're having him come up in a bit.
That's right.
All right.
So that's kind of the extent of...
We have a finger in the back.
Okay.
It's not the wrong finger.
Oh, okay. All right, yes. That's kind of the extent of... We have a finger in the back. Okay. It's not the wrong finger. Okay.
All right.
Yes.
Yes.
I guess in that way, you know, is there any road that that is required to be within,
let's say in this case, the NDIB?
Is that also being done with the program?
The... The... The program that is being allocated, right? For example, in the first step? So that would not be host memory.
That would be on the device.
That would be device memory where that memory would be allocated. And there's a lot of work in NVMe that I believe that Kim will get into
and describe the work that's happening there in support of that.
But in terms of the way that you're defining it,
are you stipulating that that has to be in?
No.
Something like a host buffer memory or something,
or CMB interaction, that type of thing,
or even an SDXI to another device, something along those lines,
could be a future implementation.
Could it be anywhere as long as the device is able to?
Yeah, as long as there's a connectivity of some sort between those.
There's opportunity for that.
It's not defined that way today, but that is what could be an extension of this
and why we're talking with SDXI and things like that as well.
All right, well, I'm going to turn it over to Scott,
and he can kind of walk us through what's coming next.
What's next?
So moving beyond just the base architecture,
we've got the 1.0 done.
We're happy about that, but we know we can't just stop here,
shut everything down, and walk away.
It's got to keep progressing and becoming more and more valuable.
So one of
the big things is continue the effort around security right now. If you read the document,
it's a single host type environment. We know that a single host attached to all these devices is not
going to be the traditional implementation model, but we had to start somewhere. So we're going to
look into multi-tenant, multi-host, all that other virtualization stuff. And anytime we mention any
of that, Eric kind of falls over in his Zoom calls.
And he's the lead of the security twig.
We have a bunch of appendix illustrative examples.
We started these all the way back about four years ago
as far as ways you could deploy this.
They were presented to the twig
on behalf of different vendors within the organization
as far as where they were seeing possible ways
to deploy this as an end
use case. When we were talking last night in the BOF, we had this concept of the killer app.
So we tried to give some examples of what could be an application. Some of them are slightly stale.
Some of them could use some updates. There's been some new opportunities that we're going to hear
about throughout the course of the rest of the use cases today that could be modified into a
statement of work for purposes of updating that and giving
broader adoption capability for that. As we've talked about a couple times, the SDXI collaboration,
what does that turn into? Does that become a device-to-device? Does it become a device-to-host?
How do we manage the memory footprint that we're getting into? So we also have to consider
not only things like what's within SNEA, SDXI, the security. We also have to
start thinking outside that box. We've got NVMe. We've got CXL coming. We've heard a lot about
that. The keynote was presented about why CXL should be cared about in storage, for example.
And then a lot of the conversations as myself going out talking to people about computational
storage is, well, what about the DPU or what about the XPU or the GPU? If you've got that,
you don't need this. And I kind of talked about that last night in the BOF as well.
There's a need to think of these as cooperative products in the ecosystem. We are not replacing anything with a computational storage drive. Yes, I'm replacing an SSD, but if done right,
the SSD looks like an SSD, and I happen to have a compute engine next to it.
If I'm storing data on that SSD and I get to 8, 16, 32, whatever density drive I get,
or I get into something that's a QLC, PLC, whatever we're going to think of next,
you need something to help that out for data that's on the device. That's where computational storage plays.
If I'm doing stuff coming straight into the front end of it, that's where a GPU, a DPU, a WDXPU will help manage that data.
It's going to preprocess all that.
If the preprocessing is collaborative with the computational storage, then you'll be able to do work at the device level.
Because we talked about erasure, coding, and encryption and compression, all that kind of stuff.
So we're working on the angles of what does that look like next, looking outside our little box into the ecosystem that we're being plugged into.
And that's kind of one of the next steps that we're talking about.
So that's when I mentioned the coordination of compute.
There's actually a DPU BOF this evening,
if you're interested in talking a little bit more about what's going on there.
And as far as what you can do if you're not already,
these are the two working groups that are actively pursuing this.
So as we've already
mentioned, we're driving the twig within
SNEA. There's a great opportunity for you if
you're not already participating to come join us.
We love adding more
logos to the logo chart. That's one of my favorite
things to update. And then, of course,
NVMe is doing some great work on the
computational storage task group, which you're going to hear
about in just a few minutes, kind of
thing, in the next session. We encourage you to participate, take a look.
The specs are available to download on SNIA.org.
If there's any other questions, we'll be able to take the questions. Otherwise, we'll
call it a presentation. Yes?
In the intro slide, you mentioned
the computation engine, maybe a general purpose
of your MGA or some sort of computation class, is there some standardization or simulation
or something that means that if I was to write a computation function, I could load that
onto vendor A's device, vendor B's's device without having to know exactly that particular device.
I can change it to different sizes and resources.
I can have it build my function engine,
the function for that specific engine.
You know, every process.
Right.
So, and I think, feel free to add on,
but the idea of this is we were building this so that you could get to that environment where you're able to create that situation where we're using the common themes and threads.
So we're not going to have a bunch of vendors creating exactly the same device.
We already know that today based on what's in the market.
But, yes, getting to a point where maybe that's one of the next phases of this work group is to help find a way to facilitate that work is certainly an aspect of where we'd like to go with it.
I agree.
Yeah.
And I think that, I mean, a general purpose CPU maybe is a little bit easier to get there.
The FPGA, I think, is going to be not necessarily the wrong answer by any means, but you have
a very valid point.
Your bit file is compiled for a specific FPGA, which means it's kind of targeted at a specific
design.
And, you know, maybe that's a collaboration with the vendors who are
using FPGAs to
target their exact products
at that point.
One in the back
and then we'll get to you. Go ahead.
Are there some
that you have already included
in the status of
CSX? Like you gave an included in the standards on CSFs?
Like, you know, if you have an example of encryption,
are there some parts of the top ones that you guys are considering
maybe making this standard, or is that even something you're considering?
So we have some example CSFs in the architecture and programming model,
but they're just that.
They're examples.
So currently we have not standardized on, you know,
this is how you should implement an encryption algorithm or a compression algorithm.
We could do that.
I think that that could be difficult to get everyone to agree on exactly what that encryption algorithm should look like
or compression algorithm should look like.
And I think that's where different products can differentiate anyway
because maybe you come up with a really snazzy encryption algorithm that's super fast
and that makes your computational storage drive very appealing to over mine.
And so I think that's where the API comes in is, okay, the customers who deploy it can interface with it,
but your algorithm maybe is better than mine,
and the market's going to decide that they want to buy yours instead of mine, for example.
If there becomes a strong need, we're very open to entertaining that, of course.
There's no opposition whatsoever.
So this gentleman here first real quick.
In terms of the security aspects, are you trying to align with any other, like the concept of trust with the execution environment, which I truly have a sense of working through on
that?
Is there a lot?
Yeah, so part of our primary,
so the question around security,
we're heavily engaged with our SNEA security technical working group,
who the leaders of that team are participating
in all of those different organizations
to make sure that we're assessing those types of things.
Like we showed on the slide with like roots of trust and sanitizing,
that's all based on the feedback we got from that twig.
And in the case of Eric being the great chair that he is,
he's on IEEE and TCG and everything else.
And so we are trying to make sure that we're not breaking anything
or having to recreate something for sure.
And like we mentioned, we're not in trying to invent new security.
We want to leverage what is out there.
And so, you know, trusted execution environment,
we want to put the framework in
so that it works with computational storage.
But it is up to the developer to actually go off and do that.
You had another question?
Your comment, just sort of flip on that model
of how it's actually used.
Are you thinking that a device would ship with CSFs as opposed to being something that someone would buy a brand device
and then write their own CSFs and put them into it?
Is this something that you see being supplied with the device?
So to answer that question, yes.
The vendors have the opportunity to send out something that is, quote, transparent.
The CSF is already there.
Examples of that would be compression.
The other flip side of that is if you are going somewhere down the path of an FPGA or general purpose
where you would allow the customer to implement whatever they want.
So a good example is encryption.
Yes, encryption is standard in all devices for self-encryption drives,
but there's plenty of custom encryption algorithms that could be created as a CSF in the device
as an extra layer of encryption or something to that effect,
or an AIML engine or something else that you want to do that does the filtering or whatever you want.
So the challenge is we can create the framework,
but if we try to dictate the functions to the extent of, you know,
do I do LZ4, GZIP, or Z standard or whatever the heck the compression algorithm is, it becomes a little more challenging.
So we're trying to give it as much flexibility with as much organization as we possibly can.
So you're imagining sort of a market where some functions might shift.
Yeah.
And some may need to use it.
Exactly. And there's even the opportunity where you can engage a customer as a vendor,
give them the complete open kimono to however they want to do it.
They figure out what they want, and then you would lock a configuration for them,
and then it becomes an indirect or a transparent type of thing.
So those types of capabilities exist in this framework.
And I was just going to add that there are vendors out today with product that do both ways,
where there's built-in CSFs and then otherwise it's wide open.
Yes, sir.
Question. I would say a single device, right? I think you can see that maybe even in the previous framework, too.
What about when it spans multiple devices?
There are many of these, you know,
spread across multiple devices, like the array.
It's been changing in the arc, right?
Can you say something more about that?
You can talk to us.
So to your point, we have defined this capability
of having multiple devices in an array,
in a JBoth over a fabric, whatever the case may be.
Have we spent significant cycles
on how exactly that looks like as an implementation?
Within the Twig, we haven't yet
because we're not having a vendor
that's brought in that example, if you will.
We know that that need exists.
That's why we've created the base architecture for it.
But exactly what that implementation looks like,
we haven't had someone within the Twig
specifically present us an opportunity there.
It's certainly something that we want to add to
as part of the conversation that we go through this,
iterations of this architecture
better help define what that would look like.
So there's nothing in place today, but we've created at least the base for it to work off of.
We're trying to keep the fabric, if you will, as agnostic as humanly possible
to allow that flexibility to exist.
This isn't a specification.
This is an architectural model, and that's kind of the delineation of what we're doing.
So SDXI wrote a specification on a protocol to do memory exchange.
NVMe is a protocol and has a specification.
Those can be applied to our model
to create an architecture.
So would the CSA architecture be a version of the CSP?
It could be one version of that, yes. But that's not the only way that it could work, right? The luxury of this is I could have a CSP, some CSDs,
and even some hard drives or some standard SSDs
all in that array architecture,
and how they're operated on is by way of whoever builds that array
wants to create the flexibility for their user.
So we're not precluding that from being a way for it to work.
Back on the logo slide, also in Arch Linux, I appreciate the logo on the NASCAR slide.
Hyperscalers are already talking about their participation in this.
I'm just curious if some of the big four or five percent have heard of that.
Some of them are in SNEA and have been participating in some of the work on the NBME side.
Some of them are participating in the definition of the specs.
We are getting some visibility from the CSPs from a perspective of their inputs to these organizations.
That's part of one of the reasons why, since a few of them have decided to play in OCP space,
we're playing nice with OCP as well and things like that.
So, yeah.
You're telling us that our time is up.
So we're going to go ahead and cut off the presentation for now,
but we'll stick around for some questions while we wait for the next session to start.
Thanks for listening.
If you have questions about the material presented in this podcast,
be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask
questions and discuss this topic further with your peers in the storage developer community.
For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.