Storage Developer Conference - #108: SPDK NVMe: An In-depth Look at its Architecture and Design

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to STC Podcast, Episode 108. My name is Jim Harris. I'm a principal software engineer at Intel in the data center group. I'm also a core maintainer for the storage performance development kit. So I know most of you already know what SPDK is, but for those that don't, it's an open source project.

Starting point is 00:00:57 Intel is still driving a lot of the contributions, but there's an ever-increasing set of other companies that are developing patches and getting those upstreamed as well. That's all BSD licensed. There's the short little URL. It's basically a set of user space drivers and libraries for storage, and not just storage, but also storage networking and storage virtualization. But for this talk, I'm going to specifically focus on the NVMe drivers down here. So I know that the title says SPDK NVMe. I'm actually not going to be talking about the target today. Ben Walker, one of the other core maintainers on SPDK, has a talk on Thursday where he's going to go into that in more detail. And then there's another talk tomorrow by Paul Luce that's going to talk a little bit more about this block device abstraction layer.

Starting point is 00:02:01 And I think Fiona here is going to be presenting, too, on some of the improvements we've made as far as doing encryption offload within SPDK. Okay. So just, you know, doing a, I guess, a little bit of an overview, SPDK and the kernel. So I'm not going to have any performance charts in this presentation. I think it's, you know, pretty well established that with SPDK and these types of user space frameworks, you can get better performance and efficiency, you know. But there's some caveats. It's not a general purpose solution. So there's some use cases that SPDK handles very, very well, and that's where people are picking up and adopting it. And then there's other use cases where it doesn't handle it at all,

Starting point is 00:02:37 or if it does, it doesn't handle it very well. But this different design and implementation really drove a lot of the SPDK design, and that's what I'm going to focus on today is just kind of going through some of the APIs, how initializing, bringing up, submitting I.O. with the SPDK driver may look different than a more traditional, like, kernel mode driver, and then, you know, some of the thoughts

Starting point is 00:03:01 that went behind some of those decisions, even in some cases where we've, like, made drastic changes in the API since the project first started a few years ago. So starting out with the, with the paradigm. So we know an SPDK application, it's, it's, in this case, we're gonna have the SSD. It's gonna be dedicated to a single process, right? This is not doing stuff in user space.

Starting point is 00:03:28 It doesn't lend itself to being able to share an SSD or at least one PCI function across multiple processes. So as we see SRIOV become more prevalent, this can be a way to share it between multiple processes, but that's not super prevalent today. Typically these types of applications are gonna be running in pulled modes. You're gonna have a relatively small known number of threads. So you're typically not going to be creating and deleting threads very often, if at all, like while the application is running. And so there's some, some simplifications

Starting point is 00:04:04 that we get there. And then the other thing is that we have pre-allocated pinned huge page memory. So again, for these types of paradigms, they're typically dedicated storage targets, storage appliances, storage applications. And so we don't have, the resources on that system are mostly dedicated to whatever that SPDK application is. So this allows SPDK to be able to void some of

Starting point is 00:04:26 the overheads of pinning memory in the IOPath. Okay, so starting out, we have in SPDK this concept of an environment API. And so what this is, you know, designed around is how do we enumerate the PCIe devices? So in SPDK, before you start the application, the expectation is that some admin, some user has already taken the NVMe devices, they've detached them from the kernel mode driver, and they've attached them to one of the, you know, like the Linux user space frameworks, UIO or VFIO. And then it also is responsible for doing the allocating and pinning the memory.

Starting point is 00:05:07 So this is stuff we don't do inside the NVMe driver. We rely on this environment API to do that. And so the default implementation of that is based on DPDK. So if you go out to GitHub, you pull down the source code, you're also going to end up pulling down a submodule that has DPDK, and that's the default implementation. But what we found was that there are cases where users have their own implementation.

Starting point is 00:05:33 So, for example, somebody may be using NVMe. They already have a framework that's similar to DPDK, and they don't want to use DPDK. They want to use what they already have. And so we've provided this API to be able to implement different environments. We don't have any different environments checked in the upstream, but that API is there and available for, for people who need to do that. This also ends up working really, really well for writing unit tests, because then all the unit tests for the NVMe driver don't rely on DPDK. We can, like, just provide an alternate implementation here that just has really simple interfaces for testing some of the corner cases of the NVMe driver.

Starting point is 00:06:12 So next, transport abstraction. So this is fairly similar. I mean, the Linux kernel driver has this as well. It's, you know, basically you've got the NVMe driver. There's core functionality in there, but then you've got different transport implementations for PCIe, and then also for RDMA. And so that API gets implemented, there's a whole bunch of different, I'm not, don't have the whole list here, but there's, you know, a number of NVMe concepts that are implemented differently based on the transport. So we have this same pluggable transport idea like the Linux kernel does. I think there's going to be,

Starting point is 00:06:51 Sagi's giving a talk later on how the NVMe TCP spec is progressing. And so there is work in progress on that with SPDK, possibly in the future in FC transport. For those that were in the FC talk earlier, there has been some work already done on the target side. On the initiator side, I'm not aware of anybody working on this, at least not through upstream. Okay, so next. Okay, so now we assume we've started our process. The user has some NVMe SSDs.

Starting point is 00:07:25 They've assigned them to either the UIO or the VFIO driver. We've initialized the application, meaning we have our huge page pinned memory, and now you want to actually attach to a controller. And so this is the simplest interface we have here, SPDK NVMe Connect. There's a couple of parameters you pass to it. So first is this transport ID.

Starting point is 00:07:53 And so this is used to identify both local and remote controllers. There's fields in here for the transport type. There's like a transport address, which is like a pci bus device function or it's an ip address there's other parameters there to be able to uniquely identify the controller you're trying to attach to and then there's um controller options so you know this is one of the things where spdk diverges a little bit from the kernel. Since in this case, SBDK is going to be used for fairly dedicated appliances, in some cases very unique implementations. There's a number of things that you can set up here. So you can specify how many queues you want,

Starting point is 00:08:38 you know, controller memory buffer. I mean, these are fairly common, but you can also specify the arbitration mechanism. So if you have SSDs that support weighted round robin, this is how you can specify that. Keepalive, timeout, host ID, et cetera. So you can specify those options differently for every controller. So this is for attaching to one controller. Then we have SPDK NVMe probe. So this is sort of like connect, but this can be used for multiple controllers at once. So again, you've got this transport ID, but the big difference here is that here you have the option to specify either all PCIe. So let's say you have a target system that has an NVMe switching it, so it's got 10, 12, 24 SSDs, you can say, I want to basically probe and attach to all of them.

Starting point is 00:09:31 Or for NVMe over fabrics, you can specify a discovery service. And so then that gives you the ability to try to attach to any of the controllers that are specified by that discovery service. Let me just skip back here. So here you can see here, one of the key parts is, and you're going to see this a lot in this presentation, is since it's all pulled mode, we've got a lot of these callback, callback function, callback argument type parameters.

Starting point is 00:10:00 So the probe function is basically deciding whether or not you want to attach. So there could be cases where maybe you have two, let's say you had two SBDK processes and you wanted one to control half of the SSDs and another to control the other half. So this is fairly theoretical. But this gives you the ability to decide for each of the devices it's found whether you want that application to attach to it or not. And then the same thing for the discovery controller. So that discovery service may report a whole bunch of different controllers, and here you can specify which ones you actually want to attach

Starting point is 00:10:34 to. So the probe function, basically you look at the arguments that get passed in, you return true, true if you want to attach, false if you don't. So on the SPDK NVMe connect call, we talked about how you can pass the controller options in. The other thing that probe allows you to do is when the driver gives you your callback, it also passes you a controller option structure,

Starting point is 00:10:59 and so then you can actually specify different options per controller. Okay, so now you've decided the driver is first going to do all your probe callbacks, and then after all the probe callbacks are done, then it's going to actually try to attach. So your attach function doesn't get called until the controller is ready for use. So once you get this attached, it means that now the controller's ready. You can start doing things like allocating I-O queue pairs, submitting commands, et cetera. Here you'll get your negotiated controller options. So for example, maybe you specified up front, I want you to

Starting point is 00:11:30 allocate 64 IOQs for this SSD, but the SSD only has 32. Well, this is where you're going to get that information back. One downside right now is that these calls are actually synchronous, which actually works fine until you start talking about hot plug, right? So when you bring the system up and you've got 24 SSDs, it's okay if you take some time to go do all this. But what you don't want is you really don't want these to be synchronous when you insert a new device, right?

Starting point is 00:11:57 Because now there's some non-trivial amount of time that you need to bring the SSD up. You have to go, you know, enable, wait for the ready bits, send a bunch of admin commands. Currently, that's all done synchronously, but we have patches in progress, so there's going to be asynchronous versions of these in the next release. One key point here, though, is that the state machine that's inside of SPDK, it will attach multiple controllers in parallel. So early on when we started this, it was not. So I remember some of the Intel SSDs,

Starting point is 00:12:30 there was a non-trivial amount of time that it took to wait for the ready bit to get enabled after reset for one SSD. And so if you have 24 SSDs, you were sitting there waiting for a minute for all those to get ready. And so now this is all done in parallel, so we'll do the attach and run through the state machine in parallel, waiting for them, to get ready. And so now this is all done in parallel, so we'll do the attach and run through the state machine in parallel, waiting for them all to get done. So you're effectively only, you're going to take, you know, whatever the biggest amount of time for

Starting point is 00:12:54 one SSD to initialize, that's what you're going to incur overall. But I think it's going to be really nice in the next release when we have the asynchronous version, because then even for the hot insert case, you can be probing and initializing SSDs and still doing IOs on the same thread. Okay, queue pair creation. So this is another place where SPDK can differ quite a bit from the kernel. So I mean, the kernel, when it comes up, it's going to, you know, look at all the queue pairs, it's going to basically try to assign one queue pair per core as best as it can. With SPDK, though, it's done very, very differently. The queues are not pre-allocated.

Starting point is 00:13:34 So when you say, I want to allocate an IO queue pair, we're actually issuing the admin commands to create the completion queue and create the submission queue in line. And the reason we do that is because we want to give the option to provide different, we want to do different options per IO queue pair. So, for example, if you're doing weighted round robin, you need to create different queues with different priority levels. And that's how we do it here. You can also do things like specify the size of the IO queue.

Starting point is 00:14:00 So, what's the actual IO queue that we're basically registering with the controller. And then also the number of, we call our internal IO requests data structures. And I'm going to talk about that here in a little bit. So this API is also synchronous. You know, again, this is another one that we're working to make asynchronous. I think, you know, again, when you hot insert a device, having the attach be asynchronous is great, but you're going to want to turn right around and start allocating IOQs on it as well, and so you're going to want that to be asynchronous. Okay, command submission.

Starting point is 00:14:32 So I know this is a lot of text. Can people in the back actually read this? Or is it too small? I made it as big as I could, but all APIs related to command submissions in SPDK are asynchronous. So you're going to see these callback function and callback argument. So it means when you say do this read, when it returns, it will have submitted the I.O. to the device. But the completions are always going to come later. So I'll talk a little bit about how some of the polling stuff works.

Starting point is 00:15:02 But you're only going to get those completions once you decide you want to actually poll that queue pair for completions. All the payload buffers are virtual addresses. Early on, maybe three, four years ago, I mean, maybe even earlier than that, we had talked about maybe we should take all of the virtual to I-O virtual address translation outside of SPDK, push it up into an upper layer. than that. You know, we had talked about, maybe we should take all of the virtual to I-O virtual address translation outside of SPDK, push it up into an upper layer. And I'm thankful we didn't, because

Starting point is 00:15:30 then NVMe over Fabrics came along and physical addresses wouldn't work anymore. So you do pass in the virtual addresses to the NVMe API. We're gonna talk in a little bit about how, you know, how that translation actually happens. An important point here is that these are not always necessarily one call to this means one I.O. submitted to the underlying controller. So there are cases where we will split I.O.s in SPDK. I'm going to talk, be

Starting point is 00:16:00 talking about that as well. So this is really important because there are cases where we have people say, you know, I said I wanted to do 100, I specified 128 queue size, and I tried to do, like, you know, one meg IOs, and the device didn't support that, and so they, you know, ran out of request objects. It's just something to kind of keep in mind as you're deciding some of those parameters when you're creating queues. Okay, so after you've submitted a bunch of IOs, the application is responsible for pulling for completion. So the driver is completely passive in this regard.

Starting point is 00:16:38 It's basically waiting for you to say, hey, I want you to go check to see if there's completions for this queue pair. You know, the other implicit thing here is that the application is responsible for the thread safety. SPDK doesn't take any locks, meaning that the assumption is that the application is submitting and completing IOs

Starting point is 00:16:54 on the same thread. Now, you can certainly use one queue pair from multiple threads. You're just responsible for managing the synchronization for that. And so we did that specifically. We didn't want to force locks on applications that didn't need it. So then here, basically, those callback functions that you passed in when you submitted the I.O., when you call this function the queue pair process completions, you're going to start getting

Starting point is 00:17:18 those callbacks invoked for each I.O. that had been completed. And then you can also use this max completions argument to kind of limit that number. So for example, let's say you have a, for quality of service purposes, you've submitted a bunch of I.O., but you don't want to take the hit, you don't know how many have actually been completed, so you may want to say, you know, go process some completions, but don't process more than two or four or eight. And so this gives you the ability to do that. Or you can just specify zero,

Starting point is 00:17:46 and it'll process whatever's available. Okay, so struct NVMe request. This is not part of the driver API, but it's a pretty critical data structure within the driver itself. We wanted to avoid a couple of things. First was any expensive, when I say expensive, like we're talking about a driver that's doing, you know, multiple millions of IOPS per core. And so we wanted to cut out as much overhead as we could there. So we didn't want to do any type of real memory allocation. So these objects are all allocated with the queue pair

Starting point is 00:18:30 when the queue pair first gets created and we just pull them off a list and put them back on the list. And since only one thread's accessing it, that's super, super efficient. And then we also wanted to avoid having cases where you had maybe multiple threads all sharing one common pool, because then you'd run into weird cases where, you know, you run out,

Starting point is 00:18:50 and maybe there's a bunch that are available, but other queues are using them, and we just decided, you know what, it's actually going to be a lot. We felt it was going to be cleaner to just allocate them all up front for the queue pair, kind of put a little bit of the impetus on the application to decide what those best numbers are. We went back and forth on this quite a bit. I could definitely see cases both ways, but this is the method that we went with. These are all transport agnostic,

Starting point is 00:19:20 meaning these are used up at the common NVMe layer. And then they can be queued. So just talking a little bit about the PCIe transport. So these requests, these are purely software constructs. So there's no PRP. PRP, that's a PCIe thing. That's not an NVMe over Fabrics thing. So inside of the transport, we may pre-allocate PRP lists

Starting point is 00:19:44 that are also allocated from pinned memory. And there may be cases where you don't have, you have more requests than you have those mapped objects, and so sometimes those can be queued, and they'll be queued down in the transport layer. I.O. splitting. So this is one that we actually spent quite a bit of time discussing early on,

Starting point is 00:20:07 is those cases where the IO has to be split before it's sent to the controller. So there's a few different times where this happens. One is max data transfer size. Another is namespace optimal IO boundary. And so working at Intel, I'm very familiar with this one. A lot of the Intel NAND SSDs have a 128K optimal I-O boundary, meaning that if you have an I-O that spans that boundary, there's going to be a performance penalty. And so you're better off splitting that up in the host rather than just sending it down to the drive.

Starting point is 00:20:49 And so, you know, we had, there's a couple options we could have taken here. We could have said, you know, let's just report all of these options up to the application and make the application do it. You know, and just have SPDK just be kind of the bare minimal. Or we could actually have SPDK split these IOs internally. We went with option number two for a few different reasons. We knew that every single application that was out there was going to end up doing this code, you know, doing all this IO splitting. We thought about putting it into, in SPDK we have this block device abstraction layer, and we thought, well, we could put it in there. But then we know that there are people who want to use the SPDK, we have this block device abstraction layer, and we thought, well, we could put it in there.

Starting point is 00:21:26 But then we know that there are people who want to use the SPDK NVMe driver in isolation. They don't want to pick up that code. And so we decided to go ahead and put it in the driver itself. You know, a lot of cases, ignoring the I-O boundary, functionally it's okay, and it'll work fine.

Starting point is 00:21:42 But then you start collecting performance, and you see performance things you weren't expecting. And so that was another thing we thought, you know, it's really going it'll work fine, but then you start collecting performance and you see performance things you weren't expecting. And so that was another thing we thought, you know, it's really going to probably be better for us to just do this inside of the driver itself and not push that onus onto application developers. So then the question comes, like, people say, well, you know, I don't want you to do the splitting. Like, I don't want any of that code. It just adds extra overhead. Just give me the raw stuff and don't even do the checks. But our performance measurements have showed it really has minimal with any impact.

Starting point is 00:22:13 And so that was just another reason why we just felt it was better to just go ahead and put it in the driver and not require applications to worry about it. And so then here, since we're doing the splitting, obviously those NVMe request structures we talked about, they support these parent-child type relationships. So you can have one I.O. It can get split into a whole bunch of child I.O.s. We obviously don't complete the parent until all those children are complete.

Starting point is 00:22:36 And of course they can complete out of order. Okay, vectored I.O. So see, now we have even more arguments, making it even harder to read from the back of the room. So we took a pretty different approach here to doing vectored I.O. So I mean, there's a couple different ways you could do it. One is you could basically say that the user has to pass

Starting point is 00:23:00 in an array of struct I.O. Vex. But as we kind of went into this in more detail we this driver could be used in a whole bunch of different environments people could be using this with maybe a dpdk front end where the data is not coming in in io vex it's coming in in rte m buffs or people might be using some other data structure to represent those vectors and so we wanted to avoid a couple things. One was making the application then have to allocate a bunch of these struct io vec arrays. You know, if we're basically forcing them by the API, it means that now all of a sudden they may be getting the data in one format, and we're telling

Starting point is 00:23:36 them, well, no, you've got to go allocate some memory, translate it, and then send it down to us. And then this also avoids, because inside of the driver, then you would end up basically just translating that struct io vec into some NVMe concept anyways. So we actually provide these sge, sgl functions. And it says sge, sgl. I mean, these work for both PRP and scatter gather. But basically when the, when the driver's getting ready to actually fill out the payload, you know, whether it's PRP or SGLs for infamy over fabrics,

Starting point is 00:24:11 it'll call back up to the submitter to get each scatter gather element. And so we felt like this would simplify the driver getting used in a bunch of different implementations. Okay, so virtual address to I-O virtual address, or I guess physical address translation. So in SPDK, we actually maintain a user space page table. So we said up front, we've pre-allocated all this pinned memory, so we register it. We actually keep a user space page table to do these translations. With SPDK today, we only work with huge pages, so at least 2 meg in size. We could support 4K. In fact, Stephen Bates and I have talked about

Starting point is 00:24:56 this because in a lot of cases, controller memory buffers may not be 2 meg. And so there's cases where we may want to make this a little bit smaller. So that's not there today. But then basically we've got this two-level page table. And then the, as the driver is getting ready to actually program, like, let's say the PRP list for the PCIe driver, it can start calling back into that environment layer to get those, to do those translations. And then we use a very similar scheme for RDMA, except in this case,

Starting point is 00:25:27 you're not translating it to an IOVA, you're translating the virtual address to a memory region. So that same memory registration lookup code is used in both transports. Device hot plug. This isn't really important, is it? So for device hot plug, and so here I'm talking more about hot removal.

Starting point is 00:25:55 So for hot insert, it's very similar to when the application first starts, right? SPDK isn't going to automatically go and attach to any SSD. You're sort of responsible for making sure the application knows which ones to, which ones it's going to attach to. SPDK isn't going to take an any SSD automatically and attach it to one driver or another. That's all kind of handled outside the application. But here we're specifically talking about device removal. So we set up a net link socket to basically look for events when a device has been removed. But of course there's a huge race condition here, right, because there's a non-trivial amount of time

Starting point is 00:26:30 between when we actually get this U event, I'm sorry, for when the device is actually removed until the U event happens. And so if you're not careful, you're starting trying to submit I.O. and you're running in user space now, and so you end up getting SigBus errors. So what we actually do is we use a SigBus handler to handle this race condition. So we know there's some amount of time between when the device gets removed, when we get the U event. When we get the U event, then we can sort of safely quiesce everything. But until that happens, we need to guard against threads that are trying to still submit I-O to that controller.

Starting point is 00:27:10 So we actually take, the SigBus handler itself doesn't really help us unless we know which controller actually is, has been pulled, right? I mean, this, we could have 24 SSDs in the system. We have to somehow know, well, which controller actually caused the problem. So we actually, we've got a thread local variable. Before we do any device access, we write the NVMe controller pointer to that thread local variable. And so then we know when that SIGBUS handler executes, we know which controller actually failed. And then we remap some memory into that region. And so then at least we know until we get that U event that we can write

Starting point is 00:27:47 to something that's not gonna cause a SIGBUS. It's obviously not really doing anything. But at least we can continue, wait for the U event to occur, and then we can safely quiesce and, and detach the controller and software. Timeouts. So timeouts are completely optional in SPDK. They're set at a controller level, so there's no, we don't have any ability today to do this out on per IOQ. It could be done. There just hasn't been a big pressing use case for it yet. But of course, we don't have any interrupts, so how do you actually know when something timed out?

Starting point is 00:28:28 Well, we actually, we check it when you do the completion polling. So for any I-O queue pair, when you go and you say poll for completions, it's also going to look at the requests that are still outstanding. And then it's going to invoke this callback function. So the SPDK driver itself does nothing. All it's going to do is just going function. So the SPDK driver itself does nothing.

Starting point is 00:28:46 All it's going to do is just going to notify you, hey, this thing timed out. You said 30 seconds, and 30 seconds has expired, and then you can decide what you want to do next. If you want to do an abort, do a reset, I don't know, print something to the screen, it's up to the application. But we kind of felt we wanted to do this

Starting point is 00:29:00 rather than trying to build a lot of functionality into the driver itself to automatically do in a board or automatically do a reset. And then as a request, since we only, you know, specify, we don't specify per request timeouts. It's a, it's a fixed time, you know, per IO submitted on that queue. We just link them all in submission order, so it's really cheap to check. We just check the next one in the list, and if it's, if it's expired, then we'll notify and we'll check the next one. Once we find one that hasn't expired,

Starting point is 00:29:28 well, we don't have to check anymore. So it's actually really efficient to do this. Okay, benchmarking. We've got two options with SPDK. So we have a plug-in for FIO, so it integrates with the SPDK NVMe driver specifically. And then we also have another tool that we call NVMe Perf. What we found was that, I mean, FIO is fabulous.

Starting point is 00:29:55 It does everything. It's super efficient. But we actually found that at some of the IOPS rates we were doing, just the overhead in FIO, once you hit one to two million IOPS, that was about the limit that we could get. And so we wanted to develop something that had a lot lower overhead, but of course it has a lot fewer features as well. So this isn't necessarily like one's better than the other.

Starting point is 00:30:18 It just kind of depends on what you're doing. For a lot of test cases, I mean, FIO is great because you've already got FIO scripts for everything else. You just use the same FIO scripts with SPDK. But for cases where you want to, you know, maybe you're doing some, you know, device evaluation, and you want to get something a little bit even lower level, NVMe Perf can work there as well. So this is another, I don't know, what I think is a very cool feature of the NVMe Perf tool is these very granular, I mean, I say sub-microsecond. You can see here, I mean, it's on the order of, you know, 10 to 30 nanosecond buckets.

Starting point is 00:30:55 So there's an option you can do to NVMe Perf, and it'll literally, it's all done with timestamp counters. All the ranges are calculated, you know, offline. They're not calculated in line. And this is really awesome. When we were testing something like the Optane SSDs, you know, you don't want to see necessarily just per microsecond. You want to actually see at a much lower granularity than that. So this has been a pretty popular feature of that NVMe perf tool. Metadata support. So for those that aren't familiar with NVMe, there's two different ways that you can do metadata.

Starting point is 00:31:34 So you can actually have the metadata in line with the rest of your data. With SPDK, that's just done with the standard I.O. function. So now instead of sending 512 bytes per block, you're sending 520 or 528. But we also have support for separate metadata buffers as well. So we have a whole bunch of different variants in the API for doing it with metadata.

Starting point is 00:31:58 And so then in this case, you're basically passing a separate metadata buffer for all the blocks covered by that. Now, we don't support the metadata being in separate scattergates, like a scatterguide that was for the metadata. That has to be all virtually contiguous. And then for doing things like data protection, if you remember all 85 parameters that you passed to some of those I.O. submission calls,

Starting point is 00:32:23 one of them was actually IO flags. And so this is where you can specify, you know, PR Act and some of those other flags that you need to be able to do the data protection. So raw and pass-through commands. So very similar to what we see for the other ones. I mean, here you're basically responsible for building the command yourself, at least all the fields that the application can fill out. SPDK will then fill in things like the PRP list.

Starting point is 00:32:52 So you can use this for vendor-specific commands. There's a similar API for doing admin commands as well. So, yeah, summary. There's a lot of things we can do different in SPDK. I mean, there's definitely some, to be quite honest, there's some shortcuts, right? I mean, this is for some use cases where SPDK can do some things that you can't really do with a kernel mode driver that has to handle every use case under the sun. So, you know, depending on the application, SPDK can be a great, uh, can be a great option. Um, this has really led to a lot of, yeah,

Starting point is 00:33:30 different ways of doing things. Um, you know, different than, uh, when we first, uh, I actually first ported this driver from the FreeBSD kernel mode driver probably five or six years ago. Um, and there's, I mean, we've, you know, basically changed this about everything because there was a lot of things in that driver

Starting point is 00:33:46 that were very specific to that kernel mode implementation that we really wanted to make different for SPDK. And then if you have more questions, there's a ton of more information on spdk.io, additional documentation, mailing list, IRC channels, et cetera. So if you've got more questions and don't get a chance to talk to me

Starting point is 00:34:05 this week, we'd love to see you join the project and we can help answer questions there. So that's the end of my presentation. Appreciate it. Thanks for listening. If you have questions about the material presented in this podcast,

Starting point is 00:34:21 be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.

Storage Developer Conference - #108: SPDK NVMe: An In-depth Look at its Architecture and Design

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Storage Developer Conference - #108: SPDK NVMe: An In-depth Look at its Architecture and Design

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.