Storage Developer Conference - #60: SNIA NVM Programming Model V 1.2 and Beyond

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to STC Podcast Episode 60. I'm Tom Talpe. I'm a participant in the SNEA NVM programming TWIG, which I will tell you quite a bit about and what we've been up to and where we are headed. This is a SNEA presentation. I'm here representing the SNEA programming TWIG. I have a vested interest in this. I've been participating in it for a few years, so I have my own opinions. We can talk about those if you like, but I will attempt to be the representative of the TWIG here and give you the broadest view of what

Starting point is 00:01:14 we're up to and what we're all about. We are scheduled on the calendar for a 40-minute talk here, and I've just discovered that most of the talks in this same slot are 50 minutes, so I'm assuming that's a typo. My talk is planned for about 30 minutes with Q&A, but I can keep going a little bit, and we can talk about whatever, however the spirit moves us. So roughly, I'm going to organize

Starting point is 00:01:43 the discussion into three areas. A quick persistent memory overview. This may be review for some folks, but in a way, it sort of sets the stage for the things that the twig is doing. The persistent memory has been around for a while, and it's rapidly approaching reality. In fact, that's where I want to go in the third bit, you know, where we are today. After that overview, I'll talk a bit about the SNEA NVM programming model

Starting point is 00:02:14 and what we're all about in the TWIG. And finally, wrap it up with just a quick view of today's NVDIMM technologies and OS support, major OS support for these technologies. So my hope is to show what is persistent memory, how you use persistent memory, and how you use persistent memory in operating systems today, increasingly tomorrow.

Starting point is 00:02:42 So first section, oh, and feel free to interrupt. We can make this interactive if you like. So here are some basic set-the-stage sort of pictures of memory, persistent memory. In today's systems, memory and storage are beginning to converge. The volatile and non-volatile technologies, which is, if you will, memory, volatile and non-volatile technologies, which is, if you will, memory, volatile, non-volatile, storage, are converging together. They're becoming one and the same.

Starting point is 00:03:14 Memory has non-volatile properties, right? And the really, really important thing about it is that there are many such technologies. We have taken this idea of non-volatile memory into sort of a stage-setting paradigm, but there are many different implementations of non-volatile memory or persistent memory. And these implementations are changing over time.

Starting point is 00:03:41 So on the left of this picture, there's a rough graphic that says today. And I don't know if this thing works. No, it's not important. Roughly, today, in traditional systems, you'd see DRAM as being the non-volatile memory-like device, and a disk or a solid state device being the non-volatile memory-like device, and a disk or a solid-state device

Starting point is 00:04:06 being the non-volatile or storage-oriented device. And these are two very different things, and they don't tend to overlap very much, right? DRAM is purely DRAM. It's power-dependent, and it goes away when the system fails, whereas disks and SSDs retain that data, right? Over time, that's beginning to change. That new middle tier

Starting point is 00:04:27 in the pyramid, persistent memory, is beginning to appear. And there are a number of those things. And you can see that over time, that persistent memory becomes more and more predominant in today's platforms. This is certainly true in server platforms, where disks and SSDs become, if you will, the capacity solution, whereas persistent memory becomes the latency, the low latency solution. But they both have non-volatile properties. Okay, and some examples down in the lower right of new and emerging memory technologies, right? We have resistive RAM, 3D crosspoint, we have MRAM, PCM, face change memory, low latency NAND, many other types. Some of them are proprietary or on package. Some of them are more open and

Starting point is 00:05:12 plug into existing designs. There we are. The persistent memory vision is that persistent memory brings storage to memory slots. And there are a number of really interesting things about that transition that we've been exploring in the Twig. And I'll just start walking through them as we go here. But the idea is that it's fast like memory. It's also byte addressable. It has a different API, if you will, from traditional storage. It's fast like memory, but it's persistent like storage. All right? So it brings new

Starting point is 00:05:54 properties and requires new thinking on the part of application designers. It's good for system acceleration. It's good for real-time data capture, for analysis, intelligent response. There are many, many applications that can and are making use of persistent memory. We in the SNIA, in the TWIG, define persistent memory in a couple of fairly careful ways. As an emerging technology,

Starting point is 00:06:24 there's been a lot of dialogue in the industry about it. And this dialogue has leapt off in interesting directions. And so there are some terminology and some aspects of these technologies that are common but are sometimes a little bit confusing to the new reader. We view roughly two types of persistent memory as a type of non-volatile memory. The word non-volatile memory, if you see it down at the bottom, is the more generic term, right?

Starting point is 00:06:57 But persistent memory, we're adopting the term PM, or persistent memory, as these memory-like applications in the second bullet. So there's disk-like non-volatile memory that is persistent memory. You can think of it like a persistent RAM disk. It appears as disk drives to applications in the traditional storage API,

Starting point is 00:07:19 and it's accessed as an array of blocks. Whereas the same underlying technology can be accessed in a very different way, in a memory-like way. And here we call it PM. It appears as memory to applications. The applications store data directly into it. They don't issue IO requests, they literally store.

Starting point is 00:07:39 They run a CPU instruction to store into it. There's no IO operation required. So you can see that it's a very different device in its interface, even though it's maybe the same device at the bottom. And so there's been some confusion around the term NVM. You may have noticed that the Twig is called the NVM programming Twig, right?

Starting point is 00:08:05 NVM. Well,, right? NVM. Well, that's a little weird. Here we are talking about PM. Since the twig was founded, which was actually about five or six years ago, NVM is beginning to be used more as a term for block-oriented access, right? It may have the same underlying technology,

Starting point is 00:08:23 NAND or 3D cross point or battery back DRAM or whatever, but the NVM style, if you will, is the block-oriented style, whereas PM is the byte-oriented style, direct access to NVDIMM sitting in a memory slot and access to as memory, not just like memory, as memory. So anyway, I think you'll see increasingly these terms used a little more selectively. NVM for block, PM for byte. So a couple of characteristics of PM, it's byte addressable from the programmer's point of view. The API is a load store API. You literally load and store. You can read from it by loading and write to it by storing, dereferencing a pointer, basically. It has memory-like performance,

Starting point is 00:09:12 right? It has the latency of memory, hundreds of nanoseconds, right? Shocking latency for a storage device. Two orders of magnitude, maybe three orders of magnitude better than devices you might have used just last year or still use today. Huge, huge paradigm shift. Supports DMA, okay, including RDMA, remote DMA. That's where my interests lie, remote access to this stuff. But, you know, it can be accessed by devices just like memory. It supports DMA as well as CPU access. It's not prone to unexpected latencies associated with things like IOQing, demand paging, or page caching. It's just memory. It has a very predictable, very low jitter delay.

Starting point is 00:09:57 A couple hundred nanoseconds every time. Every time. That's the way it behaves. So you can think of PM kind of like you might have thought about power protected RAM, right? It's there all the time. You know, the system crashes, comes back up, same stuff is still there, right? It's just RAM that's persistent. So defining the terms, we're going to focus on PM. That's where the SNEA NVM Twig is focused.

Starting point is 00:10:31 All right, enough of that. The next section will be about the NVM programming model. And this is the core activity of the Twig. It's about writing applications for persistent memory. Now we define applications pretty broadly. You could be a storage driver and you'd be an application to the persistent memory. You could also be an actual traditional application. You could be a database. You could be whatever, accessing storage directly via this new paradigm. So the SNEA NVM programming model addresses this new landscape.

Starting point is 00:11:15 It was originally developed to address the ongoing proliferation of new non-volatile memory functionality and new persistent memory technologies. The key word here is new, right? These technologies are new and the platform support for them are new. This stuff is really, really emerging rapidly recently. The latest generations of CPUs, Intel, AMD, et cetera, have explicit instructions and explicit support for persistent memory in their hardware

Starting point is 00:11:47 and in their software, in their instruction sets. So, you know, how do we address it? How do we use it was the question that was posed to the SNEA. The NVM programming model, which is the output of the NVM programming twig, is necessary to enable an industry-wide community of persistent memory producers and consumers. SNEO works both on the supply side and on the demand side, if you will.

Starting point is 00:12:13 We not only steer applications to use this technology, we use the ensuing requirements and dialogue to drive the technology itself. It's been really a very symbiotic relationship between the two. There are a number of significant architecture changes. We're only beginning the road of the architecture changes here. The instructions, the motherboards, the standards for identifying these devices and classifying these devices has classifying these devices

Starting point is 00:12:45 has only just begun. And so the SNEA really kicked into action a little while back to try to steer that discussion, to try to keep it together, to keep it productive, not just have manufacturers go this way and applications go this way, to really keep it all in one dialogue or one area, one identifiable place to come to have the discussion. That's what's key. And so it's generated a few specifications. I'll talk about them a little later. Defining recommended behavior.

Starting point is 00:13:16 It's not a requirement. It's not an absolute spec. But it's recommended behavior between user space and operating system components, all supporting persistent memory. This is how we recommend you do it. Your peers are doing it this way, you may want to think about it. This may be a good idea, you may want to think about it, whatever.

Starting point is 00:13:37 So there are several operational modes, and each mode is described in terms of a use case, actions, and attributes. And I'll show you, there are four modes. And each mode is described in terms of a use case, actions, and attributes, all right? And I'll show you, there are four modes. But those modes are very distinct. You will choose these access techniques to a persistent memory device in your application and your operating system for very specific reasons. And the Twig has been spending its time

Starting point is 00:14:02 really drilling down on those reasons, the implications of this access, the error cases. Those are really, by far and away, the most interesting implications of these things. And the benefits and the changes that may have to happen to your application. There's a lot of legacy access, and there's a lot of non-legacy access.

Starting point is 00:14:21 And each one has advantages and disadvantages. And so what the TWIG has produced, basically we're on our third iteration right now, is to rally the industry around a view of persistent memory by printing these, by publishing these technical positions, and a view of persistent memory that's application-centric, vendor-neutral, that's a very key aspect of SNEA activity. Things that are achievable today. These are not blue-sky dreaming things.

Starting point is 00:14:54 These are things that are available today and this access method can be achieved today. I'll show you exactly what those are. And it's beyond just storage, right? It's the SNEA, Storage Networking Industry Association. But these accesses go far beyond storage. They're applications, memory technologies,

Starting point is 00:15:15 networking technologies, processor technologies. All these things are in scope for the SNEA activity. So the programming model TWIG, the technical working group, has a mission statement. Roughly speaking, that is to accelerate the availability of software that enables persistent memory hardware. So it's a software-focused activity, right?

Starting point is 00:15:39 We're not specifying hardware. We're looking at industry hardware. We may have opinions and ideas to talk, but we're focused on accelerating the availability of software primarily. The hardware includes both the sort of solid state device. This may be a sort of a new PCI Express NVMe device. It may be some other type of device.

Starting point is 00:16:02 It may be a traditional serial device such as SATA or SAS, as well as, of course, persistent memory, right? Plugged into the memory bus. Instead of on the IO bus, PCI or SAS or SATA, it's plugged into a DIMM slot. So both those things are well in scope, and the software spans both applications

Starting point is 00:16:22 and operating systems. We try to go all the way up the stack. How does the operating system use this device? How does the application use this device? And the network as well, by the way. And the mission of the TWIG is to create the NVM programming model. That's the spec that we showed on the previous slide, and there's a few different things.

Starting point is 00:16:43 It describes application-visible behaviors. We try to take the application view. The application might be the operating system or might be up in user space, doesn't matter. We allow the APIs to align with operating systems. We don't do crazy stuff. We try to do things that are implementable today. But while we're doing it, we expose opportunities in networks and processors. I'm proud to say that one of the main things that we've been exploring in the TWIG is this remote flush operation that RDMA adapters may use to push data to durability. This is really, really important to have this

Starting point is 00:17:20 discussion in a sort of a requirements scenario, right? Instead of just walking into the technical group that might be specifying the protocol and saying, here's what it's going to be. We thought it through from the top down. What is the interface? How can an application use this? How would we use this in a remoted environment and things like that? And so we expose these opportunities.

Starting point is 00:17:42 We say, this is how we think it should work, and this is why, and this is the consensus that we've drawn. And that's a very powerful message to take into another standards group. It's really, really important. So the SNEA NVM programming model exposes new block and file features to applications. These features are probably familiar, but nonetheless there are new implications on some of them. They're not things that applications are used to

Starting point is 00:18:17 sort of querying from their device. For instance, the atomicity capability. A block device is typically block atomic. When you write the block, if the block succeeds, you knowity capability. A block device is typically block atomic. When you write the block, if the block succeeds, you know you have a good block. If the block fails, you know you either have a bad block or some sort of error has been signaled on the entire block.

Starting point is 00:18:38 That's not true with persistent memory. If you have a failure, you may have damaged things far outside the scope of what you were writing. At the very simplest level, you can say that the processor is writing a cache line, not a single byte, even though you thought you were dereferencing a car pointer. You dirty the cache line, and if it failed,

Starting point is 00:18:57 probably the whole cache line went bad. How do you know that? How does the application know that? So what's the granularity, and what atomic guarantees can you get from this hardware? Thin provisioning, right? Traditional storage can be thinly provisioned, and so management of thin provision

Starting point is 00:19:14 and managing the blocks on these devices and things like that are certainly something which is new and has a new behavior on these byte-addressable devices. Memory-ma mapped files. One of the very first ideas was, well it's just memory, and we know how to memory map a file, right? And so why don't we just use traditional

Starting point is 00:19:35 memory mapping APIs in operating systems as our access method? And it works, works great actually. All the operating systems support it. But it's by no means the only way or the best way to do it. And there are some gaps between the traditional memory map APIs and the requirements of durability, for instance.

Starting point is 00:19:55 What does sync mean? What does msync mean? And so a new set of APIs has grown up around the concept of memory mapped access that enable the full scope, if you will, of persistent memory behaviors. And there are open source implementations available of the C-NVM programming model, and I'll mention those a little later.

Starting point is 00:20:20 It's important to note that it's not an API. It's only a programming model. It's an to note that it's not an API. It's only a programming model. It's an abstract API. We are not sitting here trying to specify a single API that'll be used across all operating systems. There's no such thing. There never will be. So it's defined in terms of attributes, actions,

Starting point is 00:20:38 and especially use cases. Implementations map those actions and attributes to concrete APIs. There are some open source ones and there are some, I won't say proprietary, but adaptations of existing methodologies, programming methodologies, Windows, Linux, etc. So the programming model V1.2, actually the other one supported something very similar, support roughly basically four modes of access. And they're outlined in blue here. NVM dot something. On the left, you'll see NVM file and NVM dot block. Those are disk drive and disk-like access methods

Starting point is 00:21:27 that are quite traditional. File APIs, basically. You open a file, you read and write byte offsets in the file. You open a device, you read and write LBA sectors from the device, right? And this is just an obvious API, right? This is the sort of thing that all applications will be fully prepared to do, right? It's traditional I.O., but it's supported on persistent memory devices. So they can transparently adopt these new devices

Starting point is 00:22:01 by following the guidelines in these programming interfaces. On the right is the more emerging technology version of these things. They are quite different, but you can see that they are nvm.pm.file and nvm.pm.volume. These are, if you will, the file byte-oriented and volume block-oriented accesses to persistent memory devices which are non-legacy, mapped file type, memory-like access. All right? And so these four models are the focus of the Twig's interface work. Block and file modes use an I.O. model, right, in the Twig vision. It's an I.O. model.

Starting point is 00:22:55 Data is basically read or written, but the data resides in RAM buffers in the end. You don't directly access the RAM buffer. You perform a read or a write to the buffer by name. It's in either a block or a file, sort of a namespace. Software controls how to wait. It can be a context switch or a pulse, a traditional I.O. model. That wait may not be very long because it's a very low latency device underneath it, But there's an I-O model. Usually there are filter drivers or context switches or handoffs or all kinds of software interposes

Starting point is 00:23:31 on that read or write. And how that top level requester waits for the result is the business of that requester. But typically we will see this as a context switch or sort of a poll for completion while these other layers process the data. And the status of the result is explicitly checked by software.

Starting point is 00:23:52 Generally speaking, read puts data in a buffer and returns a status saying good, bad, or otherwise, right? And it's a traditional IOMO, but it's once again supported by persistent memory underlying. Volume and PM modes, right, these are the two on the left side of this, no, sorry, the right side of this diagram. Volume and PM modes enable load-store access.

Starting point is 00:24:18 There's no IOMOB operation at all. There's literally an instruction to load, to dereference a pointer and pull data data from it or to dereference a pointer and push data to it. They are directly mapped. The data is loaded into or stored from processor registers. Load register 5 from star P. The processor pens the software for data during the instruction, right? You don't think of read as a load operation as an asynchronous operation. It blocks, it stops the processing of that thread while that read instruction executes and then it runs the next instruction, right?

Starting point is 00:24:55 So there's no context switch, there's no poll, there's no nothing. At the end of the thing, there is data in that register. Right, period. But there's no status. I didn't say read some handle, some buffer. I said put the result right here, and I got a result. So there's no actual status return.

Starting point is 00:25:16 It doesn't say success or failure. It generates an exception. You get a non-maskable interrupt or some sort of software exception that says, oops, the data that landed in that register is no good, and here's why. And it's often a machine check, parity error or read failure or some sort of fatal problem on the bus. And so it's a very, very different programming model.

Starting point is 00:25:40 It's not unfamiliar to users, but it's not the sort of programming model you were expecting necessarily as a storage program. So a little more information on the block mode, file mode, and persistent memory modes. So nvm.blockmode is targeted at file systems and block-aware applications. It supports atomic writes. When you write a sector, it behaves atomically. You won't write part of that sector. You'll either succeed or fail for the entire sector.

Starting point is 00:26:17 It has length and alignment granularities. The size of the sector and the alignment of the sector are pretty well constrained. It supports thin provisioning. It's a traditional block API. You can trim the sector. You can update the sector. You can do whatever you want. The NBM file mode is targeted for file-based applications.

Starting point is 00:26:39 As I say, these things are traditional APIs. There are some atomic write features dependent on the type of hardware you have, and you can discover it and use it as a consumer of this interface. And the granularities as well, similar to the granularities you see above with block mode, but there are additional solutions available in block mode that are not available in file mode. Something called a BTT, for instance. Anyway, you can roughly see a diagram here that shows what they flow through. The green boxes are basically the things that the SNEA programming model creates or defines.

Starting point is 00:27:21 The blue boxes are things that exist in today's systems. And the purple box at the bottom of the stack is the new hardware that showed up overnight. creates or defines. The blue boxes are things that exist in today's systems. And the purple box at the bottom of the stack is the new hardware that showed up overnight. Persistent memory modes, the PM modes, there's two of them, like I mentioned before. There's nvm.pm.volume mode, which is the software abstraction for the actual hardware.

Starting point is 00:27:48 The idea is I'm going to open the DIMM or some slice of the DIMM. We'll call that a PM volume. It might be a contiguous chunk. It might be some scatter-gathered piece, subset of the chunk. It doesn't matter. We've got a hunk of this hardware available to us. There will be an addressing range. It does have thin provisioning management. These blocks are sort of abstract and can be added and removed from the map, for instance, all these things.

Starting point is 00:28:10 But the PM volume is intended to be a device abstraction. The PM file mode is an application where you view a PM as containing file data, right? There's some sort of dictionary that says, I'm gonna open this name, whatever it is. you view a PM as containing file data, right? There's some sort of dictionary that says, I'm going to open this name, whatever it is, and it's going to look something up and find out where it's laid out in that device, and it's going to give me basically a base and bounds access to this random array of blocks on the device or bytes on the device. PM file requires an explicit flush because, well, actually both of them do, but the PM

Starting point is 00:28:48 file especially, because in the byte mode, when you are dirtying some of the bytes in the file, you have to make them durable. You have to push them to the disk. In fact, you have to tell the file system that you've pushed them to the disk as well, metadata updates and things of that nature. So you can see two data paths to the device in this model, where the PM-aware apps will flow through what's on this chart, pink modes,

Starting point is 00:29:18 the PM.file and PM.volume that flow into PM-aware and PM-capable drivers, right? Explicit OS PM-aware subcomponents, not necessarily legacy subcomponents. And on the right, that dotted line is the direct data path by which those loads and stores progress. So if you will, there are sort of two modes of accessing the device. But the idea is that the data flows on the right and the consistency, durability, and metadata flow on the left. I didn't get the very end of that. The pink boxes are the...

Starting point is 00:30:16 Oh, SNEA is creating the interface, if you will, the top edge of these two pink boxes. We're defining the interface by which PM-aware apps can access PM devices through these abstract interfaces. And so we define the behavior of the interface, and we actually dig down on some of the details of how to implement. The PM device is part of the platform, and it's emerging. More and more of them are appearing on platforms. But SNEA really doesn't play down here except to say we understand these things very well, and we are defining how to access them in an efficient way.

Starting point is 00:30:58 This is the bottom box that was in the holder. This is the PM device, the hardware, the DIM that's plugged into the slot, for instance. Yeah? PM file, what does it do directly to the device and when is it in trouble? It depends on the methods that are being called in the interface. It's directly, this is actually implicitly part of the PM file API, the loads and stores. It's just that you don't really define a method for a machine instruction. It's just an instruction. Whereas here, it's an actual method with processing and interface requirements. So for instance,

Starting point is 00:31:43 the flush goes through this path, right? Whereas the loads and stores go through the dotted line path. Well that would be from the PM aware file system and it would make requests of the volume to do block management or the flushing or things of that nature. I mean, in an operating system, this is usually what's called a DAX file system. In both Windows and Linux, there's a file system,

Starting point is 00:32:20 there's a variant of existing file systems called DAX, direct access, and it's PM aware, and it uses lower level drivers for the PM device to control the PM device. But they map the PM device all the way up to the top layer. This will be on another slide with sort of that spin of the view. So uses for PM volume, kernel modules, PM-aware file systems, storage stack components, usually kernel stuff. We don't see a lot of applications opening PM volumes directly. It's possible, certainly possible. But generally speaking, we think of PM volume as sort of the control path for kernel components like file

Starting point is 00:33:11 systems. PM file is definitely for applications, however. PM file, I mean, you could use it in the kernel, but it's designed for applications, persistent data sets, directly addressable, no DRAM footprint. The sort of thing where the data set literally lives on a PM device. It's also often envisioned for things like persistent caches, where you'll place copies of data named as a file in PM and you'll manage it. That's a really typical sort of application for PM.file. And this thing which is loosely defined as reconnectable blobs of persistence, right? A blob is a binary large object.

Starting point is 00:33:55 It's basically like an object oriented storage subsystem. And the idea is it's just arbitrary binary data stored in some definable single entity. Naming and permissions are provided by the NVMPM file, and the contents of the data is managed by that file system. Yeah, question back. I'm sorry, I'm having a lot of trouble hearing. If you use this persistent memory device just as a DRAM,

Starting point is 00:34:32 you don't get a full persistence, what provisions does the DRAM model provide? Yeah, the question, if one were to use the persistent memory DIM as ordinary DRAM, what would the contribution of those upper layers be? Almost nothing is the answer. The thing is that these DIMMs are generally not useful as raw DRAM.

Starting point is 00:34:56 Their latencies are not as low, particularly for write, as ordinary DRAM. And so a lot of, if you were to store program code in them, not data, but code in them, you would have very interesting unpredictable effects on traditional applications. And they're expensive. Yeah, and they're expensive, and they can be hard to deploy

Starting point is 00:35:18 because they only sit in certain slots of platforms. It's not impossible. I'm sorry. Yes. Well certainly, like for instance, if you wanted to select between types of devices, if you said, you know, I just want ordinary DRAM, that would be an operating system function pretty much supported by NVMPM volume.

Starting point is 00:35:50 You'd still have to manage the device? No, you definitely do not have to manage the device from the application. That's not the vision. The device management would be done from the operating system. Yeah. And another question.

Starting point is 00:36:03 So the PM.5 are the PM.file and the PM.volume models, are they backward compatible to the other file in the room?

Starting point is 00:36:12 No. Well, they're backward compatible to certain flavors of APIs. Like, PM.file is

Starting point is 00:36:20 backward compatible to certain types of memory map file APIs. But, they're not backward compatible to the IOS style. You can't do a read on a pm.file. There's no read method for pm.file. You can map it and then you can load and store it,

Starting point is 00:36:38 but you can't issue a read on it. They may, however, terminate in the same device. They both have a PM device down at the bottom of these things. They don't have to. There's actually a couple of special cases of NVDIMM that I'll mention at the end that you couldn't put PM file on top of certain types of NVDIMM, but you can put block on top of other types of NVDIMM. So there's a little bit of subtlety. You have to kind of think all the way down the stack before you can answer the question,

Starting point is 00:37:09 are they compatible with the same device? Generally speaking, though, no. The nvm.file and nvmpm.file are radically different APIs. One is a read-write I-O model, and one is a load-store mapped model. Is there a translation between the two? Yeah, I suppose you could create one. It would not be too hard to put a read-write library on top of a mapped file. It might be hard to do the opposite. I don't think it would be very efficient. I think it's kind of weird to take a new groundbreaking API

Starting point is 00:37:58 and take it backwards in time. I think you'd probably do the other way around. But it's possible. Sure. I don't know of anybody thinking about that. Yeah. Yeah. Yeah. On a block device underneath it. Yeah.

Starting point is 00:38:23 Yeah. Yeah. There are some open source. I'll mention one of them in a minute. Fair enough. Persistent memory modes. This slide might actually help a couple of the questions. It's the last slide in this section. The squiggly lines. We refer to these things called the squiggly lines. And the squiggly

Starting point is 00:38:51 lines are the interface that is defined by the SNEA twig. What lives below the SNEA twig, we sort of have advice and recommendations about what lives down there. We don't specify a PM-aware file system. But we do say this is what we think a PM-aware file system should support. And this in particular is the squiggly line, the interface to that file system that we recommend. And so you can see that they sit in various different places in the stack, both of them in kernel, although they're serving user applications or kernel modules, depending on which one they are. The pm.volume mode, I believe I mentioned before,

Starting point is 00:39:34 is primarily for in kernel components. It's not impossible to come right down to it from user space, but we envision rarely pm. PM volume will be exposed to an application. Always PM file will be exposed to an application. And you can see that the driver can actually control multiple devices. These devices could be all the same type of device, or they

Starting point is 00:39:58 could be very different types of device. I listed a few of the technologies available, but lots of things can hide down here. But the idea is that in every case, they're directly mappable as some sort of byte addressable device at their fundamental level. Let's see if there's anything else that's interesting that I should say here. Most of this is a duplicate, but I can't do an NVM Twig presentation without putting this slide up.

Starting point is 00:40:29 I have to say the word squiggly red line. Okay? So I've accomplished my goal. Yeah? Yeah. So theoretically, hypothetically, if I would want to make an OS

Starting point is 00:40:44 that would do an NVDim, Almost definitely PM.block, because that's how today's OS is boot. On the other hand, one could consider a boot process that simply said, there it is in memory, I'm gonna initialize the BSS and I'm gonna jump to start. That would be the choice of the process and whether it was a legacy application,

Starting point is 00:41:26 which was used to block mode access, or if it had the capability of directly mapping and accessing programmatically loads and stores, I think that would be a question for the application developer and the architecture of that application. You could go either way, very much. Over time, we'd like to see them move to a mapped paradigm, the pm.file paradigm, for sure.

Starting point is 00:41:58 I have 10 minutes left, so I wanna finish a couple of things about the Twig and wrap up a little bit about DIMMS. So just a little bit of the work that the Twig has done. The Twig published the initial programming model V1 quite some time ago, right? Almost four years ago. The Twig's been around for a little over five years, I think, and significantly updated

Starting point is 00:42:26 that model with a whole lot of industry movement, and there was a big body of work that we could begin to share in. In March 2015, there was a major update in V1.1. Some companion documents that came out after the programming model are the remote access for high availability, which basically talked about how to access PM over a network, and PM atomics and transactions, in which we began to layer higher order transactions semantics on top of persistent memory. Both of these things were really, really interesting. The remote discussion brought in a whole lot of really nitty-gritty details about the way RDMA and networks work.

Starting point is 00:43:14 And the PM Atomics was tremendously interesting from a platform perspective. It is not entirely obvious what the best way to implement transactions in persistent memory are because of the error cases. And this white paper really only begins to scratch the surface. There's more work to be done. And finally, just a couple months ago,

Starting point is 00:43:37 we published the NVM program model V1.2, and the very next slide has a URL for it. It was published in June 2017 by the SNEA, by the SNEA, you know, the Twig produced it, but SNEA publishes it. It's a major new update. It looks really similar, but the content is quite a bit more sophisticated, shall I say. We learned a lot. And the major new installment is about error handling. The most interesting thing to me about persistent memory is that it works exactly the way you expect it to do when it works. It's when it doesn't work that it'll surprise you and shock you. And how to recover from such a condition is really, really, really,

Starting point is 00:44:24 really interesting. It's not necessarily hard. The operating system can do a lot of it. But in the Twig, we attempted to think this through, right, and take it to ground and then work back up. And what can we say in the document that would help both platform and application developers get it right?

Starting point is 00:44:41 Second, we created a new thing. Optimized flush is the magic flush operation. The thing that takes data that you've written and makes it durable. And it's called optimized flush because it can be run in user space and so it's very efficient. Well, the problem is that you can't always do optimized flush. There may be other layers that are involved and the optimized flush may not be the only or best way to do it on certain platforms. And so we exposed this as an attribute that said, optimized flush won't work here. You have to do a plain old flush. It's not necessarily less efficient. It's just that you need it for the

Starting point is 00:45:19 fidelity of the semantics. And finally, there's this thing called deep flush, which is even more than, it's like the opposite of optimized flush. It's like, do absolutely everything you possibly can to make this durable. Go all the way to the hardware and make it durable. Deep flush has a strong latency cost. We don't recommend it, actually. We recommend it only at very critical checkpoint-like intervals and things like that. But it does offer a greater persistence reliability. It's especially useful for things like something called Intel ADR, asynchronous DRAM refresh, which is a sort of an external power supply into the platform

Starting point is 00:45:56 that keeps DRAM alive on power failure. It causes various flushes to occur. ADR can silently fail if that power supply is unavailable. And then you come back up and you literally have no idea that an error occurred because the world ended halfway through the failure. Deep flush pushes that data all the way through and guarantees that it's intact.

Starting point is 00:46:19 So that's just one example. And there's ongoing work in this Nia Twig. We're still doing this. We meet roughly every week, two weeks, certainly once a month. The NVM program model specification continues to mature. We continue to elaborate on it. NVM interfaces between OS components,

Starting point is 00:46:38 a little more magnification on those PM.volume things, things of that nature. Application interfaces to NVM-related OS, hypervisor, and hardware components. It's a pretty rich environment, and it's changing, so the TWIG continues to keep up with it. We hope to update the remote access for high availability with models and requirements for communication.

Starting point is 00:47:01 Asynchronous flush, which is a really interesting thing that allows us to sort of request a flush to start but not wait for it to finish. So you can do overlapped flushing. You can overlap flushing with application processing. You can also request networks to do flushes. You don't really want to wait for the network to flush a gigabyte, for instance.

Starting point is 00:47:21 But you do want to be sure it happens, so you can say, go start it, and I'll come back and check later. That's a really interesting API, and it has even more interesting implications on the error case, like really tricky ones. So that's sort of a CS202 kind of an API, I think, in the end, but we're trying to make it as simple as possible. And finally, PM security for multi-tenancy,

Starting point is 00:47:46 in which we describe models for persistent memory security when multiple tenets, sort of in a cloud environment. And Mark Carlson's gonna be giving a talk on that, I believe, tomorrow afternoon, there he is. And so we will continue to dive on additional Twig work as a whole presentation in that security presentation tomorrow. And so the summary of that part, and I probably won't have time to finish the slides, the

Starting point is 00:48:11 programming model is aligning the industry. That's our goal. Common technologies, not forcing specific APIs. One example, the PMEM.io. If you go to that spot, you'll find an open source persistent memory library that very, very closely matches the SNEA model. What are we doing with it? PM models expose it. New PM models build on existing ones. So on top of that raw PMMIO, there's additional work. There's a PMFS, an explicit persistent memory file system in Linux, as well as the DAX support for traditional things.

Starting point is 00:48:47 Emerging technologies will drive increasing work in this area. So the twig is by no means done. I see quite a bit more work to come down the road. And because I'm like one minute from done, I'm just going to say a couple things. There are a couple different types of NVDIMM. If you're interested, you should try to discern the differences between them. They're not all DIMMs. Some of them are storage devices that happen to be plugged into DIMM slots. And there are new types of DIMMs that are appearing. This is a very rapidly changing landscape. There are a lot of components involved in DIMMs,

Starting point is 00:49:26 including the BIOS, the bus, the operating system, you name it. There are some applications that can use NVDIMN, which is the memory style of the DIMM today. And think of them like databases, storage, virtualization, HPC, right? All these guys can and do use them today. SAP, Microsoft SQL Server,

Starting point is 00:49:48 all these databases use NVDIMS today when it's available. And finally, I just want to mention that major operating systems support persistent memory for quite some time. The Linux kernel's currently working on 4.13. I think they're assembling 4.14. It's been in since 4.4.

Starting point is 00:50:07 PM support has been in since 4.4. Okay, so this is actually rather mature. And there are a few components. DAX, the file system, BTT, which is a block translation table which allows for atomicity guarantees, persistent memory itself, and a RAM disk emulation, that thing called the BLK, the block driver, emulates RAM disks on, well it is RAM disks, on persistent memory. Windows also has that support.

Starting point is 00:50:35 Since last fall, Windows Server 2016 and Windows 10 Anniversary Update both have extraordinarily similar support. These were both, both these operating systems have been discussed at recent persistent memory summits and SNIA events and lots of industry events. So on today's operating systems, on today's advanced server platforms, you can light up a PM application.

Starting point is 00:51:01 So it's there today, the programming model is there today. The open source implementations of the programming model are there today. Have fun. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at snea.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.

Storage Developer Conference - #60: SNIA NVM Programming Model V 1.2 and Beyond

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.