Storage Developer Conference - #60: SNIA NVM Programming Model V 1.2 and Beyond
Episode Date: January 9, 2018...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair.
Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community.
Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage
Developer Conference. The link to the slides is available in the show notes at snea.org
slash podcasts. You are listening to STC Podcast Episode 60. I'm Tom Talpe. I'm a participant in the SNEA NVM programming TWIG, which I will tell you quite a bit about and what we've been up to and where we are headed.
This is a SNEA presentation. I'm here representing the SNEA programming TWIG.
I have a vested interest in this. I've been participating in it for a few years, so I have my own opinions. We can talk about those if you like, but I will attempt to be the
representative of the TWIG here and give you the broadest view of what
we're up to and what we're all about. We are scheduled on the calendar for a
40-minute talk here, and I've just discovered that most of the talks in this same slot are 50 minutes,
so I'm assuming that's a typo.
My talk is planned for about 30 minutes with Q&A,
but I can keep going a little bit,
and we can talk about whatever,
however the spirit moves us.
So roughly, I'm going to organize
the discussion into three areas.
A quick persistent memory overview.
This may be review for some folks, but in a way, it sort of sets the stage for the things that the twig is doing.
The persistent memory has been around for a while, and it's rapidly approaching reality.
In fact, that's where I want to go in the third bit,
you know, where we are today.
After that overview,
I'll talk a bit about the SNEA NVM programming model
and what we're all about in the TWIG.
And finally, wrap it up with just a quick view
of today's NVDIMM technologies
and OS support, major OS support for these technologies.
So my hope is to show what is persistent memory,
how you use persistent memory,
and how you use persistent memory
in operating systems today, increasingly tomorrow.
So first section, oh, and feel free to interrupt.
We can make this interactive if you like.
So here are some basic set-the-stage sort of pictures of memory, persistent memory.
In today's systems, memory and storage are beginning to converge.
The volatile and non-volatile technologies, which is, if you will, memory, volatile and non-volatile technologies,
which is, if you will, memory, volatile, non-volatile, storage,
are converging together.
They're becoming one and the same.
Memory has non-volatile properties, right?
And the really, really important thing about it
is that there are many such technologies.
We have taken this idea of non-volatile memory
into sort of a stage-setting paradigm,
but there are many different implementations
of non-volatile memory or persistent memory.
And these implementations are changing over time.
So on the left of this picture,
there's a rough graphic
that says today.
And I don't know if this thing works.
No, it's not important.
Roughly, today, in traditional systems, you'd see DRAM as
being the non-volatile memory-like device, and a disk
or a solid state device being the non-volatile memory-like device, and a disk or a solid-state device
being the non-volatile or storage-oriented device.
And these are two very different things,
and they don't tend to overlap very much, right?
DRAM is purely DRAM.
It's power-dependent,
and it goes away when the system fails,
whereas disks and SSDs retain that data, right?
Over time, that's beginning to change. That new middle tier
in the pyramid, persistent memory, is beginning to appear. And there are a number of those things.
And you can see that over time, that persistent memory becomes more and more predominant in
today's platforms. This is certainly true in server platforms, where disks and SSDs become,
if you will, the capacity solution, whereas
persistent memory becomes the latency, the low latency solution. But they both have non-volatile
properties. Okay, and some examples down in the lower right of new and emerging memory technologies,
right? We have resistive RAM, 3D crosspoint, we have MRAM, PCM, face change memory, low latency NAND,
many other types. Some of them are proprietary or on package. Some of them are more open and
plug into existing designs. There we are. The persistent memory vision is that persistent
memory brings storage to memory slots.
And there are a number of really interesting things about that transition that we've been exploring in the Twig.
And I'll just start walking through them as we go here.
But the idea is that it's fast like memory.
It's also byte addressable.
It has a different API, if you will, from traditional
storage. It's fast like memory, but it's persistent like storage. All right? So it brings new
properties and requires new thinking on the part of application designers. It's good for
system acceleration. It's good for real-time data capture, for analysis, intelligent response.
There are many, many applications
that can and are making use of persistent memory.
We in the SNIA, in the TWIG,
define persistent memory
in a couple of fairly careful ways.
As an emerging technology,
there's been a lot of dialogue in
the industry about it. And this dialogue has leapt off in interesting directions. And so
there are some terminology and some aspects of these technologies that are common but are
sometimes a little bit confusing to the new reader. We view roughly two types of persistent memory
as a type of non-volatile memory.
The word non-volatile memory,
if you see it down at the bottom,
is the more generic term, right?
But persistent memory, we're adopting the term PM,
or persistent memory, as these memory-like applications
in the second bullet.
So there's disk-like non-volatile memory
that is persistent memory.
You can think of it like a persistent RAM disk.
It appears as disk drives to applications
in the traditional storage API,
and it's accessed as an array of blocks.
Whereas the same underlying technology
can be accessed in a very different way,
in a memory-like way.
And here we call it PM.
It appears as memory to applications.
The applications store data directly into it.
They don't issue IO requests, they literally store.
They run a CPU instruction to store into it.
There's no IO operation required.
So you can see that it's a very different device in its interface,
even though it's maybe the same device at the bottom.
And so there's been some confusion
around the term NVM.
You may have noticed that the Twig
is called the NVM programming Twig, right?
NVM. Well,, right? NVM.
Well, that's a little weird.
Here we are talking about PM.
Since the twig was founded,
which was actually about five or six years ago,
NVM is beginning to be used more
as a term for block-oriented access, right?
It may have the same underlying technology,
NAND or 3D cross point or battery back DRAM or
whatever, but the NVM style, if you will, is the block-oriented style, whereas PM is the
byte-oriented style, direct access to NVDIMM sitting in a memory slot and access to as memory,
not just like memory, as memory. So anyway, I think you'll see increasingly these
terms used a little more selectively. NVM for block, PM for byte. So a couple of characteristics
of PM, it's byte addressable from the programmer's point of view. The API is a load store API. You
literally load and store. You can read from it by
loading and write to it by storing, dereferencing a pointer, basically. It has memory-like performance,
right? It has the latency of memory, hundreds of nanoseconds, right? Shocking latency for a storage
device. Two orders of magnitude, maybe three orders of magnitude better than devices you might have used just last year or still use today. Huge, huge paradigm shift. Supports DMA, okay, including
RDMA, remote DMA. That's where my interests lie, remote access to this stuff. But, you know,
it can be accessed by devices just like memory. It supports DMA as well as CPU access.
It's not prone to unexpected latencies associated with things like IOQing,
demand paging, or page caching.
It's just memory.
It has a very predictable, very low jitter delay.
A couple hundred nanoseconds every time.
Every time.
That's the way it behaves.
So you can think of PM kind of like you might have thought about power protected RAM, right?
It's there all the time.
You know, the system crashes, comes back up, same stuff is still there, right?
It's just RAM that's persistent.
So defining the terms, we're going to focus on PM. That's where the SNEA NVM Twig is focused.
All right, enough of that.
The next section will be about the NVM programming model.
And this is the core activity of the Twig.
It's about writing applications for persistent memory. Now we define
applications pretty broadly. You could be a storage driver and you'd be an application to the
persistent memory. You could also be an actual traditional application. You could be a database.
You could be whatever, accessing storage directly via this new paradigm.
So the SNEA NVM programming model addresses this new landscape.
It was originally developed to address
the ongoing proliferation of new
non-volatile memory functionality
and new persistent memory technologies.
The key word here is new, right? These technologies are new and the platform support for them are new.
This stuff is really, really emerging rapidly recently.
The latest generations of CPUs, Intel, AMD, et cetera,
have explicit instructions and explicit support for persistent memory in their hardware
and in their software, in their instruction sets.
So, you know, how do we address it?
How do we use it was the question that was posed to the SNEA.
The NVM programming model, which is the output of the NVM programming twig, is necessary
to enable an industry-wide community
of persistent memory producers and consumers.
SNEO works both on the supply side and on the demand side,
if you will.
We not only steer applications to use this technology,
we use the ensuing requirements and dialogue
to drive the technology itself.
It's been really a very symbiotic relationship between the two.
There are a number of significant architecture changes.
We're only beginning the road of the architecture changes here.
The instructions, the motherboards, the standards for identifying these devices and classifying
these devices has classifying these devices
has only just begun.
And so the SNEA really kicked into action a little while back to try to steer that discussion,
to try to keep it together, to keep it productive, not just have manufacturers go this way and
applications go this way, to really keep it all in one dialogue or one area, one identifiable place to come to have the discussion.
That's what's key.
And so it's generated a few specifications.
I'll talk about them a little later.
Defining recommended behavior.
It's not a requirement.
It's not an absolute spec.
But it's recommended behavior between user space
and operating system components,
all supporting persistent memory.
This is how we recommend you do it.
Your peers are doing it this way, you may want to think about it.
This may be a good idea, you may want to think about it, whatever.
So there are several operational modes, and each mode is described in terms of a use case, actions, and attributes.
And I'll show you, there are four modes. And each mode is described in terms of a use case, actions, and attributes, all right?
And I'll show you, there are four modes.
But those modes are very distinct.
You will choose these access techniques
to a persistent memory device in your application
and your operating system for very specific reasons.
And the Twig has been spending its time
really drilling down on those reasons,
the implications of this access, the error cases.
Those are really, by far and away,
the most interesting implications of these things.
And the benefits and the changes
that may have to happen to your application.
There's a lot of legacy access,
and there's a lot of non-legacy access.
And each one has advantages and disadvantages.
And so what the TWIG has produced, basically we're on our third iteration right now, is
to rally the industry around a view of persistent memory by printing these, by publishing these
technical positions, and a view of persistent memory that's application-centric, vendor-neutral,
that's a very key
aspect of SNEA activity.
Things that are achievable today. These are not
blue-sky dreaming things.
These are things that are available today
and this access method can be
achieved today. I'll show you exactly
what those are. And it's beyond
just storage, right? It's the SNEA,
Storage Networking Industry Association.
But these accesses go far beyond storage.
They're applications, memory technologies,
networking technologies, processor technologies.
All these things are in scope for the SNEA activity.
So the programming model TWIG,
the technical working group,
has a mission statement.
Roughly speaking, that is to accelerate the availability of software
that enables persistent memory hardware.
So it's a software-focused activity, right?
We're not specifying hardware.
We're looking at industry hardware.
We may have opinions and ideas to talk,
but we're focused on accelerating the availability
of software primarily.
The hardware includes both the sort of solid state device.
This may be a sort of a new PCI Express NVMe device.
It may be some other type of device.
It may be a traditional serial device
such as SATA or SAS,
as well as, of course, persistent memory, right?
Plugged into the memory bus.
Instead of on the IO bus, PCI or SAS or SATA,
it's plugged into a DIMM slot.
So both those things are well in scope,
and the software spans both applications
and operating systems.
We try to go all the way up the stack.
How does the operating system use this device?
How does the application use this device?
And the network as well, by the way.
And the mission of the TWIG is to create the NVM programming model.
That's the spec that we showed on the previous slide,
and there's a few different things.
It describes application-visible behaviors. We try to take the application view. The application might
be the operating system or might be up in user space, doesn't matter. We allow the APIs to align
with operating systems. We don't do crazy stuff. We try to do things that are implementable today.
But while we're doing it, we expose opportunities in networks and processors.
I'm proud to say that one of the main things that we've
been exploring in the TWIG is this remote flush operation
that RDMA adapters may use to push data to durability.
This is really, really important to have this
discussion in a sort of a requirements scenario, right?
Instead of just walking into the technical group that might be specifying the protocol and saying,
here's what it's going to be.
We thought it through from the top down.
What is the interface?
How can an application use this?
How would we use this in a remoted environment and things like that?
And so we expose these opportunities.
We say, this is how we think it should work, and this is why,
and this is the consensus that we've drawn.
And that's a very powerful message to take into another standards group.
It's really, really important.
So the SNEA NVM programming model exposes new block and file features to applications.
These features are probably familiar,
but nonetheless there are new implications on some of them.
They're not things that applications are used to
sort of querying from their device.
For instance, the atomicity capability.
A block device is typically block atomic.
When you write the block, if the block succeeds, you knowity capability. A block device is typically block atomic.
When you write the block, if the block succeeds,
you know you have a good block.
If the block fails, you know you either have a bad block
or some sort of error has been signaled on the entire block.
That's not true with persistent memory.
If you have a failure, you may have damaged things
far outside the scope of what you were writing.
At the very simplest level, you can say
that the processor is writing a cache line,
not a single byte, even though you thought
you were dereferencing a car pointer.
You dirty the cache line, and if it failed,
probably the whole cache line went bad.
How do you know that?
How does the application know that?
So what's the granularity, and what atomic guarantees
can you get from this hardware?
Thin provisioning, right?
Traditional storage can be thinly provisioned,
and so management of thin provision
and managing the blocks on these devices and things like that
are certainly something which is new
and has a new behavior on these byte-addressable devices.
Memory-ma mapped files.
One of the very first ideas was,
well it's just memory,
and we know how to memory map a file, right?
And so why don't we just use traditional
memory mapping APIs in operating systems
as our access method?
And it works, works great actually.
All the operating systems support it.
But it's by no means the only way or the best way to do it.
And there are some gaps between the traditional
memory map APIs and the requirements of durability,
for instance.
What does sync mean?
What does msync mean?
And so a new set of APIs has grown up around the concept
of memory mapped access that enable the full scope,
if you will, of persistent memory behaviors.
And there are open source implementations available
of the C-NVM programming model,
and I'll mention those a little later.
It's important to note that it's not an API.
It's only a programming model.
It's an to note that it's not an API. It's only a programming model. It's an abstract API.
We are not sitting here trying to specify a single API
that'll be used across all operating systems.
There's no such thing.
There never will be.
So it's defined in terms of attributes, actions,
and especially use cases.
Implementations map those actions and attributes
to concrete APIs. There are some open source ones and there are some, I won't say proprietary, but adaptations of existing methodologies, programming methodologies, Windows, Linux, etc. So the programming model V1.2, actually the other one supported something very similar,
support roughly basically four modes of access.
And they're outlined in blue here.
NVM dot something.
On the left, you'll see NVM file and NVM dot block.
Those are disk drive and disk-like access methods
that are quite traditional.
File APIs, basically.
You open a file, you read and write byte offsets in the file.
You open a device, you read and write LBA sectors
from the device, right?
And this is just an obvious API, right? This is the sort of thing that
all applications will be fully prepared to do, right? It's traditional I.O., but it's
supported on persistent memory devices. So they can transparently adopt these new devices
by following the guidelines in these programming interfaces. On the right is the more emerging technology version of these things.
They are quite different, but you can see that they are nvm.pm.file and nvm.pm.volume.
These are, if you will, the file byte-oriented and volume block-oriented accesses to persistent memory
devices which are non-legacy, mapped file type, memory-like access.
All right?
And so these four models are the focus of the Twig's interface work. Block and file modes use an I.O. model, right,
in the Twig vision.
It's an I.O. model.
Data is basically read or written,
but the data resides in RAM buffers in the end.
You don't directly access the RAM buffer.
You perform a read or a write to the
buffer by name. It's in either a block or a file, sort of a namespace. Software controls how to
wait. It can be a context switch or a pulse, a traditional I.O. model. That wait may not be very
long because it's a very low latency device underneath it, But there's an I-O model. Usually there are filter drivers or context switches
or handoffs or all kinds of software interposes
on that read or write.
And how that top level requester waits for the result
is the business of that requester.
But typically we will see this as a context switch
or sort of a poll for completion
while these other layers process the data.
And the status of the result
is explicitly checked by software.
Generally speaking, read puts data in a buffer
and returns a status saying good, bad, or otherwise, right?
And it's a traditional IOMO,
but it's once again supported
by persistent memory underlying.
Volume and PM modes, right, these are the two on the left side of this,
no, sorry, the right side of this diagram.
Volume and PM modes enable load-store access.
There's no IOMOB operation at all.
There's literally an instruction to load, to dereference a pointer and pull data data from it or to dereference a pointer and push data to it. They are directly mapped. The
data is loaded into or stored from processor registers. Load register 5 from star P. The
processor pens the software for data during the instruction, right? You don't think of read as a load operation
as an asynchronous operation.
It blocks, it stops the processing of that thread
while that read instruction executes
and then it runs the next instruction, right?
So there's no context switch, there's no poll,
there's no nothing.
At the end of the thing, there is data in that register.
Right, period.
But there's no status.
I didn't say read some handle, some buffer.
I said put the result right here, and I got a result.
So there's no actual status return.
It doesn't say success or failure.
It generates an exception.
You get a non-maskable interrupt
or some sort of software exception that says,
oops, the data that landed in that register is no good, and here's why.
And it's often a machine check, parity error or read failure
or some sort of fatal problem on the bus.
And so it's a very, very different programming model.
It's not unfamiliar to users, but it's not the sort of programming model
you were expecting necessarily as a storage program.
So a little more information on the block mode, file mode, and persistent memory modes.
So nvm.blockmode is targeted at file systems and block-aware applications.
It supports atomic writes.
When you write a sector, it behaves atomically.
You won't write part of that sector.
You'll either succeed or fail for the entire sector.
It has length and alignment granularities.
The size of the sector and the alignment of the sector are pretty well constrained.
It supports thin provisioning.
It's a traditional block API.
You can trim the sector.
You can update the sector.
You can do whatever you want.
The NBM file mode is targeted for file-based applications.
As I say, these things are traditional APIs.
There are some atomic write features dependent on the type
of hardware you have, and you can discover it and use it as a consumer of this interface.
And the granularities as well, similar to the granularities you see above with block mode,
but there are additional solutions available in block mode that are not available in file mode.
Something called a BTT, for instance.
Anyway, you can roughly see a diagram here that shows what they flow through.
The green boxes are basically the things that the SNEA programming model creates or defines.
The blue boxes are things that exist in today's systems.
And the purple box at the bottom of the stack is the new hardware that showed up overnight. creates or defines. The blue boxes are things that exist in today's systems.
And the purple box at the bottom of the stack
is the new hardware that showed up overnight.
Persistent memory modes, the PM modes,
there's two of them, like I mentioned before.
There's nvm.pm.volume mode,
which is the software abstraction for the actual hardware.
The idea is I'm going to open the DIMM or some slice of the DIMM.
We'll call that a PM volume.
It might be a contiguous chunk.
It might be some scatter-gathered piece, subset of the chunk.
It doesn't matter.
We've got a hunk of this hardware available to us.
There will be an addressing range.
It does have thin provisioning management. These blocks are sort of abstract and can be added and removed from the map, for instance, all these things.
But the PM volume is intended to be a device abstraction.
The PM file mode is an application where you view a PM
as containing file data, right?
There's some sort of dictionary that says,
I'm gonna open this name, whatever it is. you view a PM as containing file data, right? There's some sort of dictionary that says,
I'm going to open this name, whatever it is, and it's going to look something up and find out where it's laid out in that device, and it's going to give me basically a base and
bounds access to this random array of blocks on the device or bytes on the device.
PM file requires an explicit flush because, well, actually both of them do, but the PM
file especially, because in the byte mode, when you are dirtying some of the bytes in
the file, you have to make them durable.
You have to push them to the disk.
In fact, you have to tell the file system that you've pushed them to the disk as well,
metadata updates and things of that nature.
So you can see two data paths to the device in this model,
where the PM-aware apps will flow through
what's on this chart, pink modes,
the PM.file and PM.volume
that flow into PM-aware and PM-capable drivers, right?
Explicit OS PM-aware subcomponents, not necessarily legacy subcomponents.
And on the right, that dotted line is the direct data path by which those loads and stores progress.
So if you will, there are sort of two modes of accessing the device.
But the idea is that the data flows on the right and the consistency, durability, and
metadata flow on the left. I didn't get the very end of that.
The pink boxes are the...
Oh, SNEA is creating the interface, if you will,
the top edge of these two pink boxes.
We're defining the interface by which PM-aware apps can access PM devices through these abstract interfaces.
And so we define the behavior of the interface, and we actually dig down on some of the details of how to implement.
The PM device is part of the platform, and it's emerging.
More and more of them are appearing on platforms.
But SNEA really doesn't play down here except to say we understand these things very well,
and we are defining how to access them in an efficient way.
This is the bottom box that was in the holder.
This is the PM device, the hardware, the DIM that's plugged into the slot, for instance.
Yeah?
PM file, what does it do directly to the device and when is it in trouble?
It depends on the methods that are being called in the interface.
It's directly, this is actually implicitly part of the PM file API, the loads and stores.
It's just that you don't really define a method for a machine instruction. It's just an instruction.
Whereas here, it's an actual method with processing and interface requirements. So for instance,
the flush goes through this path, right?
Whereas the loads and stores go through the dotted line path.
Well that would be from the PM aware file system
and it would make requests of the volume to do block management or the flushing
or things of that nature.
I mean, in an operating system, this is usually
what's called a DAX file system.
In both Windows and Linux, there's a file system,
there's a variant of existing file systems called DAX,
direct access, and it's PM aware, and it uses lower level drivers for the PM device to control the PM device.
But they map the PM device all the way up to the top layer.
This will be on another slide with sort of that spin of the view.
So uses for PM volume, kernel modules, PM-aware file systems, storage stack components, usually kernel stuff.
We don't see a lot of applications opening PM volumes directly.
It's possible, certainly possible. But generally
speaking, we think of PM volume as sort of the control path for kernel components like file
systems. PM file is definitely for applications, however. PM file, I mean, you could use it in the
kernel, but it's designed for applications, persistent data sets, directly addressable, no DRAM footprint.
The sort of thing where the data set literally lives on a PM device.
It's also often envisioned for things like persistent caches, where you'll place copies of data named as a file in PM and you'll manage it.
That's a really typical sort of application for PM.file.
And this thing which is loosely defined
as reconnectable blobs of persistence, right?
A blob is a binary large object.
It's basically like an object oriented storage subsystem.
And the idea is it's just arbitrary binary data
stored in some definable single entity.
Naming and permissions are provided by the NVMPM file,
and the contents of the data is managed by that file system.
Yeah, question back.
I'm sorry, I'm having a lot of trouble hearing. If you use this persistent memory device
just as a DRAM,
you don't get a full persistence,
what provisions does the DRAM model provide?
Yeah, the question,
if one were to use the persistent memory DIM
as ordinary DRAM,
what would the contribution of those upper layers be?
Almost nothing is the answer.
The thing is that these DIMMs are generally not useful as raw DRAM.
Their latencies are not as low, particularly for write, as ordinary DRAM.
And so a lot of, if you were to store program code
in them, not data, but code in them,
you would have very interesting unpredictable effects
on traditional applications.
And they're expensive.
Yeah, and they're expensive,
and they can be hard to deploy
because they only sit in certain slots of platforms.
It's not impossible.
I'm sorry.
Yes.
Well certainly, like for instance,
if you wanted to select between types of devices,
if you said, you know, I just want ordinary DRAM,
that would be an operating system function pretty much supported by NVMPM volume.
You'd still have to manage the device?
No, you definitely do not have to manage the device
from the application.
That's not the vision.
The device management would be done
from the operating system.
Yeah.
And another question.
So the PM.5 are the PM.file
and the PM.volume
models,
are they backward
compatible to
the other
file in the
room?
No.
Well,
they're backward
compatible to
certain flavors
of APIs.
Like,
PM.file is
backward compatible
to certain types
of memory map
file APIs.
But, they're not backward compatible to the IOS style.
You can't do a read on a pm.file.
There's no read method for pm.file.
You can map it and then you can load and store it,
but you can't issue a read on it.
They may, however, terminate in the same device. They both have a PM device down
at the bottom of these things. They don't have to. There's actually a couple of special
cases of NVDIMM that I'll mention at the end that you couldn't put PM file on top of certain
types of NVDIMM, but you can put block on top of other types of NVDIMM. So there's a
little bit of subtlety.
You have to kind of think all the way down the stack
before you can answer the question,
are they compatible with the same device?
Generally speaking, though, no.
The nvm.file and nvmpm.file are radically different APIs.
One is a read-write I-O model,
and one is a load-store mapped model.
Is there a translation between the two? Yeah, I suppose you could create one. It would not be too hard to put a read-write
library on top of a mapped file. It might be hard to do the opposite. I don't think
it would be very efficient. I think it's kind of weird to take a new groundbreaking API
and take it backwards in time. I think you'd probably do the other way around. But it's
possible. Sure. I don't
know of anybody thinking about that.
Yeah.
Yeah.
Yeah.
On a block device underneath it.
Yeah.
Yeah.
Yeah. There are some open source.
I'll mention one of them in a minute.
Fair enough.
Persistent memory modes.
This slide might actually help a couple of the questions.
It's the last slide in this section.
The squiggly lines. We refer to these things called the squiggly lines. And the squiggly
lines are the interface that is defined by the SNEA twig. What lives below the SNEA twig, we
sort of have advice and recommendations about what lives down there. We don't specify a PM-aware
file system. But we do say this is what we think a PM-aware file system should support. And this in particular is the squiggly line, the
interface to that file system that we recommend. And so you can see that they
sit in various different places in the stack, both of them in kernel,
although they're serving user applications
or kernel modules, depending on which one they are.
The pm.volume mode, I believe I mentioned before,
is primarily for in kernel components.
It's not impossible to come right down to it from user space,
but we envision rarely pm. PM volume will be exposed to an
application.
Always PM file will be exposed to an application.
And you can see that the driver can actually control
multiple devices.
These devices could be all the same type of device, or they
could be very different types of device.
I listed a few of the technologies available, but
lots of things can hide down
here. But the idea is that in every case, they're directly mappable as some sort of
byte addressable device at their fundamental level.
Let's see if there's anything else that's interesting that I should say here. Most of
this is a duplicate, but I can't do an NVM Twig presentation
without putting this slide up.
I have to say the word squiggly red line.
Okay?
So I've accomplished my goal.
Yeah?
Yeah.
So theoretically,
hypothetically,
if I would want to make an OS
that would do an NVDim, Almost definitely PM.block,
because that's how today's OS is boot.
On the other hand,
one could consider a boot process
that simply said, there it is in memory, I'm gonna initialize the BSS
and I'm gonna jump to start.
That would be the choice of the process
and whether it was a legacy application,
which was used to block mode access,
or if it had the capability of directly mapping
and accessing programmatically loads and stores,
I think that would be a question for the application developer
and the architecture of that application.
You could go either way, very much.
Over time, we'd like to see them move to a mapped paradigm,
the pm.file paradigm, for sure.
I have 10 minutes left, so I wanna finish
a couple of things about the Twig
and wrap up a little bit about DIMMS.
So just a little bit of the work that the Twig has done.
The Twig published the initial programming model V1
quite some time ago, right?
Almost four years ago.
The Twig's been around for a little over five years, I think, and significantly updated
that model with a whole lot of industry movement, and there was a big body of work that we could
begin to share in. In March 2015, there was a major update in V1.1. Some companion documents
that came out after the programming model are the remote access for high availability,
which basically talked about how to access PM over a network,
and PM atomics and transactions, in which we began to layer higher order transactions semantics on top of persistent memory.
Both of these things were really, really interesting.
The remote discussion brought in a whole lot of really nitty-gritty details
about the way RDMA and networks work.
And the PM Atomics was tremendously interesting from a platform perspective.
It is not entirely obvious what the best way to implement transactions
in persistent memory are
because of the error cases.
And this white paper really only begins
to scratch the surface.
There's more work to be done.
And finally, just a couple months ago,
we published the NVM program model V1.2,
and the very next slide has a URL for it.
It was published in June 2017 by the SNEA, by the SNEA,
you know, the Twig produced it, but SNEA publishes it. It's a major new update. It looks really
similar, but the content is quite a bit more sophisticated, shall I say. We learned a lot. And the major new installment is about error
handling. The most interesting thing to me about persistent memory is that it works exactly
the way you expect it to do when it works. It's when it doesn't work that it'll surprise
you and shock you. And how to recover from such a condition is really, really, really,
really interesting.
It's not necessarily hard.
The operating system can do a lot of it.
But in the Twig, we attempted to think this through, right,
and take it to ground and then work back up.
And what can we say in the document
that would help both platform and application developers
get it right?
Second, we created a new thing.
Optimized flush is the magic flush operation.
The thing that takes data that you've written and makes it durable. And it's called optimized
flush because it can be run in user space and so it's very efficient. Well, the problem
is that you can't always do optimized flush. There may be other layers that are involved
and the optimized flush may not be the only or best way to do it on certain
platforms. And so we exposed this as an attribute that said, optimized flush won't work here. You
have to do a plain old flush. It's not necessarily less efficient. It's just that you need it for the
fidelity of the semantics. And finally, there's this thing called deep flush, which is even more than,
it's like the opposite of optimized flush. It's like, do absolutely everything you possibly can
to make this durable. Go all the way to the hardware and make it durable. Deep flush has a
strong latency cost. We don't recommend it, actually. We recommend it only at very critical
checkpoint-like intervals and things like that. But it does offer a greater persistence reliability.
It's especially useful for things like something called Intel ADR,
asynchronous DRAM refresh,
which is a sort of an external power supply into the platform
that keeps DRAM alive on power failure.
It causes various flushes to occur.
ADR can silently fail if that power supply is unavailable.
And then you come back up and you literally have no idea
that an error occurred because the world ended halfway
through the failure.
Deep flush pushes that data all the way through
and guarantees that it's intact.
So that's just one example.
And there's ongoing work in this Nia Twig.
We're still doing this.
We meet roughly every week, two weeks,
certainly once a month.
The NVM program model specification continues to mature.
We continue to elaborate on it.
NVM interfaces between OS components,
a little more magnification on those PM.volume things,
things of that nature.
Application interfaces to NVM-related OS,
hypervisor, and hardware components.
It's a pretty rich environment, and it's changing,
so the TWIG continues to keep up with it.
We hope to update the remote access for high availability
with models and requirements for communication.
Asynchronous flush, which is a really interesting thing
that allows us to sort of request a flush to start
but not wait for it to finish.
So you can do overlapped flushing.
You can overlap flushing with application processing.
You can also request networks to do flushes.
You don't really want to wait for the network
to flush a gigabyte, for instance.
But you do want to be sure it happens,
so you can say, go start it,
and I'll come back and check later. That's a really interesting API,
and it has even more interesting implications
on the error case, like really tricky ones.
So that's sort of a CS202 kind of an API, I think,
in the end, but we're trying to make it as simple as possible.
And finally, PM security for multi-tenancy,
in which we describe models for persistent memory security
when multiple tenets, sort of in a cloud environment.
And Mark Carlson's gonna be giving a talk on that,
I believe, tomorrow afternoon, there he is.
And so we will continue to dive on additional Twig work
as a whole presentation
in that security presentation tomorrow.
And so the summary of that part, and I probably won't have time to finish the slides, the
programming model is aligning the industry.
That's our goal.
Common technologies, not forcing specific APIs.
One example, the PMEM.io.
If you go to that spot, you'll find an open source persistent memory library that very,
very closely matches the SNEA model. What are we doing with it? PM models expose it. New PM
models build on existing ones. So on top of that raw PMMIO, there's additional work. There's
a PMFS, an explicit persistent memory file system in Linux, as well as the DAX support for traditional things.
Emerging technologies will drive increasing work in this area.
So the twig is by no means done.
I see quite a bit more work to come down the road.
And because I'm like one minute from done, I'm just going to say a couple things.
There are a couple different types of NVDIMM. If you're interested, you should try to discern the differences between them.
They're not all DIMMs. Some of them are storage devices that happen to be plugged into DIMM slots.
And there are new types of DIMMs that are appearing. This is a very rapidly changing landscape.
There are a lot of components involved in DIMMs,
including the BIOS, the bus, the operating system,
you name it.
There are some applications that can use NVDIMN,
which is the memory style of the DIMM today.
And think of them like databases, storage,
virtualization, HPC, right?
All these guys can and do use them today.
SAP, Microsoft SQL Server,
all these databases use NVDIMS today
when it's available.
And finally, I just want to mention
that major operating systems support persistent memory
for quite some time.
The Linux kernel's currently working on 4.13.
I think they're assembling 4.14.
It's been in since 4.4.
PM support has been in since 4.4.
Okay, so this is actually rather mature.
And there are a few components.
DAX, the file system, BTT, which is a block translation
table which allows for atomicity guarantees,
persistent memory itself, and a RAM disk emulation, that thing called the
BLK, the block driver, emulates RAM disks on, well it is RAM disks, on persistent memory.
Windows also has that support.
Since last fall, Windows Server 2016 and Windows 10 Anniversary Update both have extraordinarily
similar support.
These were both, both these operating systems have been discussed
at recent persistent memory summits and SNIA events
and lots of industry events.
So on today's operating systems,
on today's advanced server platforms,
you can light up a PM application.
So it's there today, the programming model is there today. The open source implementations
of the programming model are there today. Have fun. Thanks for listening. If you have questions
about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at snea.org.
Here you can ask questions and discuss this topic further with your peers in the storage developer community.
For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.