Storage Developer Conference - #25: The SNIA NVM Programming Model
Episode Date: November 3, 2016...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC
Podcast. Every week, the SDC Podcast presents important technical topics to the developer
community. Each episode is hand-selected by the SNEA Technical Council from the presentations
at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast.
You are listening to SDC Podcast Episode 25.
Today we hear from Doug Voigt, Distinguished Technologist with Hewlett Packard Enterprise,
as he presents the SNEA NVM programming model from the 2016 Storage Developer Conference.
So today I'm going to describe the SNEA NVM programming model.
I'm going to cover what it is and why we've done this work.
A number of the implications.
This is primarily an overview, maybe a little bit of a tutorial on the programming model.
It's got implications in terms of how you access data.
That's the map and sync concept.
Questions about how to manipulate and manage pointers,
about atomic operations of a slightly different sort than we're mostly accustomed to,
and a little bit about exception handling.
So some of the basic interesting aspects
that the programming model surfaces.
And then a little bit about ongoing work
in the area of persistent memory data structures.
This is the area that I believe Andy Rudolph's talk earlier today focused on. And also in the area of persistent memory data structures. This is the area that I believe Andy Rudolph's
talk earlier today focused on. And also in the area of high availability.
So I'm going to start with the why. I think a number of people are aware of this. As we
have moved through flash technology and now we're starting to see technologies beyond
flash latencies
for persistent memory technologies or any kind of non-volatile memory technologies are
coming down.
This is to represent that.
These purple bars are showing sort of abstract ranges of latency for different technologies, such as hard disks, SSDs, perhaps with SATA, and
then NVMe reducing the minimum latency somewhat more, and then persistent memory, which is
a bit lower still.
Now, I've got this range here that's between 200 nanoseconds and 2 microseconds.
That's kind of an interesting range
that causes some disruption in the way data is accessed.
And the reason is because if you are accessing data
and you know that it's going to take something
like a couple microseconds or more,
then you are probably going to context switch.
You're probably going to start the data access
and then let your process
block, go away for a while, and it'll be interrupted. You come back and your data access is complete.
On the other hand, if you are under, let's say, 200 nanoseconds, and these are approximate
numbers that are pretty architecture specific, but sort of rules of thumb then you're
doing a memory access you're in kind of the non-uniform memory access region
where it's okay to allow a processor instruction to access the memory instead
of instead of waiting for what would have been previously an I.O., basically.
We'll talk some more about the specific implications of that,
but that means that there's this kind of interesting middle ground
between a couple hundred nanoseconds
where the technology is not quite as fast as memory
to where you might not want to install the processors data access pipeline during the access.
But it's also faster than what you'd probably wait for with a context switch.
And in this range, you've got some choices.
You might, for example, pull for completion because by the time you got around to completing
a context switch, the data would already be there.
So the point is that as you descend through this range,
which is what current technologies are now actually evidencing or doing,
you have some choices to make about how you access your data and whether you view it as storage or as memory.
So this was the significant motivation for the MVM programming model because if you're
going to access data a little differently, if you're going to access persistent memory
using processor instructions, that's a significant difference to an application, and it will take some period of time for applications
to adopt that style of accessing data stored in persistent memory.
So we wanted to start paving the way for that, to develop an ecosystem that emerges over
time, has a lot of components and a lot of software that can tap that ecosystem
so as to drive application acceptance, basically,
of this type of new programming model.
So to talk more about persistent memory specifically,
there are a couple of types of NVDIMMs.
Really, this slide should be of NVDIMMs.
Really this slide should be labeled NVDIMMs.
And I believe there are other presentations and there have been in other conferences about
NVDIMMs specifically.
But some NVDIMMs, although they all go in memory slots, they are DIMMs, some of them
are actually kind of disk-like even though they're attached to the memory bus.
You know, for example,
you might, you know,
use the memory to send a command to flash on the DIMM
and then wait for that command to complete,
you know, but still using a memory channel to access it.
So those are the sort of disk-like NVDIMMs,
and they're really block storage from the point of view
of an application. On the other hand, there are other memory DIMMs that are NVDIMMs that
are memory-like. Those do literally appear as memory to applications, and those are the
ones where data is stored directly in bi-addressable memory, and you don't really have any I.O. and you
don't necessarily have a traditional DMA as a kind of a bulk data mover involved.
So we started the programming model actually back in 2012, started work on it.
We're currently on revision 1.1 since mid last year and we're now pushing to complete
the content for version 1.2. I'll talk a little bit later about some of the things that we're
expecting to see in there. The programming model includes both block storage and persistent
memory. There are some features that have been developed in
the industry and have been standardized in other standards groups, in some cases, that
are not yet exposed to applications in a uniform way.
So the programming model spends some time elaborating on those features and how they
can be presented to applications. And those specifically include atomicity, block storage atomicity,
what kind of capability and granularity do you have with particular block storage?
Cuz an application may care about that.
And certain aspects of thin provisioning management.
So what's happening here is that we're embellishing the existing application view without really
perturbing it very much.
On the other hand, when you're using persistent memory, we recommend that you use memory map
files, which is not brand new.
The technique has been around for a while, but when you use it to access persistent memory,
it works particularly well. So it's
an existing abstraction and we feel that it can act as a bridge for applications as these
technologies emerge to allow applications to use them sooner than they otherwise might
have. There are already some open source implementations available of this type of memory map files
using persistent memory.
And this is a programming model as opposed to an API.
The reason we did that, the reason we call it that is because we need consistent enough
behavior across software and hardware, and some of that software is operating system
software, to create an ecosystem that people can depend on
in terms of the behavior of the ecosystem.
But we don't expect all implementations
to have the exact same API.
We want OSs, for example, to be able to present their APIs
in ways that are natural for those OSs.
So that's why we focus on our programming model.
It describes behavior in terms of attributes and actions,
and then we illustrate them as use cases.
And then what happens is to deploy the programming model,
an API will take specific calls, for example,
and say this is our implementation
of the NVM programming model's action or attribute
that corresponds to that API call, right?
So they kind of map, you know,
people can map their APIs then to the programming model.
Let's see.
I think I've really kind of covered this pretty well probably for this audience.
You know, I've found though that sometimes when I talk about these difference between,
you know, the traditional block and file access methods using I.O. compared with what the
programming model calls the volume and persistent memory modes that use processor
instructions such as load and store, that I need to spend some time making sure people
understand what the difference really is. Because when you're doing storage access today
using I.O., you're usually reading or writing data using RAM buffers. You're copying data from a buffer to a storage device or vice versa.
And your software controls how it's going to wait.
It may context switch or pull, as we were talking about earlier,
but the software can make that decision on its own.
And then when the command is done, the status is explicitly checked.
It's returned and checked by the
software.
You get a response code, an error code, that sort of thing.
On the other hand, when you're using load or store instructions, and these are proxies
for any instruction that accesses memory, a processor instruction, the data is generally
being loaded or stored into and out of processor registers as opposed
to RAM.
RAM is on one side the target of the loader store and the other side is usually it's some
kind of a some register inside the processor.
And the processor makes the software wait for the data during the instruction.
Once the instruction starts, the application doesn't have any choice, right?
The processor is executing the instruction.
So it doesn't decide whether to wait or how to wait.
It's stuck at that point.
That thread is stalled until the instruction completes.
So that's a significant difference.
And the other significant difference is that there's no status returned.
When you access persistent memory, if there's an status returned when you access persistent memory.
If there's an error, you'll get an exception. So you can't just check the status. In normal
circumstances, you do the memory access, it finishes and nothing happens, right? Nothing
abnormal happens. You just keep going. You don't really check anything. If something goes wrong with the persistent memory
access, then you get an exception, which certainly there are plenty of things that generate exceptions,
but that is a change from an application point of view because now it means it went to store
something in persistent memory and suddenly it got an exception and it now has to decide
where it was that the exception occurred and how to recover from it.
So that's why exception handling, error handling, is significantly different when you're using persistent memory in the programming model.
So drilling in now on details of the programming model when used for persistent memory.
This is a slightly modified, you know, typical IO stack with an application and a file system
and a driver.
There are two layers that are specified by the programming model in this picture.
They're represented by these red squiggly lines, which has kind of become a trademark
for our illustrations from the programming model.
The lower one we call the volume, and really its purpose is to describe where the ranges
of persistent memory are.
That's its primary function.
It does assist in that way in discovery of the memory
and some information about atomicity of the ranges of memory.
And that can be used either from inside the kernel
by any sort of persistent memory aware kernel module.
So it can call the PM capable driver and say, where are the ranges of persistent memory aware kernel module, right? So it can call the PM-capable driver and say,
you know, where are the ranges of persistent memory
and perhaps what are they called?
Are they structured?
You know, what volumes are they?
You know, where's this particular volume of persistent memory?
And it gets a group of memory ranges back
and it can then access those memory ranges
knowing that they are persistent memory.
One of the consumers of that, then,
would be a persistent memory-aware file system
where it presents typical file system functionality
with the map and sync instructions,
commands that already exist implemented.
And when you do the map,
it's actually called M map in Linux or
map you a file in Windows what happens is the application can now see the range
of persistent memory in its address space it's normal you know variable
address space and that's represented by this end line where it so that once its
memory maps the application can do load store instructions
and through the processor's MMU get directly at the persistent memory without going into
the kernel.
Now the map and sync instructions currently the sync can go into the kernel, but we have
an alternative that we call optimize flush where the intent is to potentially avoid
going into the kernel.
So that's the basic kind of layout of the persistent memory
part of the NBM programming model.
So I've already been talking quite a bit about map and sync. So this is already, I think, a fairly familiar subject.
The map does this association of memory addresses with a file that has previously been opened.
So that now it's in the map.
The caller may request specific addresses,
and that gets into the question of how to deal with pointers,
which is kind of an interesting question with a couple of
options that are not standardized.
It's not like everybody's chosen one option or the other.
The sync command then makes sure when you're using persistent memory
that the CPU's cache, which is generally considered to be volatile, is flushed for the indicated
range. Not just the cache, but anything that may be storing data in a volatile way in the
pipeline to persistent memory. So the intent of the sync is to assure that the data has made it to persistence.
Now, there are several ways to do that. One of them is to literally flush the caches and any
buffers in between in the pipeline through the CPU to the memory. Another way is to have sufficient power when the line voltage is lost,
essentially after an external power failure,
to have sufficient power to ensure that any data that was in the pipeline
is flushed to persistent memory before the power goes completely away
within the processor subsystem itself.
So in some cases, it may be that you can take advantage of that sort of flush-on-fail capability
and put less emphasis on the sync.
In other cases, maybe your system doesn't have that capability,
and there are several other possible reasons why you might still want to do a sync.
So there are a couple of options there.
And that's one of the new things that we're working into the programming model
is this difference between a flush-on demand and a flush-on fail.
We may not use those particular terms,
but we're starting to work towards incorporating that concept
into the programming model in version 1.2.
Then we, as I mentioned, added the optimized flush.
It has two differences.
The idea is that the sync as originally specified in POSIX,
for example, has just one address range that you're
supposed to ensure is persistent.
The optimized flush has multiple ranges, and it's intended to be able
to execute from user space. The optimized flush and verify was conceived to be similar
to a kind of disk write and verify where you go to extra effort to ensure that the data
has made it to the persistent memory and that any errors that may have occurred in that process are fully reported or create exceptions before the sync or optimized flush
is complete.
So that's to give you some additional assurance that the data has in fact made it all the
way to persistent memory without any errors. The interesting thing about the sync is that it does not
guarantee complete order of operations that occurred prior
to the sync.
It's got a very specific limited guarantee.
If you can picture what's happening, the sync's job is to
make sure that the CPU's cache is flushed to persistent
memory.
The CPU is managing
its own cache all along and that cache may have filled before you even attempted a sync.
So the CPU may have flushed some of your data out earlier. It may have been even before
you even started the sync. So the only guarantee is that when you sync or when you use an optimized flush, the data
in the indicated range or ranges has been flushed to persistent memory before the sync
completes.
But you don't know in what order it was flushed to persistent memory.
And that's kind of an interesting dichotomy because it means that you have to be very
careful about what you assume that the sink gives you.
It only gives you precisely this guarantee.
But on the other hand, that does give you the flexibility to manage a cache to improve the performance of your read and write pipeline.
When there are no syncs happening. So that's an important performance acceleration
that you want to selectively force to some degree
when you need to get specific data
all the way to persistent memory.
So one interesting thing is now,
okay, I've got data structures.
They're in persistent memory.
I have allocated them.
Maybe I have allocated a heap of space for variables in persistent memory.
And I can do a type of memory allocation, a type of malloc, a p malloc, persistent malloc, a P malloc, persistent malloc, to get a range of persistent memory where I can
now store variables as data structures using normal equals, right, in regular programs,
right?
Now, if I do that, I can build data structures right in persistent memory.
But suppose, then, that I want to put in one data structure a pointer to another persistent
memory data structure.
How will that pointer work? Now, everything that you've memory mapped has a virtual address,
and normally the application uses the virtual address as a pointer to whatever data it's
trying to access. That's great, but it assumes that if you have, for example,
opened in memory map to file and it's got some pointers to some data that's
either in that file or maybe even some other file.
And then you close the file and you later open it again to use it some more,
that you'll get the same virtual addresses for the file that you got the last time.
And there are some systems and some things,
intentional things in some cases about systems that
either don't guarantee that you'll be able to get the same
virtual address or that for security purposes ensure that
you'll get a different virtual address.
So, you know, that creates this interesting dichotomy to say,
okay, now if I've got pointers from one structure to another,
is it just a pointer?
Will I get that virtual address back?
Or does it have to be something relocatable?
Perhaps something like an offset from the start of a file
is what your pointer really is.
And that means that the pointer implicitly includes
some sort of reference to the file namespace, perhaps.
So now your compiler, when it's determining how do I dereference a pointer in one data structure,
it may have to say, okay, where's the current base address for the file that that pointer's in,
and what's the offset into that, rather than assuming that you always have the same virtual address.
So the issue of pointers is interesting.
I think Andy spoke quite a bit about the NBM library, and I've got a reference to it later,
but I don't go deep into it because Andy did.
And these are some of the types of things that can be addressed inside that sort of library, to hide some of these issues from the application, you know,
so that it's less disruptive for the application to use persistent memory.
This is one example of that.
In fact, actually all three of my, you know, little dilemmas about persistent memory are things
that you can encapsulate inside a library like the MBM library and make it easier for applications as a result.
The next one is about atomicity, specifically failure atomicity. Now, you know, those of
us who have dealt with caches, non-volatile caches in disk arrays, are fairly familiar with reasoning
about failure atomicity.
But from a point of view of a processor system,
it's not really comprehended exactly.
When we talk about atomicity in a processor,
it's normally trying to guarantee inter-process
consistency, such as what happens with symmetric multi-process consistency, such as what happens with, you know, symmetric
multi-processing, or, you know, if you write a piece of data in one process, when will
any neighboring, any other processes see that data, right, and how do those processes see
it in the correct order?
So that's the, you that's the concept of atomicity
that's already understood well by processors,
but processors only provide limited atomicity
with respect to failure.
The proposition on failure is,
if I'm merrily writing along,
writing areas of persistent memory,
and suddenly my processor loses power,
my whole system loses power,
when the power comes back, what state will that persistent memory be in?
And will I see that some of my writes have completely completed
and some of them have not completed at all,
but none of them have partially completed, right?
It's another kind of atomicity, but it happens at failure, right?
At a power failure or perhaps even a hardware
failure.
All right.
So we've had to do a fair amount of work surrounding this concept of atomicity to add it to the
sort of interprocess or interprocess concept of atomicity that was already well established. And it turns out that there are some failure atomicity properties
offered by some processor architectures,
and normally they apply to aligned fundamental data types.
Think of it as like an integer or a pointer,
so that when the processor is writing a cache line containing a pointer,
it actually guarantees during its power loss
that the pointer will either be completely written or not written,
but never partially written.
So that's an example of failure atomacy
that you actually can get from some processors today,
but it's a little bit architecture specific
and not very well called out.
So I think I've covered, yeah, okay.
So what happens then is if you want to create atomicity
of larger data structures,
then what you end up doing is trying to leverage
the atomicity of a pointer or an integer
into a larger atomic action.
And that's easy to think of if what you've got is a pointer.
You can construct a new piece of data in free space, for example,
for perhaps a link list or a tree or something like that
that's not yet referenced by the rest of the data structure,
and then atomically update a pointer that starts referencing that piece of data that you've just constructed,
and as a result that data is atomically incorporated into the rest of the structure.
So that's just an example of how you can take a pointer or integer atomicity and convert
it into, you know, larger arbitrary, into larger arbitrary constructs of atomicity.
If you can't get atomicity, failure atomicity from your processor or from your system, then
generally the callback or the fallback has been to have an additional checksum.
So what you do then is every time you write
something you write a record, let's say, and the reason I say that is because this is very
common in databases, you compute a checksum on the record and if the checksum after a
power loss does not match, doesn't compute, then you know that that data was affected
during the power loss. It was partially written. It was torn during the power loss.
And now you have to have some way of recovering from that.
So this leads you essentially into the database school of transactions
as to how you recover when things like that happen.
So if you don't get atomicity from your processor architecture,
then the fallback is you have to add it yourself with some kind of additional checksum.
The other area I wanted to elaborate a little on is error handling.
In this picture, and it does get a little busy, but I think what we'd want to do, we
should start here with this little sequence of lines, right?
Traditionally, you would just get an exception from memory, and it would go back into the
interrupt handling system of the processor and be delivered to some particular process.
The problem that we have with their handling now is that there may be some recovery
that can occur inside the file system in addition to the potential need for recovery inside the
application. So what we need is a registration process in which multiple parties, such as a
file system and an application, can register to receive the exception.
All right, and then when the machine exception occurs,
those registrations are looked at here when the number two occurs.
And delivered to whichever party is registered for them.
So that, for example, if you get a memory exception and you're working with a PM
aware file system that has some means of recovering from that
type of exception on its own, you give it a chance to recover from the exception. And depending on
how that recovery worked, you know, whether it worked, you may or may not need to notify the
application that the recovery occurred. You know, so there are two levels where, you know, where you
have to deal with exception handling.
One of them is perhaps within something like a PMWare file system
or if you're doing some kind of redundancy within that layer.
And the other one is farther up where the application may have to be notified. And then we get into some details
of exactly how processors handle memory errors.
And there are three properties involved there.
An exception is contained
if we know the exact memory location
that generated the exception.
It's precise if we know
that the instruction execution
can be resumed from the point of the exception.
And it's live if we know that we can resume from that point
without doing a restart.
So it takes actually somewhat special handling of exceptions.
And depending on the processor,
it may not have this type of exception handling,
in which case every time you get a memory exception,
you'll perhaps always get a restart.
And then that, once again,
affects how applications recover from errors
because the exception comes along
essentially with an application restart
and special processing
that has to be done during the restart.
So there are dependencies to get graceful error handling.
There are dependencies on the processor itself in order to make that easier.
Yeah?
So 1, 2, and 3, are they ongoing, or are they triggered by events?
OK.
Triggered by events.
The one caveat is item 1 is kind of registration for the event.
So that would happen as a file system starts up
or something like that.
But then items 2 and 3 are triggered by the event.
Yeah.
Yeah.
Male Speaker 3
Let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see.
So, let's see. So, let's see. So, let's see. So, let's see.
You have to restart if the error was not live or if it was not precise.
If the error was precise and live but not contained, then there may be some ways that you can avoid the restart,
but they require so much overhead
that you might end up restarting anyway.
So pragmatically, if you don't get all these three properties,
you're probably going to have to restart.
So that's a picture of what the Prairie Model is, how it works for persistent memory, and
some of the interesting aspects where applications need some help in order to do this well. So in ongoing work, I now refer specifically to some of the
MBM library work that Andy talked about this morning.
And that occurs up here.
So the programming model, you know, sits right here between
the application and the file system, right, PMWare file
system, it would be right in this boundary.
You know, the library, you know, the NVM library,
which allows you to create persistent memory data structures,
you know, sits above the NVM programming model.
And it uses the programming model so that that library can operate
on multiple operating systems and with different kinds of hardware in a consistent manner
But then the library goes to the effort of hiding some of those
difficult scenarios from the application
And then you know, so
That's what that's what the work how the how the library makes it simpler for applications to use the programming model
is by encapsulating them as essentially persistent memory data structure libraries.
We have completed an atomicity white paper about persistent memory libraries and their
potential transactional nature. The NBM library is an example of one of these,
and that's in final review at this point.
So we're hoping within a month or so we'll have that white paper released.
And it's tied to the NBM library as an example.
There are other ways to do it, basically,
and in the future there may be other libraries. But the NBM library as an example. There are other ways to do it, basically, and in the future there may be other libraries,
but the NVM library is one example.
The other area where we've had ongoing work,
and we released the white paper earlier this year,
is in high availability.
If you're storing your persistent data in persistent memory,
you might want some of those same features that you have with storage solutions such as redundancy, RAID, basically some kind of
high availability solution.
In order to do that and avoid, for example, all single points of failure, you have to
get the data to another server, let's say another node that's separate.
And the way we envision doing that
is to use remote direct memory access as the protocol.
We view it as likely to be the lowest latency protocol
for that purpose.
But what we've found as we start drilling into that
is that today's remote direct memory access protocol lacks some optimization for accessing persistent memory.
In particular, it's actually difficult to know when the data becomes persistent on a remote node when you're using RDMA. So that has led to an interesting white paper on the subject and some additional work across
the industry in the RDMA community to say, you know, how can we resolve this problem
and get, you know, essentially, let's say, mirroring across, persistent memory across servers using RDMA to be as low latency as possible,
you know, while assuring remote durability
and appropriate error handling, right?
So a number of those same issues now start to come up again
when you talk about how you remotely access persistent memory.
And there's quite a bit of presentation
tomorrow and Tuesday on both of these two topics, you
know, on experiences with the NBM library and on how we are working towards evolving,
you know, persistent memory access using RDMA and some of the interesting technical aspects that come up in that problem.
So trying to stand back a little bit, especially in the sort of persistent memory library area,
I like to model this as a kind of a journey from where we have been, you know, and for the most part still are,
right, you know, towards, you know, being able to fully use persistent memory. And a
first step, and some of this has already been happening, you know, some in open source and
some elsewhere, is to create a persistent memory aware file system that
can run faster because it knows it's using persistent memory even if the application
is still just using it as a file system.
Another step in that journey is what we're seeing with the MBM library where the application
inserts a library that manages persistent data structures for
it and that library is aware of persistent memory and it uses a persistent memory aware
file system.
The final stage of this evolution which is probably still years away although there's
been some experimentation on this is suppose your compiler is aware of persistent memory and your application
can easily use it directly because the work that used to be done in the persistent memory
library is now built directly into the language that your compiler is using. There are some
people at HP Labs and some people in Oracle who have done experiments
on how would you extend the language in order to do that, and there are some prototypes
of that.
But that type of innovation in language, and usually today it's happening in C or C++,
those take a long time to kind of mature and become standardized and ultimately adopted.
So I view this as a sort of dual stack scenario where you'll have
applications that don't understand persistent memory and use block
access regardless of whether accessing a disk drive or even persistent memory you
can still access it like a RAM disk. So a lot of applications may never move from that.
On the other hand, applications to get maximum benefit from persistent memory can evolve
into a persistent memory programming model domain, use libraries or language extensions
ultimately to take full advantage of it.
And in fact, even if they don't have persistent memory, there are ways of coordinating RAM
access with disk drives, obviously much slower than persistent memory, but would still allow
you to use the memory map model. So it's a classic dual stack scenario where you have
PM-aware applications and some that aren't, and you have some systems that have persistent
memory and some that don't. And all of those permutations can be made to work with it, you know, but to different levels of performance obviously
Dual stack scenario
Let's see, I think I've already covered this.
And here's a specific detail on that remote access work.
Checking over this slide, I want to highlight this part up here,
is what you start to discover when you're dealing especially with remote access
is that because of the somewhat looser ordering constraints that you can get, you know, from
the sync semantics, you may have to recover to a recent consistency point rather than to the exact,
you know, let's say load or store instruction, or store instruction specifically,
where the exception or error occurred, all right? So for that reason, inside the white paper on remote access, we go into a remote access taxonomy of different types
of remote access systems and recoverability requirements, which then really kind of drive
the requirements for how an RDMA, for example, has to behave in order to ensure that ultimately the application can recover from a failure
and get the kind of consistency that it needs after the failure.
So that work, the white paper, generates requirements, essentially, and some modeling for remote access, in particular remote flush, in effect,
in order to make sure that everything volatile in the chain is flushed out.
And as a result, as I mentioned, there are multiple parties in the industry,
including the Open Fabric Alliance, InfiniBand Trade Association and several vendors, you know, who are now looking at this problem and starting to do work on how to optimize
RDMA for persistent memory to solve these problems.
Yeah.
Chris.
When a node fails, even if the data is flushed, PM is not accessible, right?
Yeah, so, you know, you have to think in terms of a...
Okay, the question is when a node fails,
even if the data is flashed,
the persistent memory in the node is no longer accessible,
at least not at that point.
That's right.
So now you have to start the line of reasoning that you have
when you've got, let's say,
a disk array that has redundancy as a storage analog, right, and you've got application
failover as well.
You can say, okay, I have a system here that can tolerate one failure.
If that failure was a whole server, I've lost one whole copy of my persistent memory, but
I can continue operating with the other one, you know, as long as I've only one whole copy of my persistent memory, but I can continue operating with the other one
You know as long as I've only had one failure and then when this one is repaired, right?
I will need to reestablish my you know my redundancy, right?
You know or if I lost persistent, you know a piece of my persistent memory
I may be able to just go to the other still functioning server and restore that.
All right.
So, you know, yes, you get into, you know, all of those permutations of, you know, media failure and server failure with failover.
But those are very familiar.
Yeah.
Yeah, their storage does that, right?
Let's see.
It depends on a lot of things.
I think, okay, thank you.
The question is whether you can take persistent memory out of one system and put it in another.
You know, it's in some sense removable.
So let me go through the stack of obstacles. The first is today persistent memory usually appears as NVDIMMs and they're not necessarily
accessible from the edge of a server. They're inside it. So you've got to pull out a server board and manipulate the DIMMs, right?
So it's not quite the same type of repair as you would get with removable media.
Okay, so that said, if you either fix that or ignore that, you say, okay, you know,
it's likely that, you know, your data is not necessarily constrained to one DIMM, right?
You probably are going to have to move a group.
And then you're going to have to move the configuration information
that tells the system how to present that group
in such a way that the application would recognize it.
So if you do enough of those things, yes,
you could probably get that same kind of volume mobility
that you can get from from disk arrays
But actually it's you know, the constraints with disk arrays are ultimately similar
You know, you kind of have to move, you know
everything, right
So, you know, yeah, I think we can we will get there in terms of solutions, you know
But I don't think we've gone very far down that path up to this point.
Yeah, a question.
Is there any work being done on plug-in ability on the physical level,
like connectors or something?
Even if I had a failover, I'd still get my system back eventually without having to take it apart.
Yeah.
So the question is, I'm learning, whether there's any work being done on hot plug, let's say memory hot plug, right?
It's essentially what it boils down to, so you get that same kind of removability benefit.
It's all proprietary.
So I'd say that there is some work going on there, but I don't know of any that's specifically
memory in a standard environment. So the question is that, you know, it sounds like in this environment when you get a failure
you may have to backtrack is what I call it.
You may have to actually recover from some, you know, historical point, you know, or state
of the system rather than an instantaneously current one and then perhaps roll forward
or move forward from that state.
And yes, that's right.
And that's why we talk so much about transactions in the persistent memory context because basically
as soon as you try and do that sort of thing, you're going to want some kind of transactional
model.
Perhaps a lightweight one would work well and the NVM library provides that sort of thing. You're going to want some kind of transactional model, perhaps a lightweight
one would work well, and the NVM library provides that sort of thing, right? So yes, you're
going to end up recovering from a recent consistency point, especially if you're in a high availability
situation, and you're probably going to want a transaction formalism as a way of doing
that. So, you know, the bad news is, yes, that means applications
would have to pay attention to it. The good news is that there is actually a lot of prior work done
on transactions. We basically know how to do them. So we just have to apply that.
Any other questions? I think that's it.
All right. Thanks.
Thanks for listening.
If you have questions about the material presented in this podcast,
be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further
with your peers in the developer community.
For additional information about the Storage Developer Conference, visit storagedeveloper.org.