Storage Developer Conference - #1: Preparing Applications for Persistent Memory
Episode Date: April 4, 2016...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC
Podcast. Every week, the SDC Podcast presents important technical topics to the developer
community. Each episode is hand-selected by the SNEA Technical Council from the presentations
at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast.
You are listening to SDC Podcast Episode 1.
Today we hear from Doug Voigt, Distinguished Technologist with HPE,
as he presents Preparing Applications for Persistent Memory
from the 2015 Storage Developers Conference.
Probably most of you or all of you perhaps went to Andy Rudolph's talk.
I'm one of Andy's partners in crime on the MVM programming model stuff.
We, you know, this is a theme, actually, that persistent memory is a theme for this year's SDC.
And so there are a number of talks about it. I'll mention a range of them at the end.
And I've looked at some of them.
There's a little bit of overlap,
but we each take a different perspective slightly.
And when we drill down, we drill down in different ways.
So even though a lot of the boilerplate and the topic is the same,
there's a lot of information from the various talks that you have available.
I'll start with this slide, a variation on a slide that you've seen Jim Handy use this slide.
It may have shown up a couple times.
But this is how I like to introduce the inflection point, the disruption that we're talking about.
And I think if you were at the keynotes this morning, I don't really need to go into this much more.
But just to make sure, I kind of leveled the playing field here. For a long time in large, very large, super mini computers and mainframes,
it's been acceptable in NUMA, non-uniform memory access systems,
to tolerate latencies of up to about 200 nanoseconds in a large memory system.
Of course, some memory is an order of magnitude faster than that, but when you put it in some kind of a memory fabric or network in a large or
mainframe type computer, it can get up into that range. So that's been kind of acceptable
for a long time, regardless of the persistent memory technology.
And then there's another threshold that I put around two microseconds,
although it is processor architecture,
processor technology specific.
As to, you know, if you've asked for something like an I.O.,
you know, something that you think is going to take a little bit of time,
at what point will you wish that you had context switched?
You can sit there and wait for it and block one core or one thread, waiting for something
that you think is going to be pretty quick.
But if it actually takes longer, you'll wish you'd context switched.
Or if you made the wrong decision the other way, you'll say, darn, I just context switched
and it's already done.
So that's this threshold here.
When I talk about context switch,
people hopefully realize that that's when you give up the processor
to wait for an I.O. or something like that.
So in my view, this is a key disruptor
when persistence comes below this range.
Because that's where you'll say,
well, not only do I not need to do a context switch
and I'm willing to give up one core
while waiting for this thing to finish,
but I'm actually willing to block my whole right pipeline
while waiting for this thing to finish, for example.
That's when it's down in the NUMA range.
And when you're in between,
and I think we will see some technologies
that are in between,
that's an interesting trade-off.
Because it's kind of a
somewhat more
unknown territory when you're kind of in
that middle range.
So I think people
can perhaps see how this makes a lot
of difference to the way you might write an application
if you're going to try to get full benefit from persistent memory
that is down in this sort of lower range.
So I think people realize that's the motivation here.
That's why we think there's a disruption.
And in response to that that we've realized that,
this was also in the keynotes, that there's some work that applications might need to
do to get more and more benefit. This is where Andy talked about moving up the stack, right,
getting more disruptive for higher return. So to realizing that, we've been working on the programming model, and this is just a quick rundown of where we're at on the CNVM programming model.
We got a version 1.1 out this year.
It contains content about both block and file, and with or without persistent memory.
And here are just a few of the things that are addressed in the specification.
And then it makes a strong case or drives really this idea
that memory map files are a good way to use to access persistent memory, at least as a bridge until other methods
perhaps are developed. But even so, it's a way for applications in the context of almost
all the existing infrastructure to get that direct access to persistent memory. And this
is the same thing that Andy was talking about.
And he mentioned the idea of a programming model as opposed to an API.
So I think people understand that.
Still going through the basic background, the same picture that Andy used,
with the same reasoning.
The behavioral model of the programming model is at these squiggly lines.
And here I've got some additional notes
about the kinds of things that are now
in the persistent memory part of the programming model.
This volume part is a kind of discovery abstraction.
It's the way the kernel would say, tell me about the persistent memory capacity that you have and where it is.
That's what this is for.
And then this says, allow someone to memory map the persistent memory
based on the authorization that they have.
So I think this has been reasonably well covered,
especially if people have gone to the keynote.
So going a bit further into what the programming model says and what we further have discovered
while working through a whole bunch of use cases
and some boundary conditions and stuff like that
that you run into when you try to do this thing that seems relatively straightforward, perhaps at first.
First of all, drilling a little bit into the map and sync paradigm,
this is a pre-existing method of doing memory map files.
It works better with persistent memory because the old way, when you did a sync,
that's when you would write your variables
from DRAM out to the disk
after having memory mapped the
DRAM. But now with persistent memory, you
memory map the persistent memory
and you just have to make sure that your
data is flushed out of the
processor's volatile caches
in order
to make sure that it is actually persistent.
So the mapping associates memory addresses
with the persistent memory that is where your files are stored.
The sync ensures that modifications become persistent,
but now we run into the first catch,
which is that the sync does not preserve order.
The processor has kind of a will of its own about its caches. It thinks it owns them.
And if they get full, it might start flushing things out of its caches, you know, at will in
order to continue operating with the high performance that it, that's the purpose of
it having a cache. You know, so just because you haven't said, I want to sync something yet that
I've written, you know, it doesn't mean that it hasn't been
written. It may have been written.
But other things in this block
of memory that you say you want to sync may not
have been written yet so they get written
during the sync. So you
cannot assume that you have
any sort of order guarantee
as a result of doing sync. All
that the sync semantic says is
by the time this sync is over,
all of the writes that you've issued
in the address range of the sync
have made it to persistent memory.
That's the whole semantic, right?
So that's something to be very careful with,
and I'll show you an example
that kind of toys with this order question a little bit more. So that's something
to be very aware of. There are proposals out there for alternative sinks that do more,
and those are all kind of in the research domain pretty much at this point, and some
have been prototyped, and there's not really some sort of industry conclusion
about what other sync behaviors and what they should look like
other than some of the things I'll say later that are also in Andy's persistent memory library.
So another area, and Andy has sometimes touched on this in his presentations,
but I don't think he mentioned it much today, is the question of pointers.
So you've got a data structure.
It's like any other data structure except that it's being stored in persistent memory.
That's kind of the thesis here.
And it may have references to other data structures.
And in theory, they could be anywhere.
So how do you actually make the reference?
And there have been sort of two schools of thought about that.
One school of thought says that once you've got universal memory,
what you want is one huge global virtual address space
so that everything you could ever want to access is in
one virtual address space, and every time you go to access
something, its virtual address is the same. So you can actually use
a virtual address in a persistent memory data structure to point
to some other persistent memory data structure, regardless of where it is.
So that's one school of thought, is you want a global virtual
address space.
There are some current practices that
get in the way of that.
It turns out that with current operating systems,
you don't always necessarily get the same virtual address
when you go to memory map something.
In fact, there are security related practices that intentionally mix up
the virtual addresses. So this all kind of turns into some controversy. The other school
of thought is that you should always use an offset from a relocatable base. So if I memory
mapped a file and the file has a bunch of data structures in it, then maybe all of my
pointers amongst those data structures should be basically offsets relative to the file has a bunch of data structures in it, then maybe all of my pointers amongst those data structures
should be basically offsets relative to the file I'm in.
Or you can imagine maybe that's not quite enough,
but that sort of gives you the idea of this sort of relocatable base.
Continuing through the laundry list, we've done a lot of investigation into failure atomicity.
And here's the thing about atomicity.
You might say, well, gee, don't we already have atomicity?
We have compare and swap.
We have various types of atomic operations but the problem is that historically
since we've never had persistent memory
all of those atomicity features
and even the ones that are built into the processor workflow
those are about making sure that processes have a consistent view
all the processes have a consistent view all the processes have a consistent view
of all the data
so that's actually
failure, the atomicity that you
can get today is actually about inter-process
consistency and not
about making sure that
your persistent memory was written
atomically
so that if you get a power
fail at some instant, when you come back,
the thing that you wrote, like a pointer
that you thought you were writing atomically
so that it's either the new pointer or the old pointer,
you want that proposition when you come back from a power
loss, which has to do with exactly how that variable was written to memory,
which never mattered before,
because if you had a power loss before, it would be gone.
So there's actually a lot of logic that's starting to emerge
about how to achieve failure atomicity
as opposed to inter-processed atomicity.
And it turns out that the techniques available for getting failure atomicity
are process architecture specific.
Some processors might guarantee that a cache line or something like that
is fully written before they will stop writing in the event of a power loss.
Logic like that gets into the mix.
So the phrase that we've started using in the spec
is atomicity of fundamental data types,
which is actually the most common convention.
And it's one that is built into the C language.
There's a way for you to declare that a variable update should
be atomic with respect to how it's written,
up to some fundamental data type.
Fundamental data types are like integers and pointers
and basic things like that.
So that's kind of the building block
that you have for Adamicity.
And it really has nothing to do with when you did a sink.
It has everything to do with how the data got written
to memory by the processor whenever it was written.
And then finally, the other area is exception handling.
Of course, if you're using persistent memory,
you no longer get a status. You no longer are doing a command with the data transfer and a response.
You're just writing some data, let's say.
And if there's a failure, you get an exception.
And it turns out there's a lot to how exceptions are handled on different processors.
If you have a low-end processor, you may have no opportunity to actually intercept that exception and do something about it without basically rebooting, restarting the system. On other processors, they do a lot more for you and give you perhaps enough information to affect a recovery from an application point of view during an exception. So this has been another area
where we've started doing a lot more work
on exception handling.
And one of the things that comes into that work
is the question of, you know,
if I've lost something,
some piece of data has become inaccessible,
perhaps in its current location,
but maybe I have some redundancy or I have some options.
I can maybe restore a specific piece of data.
But then the question is, how does that fit in with all the rest of the data
that was being written in some somewhat loose order?
So this leads you into the potential for backtracking.
In the event that my redundancy is not quite up to date
with where my processor currently is,
you start having to account for the possibility
that the application may have to do something like abort a transaction
in order to use the restored data
because of the way
it's managing its consistency.
So these are some of the forefront issues
that we've uncovered in the process of working
through the spec.
And several of these are continuing
to get attention as we refine the spec and we're writing additional white papers about different aspects.
So I go through this to give you a little bit more detail on one of the things Andy was saying is that we really can't afford to turn the whole industry into a bunch of deep system programmers.
What we really need to do is make this easier for applications.
So there's one way to illustrate that,
and this is, I think, consistent with the messages
that we've talked about already this morning,
is that we're on an interesting journey
from this kind of a stack that everybody already knows and loves already this morning is that we're on an interesting journey
from this kind of a stack that everybody already knows
and loves to something where we have
persistent memory available.
And the first and easiest thing to do
is to accelerate the middleware.
Since the file system was already
written by file system programmers,
they may know enough esoteric stuff
to get some acceleration from persistent memory at their level,
and then present their current functionality to a file system without modification from the abstraction point of view.
So this is already happening. Actually, all of these, little pieces of all of these things are already happening. Actually all of these little pieces of all of these things are already happening.
But then the next step would be to say, okay, let's have a persistent memory library that
encapsulates a whole bunch of those issues that I listed earlier and allows the application
to ignore them because it's accessing a library that implements a persistent memory data structure that's native.
It's fully aware of how persistent memory works.
It gets full advantage of it, but it's encapsulated inside a data structure library,
which may use persistent memory directly.
It may still use the file system.
The application can still use the file system.
So here we start to see the dual stack.
This is a classic dual stack scenario. Some
applications will never learn how to use persistent memory, so even in the end there's still a
file system, for example. Now this is independent of the evolution to things like object storage
and stuff. I'm not writing those off, I just kept it simple for this talk. It's a classic, just like the IPv4, IPv6 transition.
It will take a long, long time.
In fact, it will never actually finish.
It's a classic dual-stack scenario.
I think it makes sense to anticipate it,
thinking about it that way.
Finally, what makes it really easy for applications
is when the language is evolved to the point
where you can do things like open a block of code
with a declaration that you want the modifications
that occurred inside that block of code to be atomic,
all of them together.
And then the application can just go on and write its code
the way it always did, having made this
declaration. The compiler,
perhaps in combination
with some libraries,
does what's needed to make that
work to the application's
expectation.
That's what's happening in the languages
category.
I think you can see
how this is just another model of the horizons, the evolution
that we've sort of started to embark on as we make it easier and easier for applications
to get benefit from persistent memory.
So I want to talk quite a bit more about what is this persistent memory library. Andy talked
about it, he over viewed it. I'm going to drill into a couple of things and I'm going
to not back up right now. Thank you.
This is the same picture that Andy used, one of them, where here's where we're talking about.
We've got a PMWare file system.
It can do memory mapping.
We're sticking a library in right there.
So let me give a really trivial example
that I think will shed some light for people
who don't already visualize what I'm talking about, about what's different
about a persistent memory data structure. This example is an append-only log, and there There is one of these inside Andy's PMEM library.
And you may have allocated some fixed amount of persistent memory for the log,
just to keep it simple.
And part of it is filled.
So here I have, let's say, some sort of integer or something that points out to what part of the log is filled.
And the way I append something to the log
is I create a new log entry
in this free space. So that's this
work in progress here
in the free part of the log.
And since int hasn't changed
yet, if I lose power or something
this will just disappear
because it was never
committed. And then
the act of committing is
bump my fill pointer. So now all of a sudden when I was
done with my work in progress I bumped my fill pointer.
And one of the tricks that I think we're going to
see used more and more at least at first is to do a sync
after having filled in my work in progress to make sure
it's persistent. And to make sure it's persistent.
And then knowing that it's persistent,
have a section of code where I only update the pointer,
and then I sync again.
So I know for sure that nothing happened between these two syncs other than this change.
So I've now used two syncs
to create a type of,
to leverage the atomicity of my update
to my field pointer into something bigger,
which is a whole record in a log.
So when you say sync,
you really mean cache line cache from a CPU?
Yeah.
Now, there may be fancier
sinks, you know, but
yes, that's basically the minimum.
So,
hopefully this gives people a concrete idea
and although this is kind of
a lame example, it's
real.
You know, you could
imagine building link lists and trees and hash tables that leverage
this basic concept that says you want to do your work not in place. You want to do your
work in a place that is not exposed and then do an update to a fundamental data type that
commits it. To the extent that you can do that, this is like the oldest trick in the book.
I'm an old array guy.
I've built embedded systems that have persistent memory
for a long time in my cache.
And this is how they work.
So let's go further, though.
So it's not that you always have to avoid updating in place.
The other way to do it is to take a pre-image,
the classic rollback scenario where you say,
OK, I'm about to update something,
but I want to be able to abort a transaction that
has this update in it.
So I take a snapshot of my physical memory in this area
before I start modifying it,
and I record that to persistent memory.
And now if I have a power loss before I commit,
I can use that to set whatever work I was doing back to the way it was when it started.
This is another old trick.
Databases have used this forever.
So that's the other way to do it,
but it may be more efficient when you can
to avoid modifying in place using
newly allocated place space, but that
means that the allocation itself
has to be
in some sense transactional,
right? Because the trick I did in
my simple example is the
allocation was just a simple
move, so both the allocation
and the commit were in the same
change to my integer.
They were together. Now,
if you're actually allocating new space,
it's not just an append-only
log. You had to go get some new
persistent memory space. It got allocated.
There's some records about
the allocation of the space, and those records
now have to be tied
to your transactional behavior.
Because if you lose power, you can't
afford to have that space not get reclaimed.
Somebody has to remember.
So this is another thing that now gets
wrapped up in the PMEM library.
It understands how to do atomic
allocation and it couples that allocation to the persistent memory data structures that it encapsulates.
So another reason to use that type of approach.
It also allows you to form groups of data structures. I talked earlier about the pointer problem, you know, and the way the approach that the persistent memory library uses there is to say okay I'm gonna have a pool of persistent memory
that I know about and I'm going to make sure that I know how to resolve pointer
references within that pool. Alright so it may be a file, you know, that's the
easiest way to think of it. It doesn't have to be
only a file, but it's something that you can catalog, that you can manage with some notion
of some common root. So you know that all of the references within this range of data
structures are tied together through some root and so other addresses can be resolved. So that's another thing that's encapsulated
inside the persistent memory library.
And then finally, it's great that I have
modifications to individual data structures, atomic,
and I may be able to do that without some kind of formal
transaction, such as I did with my append-only log.
But then I want to write a lot of programs
and manipulate multiple data structures atomically together.
So I still need something larger to track scenarios
where I've done that.
So several persistent memory data structures
implemented by the library that are each atomic in their own
right can now be made atomic together
by participating in a transaction that is also known to and built into the persistent memory library.
What that transaction does is it takes care if you did need to modify something in place
you can declare that and it will take care of your pre-images and rollbacks for you and
it takes care of when to do your syncs or flushes to get into the persistence domain at the right time.
So this is how the persistent memory library
makes it easier for applications, right?
It's by taking over that group of issues.
So use that.
That kind of gives you a clue about where we think we're headed. Here's a pointer to where the persistent library exists in open source.
And I just kind of summarized, and Andy already flashed up a more complete summary of the operations,
the objects in the persistent memory library.
It's got some basic assist functions
if you want to do these things more on your own.
It's got some whole persistent memory data structures,
and then it's got the notion of a persistent memory object
that takes care of those additional things
and starts to introduce a notion of type safety
among the different manipulations
of different persistent memory objects.
There is research going on about language extensions. There's actually probably quite
a lot of it. I would highlight two. Why do language extensions? I mentioned the example earlier. It should be more convenient.
The logic that can take place inside a block of code that has some sort of language structure around it
can actually be more sophisticated.
Compilers can be pretty smart.
They can track a lot of stuff for you.
You actually might be surprised at how much stuff they're already tracking
in order to make things happen correctly.
And you don't have to pay any attention to that.
And ultimately, it may be safer, you know,
because, you know, it can do type checking built in.
You know, it's a much more, you know,
direct extension of the language that you already know.
All right, so these are some of the motivations
for ultimately moving to a language environment.
There are two, I point out here, pointers to two public works.
One of them came from HP Labs in which language support is added for failure atomic code sections
that are based on existing critical sections.
The logic here says, you know, in order to do correct concurrent programming,
I had to lock something anyway.
I have a critical section.
What if I say that critical section is also the thing that contains
my critical modifications to my data structures?
So it couples those things and makes it easy to do both at the same time.
Then there's another from Oracle.
Gee, that thing is pretty persistent.
I should have told it not to remind me again.
In which there's a set of language extensions,
and sometimes these are built into precompilers
that allow you to do some management of NVM regions using files,
implement transactions and locks and heat management
in the context of a language.
So there are some common themes.
There are some differences in implementation
and the direction that people are trying out. So there's activity in all of those columns,
right? Accelerating things like file systems or as Andy mentioned JVMs, creating libraries
that make it easier and working towards what kind of language extensions
would make sense ultimately.
So I'm going to segue to another area
that we've been working on.
We have a white paper out
from the NBM Programming Model Group
called Remote Access for High
Availability.
There's a work in progress draft
of that paper available on the
SNEA website.
And it starts going into
more of the high availability problem
and it was referenced
at least once or twice this morning
and it was referenced
in the Microsoft talk yesterday
on the subject.
So here, what we suggest is that through a combination,
and Andy showed one of these, right?
A combination of a library and an improved PM-aware file system,
you achieve RAID or erasure coding is for redundancy.
The advantage of doing some activity in the library
is that you don't have to switch into kernel space
in order to get it done.
But you can't necessarily do everything in user space
depending on what you've run into,
what you're trying to recover from.
I want to talk for a minute about high durability
versus high availability.
I think high availability is actually a pretty well-defined
industry term.
High durability may not be quite as well-established
in the industry.
But what I mean is that if you have high durability,
you have the ability to recover data after a failure that
affected your data.
But you have no guarantee about your ability
to access that data shortly after the failure.
You can get it back eventually.
And local mirroring is an example of that.
If the failure was an NBDIM
failure and you're doing something like a store
that has some sort of a machine
right here nearby
that mirrors NBDIMs,
then you have redundancy. But if your whole
server fails,
you may not have access to whatever
you have in that pair of DIMMs
until the server comes back.
But you do have two copies,
and you might even be able to move those DIMMs to another server
and get your data back that way.
So it is highly durable,
but you may not be able to access it.
On the other hand, with high availability, it's both.
It's highly durable, and you have a method of assuring that you can access it
in spite of some number of failures, perhaps only one.
And for that, you have to basically get to another node, another server.
So that involves network communication, and thus higher overhead. So one of the things that we're anticipating
is that people may start leveraging that sync operation
to achieve some level of higher availability,
especially since there's some network overhead today
in achieving that.
So that's what we've started to elaborate on in this white paper.
And say, okay, why not during the sync use RDMA to do a remote sync?
Now, this is agnostic to how you did RDMA,
and in fact, there's debate about whether the optimal implementation is in fact RDMA or adheres to today's RDMA specification.
It may be some extension to RDMA,
but we're using RDMA as the way of exploring this use case.
So in the persistent memory programming model,
there's this function called optimized
flush, which extends the sync. It's really pretty much the same as a sync, except that
it allows you to list multiple ranges. The current sync semantics that are backward compatible
only have one memory range. So in optimized flush, I can say I want to do a sync of all
of these memory ranges.
It's still not atomic, but it applies the sync semantics to multiple ranges.
So the really nice thing would be if I had a remote-optimized flush,
which could do the same thing while hopefully minimizing network latency.
However, today the problem is achieving remote durability. The
way RDMA works is
you don't
know how far the write has gotten by the time
you get a response to the write.
If you surround it with
a command response queue
and you send a command and then you do
an RDMA interaction and then you get a response
which is the way the whole thing was really conceived
in the first place,
that's well defined
and you can do a lot of stuff that way.
But that has a lot of round trips.
Here all we want to do is blast
something over there and get it persistent
by the time we get to the end of the sink and move on.
How fast can we do
that?
So that's kind of the thought experiment here.
And when you get to the remote side,
even if your RDMA has made it to the remote RNIC on the other side, now how do you assure that it has made it all the way to the persistent memory on the other side? And it turns out
in today's processor architecture, you have to go through the processor in order to get
from PCI, where the RNM is connected to the memory.
Who is going to make sure that anything the processor did on that path
caching wise is in fact
flushed? Well, there are
some configuration things you can do
and I think there's a talk
later tomorrow where some
of that type of stuff may be described.
But ideally we would want to be able to say,
get this RDMA right flush to persistent memory,
just like you did on the optimized flush locally.
But you would like to do that as efficiently as possible
remotely.
Today, you have to interrupt the processor to do that,
unless you've used some configuration options that
may also be architecture specific.
So exploring that space about how to achieve remote durability is kind of the first order requirement here.
There has been a proposal of a new RDMA completion type in the stack, the RDMA protocol stack library.
And this new completion type would delay the RDMA completion
until the data is, in fact, persistent in the remote domain.
So this is now becoming a sort of industry-recognized problem,
to say how do we do efficient remote durability
in order to manage high availability for persistent memory.
So people are starting to pitch in.
The industry is starting to innovate as only it can
in terms of how to solve this problem.
So even if you do solve that problem,
we still have this order issue,
the lack of order statement or semantics in sync.
So if you think of it at the application level,
you want to recover from failure, you're going to have to have both robust local and remote error handling to do that. the application ends up having to be aware of how to recover from data that was retrieved from another node.
And it comes back to that same ordering problem.
I mentioned the potential for backtracking recovery.
And if you're trying to do high availability,
right now we don't actually have a way to do that without teaching the application how to do backtracking recovery. Because the thing it got back, well okay,
here's the reason, ultimately the reason is because it's way too expensive to only do
one write at a time and always know the order it was in, right? The sync semantic that disregards
order is more efficient. So to the extent that that's the case,
you may have to do backtracking,
which gets you into the question of consistency.
When you start talking about consistency,
you're really talking about an application-specific assertion.
There are a number of ways of treating consistency
that have grown out of the storage remote replication capability,
the disaster recovery capability,
except those are on millisecond-to-second time frames.
So now...
Okay, I'll do the right thing this time.
All right.
So there's a line of thinking here that says,
if my application knows how to do backtracking to a recent consistency point and is willing to lose a little bit of work in that process,
then I can start applying the concept that I already know about for disaster recovery,
but on several orders of magnitude faster time scale.
In order to do that type of recovery, perhaps have a recovery point objective for even my local, perhaps in a rack or even in a blade chassis type of high availability. So you're very quickly running down a slippery slope
into basically applications that use transactions
and are able to recover to a recent consistency point
in order to make the flow of high availability information
to a nearby node efficient.
So that's where this slope kind of leads.
I think I've already pretty much talked about this.
The next question is, is the application capable
of handling backtracking during an exception
or does it basically have to restart? Did it have any choice?
If the problem is that I need
high availability and I'm mirroring across
servers and one of the servers failed, I'm basically
going to have to now restart
that application on another server.
So depending on the
failure scenario, you may
you might have a choice
of doing it without restarting or you might
not actually have a choice. The it without restarting, or you might not actually have a choice.
The scenario may have forced a restart.
And this is where transactions start coming in really handy
in terms of how to bury all of these problems
inside something that's well understood to be a reliable way to recover.
So there is a deeper tour
of this little journey.
Some of
the things we've discovered, some of the responses,
the work
in progress. It's really interesting to
see the industry
start responding
to the issues
that we're having, to rally
around a problem.
So this is what we've kind of embarked on,
and my belief is that this NBM programming model
has helped to stimulate some of this thinking,
helped to sort of organize it,
describe it perhaps in a more and more consistent way
so that players in the industry can
develop an ecosystem that actually solves
this range of problems.
Questions?
Discussion? Oh, related
talks. Go ahead with questions first.
So, with
some of the paleo-moves you're talking about
recovering from the loss of a DIN,
that's kind of like you have an incorrect memory error in traditional RAM,
which you don't normally try and recover from.
So obviously you're thinking that these devices have a higher failure rate
than the regular system.
Is that not true?
No, I'm not necessarily assuming that.
What I'm assuming is that when someone wants 99.9999% high availability,
that RAM doesn't give you that.
Yeah.
I'm measuring between systems for the availability,
but within a system,
because you were kind of showing both models there.
Okay, so the, yeah,
you probably wouldn't use them both at the same time.
No. Alright, so
I was just illustrating kind of
the raising of the bar from durability
to availability.
And I have one other little question.
Okay.
I presume you would tend to keep
lock-type structures out of the system. Probably, yeah. I presume you would tend to keep lock type structures
out of the system
probably yeah
because well
first of all to the extent
that they're extremely dynamic
you may not want them in that type of memory
and also you may not want them to be
persistent because if anybody's ever started
thinking through the persistent lock
problem it's like gosh I might not want that. So yes, yeah. So there's a fundamental here that I assumed
that applications will need to know what memory is persistent and what memory is not from their
abstraction point of view, right? As Andy pointed out, you can use persistent memory as if it were volatile.
That's fine, but the application is going to end up with two pools,
one that it knows will come back and one that it knows will not come back.
So that's a fundamental assumption that I think is pretty well accepted in the industry
is the application is going to have to at least know that.
Your last chart, the evolution chart. Does it seem like things are getting simpler as you go along, or does it just seem to me that things are getting more complicated programming? It depends on where you stand. It depends on where you stand in this picture.
What I'm suggesting is that it gets simpler
to do very high-performance transactional manipulations
from the point of view of an application,
or at least not any more complex.
But if you stand down here in the files
or in the libraries that do those things,
or in the compiler itself, there is more complexity that you're dealing with.
So that's why I would frame it that way.
But the idea is to relegate that complexity to the people who are steeped in that knowledge,
rather than the whole world.
Right. the whole world
as far as we know
every section
this is the
the
the
so uh... Is that correct? So the simple answer is yes,
but it's not clear that the application itself
ever deals with the driver.
We have to be very careful when we involve software.
There's a sync implementation.
Would you say the sync implementation is part of a driver?
Right now it's part of the file system protocol. Would you say that doing a flush during a sync implementation, would you say the sync implementation is part of a driver? Right now, it's part of the file system protocol.
Would you say that doing a flush during a sync
involved a driver?
Today, you wouldn't really say that.
So we have to be very careful
exactly what the role of the driver is.
So that's why the simple answer is yes,
but you have to tease it apart a little bit.
I had a question on the RDMA for failure recovery.
Of course, the further away something gets, the higher the latency.
So for what you're talking about here, I think what's reasonable in terms of remote... Okay, what's not reasonable is persistent memory-based disaster recovery across the continent.
Forget it.
No, that will never happen.
Well, somebody could try it, but it wouldn't be competitive.
Definitely.
Well, that's true.
Yeah, that's right.
If you, as a man, are an island, then perhaps.
So let's talk about rack scale or maybe a couple of rack scale.
Those are the sort of things
that people are thinking in terms of.
Gotcha, okay.
There's more in other talks.
You know, there are talks talks about NVDIMM hardware.
That one's coming up today I believe. Persistent memory management is next.
So a lot of related talks here. This is a regular theme for us this year.
We have a talk tomorrow that Dominic and I are giving on how to measure
persistent memory performance with a pure workload generator.
And there are talks related to remote access and failure recovery
from Tom Talpe and Chet Douglas.
Those are tomorrow.
So you can learn a lot more about this area.
We also have a couple of other application-related talks
from Intel and Pure.
Well, Pure is not as, well, it's an application of persistent memory.
It depends on where you stand as to what you view as an application.
Obviously, Andy's keynote, and you'll be able to pick up those slides.
And then there was a bunch of stuff about persistent memory application. Obviously, Andy's keynote, and you'll be able to pick up those slides. And
then there was a bunch of stuff about persistent memory from the pre-conference that took place
on Sunday, so that stuff's out there too. So we're hitting it hard this year. There
was a lot of information that you can get. I didn't even mention them all. I didn't realize
that Microsoft was going to do a talk that was related to it. They have several, actually. So, you know, that's
not even all. Okay,
thanks.
Thanks for listening. If you have questions about the material presented in this podcast, be sure
and join our developers mailing list by sending an email
to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further
with your peers in the developer community.
For additional information about the Storage Developer Conference,
visit storage-developer.org.