Storage Developer Conference - #2: Managing the Next Generation Memory Subsystem
Episode Date: April 11, 2016...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC
Podcast. Every week, the SDC Podcast presents important technical topics to the developer
community. Each episode is hand-selected by the SNEA Technical Council from the presentations
at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast.
You are listening to SDC Podcast Episode 2.
Today we hear from Paul Von Beren, software architect with Intel,
as he presents Managing the Next Generation Memory Subsystem
from the 2015 Storage Developers Conference.
My name is Paul Von Beren. I'm at Intel and I'm going to talk about managing memory and a lot of new concepts.
I'm not going to get into a lot of detail. I won't have any XML dumps on the slides or the normal stuff that makes most people scared of management software.
It's kind of conceptual, but trying to set up why things are changing,
why this is more interesting.
Traditionally, you don't manage memory.
It's just there. It has capacity. That's about it.
It's getting a lot more interesting.
Okay.
I've got to get rid of this. Hold on.
How's that, better?
So, and also, I normally work
on the programming model side of things.
The architect has been doing our manageability software
at Intel, did a lot of this work,
pretty well put together the slides himself.
But he accepted a new job a few weeks ago and is in training this week.
So I'm presenting on his behalf, but I was involved with reviews
and working on the work that we had done.
So what I'm going to do is talk about memory technologies,
things that are coming up that change the picture.
This includes NVDIMS, but some other things as well
concepts and practices
for the next generation memory
and emerging management standards
open source code documentation
and emerging is a big word here
because we're just getting started
we're trying to get some things in place
trying to get some understanding
we're also looking for feedback. Particularly the operating system vendors,
the NVDIMM vendors, management software vendors.
This is the opportunity we'd like for people to get involved in the discussion.
Let's talk about
memory technologies and management. What kind of use cases are there?
First off, we'll talk about
the current way of doing things.
What do we know about processor cache?
Not a lot.
It's pretty transparent.
We know how much is there.
We know some information about
what's being dedicated for core.
But for the most part, you don't really manage it.
It's just there.
There's also the memory related to...
I can't even read my laptop here.
To the...
Let me scoot around here.
Memory controllers, channels, slims, slots, and DIMMs.
This becomes a little bit more important
because of NUMA locality, but again,
for a lot of applications, they don't really care at all.
Also BIOS interleave, you frequently need to set this up
in the BIOS, can be very confusing.
There's frequently a range of settings,
including, I don't wanna think about it,
frequently one of the BIOS options, just because in a lot of cases,
just let the hardware do the best thing is the right answer.
But it becomes more interesting as we have different kinds of memory moving forward.
The memory and channel clock speed, which is the speed of installed memory,
there's information about that that's available now.
Typically, it doesn't get in your way but typically the changes and speeds across different kinds of DIMM in any particular system is
relatively small and again we're talking about potentially larger differences now
that become more important to manage the redundancy ranks rank sparing, mirroring.
So a lot of people don't realize this,
but high-end servers allow raid-like features of memory.
They have for a while.
If you have the cache to fully populate
very high-end servers and make your capacity divided in half,
you'll be very interested in the mirroring features.
So it's not available in all that many systems
and not a lot of people using it,
but those that do value it very much.
Another level of redundancy for certain types of computing
is always critical.
I mentioned here SSD cache for HDDs
so we've been talking about the role of persistent memory
we're not necessarily in this case thinking specifically about SSDs
but the way SSDs work as a cache now
we see growing into use of persistent memory in the future
so I've got this notation here, NVDIM N,
and I have a couple other NVDIM hyphen somethings that are coming up.
JDAQ is a big force in DIMS.
They are the standardization at the hardware level around DIMS.
They have announced interest in supporting new NVDIMM standards. As far as I know,
they are not quite released yet, but there has been quite a bit of information out. And by the
way, my understanding is they'll be released real soon now. So we can expect more on this. But
there's been a lot of information like Flash Memory Summit and some of the other SNEA events.
So I'll be using their terminology just because it's kind of helpful
to have names to talk about these different classes of NVDIMMs.
So the one is non-volatile memory created by combining
volatile and non-volatile media with the power source.
The non-volatile typically today being Flash.
And what you see in terms of your software activity is DRAM speeds, triggering mechanisms like ADR.
This is the Intel asynchronous DRAM or device.
I'm not even going to try.
It's a mechanism to trigger a flush of the volatile memory into flash because there's a power down situation going on.
There's a platform support, BIOS support
that makes NVDIMM ends work.
There are standards that are put in place very recently
that allow the operating system to tell the difference
between different types of DIMMs.
There is interleave separately from DRAM, so
it has features much like typical DRAM features for interleaving, but it's
being handled differently. So that doesn't make any sense to interleave
volatile and non-volatile DIMMs. So they have different rules. Again, an
opportunity for management software.
I mentioned BIOS is uniquely identifying the different types.
So the kind of use cases for this flashback DRAM hardware, how to trigger the save, how to trigger the restore. This is mostly transparent, but sometimes there's
hooks into the power supplies or power source
to specifically manage this or at least tune it.
Monitor save and restore status so
that you can understand what the condition is at any moment.
Monitor energy source, flash health, save readiness.
So it's a lot of diagnostic type of data,
but from the operating system perspective,
it's basically treated like volatile memory,
other than the fact that it can have these management cases.
Standard NVDIMM use cases apply as well.
So these are the kind of things we touched on
in the previous supply.
They're even mostly common to DIMMs.
You want to be able to replace a failed DIMM,
update firmware, and decommission,
erase sensitive persistent content.
Now that's a very new persistent memory concept.
Since volatile memory didn't hold data,
you really didn't have to worry about this.
It's becoming a huge issue around persistent memory.
The next type that JDAQ has classified and are in the market now
are NVDIMs that act like block devices,
basically SSDs.
It's a DIMM form factor, custom BIOS, uniquely identifies block NVDIMMs that act like block devices, basically SSDs. It's a DIMM form factor.
Custom BIOS uniquely identifies block NVDIMM F capability in the system address map.
So this is the information that the BIOS provides to the operating system
about the topology of the hardware.
So they identify the stuff so the operating system doesn't try to use it for volatile memory,
realizes it's a different access approach.
It coexists with DRAM.
Some system DRAM may be used for caching.
It's another characteristic of this.
Sometimes they're taking part of the standard system memory.
Sometimes they use the DRAM is on the NVDIMM F chips themselves.
Custom driver presents the DIMMs as standard block devices to the operating system. Better
than SSD performance. That's the really, you know, one of the big factors here, the benefits
is the performance. The other is it's transparent. Everybody knows how to do file systems, do I.O. to disks,
work exactly the same with these.
The kind of use cases that come up are monitor flash,
spare wear, and other drive, they're using smart,
sometimes with variations as smart tends to be,
but providing data on the health of the device. using smart, sometimes with variations as smart tends to be,
but providing data on the health of the device.
Standard block device partitioning formatting,
these are operating system management built-in features,
but they still apply.
Software RAID can be used with these kinds of devices.
Standard NVDIMM use cases apply as well.
Update firmware, decommission, erase sensitive data,
and backing up the data.
In the case of the flashback NVDIMMs,
you didn't have to worry about backup
because they had the built-in backup capability.
With these, you would back up the data
just like you would any other
block device. NVDIMP, this is proposed. My understanding is this is not going to be used
in the upcoming JDAQ standards, but trying to capture this idea of a combination of non-valuable
memory fast enough for direct memory controller access.
I used the MC all over the place because I was running out of space here.
That's the memory controller.
Directly accessible DRAM and NAND.
So it has persistent media capabilities as well.
They're recognized as persistent media.
It's not just a transparent backup near DRAM speeds directly accessed by memory controller very large capacities may
include multi-mode capable byte and or block addressable so multi-mode capable starts to throw a big wrinkle in the world of manageability.
Rather than the management stack thinking just in terms of monitoring and reporting what's there,
if the device provides the ability to let the administrator make choices,
now in addition to passive management reporting you have to have active management the administrator
gets to make choices here so this becomes a much more complicated an interesting management problem
here I just wanted to touch out that's a you know kind of lost in the paragraph there that's a very
key concept the bias uniquely identifies volatile nonatile memory regions I mentioned this before and how
the information about the
memory topology is reported
so the use cases here
are you can configure
RAS
and performance characteristics via the
BIOS and configure block
and direct access devices via
driver
so traditionally if you had BIOS and configure block and direct access devices via driver.
So traditionally, if you had any choices about your memory configuration, like interleaving,
you really could only do them from the BIOS.
The standards are being extended to a point where the possibility of having software control over some, but not all, of these characteristics.
And we figured it's a pretty drastic shift to reduce the size of volatile memory
for a management command on a running system.
But increasing it might not be too bad.
And changing whether it's byte or block addressable unused persistent memory
shouldn't be an impact either.
So there's a subset of configuration changes you can make
to your memory configuration on a live system.
And again, many of the previous use cases apply.
Replace a failed interleaved DIMM, update firmware, and we never get away from decommissioning
erasing sensitive data.
So, these are the types of things that
complicate the memory management problem.
I touched on this a little bit before,
and I should just expand.
NUMA is, if you're not familiar, is the affinity of you
get better performance of using memory that's
local to a socket for processes that are running on course
on that socket.
In the case of both DRAM and FLASH,
is there any concern about the reliability of their own endurance between those two technologies?
I know DRAM has very good endurance. Is there a question about that? There is
likely to be differences in
endurability characteristics of
this NVDIMS.
It really depends on the types
of technology being used, but
yes, there are issues there.
In terms of the management software, being able to monitor
things and get statistics
and see,
to some extent, a lot of what's being considered
is very close to what's being done with SSDs.
So it's a somewhat known problem,
but it's a new management issue for memory.
Yes?
Isn't keeping the hands off the memory controller
and letting the hardware figure it out directly at odds
a new one.
We did a test where we wrote some really high performance code using a conventional program
called OpenCL. We ran it on two sockets. We ran it on one socket. We actually got better
performance on a single socket than we did on a single socket.
Sometimes it depends on the nature of your
software you may be able to to predict things uh also operating systems are are you know will
if you can set up and configure how the the numa configuration is being used or how an application
is tied into it so you can can say, I'm setting things up
so that the memory for this and the active processes
are all tied to a specific core.
If you don't do that, then you have a choice
between the operating system Monte Carlo choices
versus the hardware's choices if you say you want to use Yuma.
And having the hardware guess at it might be the
right answer.
So yeah, it can be tricky.
Right now, NUMA really seems to be a HPC feature.
That's where you get the most use of it.
They're more interested in that fine-grained control.
For most applications, like I said, letting the server run in UMA mode might be the best choice.
And it's certainly the most common choice,
as far as I can tell.
So I was done with that slide.
Oh, I was talking about NUMA.
The other thing that's being introduced now,
which is another memory management use case, is that HPC-oriented memory CPUs
are now introducing in-package or unpackaged memory, which is basically soldered to the
CPU. With today's NUMA configurations, there's a performance penalty if you go off socket.
With some of these HPC configurations, you can go off socket to standard DRAM,
but a different socket can't get to another in package memory.
So now you have another type of behavior that, again, mostly HPC applications have to be aware of. So there's work being done on providing libraries that
can combine information about the topology of the system,
understanding of the types of memory,
and be able to help software make policy decisions
about what regions of memory should it avoid
if it has certain goals?
And that's one of the things that Doug was talking about,
this open source library.
We've taken some of the software we had for fake, volatile memory over persistent memory and are working it into a common library
so that there's a single place that applications can go
and get some of this information.
Does this library allow explicit programming control
where to place your data on the memory?
Yeah, yes.
So it's kind of moving towards a memory flow at the moment.
It's kind of a memory flow at the DLE moment.
Yeah.
I mean, hopefully it's a little more flexible than that,
but your application to get the most performance
will take advantage of this. And like I said, right now that's essentially HPC software. than that, but your application to get the most performance
will take advantage of this.
And like I said, right now, that's
essentially HPC software.
Yes?
Can I ask, these different modes,
how you mingle things like DMA and RDMA?
So we have 10DN, the three different flavors.
Is there any comment on how we might
push the full data into those I-O devices using DMA?
Well, the NVDIM-F is a block device.
Right.
So, yeah, that's being handled differently.
DMA should work with the byte addressable persistent memory
part.
Exactly how this rolls out relative to the hardware
is still a little bit unclear.
Because when you're acting like a block device,
or part of your capacity is acting like a block device,
and part of it's acting like memory,
it's not really quite sure what the right answer is.
But so.
I see, I think,
a public, a public Intel introduced some instructions
for flushing.
When you do a DMA, you don't necessarily
know if the cache has got the flushing stuff.
Right.
.
No.
Thank you.
Any other questions?
Yes.
So do you expect to have regions of the cache
lockable, like programmer control?
Instead of just doing cache lock,
just do a chunk of the cache.
They can explicitly lock and explicitly evict.
So this would make it easier to be able to move data in, right?
So there's a whole bunch of compiler optimizations
you can do by hand if you really care.
In my world, I really care.
So it's like the optimizations.
So if you think of multi-level caches,
what we're talking, right?
To just have different sizes.
So if we're doing cache blocking at every level,
one of the things that would be nice
is if you could just block or lock the whole chunk.
Right.
Do you expect these kinds of memory systems to have that kind of capability?
Not just a line, but a block?
I think what is done at times to achieve that
is to make sure that the application's requirements are met by the hardware
and not have anything ever move.
So, you know, this is common in dedicated systems that are doing one thing.
So if you basically have one application or one service,
you know, that there's a little memory left over for the operating system
and the application gets everything else,
then it doesn't have to worry about memory moving around.
But in terms of hardware support, I'm not sure.
Yes?
Yeah, on your memory management,
these cases show that the NEM working back
to the place that they'll dim in a Italy mode.
But does it say anything about external energy source,
like a supercap.
Is there plans to move forward with that?
Which one is that?
You're going to provide to the Type N or the Type F
or the Type, or actually the Type B.
Because Type N does have super caps.
Right.
I'm wondering, it has management software
to detect that.
I'm just wondering, going forward,
is that going to be supported?
Who owns the management software?
I'll get to that, who owns it, soon.
In just a couple slides here.
In terms of the SuperCAP, that's something that's interesting from a management point of view.
A lot of it's pretty opaque right now or vendor-specific, and that's one of the other challenges here.
But that's true with all devices.
I think the management solutions will be a combination of what can be done generically across a variety
of vendor solutions along with the vendor specific stuff so you know i think that's common for you
know in just about any kind of hardware devices there's a little bit of each but um the um i
believe that super cap i am again i've seen some of the java work uh jeddak work it's as far as i
know it's not publicly available,
and I shouldn't be speaking too much about it
other than what they talked about at my Flash Memory Summit.
But I don't recall seeing that they were too specific
about the power source information.
Other than they're also talking about
some manageability interfaces
something else that could be monitored
if the vendor wanted to provide that
but I don't think they were necessarily describing
exactly what had to be there
so next generation memory concepts
what we're anticipating for the next generation
memory is not a monolithic resource I've mentioned this there are different kinds of memory Memory concepts, what we're anticipating for the next generation.
Memory is not a monolithic resource.
I've mentioned this.
There are different kinds of memory.
They have different features, different quality of service, different characteristics.
Traditionally, memory has been treated as a monolithic resource.
Even though NUMA has been in place for quite a while on systems,
there's an awful lot of software around manageability that knows nothing about NUMA has been in place for quite a while on systems. There's an awful lot of software around manageability that knows nothing about NUMA.
So these are a variety of things that
are kind of forcing us to think about a change in the way
we approach memory.
So that's probably the biggest takeaway.
Multiple types of devices plugged into the memory bus.
Devices may coexist or require their own channel.
They may work cooperatively or may be segregated.
BIOS recognizes distinct device characteristics.
Management tools need to differentiate memory types and manage accordingly.
So there's slightly different issues between management software and
applications that are using the data
but some of the stuff overlaps
and I expect to see
more and more awareness of these differences
and how to
react to issues
and also to optimize
Configuration required
I mentioned this earlier, that
what's bubbling up soon
is memory systems, NVDIMM
type systems that have different types of
capabilities and the administrator can make choices.
So volatile versus persistent, interleaved, mirrored sets, those kinds of choices
that you would have, block access, bite access, cooperative relationships, that was, you know,
NUMA affinity being the main one, but there could be other kinds of relationships that you may want to be aware of.
Constraints, topology restrictions, operating system support, workload requirements.
And the workload requirements really ties into the ability to make choices around the types of configurations, choices that you have with NVDIMMs. Andy talked about cases where the block-addressable pseudo-SSDs
is the best way existing applications work.
This may be a good starting point.
As evolutions get smarter about working with the persistent memory model,
you should be able to get better performance
with a PMEM-aware application with PMEM hardware.
And so I think there will be transitions of times
where the new version of the database starts supporting PMEM
and you can switch over your hardware.
Is it possible that the use of persistent memory
would simplify the amount of devices
on the memory bus?
At this point, it doesn't seem like it.
But it may not make things worse.
But it certainly opens the opportunity for things
to get pretty complicated.
I mean, there's been a real significant attempt
to allow different kinds of devices to be used on the memory device,
on the memory bus with NVDIMMs, which adds some complications,
and particularly population rules.
It was touched on in one of the previous slides there.
Things have to be on the same channel, or they must be on different channels,
and there's things like that that you have to be aware of.
Moving on, persistent handles. So the idea here is similar to other kinds of
devices. There'll be ways that names for the persistent memory devices, you may
not be able to use them one by one.
It's really kind of a choice between the operating system, driver design, and hardware choices.
But you're going to have something, just like with block devices, things that look like file names that are actually representative of names of devices. So for the DAX work that's in Linux, the PMEM devices or namespaces that are discovered
are just a dev PMEM0, PMEM1, PMEM2.
So it's a very simple namespace.
And then on top of that, you mount a PMEM-aware file system,
and then you have names that you can set for the files
that represent the blobs of persistent memory that get used by applications.
So file systems and the drivers support exactly this type of behavior for other resources,
very similar to what you see with disks.
They may be able to allocate and label a region of persistent memory. So in some cases, we expect NVDIMM devices,
just the capacity you get for a device is exactly the capacity of one DIMM,
or maybe all of them, all form into one.
But there's, I'll mention it a little later in more detail,
an ACPI standard, NFIT, that provides a way for basically a separation of those,
kind of similar to the way that storage arrays will allow you to configure a blob of stuff
together at the RAID level and then divide it up into a bunch of virtual logical units. Deallocation when done, modify if needed.
Options are there that may or may not apply to the technology.
Data management needed.
Persistence creates reliability, serviceability, and security concerns.
Interleaving NVDIMMs complicates failure domains.
If you've
worked with servers that you needed to worry about interleaving what's there
it's already complicated so this this makes it even more so yes Tom but you're also looking at maybe a self-encryption function as well.
Yeah, the need for something is very clear.
The right answer is a little bit unclear.
We've started some discussions in the NVM programming twig
and SNEA of, if hardware provided 10 encryption keys for NVDIMM,
would that be enough?
Consider you're running 20,000 containers on your server.
Is 10 keys enough?
Does that even make sense to think that way?
Are there going to be niche cases where you absolutely
need one or zero, but you probably
couldn't take advantage of more than that?
Is software encryption the right answer?
So it's a lot of considerations.
I was just kind of getting started with that,
but that really wasn't a primary concern of the TWIG when we got started.
But the more you start thinking about this,
this is something we really need to have an answer for as an industry.
When the car goes off, the data doesn't go away.
Right.
Andy had all those pictures of core.
I remember my first, I got out of college and went to Sperry. the data doesn't go away. Right. Andy had all those pictures of core.
I remember my first, I got out of college and went to Sperry.
I was immediately swallowed up into a Unisys.
Had a beta customer that was having all kinds
of funky problems with their storage with torn rights.
What a totally new concept.
And they, I had to debug with somebody is that the Air
Force it's great to have a highly confidential site as a beta customer his
he couldn't he had to basically get centered he had to walk have
intelligence people in the room with them before he could answer my questions
on the phone and they needed a quiet place to work,
so they used a room that was originally
built for the core memory.
They'd have a power loss in that data center.
They'll grab all the core memory out of the servers
and lock it up in this concrete vault.
And it was like a panic room.
So that's where he called me from.
At the time, it was just you would hang out and have coffee
and phone calls.
But it was that close to that error.
I didn't work with that.
It was just afterwards.
Failed server.
Need to migrate NVDIMS to a new server.
This also gets tricky.
Your new server may not have the same kind of socket configuration that you need,
but if it does, assuming you have one that's exactly the same, you should be able to move DIMMs and reinsert them,
paying attention to the same population rules, things like that, that may be useful in some cases.
But you probably need some help from the software to tell you what strategy you need to take.
Repurposing NVDIMMs, if you decide you're done with them in their current use,
then you have to worry about the encryption problem and then how to clear it, make sure that any sensitive data is gone.
Optimization is hard.
I talked about NUMA a bit, and there's probably nothing really new on this slide versus what I said already.
The operating systems are already doing a lot of the tricky work with today's NUMA.
They will again schedule threads to run optimal for the memory.
That's really easy to do with volatile memory because
you have to make all those decisions since the last time the session was
booted. Anything that was there before is gone
after the reboot. It gets a lot more complicated when you have persistent
memory.
Now you've got, you started an application running,
you've got its threads running on the
core 2, or cores on socket 2,
which made perfect sense
at the time and then the thing
decides to open up a
persistent memory file
with affinity to core 1
what do you do?
In the case of
typical NUMA
right now, you can have some non-optimal IOs
you can try to have the application set up, again, using utilities that are available to set up that affinity ahead of time.
So generally, we think that existing tools will suffice with persistent memory versus volatile memory and NUMA now.
But the on-socket memory is a brand new headache.
It's going to be a real challenge to get that right. And optimal. Things will work, it will
be trickier to get things optimal.
Public resources. What's going on with standards and emerging work and all that, and open source.
So I have a little chart here of kind of a stack,
and just using this as a reference for where the standards are.
Again, I don't have been loosely following the JEDEC work.
I believe that the impact is going
to be really a combination of where the blue boxes are here
and the physical access to the DIMMs themselves
is where their NVDIMM work is going on.
But they're going to be something
very important to monitor.
I just didn't have public information yet to share.
So they're missing from my discussion here, very important to monitor, I just didn't have public information yet to share.
So they're missing from my discussion here, but they're really a big force.
But looking at the top from the kind of the high level user space, there's end user tools,
command line tools, XML representation, integration with management libraries, kernel, in the case of Linux they have the sysfs,
iOctools, hooks into the drivers.
These are going to be extended the way they have been for other kinds of devices as ways
for management software to figure out what's going on.
Out of band, this is through BMCs and that kind of communication being done,
not through the normal operating system interfaces,
but you can monitor things even before the OS is even installed.
You can monitor things when the system appears to be offline.
Jim mentioned this morning the need for a power reset.
It's one of the other tricks that's provided with VMCs.
So IPMI is kind of the way the world is right now,
and Redfish is emerging standard for out-of-band interface to servers,
working with that activity to get NVDIMS plugged into the model.
BIOS.
ACPI.
ACPI 6.0 came out earlier this year.
I mentioned earlier they introduced NFIT.
NFIT, like I said, adds this level of indirection
between the physical topology and the logical topology.
So if your PMEM devices support this,
you can create what appears to be a pool of capacity
from one or multiple NVDIMMs
and divide it up into multiple namespaces.
This is not a software construction.
This is not software array.
This is at the BIOS level.
This is being done.
Once the configuration is done and saved off,
there's not like a RAID manager or a partitioning manager going on.
Once it's done, it's fairly passive in the firmware.
But you can configure this stuff at the BIOS level.
But again, you have options of configuring this stuff from
software as well if the devices support it.
You don't have to use that, so there's also going to be NVGIMS that have much simpler
really read-only configurations.
Hardware look at registers, firmware interfaces, it's all part of the fun puzzle here. So for the BIOS work that's going on,
I mentioned BIOS tables that describe the NVDIMM resources, the operating system. Actually,
this is the same kind of tables that talk about all kinds of hardware, motherboard resources,
or system resources really. So so the data structures describing them
are put into memory, the operating system can read them.
So this has been extended to have multiple kinds of devices
for memory, not just one.
It also includes, you know, the ENFIT,
this ability I was just talking about
and kind of a level of indirection.
DSMs or device specific methods.
There's a link to the examples paper here.
So ACPI 6.0 really opened up a lot of functionality related to NVDIMMs at a higher level than what
JEDEC is doing at this mostly the level where you'd have interaction with the operating system,
though operating systems probably directly be working with some of the JEDEC interfaces as well.
At the kernel level, the Linux PMEM driver,
which is known as DAX, direct access,
has gone upstream in kernel 4.0.
This whole universe of the BIOS support that I was just talking about,
that's pretty well all in place in 4.2 kernel, which is also now upstream. There's going to be
patches and tweaks and things like that for a while, but the basic structure is in place. You
can use it. As far as I know, there's no standard distros that provide it enabled yet. But wait a couple of weeks, and it might have changed.
Linux is a very fast-moving community.
So those of us that are playing with this stuff now,
it's not terribly difficult to do a kernel build that
enables these features.
One of the cool things that's provided there is a little hook.
So the BIOS tables I was talking about
representing different memory types is the EA20 table.
And if you identify yourself as a NVDIMM type,
then the Linux with these changes in
will just not put your memory into the volatile memory pool.
So there's a special syntax for a command line option
that says I want to create a physical memory address range
and have you pretend it had an NVDIMM E820 type.
And when you do that, you just have a range of memory.
It's really volatile memory.
You know it's volatile memory.
It's the stuff that was there yesterday.
It didn't change any features, but the system treats it.
You can use the DAX features, you can
mount a file system, you can use MMAP, it will not do the paging that Andy had talked
about earlier, and you can start looking at how you adapt your software to work with persistent
memory until you reboot. It's not persistent anymore. But it's still a very useful feature
to evaluate how your software works today.
The PRD is a Git repo
which picks up the kernel
and adds emerging work in this area.
So Intel is managing this until all of the bits and pieces
get in place.
But it's very easy to grab a tarball from it, or a zip file,
I guess.
Or you can use Git and download a copy of this.
It's already patched in the not quite approved pieces and get a kernel up
and running pretty quickly.
The namespace spec
is
available through pmem.io.
That's the same place that the
NVML library is.
It's just different areas in the same site.
There's also the device writer's
guide.
Low-level Linux-only library. This is a little
command-fine tool for some NVDIMM management services, just kind of a starting point. In
addition to that, Intel has been defining experimental standards for SIM-based management.
So we worked in two different groups because right now the server side of SIM-based management is done in DMTF,
while the storage side is done in SNEA, in SMIS standards. So we realized that it really makes sense
to update related stuff in both sides
in order to cover this.
So there's a read-only model for memory
that has been updated to include
more information about pneumotypology,
because that was just missing in the previous version,
but also the possibility that memory might be there, DIMMs might be there,
but not part of the volatile memory pool.
So it doesn't have all of the configuration choices because the model there was much simpler before
and we didn't really want to make it that much more complicated.
But at least you can recognize that there's this much capacity
and DIMMs are attached to the system, but only this much is part of the
volatile memory pool. To understand the rest of it, you have to look at the
SNEA models, and those are in two layers. These profiles tend to be small pieces
that you piece together depending
upon the capabilities of your hardware.
So one of them is just managing the system address map
and allocating the system address map.
So basically, it says how you take your capacity
and expose it as whether it's block addressable,
persistent, byte addressable, persistent,
byte addressable, persistent, or volatile.
So you can do that level of tweaking, and that's about it with that one.
The other is around the persistent memory regions,
following the same NFIT model that ACPI describes.
If you're interested in those, they're emerging,
and with anything emerging, you can get more information if you're an insider.
In 2016, you should be able to see public versions of the SMIS-17 spec, including these profiles,
from the same place where all of the in-review SNEA technical specs are
through the URL here
if you're a SMITSG
member, this is a working group
that deals with the SMI stuff
you can see the emerging
work now and there's actually ballots
going on on approving
these right now
and there's a URL for those
that would be the latest.
I got an older version of this work
made public for review
several months ago.
But as with anything emerging,
a lot has changed in that time.
So it's significantly different than...
Really, the model is about the same.
The names have gone through significant change.
People reviewed them and said,
I don't like that name, and changed this around.
A couple things got combined.
But that's available.
It's publicly available now.
So all those URLs are there.
Just a real one-minute description.
I mentioned that Intel is looking at an implementation of this.
We have implemented the SIM model.
We're kind of going through validation now.
We are really using the same general model outside of SIM as well.
So if you look at the CLI that we have, when you look at the CLI we have, because it's
really not available yet, it will look a lot like the way that the SIM model works.
And the idea is that depending upon the context,
depending upon the operating system,
there's no perfect answer for manageability tools right now.
For the software that is SIM-enabled,
there's a lot of desire because there's a place
where you can drive standards and have discussion.
The rest of it just kind of evolves to meet the problem.
But at the moment we're going through several approaches.
We're also actively looking at now a Redfish adapt.
A few months ago we didn't know what Redfish was and I think six months ago probably not
much of anybody did.
It's become a big force in server management.
So we've been monitoring that and realized that's a simpler type of thing.
It's much more focused.
But we're trying to shoehorn the same model in there
and work with that community as well.
So trying to drive standards where we can.
But like I said, there's going to be cases
where there's going to be vendor-specific management.
There's just always an aspect of hardware management
that really isn't worth standardizing because it's really vendor-specific management. There's just always an aspect of hardware management that really isn't worth standardizing,
because it's really vendor-specific.
Then you kind of get to a layer above that,
and some generic rules apply.
So we anticipate a combination of both of those for a while.
Any other questions?
All right. So it's about moving
things between systems
and maintaining their identity.
So if you have something
which maps on one system,
it's not particular.
Yeah, it's just as much fun
as moving a bunch of disks
in an array group.
Yeah.
Obviously, there is metadata
on the device itself,
which kind of identifies the.
Yes.
Some of the information that is in some of these guides
and stuff I said, that reference there talks about how
metadata is laid out.
Some of the identity information is actually
built into the
At the hardware level as well, so there's you know
Like like IDs that that could be used but it's a kind of a combination of both of those and and along with the population
rules that are server specific and
so like I said If you have to a spare copy of the same server that you're pulling the DIMMs out of, chances are pretty good you can repopulate your data.
Otherwise, take up backups a lot.
Any other questions?
All right.
Well, thank you very much.
Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list
by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further
with your peers in the developer community.
For additional information about the Storage Developer Conference,
visit storagedeveloper.org.