Storage Developer Conference - #26: Persistent Memory Quick Start Programming Tutorial

Episode Date: November 10, 2016

...

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast. You are listening to SDC Podcast Episode 26. Today we hear from Andy Rudolph, Architect of Data Center Software at Intel Corporation, as he presents Persistent Memory Quick Start Programming Tutorial from the 2016 Storage Developer Conference. So I'm Andy Rudolph. I work at Intel.
Starting point is 00:00:50 And this is a rapid introduction to programming with persistent memory. And in particular, I'm going to focus on the non-volatile memory libraries, or NVML, which is only one such solution to this problem. So persistent memory is a new thing. It's an emerging, relatively new thing. And I just want to make it clear that I don't think we know what the right answer is for programming to persistent memory yet. I think there's a lot of right answers. And this is one of those things where communities like the Linux community really flourish
Starting point is 00:01:23 because you just throw the ingredients out there and you let them goof around with it. And they come up with all this stuff that nobody ever thought of and then after about five years a bunch of it dies out and three or four solutions tend to get, become the primary focus. So this is just one of those solutions. So, you know, I'm not trying to undersell the library. I think we actually have done some nice stuff here, but I don't want anybody to think that this is the one and only true solution yet because this is all areas where we're learning. So actually, if you sit in this room long enough, Doug Voigt will be talking about what persistent memory is in the programming model.
Starting point is 00:02:04 I think we may have done these talks maybe in the wrong order. But I thought I would do the two-slide version of it here, which is, you know, for all of my life and well before that, when a program is running, it pretty much bifurcates data into two areas. There's the stuff that sits up here in memory. It's mapped right into the application's address space, and the stuff that sits in storage, and it's either accessed through a file system or sometimes not with a file system, but usually there is. And so, you know, this is just like, this is what you would teach your grandmother
Starting point is 00:02:38 about computers, right? There's just these two basic tiers. But what's exciting about this is that now we're moving into this world where there are three tiers. And I always make kind of a big deal out of calling this a third tier, just to make it clear that I don't consider persistent memory the replacement for storage. And I don't consider persistent memory the replacement for memory. It really is not either of those. Maybe someday. Maybe there will be a day when we decide systems only have to have persistent memory in them.
Starting point is 00:03:07 They don't need DRAM anymore. But that day is pretty far away, in my opinion. And maybe there's some day where we'll decide, you know, we don't need storage anymore. We'll just use persistent memory. That day is still pretty far away. So I think of it as three tiers. Okay. And with this tier, you get the load store access of memory with persistent memory
Starting point is 00:03:25 and all the other stuff that you can do with memory like DMA to it and stuff like that. But it keeps its contents when the power is off. So it really is kind of this interesting animal that hasn't been around since the 1950s when we had core memory. So I guess it has been around for a long time. Just hasn't been used in a long time. So what's hard about this? Aren't we just done? Here's another tier. Use the load and store instructions of your computer. You know, say star P equals value and it stores there. We're done. And the persistent memory programming model gives us a way of getting to this stuff for an application.
Starting point is 00:04:09 Easy, right? The problem is that if you can store things in memory and have them be persistent, then the first thing you're going to start asking for are transactions. Because you'll be in the middle of making some change to a data structure and the program crashes or the machine crashes, you come back and now
Starting point is 00:04:28 that data structure is half changed. Okay? So we avoid that in memory all the time because we put locks around it and it's volatile. So when you're holding the lock, nobody sees the half done change and if you crash, nobody sees the half done change because the whole thing's gone. But with persistent memory, if you crash, you've torn it. You've torn the operation. And now you have this inconsistent state.
Starting point is 00:04:50 So that's what's hard. That's what the rest of this talk is about, is how to make that problem easier to deal with. So this actually shows the whole persistent memory programming model kind of in a nutshell over here. But the whole point is that an application after getting the persistent memory mapped into its address space, it never calls through the file system again. It doesn't need to, right? Now it has direct access.
Starting point is 00:05:18 And that's like I say, that sounds really cool but it has some hard problems. So this is where we're putting the library. It goes right here. It's in user space. The application calls it. It's optional. If you don't want to use the library, you get to solve all these problems yourself.
Starting point is 00:05:34 That's great. No problem with that. And what we've done is the library's on this website called pmem.io. And we call it the NVM libraries, NVML, but right now it's a suite of six libraries. There are a few more coming. In fact, one I'm going to talk about on Tuesday, I think, in a talk here.
Starting point is 00:05:54 And you can see three of the libraries are actually transactional and then there are a couple of other libraries. I'm going to talk about what they are in a minute. But just a couple more background points about the library. We had to start somewhere, so we started out making the library for 64-bit Linux. There's a team of people, one of which is sitting right here, which is porting it to Windows. And it's actually going really, really well, much faster than I thought it was going to go and being done by several companies.
Starting point is 00:06:28 Microsoft's involved, Intel's involved, and some other companies are involved. And so all of that's being done out on GitHub open source and so you can actually go play with it today. You can pull it over and start playing with the stuff that works on Windows if you wanted to. As far as the Linux stuff, we've already delivered our 1.0 version a while back and several distros have picked it up already. And I'm going to show you an
Starting point is 00:06:57 example of one in a second here. So we're making progress on Linux and Windows anyway. We actually got a question on the mailing list a couple days ago about, I tried to compile this on OSX and it didn't work. And much to my surprise, somebody replied and said, oh, I have patches that make it work on OSX. It's like, whoa, okay. How do you get persistent memory in a, I don't know, whatever. So people are looking at it and picking it up.
Starting point is 00:07:24 Okay, so let's talk about what those individual libraries are, right? I talked about this little list of six libraries here. Let's talk about libpmemobj first. I'm guilty for the name, pmemobj, and now I hate it. I actually at the time I was thinking, you know, obj like object storage, like, you know, variable sized blob, not obj, like object storage, like, you know, variable-sized blob, not obj, like Java object or something like that. It really just means that you want to be able to manage variable-sized blobs of persistent memory. Probably we should have
Starting point is 00:07:55 called it something like libpm transactions or something. But this is the library where we have general purpose persistent memory transactions, and this next part is actually a pretty important part of the library. Yeah, SW. Good question. Are these, did you make any changes to the slides versus what's posted online? Did I make any changes in the slides versus what's posted online? No, but these are not posted online. Okay. But you're going to?
Starting point is 00:08:22 Yes, but we are going to post these online. Yes. And. Okay. But you're going to? Yes. But we are going to post these online. Yes. And for the pre-conference things, actually, I never got asked for the slides. But I guess they do have a place to put them, right? Male Speaker 2. Yeah, yeah. Good.
Starting point is 00:08:36 So I will deliver these. And so, you know, everything you see on the slides will be available to you. Male Speaker 2. And there will still be changes to them. And I still have a chance to fix a few typos during the time. Male Speaker 2. time. Okay. So the persistent memory-aware allocator actually turns out to be pretty important. If you think about it, the persistent memory model says call mmap and you get persistent memory.
Starting point is 00:08:59 So, you know, remember, we're talking about terabytes of this stuff, right, like three terabytes per socket. So here I am on my two-socket system. I have six terabytes of this stuff, I call mmap, and I get back an address of six terabytes of memory. Huh. What's the first thing I'm going to want to do? Well, I've got to carve it up somehow, you know, I want to put my favorite struct in the memory or something like that, I need an allocator.
Starting point is 00:09:19 Why can't I just use libsys-malik? Because libsys-maloc has volatile memory assumptions all through it, right? If I tried to put the malloc library on top of persistent memory, the first time the program crashed and came back, the heap would be in this horrible corrupt state. It's not, malloc is not a file system-y kind of, you know, consistent data structure. So that's what we implemented here. So you could actually think of libpmemob just like a little user space file system. It has allocation rules, it has a journal,
Starting point is 00:09:47 it has all sorts of undo logs and redo logs and complicated things to keep it always consistent. We also took some common operations and just made them atomic. We kind of differentiated in our man page between transactions, which are these kind of big flexible things where you can take a bunch of operations and make them transactional,
Starting point is 00:10:06 and an atomic operation. An atomic operation is like, we have one that says, allocate this size memory and put it on this doubly linked list, atomically. If any of that thing that I just said gets interrupted, none of it happened. And then, kind of the most important part of the library is the application doesn't have to worry about the hardware details. You don't have to worry about all the cache flushing and stuff like that. The library does all that for you. You don't have to worry about what size of a store is actually atomic in the hardware. It's 8 bytes, by the way, so not a, not, there aren't any really big atomic stores.
Starting point is 00:10:41 And so the library takes those little 8-by byte atomic stores and makes, you know, gives you megabytes of atomics if you want it. The library knows how to do that. We have lots of examples. And so because this is actually just a 50 minute talk and, you know, there's only so much I can go through in that amount of time, the best thing I'm going to be able to do for you, I'll show you a little code snippets but I'll point to the examples. They're actually in the source tree. This is on GitHub.
Starting point is 00:11:08 PMM.io is just a domain name that goes to GitHub, and you can click to the tree there, and I'll show you how to do that. But we have examples there of B-trees and hash tables and a Space Invaders game that is always persistent and stuff like that. So this is probably the library you want. If you want to use the NVML collection of libraries, this is probably what you're looking for. So this is the data path then, you know, a little more explicitly than I drew before. Here's the application. It has some persistent memory here. This is libpmem-obj right here.
Starting point is 00:11:41 And it's built on another small library called libpmem, which you saw in that first list. So libpmem is just this tiny little library that does nothing but the minimal things about what instructions are there for flushing to persistence, what special instructions are there for going around the cache and stuff like that. All that stuff where you don't want to have to deal with the assembly language, that's in libpmem. It's a tiny library. We wrote it in a few days. And libpmem-obj
Starting point is 00:12:09 is built on top of that. And so libpmem-obj is the one that gives you all these transactions. But if you didn't, if you want to do your own transactions, for example, you probably would still just use libpmem yourself, just because you don't want to have to go and figure out all those weird assembly language things that the PMEM does. One more point here is that doing transactions on storage takes some time. And if you think about it, one of the things that a transaction involves is a store barrier. You're going to make a change, so maybe you do write-ahead logging, and you make that change to a journal first.
Starting point is 00:12:49 You need to know that that change is on the journal before you consider it done. Right? So you need a store barrier. So on storage, a store barrier can only be done at block level. So even if you want to change one byte, you're signed up for a block. Well, this is one of the cool things about persistent memory, right? You can do it at any level. Well, of course, cache line is really what happens because the computer organizes memory in cache lines. But still, it's smaller than a block. So it's much faster. You get these really fast transactions, you
Starting point is 00:13:20 know, hundreds of thousands of these transactions a second is no big deal for persistent memory. But the downside is, in order to do this, the app had to change. It had to change to use one of our new APIs. Again, I want to make sure that it's clear that that's not the only way. We have a whole bunch of things that are transparent to the app, right? We're going to use persistent memory to make the storage programming model work. It'll be fast, right? This is just the fastest and that's why we talk so much about this one. Actually changing your app, re-architecting it to use that third tier is going to be the fastest. But it's
Starting point is 00:13:53 not the only way to go. If you just want to say, sorry, my app's written already, I just want to use persistent memory and pretend that it's a normal file system, that's going to work, okay? So this is not the only answer. Now here's a normal file system, that's going to work. Okay? So this is not the only answer. Now here's a really cool thing, though, we can do if we do change the API. So today, if you're using storage and you want to introduce replication,
Starting point is 00:14:17 you just do it in the kernel, right? You just intercept all the storage operations and you replicate them. And the application has no idea this is happening. Most applications have no idea that this is happening. Right? That's very cool. And this is, it's been done this way now for decades.
Starting point is 00:14:32 Well, we took away the control point in the kernel here. The kernel, no kernel code runs. Right? You're doing loads and stores. There's no kernel code here. So what if we wanted to do replication? Well, we have an API. We have a library.
Starting point is 00:14:51 So sure enough, one of the things that libpmmobs can do is replicate. Now, in the current version that's out on GitHub today, it replicates to a local replica. But in the version that's actually, a lot of this code is already out there. It's just not, we just, we haven't released it yet because we're still testing it but yeah we haven't tagged it in a release yet but we have this ability now to do replication over RDMA to a remote persistent memory so here the app hasn't changed at all same API but some administrative steps happened and suddenly your entire persistent memory pool is replicated so that's that's another advantage of the library. Replication is transparent to the application. Okay, so then just before I go into more of the details about how you
Starting point is 00:15:33 actually use this library, let me say quickly what some of the other libraries are. We have two others that I mentioned on the list of libraries, libpmemlog and libpmemblock. And these are for persistent memory resident log files and persistent memory block devices, essentially. Okay, so you can append transactionally to a PMEM log by calling this API, or you can just transactionally write to a big array of fixed blocks, like a block device, using this library right here. In fact, today on Linux and Windows, because this is
Starting point is 00:16:16 shipped on both OSes already, they include the ability to use just a normal file system on top of persistent memory. And to do that, they make the block writes atomic because file systems have baked into them, like NTFS has baked into it this notion that you won't get a torn write by a power failure. But persistent memory writes are done by, like, mem copy. So if you get a power failure in the middle of mem copy, it's a torn write. There's nobody's going to finish that block for you. So we came up with an algorithm for making these blocks atomic, and that's the algorithm.
Starting point is 00:16:49 We put it in this library. The same algorithm is on Windows in the kernel and on Linux in the kernel. So it's actually kind of getting a lot of play, this one algorithm. So you could use libpmilob for any of this because it's the general purpose one. These are just a little faster for these specific usages because they know they only have to do one little thing. And then I mentioned libpmem is kind of a helper library. It's used by all the other libraries that are persistent. It has its
Starting point is 00:17:22 own version of flush and memcopy, no transactions to it. But most likely what you want is libpmemoption. Okay. Finally, it's worth mentioning that three of our libraries are for volatile usages. Okay. So this sometimes hurts your brain. Why do I talk about the volatile use of persistent memory? But remember, I told you before, some random company out there has announced that they expect to have like 3 terabytes of this per socket. So if that random company is right about this, they're going to be huge capacities, right?
Starting point is 00:17:59 And they said it was going to be cheaper than DRAM. So you could easily imagine that people start using this stuff as another tier of just volatile memory because it happens to be cheaper than DRAM. So to put together a 6-terabyte DRAM system today is really expensive. It's kind of prohibitively expensive.
Starting point is 00:18:21 I don't know of a lot of systems with that much memory in them. There are a few. So we hope that doing that with persistent memory will be easier in the future. And what these libraries do is they just give you the familiar malloc and free interfaces, like the libc malloc, but they give it for the pool of persistent memory.
Starting point is 00:18:41 We started out with these two here. One is just a malloc and free library and one is transparent to the application where it just magically replaces your calls to malloc and free with these calls to libvmem. Then we collaborated with another group inside of Intel who is making a library for allocating things like high bandwidth memory and memory from different NUMA nodes. And we decided, their library was called libNUMA, and our library was called libVMEM, and we decided, well, this is silly. We're all just talking about different kinds of memory. So we put all the code together and we called it libMEMKIND.
Starting point is 00:19:15 It's also open sourced on GitHub and also being delivered in several distros. And this is actually the focus of all of our volatile stuff from here on out. So if you haven't started with one of these libraries, Memkind is the one you should start with for volatile usages. Okay, so where is this NVM programming model that I say we're building on? It's in the OS. And so Doug later on is going to show you a diagram that I just sort of shrink it down, but it looks like this.
Starting point is 00:19:45 And the whole point is that the OS's shipping now support this, right? So Linux and Windows both have this model. We're putting NVML on both of these things. And I just have to make this point one more time that using NVML is a convenience. It's not a requirement, right? The programming model itself have added if you want to just have the raw persistence. Yeah? I think there's a minor typo there where you should say the SNEA NVM programming model.
Starting point is 00:20:13 That's true. This should say, where is the SNEA NVM programming model? It's an industry standard. It is. It's an industry standard. We created it with a technical work group that's part of SNEA. And Doug Voigt is the chair of that group, and he's going to talk later. So thank you.
Starting point is 00:20:32 I will actually fix that. So if you just, you know, want to use persistent memory today, what are your choices? If you just want to start playing with this programming model. Well, one is you could go out and buy. There are a few products out on the market that are NVDIMMs. By the current terminology, the ones that are persistent memory are called NVDIMM-Ns. And those are the DIMMs that you can actually use in the way that I'm talking here where they're byte addressable. There are other kinds of NVDIMMs that look like little disks that are on the DIMM form factor. That's not what we're talking about, right?
Starting point is 00:21:07 We're talking about load store accessible stuff. But if you just want to start playing with this and you don't have any persistent memory yet, you could just decide to emulate persistent memory, and this is how we do almost all of our development today. We just use DRAM. So we put as much DRAM as we can afford into a system, and we carve off part of it and say, let's pretend that that's persistent memory. Okay? And so the programming model that I was referring to, the SNEA NVM programming
Starting point is 00:21:35 model, builds on the idea of memory mapped files. So actually, anything that you do with memory mapped files will work. And you can use that for development. And it'll work just fine to get your code working. All the NVML libraries, they work fine on any memory mapped files will work and you can use that for development. And it'll work just fine to get your code working. The whole, all the NVML libraries, they work fine on any memory mapped files. I run them on my laptop all the time. I don't have persistent memory. I don't emulate persistent memory. I just run them on memory mapped files. It works fine. But of course, you'll get that non-optimal performance because every time I hit one of the store barriers, it'll call msync and msync flushes pages from a cache, from the page cache. So it's much slower. About, I don't know, 400 or 500% slower than what you would expect.
Starting point is 00:22:11 On the other hand, it's, you can just pretty much use any 64-bit Linux and it'll work. But if you really want to do some benchmarking and actually, you know, use some DRAM as if it's persistent memory. We have a blog entry here on PMEM.io that describes how to emulate persistent memory. And actually, I should just be able to like flip over. Here it is. So you can just take a look at this in our PMEM.io and it explains to you what kernel you need and how to configure the kernel.
Starting point is 00:22:46 These are all these, like, configure screens that you have for building a Linux kernel. And it'll tell you eventually when you get down to the bottom here, there's all sorts of stuff about figuring out what addresses you want to make persistence, but eventually you'll end up over here mounting a device like DevPMem0, which is your emulated persistent memory. On the other hand, if you don't want to build a kernel and all that, what I recommend doing is using a distro like Fedora24. So Fedora24, easily available today, comes with a kernel with all this stuff built in it already. Yay!
Starting point is 00:23:26 And it works great. And I use it for all my development today. Not only does it have DAX and PMEM, which are the things that you need for emulating persistent memory, but NVML is also part of the distro too. Okay. Or you could get Windows. Yes. That's true. But it doesn't have NVML just yet, but it will. Or you could get Windows, yes.
Starting point is 00:23:47 That's true, but it doesn't have NVML just yet, but it will. So I couldn't help myself but to show you what I do with Fedora here. So here I am. I'm sitting on my machine at the Fedora 24 prompt, and I just installed it, and I typed dnf install libpmemobj-devel, and sure enough it found libpmemobj and asked me if I want to install it. So what's the dash-devel? So if you just want to run an application that uses libpmemobj,
Starting point is 00:24:12 then the package that you need would just be called libpmemobj, and of course the application would probably already have a dependency on that package, and it would have gotten installed when you installed the app. But if you want to do development, so you want the man pages, and you want the header files, then you want the dash develop version right here. So I answer yes and sure enough, there it is installing everything. This whole thing took just, I don't know, a couple of seconds. It really went fast. But notice how when I asked for libpmemobjdevel, I also got libpmemdevel, right? Because it, like I say, one library builds on the other one.
Starting point is 00:24:56 There's another package I want to tell you about. It's called NVML tools. It's again, it's also in Fedora and soon to be RHEL and SLESS and several other distros. Just using Fedora here because they got everything out first. Not unexpected, right? And so there's this great little command called pmempool and and you can see in the man pages, it has a lot of subcommands. So these are actually broken into separate man pages. There's a pmempool info, check, create, dump, and so on. And especially when you're doing development, pmempool info can be kind of handy because
Starting point is 00:25:38 it prints out a bunch of useful information about the pool. So what is a pool? I'm using the term pool and file almost interchangeably here. It's just a file with a persistent memory data structure in it. So like libpmemobj will make one of your files into a pmemobj pool.
Starting point is 00:25:56 libpmemlog would make your file into a pmem log pool. If you type pmem pool info on that file, it'll tell you which libraries you're using with it and when it was created and a whole bunch of other stuff. I actually have an example coming up of what that prints. But why do we use the term pool instead of file? Because we also have a thing called a pool set, which is a file which has a list in it of a bunch of other files. And that's because on the current Linux implementations, a device like dev-pmem0 can only be all of the persis-as big as all of the persistent memory on a single socket.
Starting point is 00:26:31 All right. So I told you they could have like three terabytes per socket. So if you had a two socket system and you had six terabytes and you wanted to make one big six terabyte pool, you'd have to use a dev-pmem0 for one socket, dev pmem1 for the other socket, but NVML would gladly join those together for you and treat it like one big pool. Okay, so you don't have to worry that it's, you know, split in two. Okay, so that's the tools package. So at this point, I've given you the basics of why we have NVML, why we have what persistent memory is good for, hopefully, and how to get yourself a system all set up where you
Starting point is 00:27:13 have NVML and you have the tools. So what I thought I'd do then is move into three quick start examples because remember the name of the tutorial was a quick start tutorial. So I came up with three things. These are kind of unrelated, three unrelated examples. They're short and they make a few points but point you at source if you want to go and look more, right? And so all of the things that I'm going to show you,
Starting point is 00:27:38 you can just pull them out of the GitHub tree and play with them, build them. Complain to me on email when they don't work, stuff like that. And so the three I'm going to do here, the first is, remember I said it's very interesting to just use our library as a memory allocator because you can't just use the libc malloc. So we have a persistent memory safe memory allocator. And even if you don't want to use transactions, you can just use it as an allocator. And you can see here I just have these kind of, in my diagram, I'm just showing this big pool of memory with these kind of random blobs all over it that have been allocated. They don't even seem to point to each other or anything. And I'll show you how that works. On the other hand, usually you want to find kind of the root
Starting point is 00:28:19 of your data structure and use it from there. So we have a special allocation, which we call the root object, which is at a well-known location. You can always get back to it. So after your program crashes or exits even and comes back and it wants to start over with your persistent memory data structures, it usually has to start somewhere. So it starts with the root element. And I'm going to show you how that works in the second example.
Starting point is 00:28:43 And then in the third example, I kind of put two of these things together and then show you in addition to that some transactions. And we have a hash map example that is nicely transactional and everything. And again, I'm just going to be showing you code snippets. So we're not going to do too much of this just because in the amount of time we have, there's only so much code I can throw up on a slide and have it be valuable. Okay, so these are the three examples. So let's go to the first example, persistent memory safe allocation. Okay, so here are some code snippets and I'm going
Starting point is 00:29:19 to talk about a snippet and then I'll go to a slide for some more detail, then I'll come back to this slide and talk about the next bit. This is actually just the function signature for our allocator. So, malloc has one of the simplest function signatures in the world, if you'll remember, if you're a C programmer. Oh, by the way, I should say, why am I doing all this stuff in C? Because again, we had to start somewhere but actually on GitHub right now, we have Python that's actually crawling along pretty well right now, built on top of these libraries. And so it's much simpler than this, right?
Starting point is 00:29:51 Like in the Python dictionary, when it's a persistent dictionary, you just say dictionary equals value, and that happens transactionally. So the Python runtime takes care of it all. But how is the Python runtime written? It was written using these libraries, right? So this is what I expect to happen. I think C programmers and kind of early adopters and people who are implementing higher-level languages,
Starting point is 00:30:18 they're going to be the ones who are going to use NVML. And then the people who use Python and Java and so on, they'll benefit from this much more language integrated way, right? But we didn't want to wait for like changes to the C language for this, right? So that's why these libraries work the way they work. And that's really going to become apparent when I start talking about these macros. Sorry, we didn't want to modify the compiler. In fact, that was one of our basic design goals of NVML. It had to work today. People needed something, right? Yes, thank you. People needed something. But,
Starting point is 00:30:50 yeah, so that's the good news. The bad news is sometimes to make something work, you need to use macro magic, and people hate macro magic. Sorry. It was either that or force people to use C++ for the first time. Now we do have C++ bindings in the library now, but we started out with C. So this is the function signature for our allocator. It takes, you know, when you first open up the pool, you get back this opaque handle, which I call pop in all my examples for the object pool, PMAMOB pool pointer, something like that.
Starting point is 00:31:25 I don't know. I just call it POP. And let me tell you a little bit about this type, PMEMOID. So a PMEMOID is an object ID. So this is another important point about persistent memory. Remember, the persistent memory model is that you open up a file and you memory map it. So how do you guarantee that it shows up at the same location every time? What if your data structure is a tree and it has pointers in it? Those pointers are going to be invalid if the mapping comes up at a different location the next
Starting point is 00:31:54 time. So why wouldn't the mapping just be the same every time? Well, first of all, for security, you know, there's this thing called address space randomization that's part of every OS just about these days, where they map things at a slightly different address every time so that a virus can't just depend on the location of some data structures. But even if you tried to force something to be at the same location every single time, that would be pretty hard to do because shared libraries change their sizes and things like that and these mappings move around. So you kind of have to just assume that these mappings can
Starting point is 00:32:27 be different every time. So instead of pointers in our library, we use object IDs. So if you want one node to point to another node, its little next pointer isn't a normal C pointer. It's an object ID. And object IDs are how we look up everything inside the pool. Now in reality, we still wanted it to be fast,
Starting point is 00:32:46 so these object IDs are implemented as relative pointers, relative to the beginning of the pool. So they're still very fast. One thing an Intel chip can do very fast is add an offset to a pointer. So we just add the beginning of the pool to all the pointers when they get dereferenced. It's actually quite fast. Okay, so instead of like malloc,
Starting point is 00:33:08 malloc returns the thing that you got allocated. We just return whether it got an error or not, and you have to pass a pointer to where the thing that got allocated goes, just to make the interface a little cleaner. Here's the size of what you're allocating. You're also allowed to supply a type number. This can be an arbitrary number that you supply. So if you want to say everything that I'm allocating, you're also allowed to supply a type number. This can be an arbitrary number that you supply. So if you want to say everything that I'm allocating in this function is going
Starting point is 00:33:30 to be type 37, then they get kind of magically tagged with type number 37. And then I'm going to talk in a few minutes about we give you a little iterator that says now I need to iterate through every object that's type 37. Okay? And then this is one of our coolest features of our allocator. You're allowed to pass in a constructor function. Okay, so imagine this. I'm going to allocate some data structure, and I'm going to fill it in.
Starting point is 00:34:04 And if anything goes wrong while I'm filling it in, I need to print an error and then free it. So that's got to be transactional, right? Because if I die in the middle of that, I'm halfway through it. Well, without the complexity of transactions, we can do all this stuff by allowing you a constructor. So what happens with a constructor is the PMObs alloc allocates the size that you asked for
Starting point is 00:34:28 and hands it to the constructor function. And if anything goes wrong in the constructor function or the program crashes, or the constructor function returns an error, then it's as if you never allocated it. So it's atomic. It's all kind of this one atomic allocator, but you get to supply part of the code in the atomic path.
Starting point is 00:34:49 I put this on a slide. So here's just a, this is like a very quickly thrown together example just to show you how to use the allocator. Here's my little struct of things that I want to live in persistent memory. And this is really, you know, this is how we do persistent memory programming.
Starting point is 00:35:09 We use native data structures, and they live right there in the memory, and you access them in place, right? It's not like storage, where you might take a data structure like this and serialize it when you put it out on storage and then deserialize it when you bring it back in. We don't do any of that.
Starting point is 00:35:22 This is in place, right? So if the compiler sticks some sort of funky alignment stuff in there, that's part of the data structure. And if you bring it to another machine that has a different ABI, different alignment restrictions and things like that, we will refuse to open the pool. So we at least catch it. So we store in the pool header enough information about the alignment of every data structure, of every base type,
Starting point is 00:35:48 so that if you go to another machine that doesn't have the same alignment restrictions, we'll at least catch it and say, sorry, this pool, your data structures aren't going to line up where you think they are here. So we at least catch that for you. I just wanted to make a point out of that. So here I am declaring one of these PMEM OIDs. I call it stuff OID because my struct is a that. So here I am declaring one of these PMEMOIDs. I call it stuffoid because my struct is a stuff. And here I am doing the allocation. I'm going to, you know, the allocation is going to show up right here if there's no
Starting point is 00:36:15 error. It's going to put it in that variable. And so what I get back is this thing called stuffoid. And I can't just dereference it, right? PMEMOID isn't something you can put a star in front of it. I can't dereference it. Before I can dereference it, right? PMEMOID isn't something you can put a star in front of it. I can't dereference it. Before I can dereference it, I have to get a direct pointer. This is a runtime pointer by calling PMEMOBS direct.
Starting point is 00:36:33 And now that is a pointer to the struct in place. So this is the ugly part. There are kind of two ways of referring to every data structure in persistent memory. One is by its object ID, which is in this variable stuffoid, and then one is by a direct runtime pointer, which you get back from the direct call here. So it's simple, type-based, size-based allocation, but it's typeless. In other words, I've just taken all the types in C and turned everything that lives in persistent memory
Starting point is 00:37:08 into a void star. So not so friendly, right? All that type checking just went away. All that type checking that we've all depended on for all those years. Yuck. But I wanted to give you a simple example. This is the simplest example.
Starting point is 00:37:22 It's typeless allocation. And here I am, I can make changes to it right here and I can ask the library to make sure those changes are persistent. And this bottom thing here, pmemob persist, by the way, this is a library function that will automatically figure out, is it real persistent memory? Am I allowed to just flush caches to make this persistent? Or are you just playing around with memory map files, in which case I should call msync? This library figures that out for you, right? So you don't have to worry about that kind of stuff. But the type thing is kind of ugly.
Starting point is 00:37:56 The fact that I had to cast this, the fact that everything is just really a void star, that's a problem. So what do we do about that? Well, let me go back to my snippets here. What we decided was that having an object ID was useful, but what we really want is a typed object ID. So we call it a TOID. And a TOID here is something that is an object ID, but it is associated with a certain type.
Starting point is 00:38:21 So I've taken these examples from our PM invaders, persistent memory invaders. It's a space invaders game that's in our examples directory and it's actually very cool. All of the asteroids and all of the ships and every bullet and everything is a persistent object. So you can take this space invaders game and you can play it. It runs in an ASCII terminal. It's all ASCII art. You can play this thing and then you can like kill it with a signal or control C or whatever. And then when you start it up again, it's exactly in the same state as it was before. We use it as a test. It actually found a whole bunch of bugs when we were first designing the library, right? So we thought it was cool to put it out there as one of our things. And in fact, our Python work right now, the first thing
Starting point is 00:39:00 the Python guy did was make a Python version of PM invaders. So do with that what you will. I think the summary there is that we're a bunch of nerds. So this layout, series of layout macros here, makes a bunch of, defines a bunch of type numbers. Remember I said that this PMM object, Alec, can take a type number? So, this just defines a bunch of type numbers which we do again with a bunch of macro magic and actually some, you know, compiler extensions to the language like GCC extension and so on. But the point is, then you end up with things like this. Instead of using that direct call that I pointed out before, we do it with these little tiny macros, D underscore RW for read reference,
Starting point is 00:39:51 D underscore RO for read-only reference. And it takes the variable that you pass it, figures out what type it is, and allows you to dereference a field of that type. So now you have the type checking back,. But you have to, when you convert from an object ID to a local runtime pointer, you have to use these macros and that will give you the type checking. Yeah. Did any of this work on a platform other than Intel? Oh, good question. Will this work on any platform other than Intel? And, you know, it is my intention that
Starting point is 00:40:26 the answer be yes, but nobody has submitted any patches yet to our library for any other platform yet. So, honestly, the GCC stuff will probably work just fine. All the GCC tricks we're using will probably work fine. My guess is that the bulk of the porting to any other platform would be in libpm where it just knows how to do things like flush CPU caches and stuff like that in non-temporal stores. So my Intel brethren should shut their ears for a minute, but I will consider NVML a success
Starting point is 00:40:59 when I see the first pull request for something like ARM. And I would welcome it. Because we've been very careful not to do anything that we think prevents these other architectures from being implemented. I was wondering, you know, like the basic examples with the map and, you know, just basic, no, just, no jokes, but the basic data patterns are all off. Yeah. That should all just work. I mean, the Linux community
Starting point is 00:41:27 imposes some amount of sanity when you put something in. You know, so all the stuff that went into the kernel, you know, usually has to at least build on all the other architectures. I don't know how much of it
Starting point is 00:41:39 could be tested just yet because I don't, like, I don't know if there are NVDIMMs. We used NVDIMMs to test a lot of the Linux stuff in the kernel, but I don't know if there are NVDIMMs. We used NVDIMMs to test a lot of the Linux stuff in the kernel, but I don't know if there are NVDIMMs on these other architectures available yet. I was literally asking about the. OK.
Starting point is 00:41:54 The smarter. Once you get into the NV pad, you're talking about very machine-specific stuff. Yeah, yeah. I was wondering if the generic stuff would be more. Yeah, as far as we know it is. But I think we're missing the existence proof.
Starting point is 00:42:08 So, but I welcome it. I really do. Okay, so now with the type stuff, this pmamobj alloc in lowercase becomes pmobj new in uppercase and you can see it still takes a name of a constructor function and the thing that you're allocating but now it does all this type checking stuff for you. And like I said before, there's also a way of looping through everything with the same type. So in that picture that I was showing you where these things aren't really connected to anything, you could use this to just do a bunch of random allocations and then to find your stuff again, just say, okay, now give me everything with this type, right? So why would you do that? Let's say
Starting point is 00:42:52 that you wanted to make a hash table that lived in DRAM for performance but all of the values were in persistent memory. That's perfectly doable with this library. Every time you allocate a value, just allocate. Don't even bother keeping a pointer to it. We'll do that for you. We have a list. We have a heap manager. We know where everything's allocated.
Starting point is 00:43:12 And then every time you start up, you just say, oh, time to build my hash table again. You loop through everything and you do it. And it's okay if you're willing to pay that startup time. But instead, what you probably want is to be able to find, and I'm just going to kind of skip ahead here, find the root object. The root object is this idea of a special well-known object that we can find. And I'm going to do a full example of this, only I'm going to do it really fast because
Starting point is 00:43:39 we're already starting to run out of time. In this example, which by the way is on this, there's a URL on this next slide here that shows you, I wrote this example just for today. So I put it on GitHub. But in this example, I defined a layout name. And this is a name that says, what's the layout of this pool?
Starting point is 00:44:03 It's a user-supplied string, and we'll check it for you. So even if you have like six programs that are using libpmemobj, if they're using it for different things, for different data structures, you want this kind of check at open time to say, is this the hello example layout? And we'll tell you if it's not, if you're opening up the wrong pool. And then here's the data structure that I'm going to put in here. It's just a length and a buffer with some sort of text in it.
Starting point is 00:44:27 And I'm calling this the root data structure, myRoot. So I put that in layout.h. And then here we are. I've created a program that's gonna write. This is my hello world example, right? So I've created a program that's gonna write to the root object, one that's going to read the root object. There's the layout.h I just told you about.
Starting point is 00:44:47 I'm showing you what make does here just because I wanted you to understand how you build with the different libraries here. But I'm not going to spend any time on it now because you can always go look at this slide. So here I am in the hello write thing. This program takes a file name as an argument and it creates a pool. So this PMOB create here is how you create a pool. It's creating a file of a certain size. We have a minimum size that we support so there's a macro for that. And after creating the pool,
Starting point is 00:45:20 it gets you back one of those pop pointers I mentioned before. And then here I am saying, okay, give me the root object. And the root object, you tell it what the size of the root object is and if it's already there, you get it back. But if it's not already there, you get it back, for the first time you get it back to initialized, it's all zeros. So your code can pretend like the root object is there all the time. But it's up to you to maintain what's in it. And then here I am using one of our PMEMOB memcopy variants. This does a memcopy of this string into the buffer in the root object and makes it persistent. So this is an important function just to kind of mention. It's actually doing three things.
Starting point is 00:46:07 One is it knows how to memcopy in the optimal way. So for persistent memory, everything that you copy goes into the CPU caches, and if you want it to be persistent, you have to flush it out of the CPU caches. So if you're copying a huge range, it's better above a certain threshold to use these non-temporal instructions that go around the cache, and mem copy function does that automatically?
Starting point is 00:46:27 But remember these are these are this is an Atomic Operation here these these these things are atomic and so if there's a reason why it has to make a copy somewhere else Then it will Anyway here. I am running the program. I write to the file. I read to the file. I get back the string that I wrote.
Starting point is 00:46:48 The full example is on here, but I'm going to keep skipping ahead because we're down to the last five minutes. Yes, sir? Oh, thanks. That's true. I want to make sure we have enough time for all of you to get up and mill out. Okay. So I just wanted to show as the last part of this example that we do have this PMEM pool command.
Starting point is 00:47:17 And so as part of my example, I created this thing called my file. And if I just type PMEM pool info my file, I get to see that it's a pmemobj file. I get to see all sorts of stuff you probably don't care about unless you're debugging the library. But it's, it dumps all sorts of interesting things. In fact, here's all this stuff about what kind of layout the machine architecture is and stuff like that. And the header itself has all sorts of checksums in it that this command tells you if they're okay or not. So, PMEM pool can be a little bit of a debugging tool.
Starting point is 00:47:49 What I didn't have time to go into today, but I want you to make sure you know it's there is that we also have modifications to Valgrind. So you can run all this stuff under Valgrind with the PMEM checking on and it'll tell you if you've done anything weird like you created this data structure in persistent memory and you made changes to it but you didn't ever flush those changes out or you know or you flush them twice and all sorts of things like that. Yeah. What's the total overhead for one of these pools? As far as memory usage, I'm hesitating because if you do lots of big allocations, it's not that bad.
Starting point is 00:48:35 You'll get a few percentage points of it. If you tend to get things pretty fragmented, it's like any memory allocator. You can get a lot of overhead just out of the fragmentation. So how much data do you store for? For every call, we store an extra cache line, so an extra 64 bytes right now. So say only 64 bytes? Currently, yeah. We're actually trying to even improve on that. Yeah. So because it turned out we thought we needed like eight pointers
Starting point is 00:49:06 with each allocation, but we've only turned out to use like three or four of them. So the rest of them, we're saving them in case we need them. But it could be we could actually even improve on that a little bit. 64 per allocation. Per allocation.
Starting point is 00:49:18 Per allocation. Yeah. All right. Let's go into the last example here, which is the transactional example. And again, I'm just going to do small snippets and point you to the source. The source for this example is in our examples directory here. You can see it on GitHub. But if you just navigate into examples, you'll see there are examples for every library. And this example is under the libpmemobj examples and and in particular it's the hash map example.
Starting point is 00:49:46 And what I'm going to show you here is that now I can have these structs like we talked about, and I can manipulate them in between these begin and end macros. So this begins a transaction, and this ends a transaction. So this is a bunch of C macro magic. And what is it really doing? It's actually doing set jump and log jump, right? But the point is, is that the stuff between these curly braces here either happens completely or not at all in the face of a crash
Starting point is 00:50:17 or a machine crash or a program crash, okay? So you can literally begin a transaction, change, you know, 500 megabytes worth of stuff and end the transaction. And if you crash before ending that transaction, those 500 megabytes are restored to their original value. Okay. One last point is that there's a version of these begin and end macros that take locks. And that's because most of these programs today are multi-threaded programs. And we found that the places where you make your program MT safe almost always line up with the places where you make transactions for power fail safety. And so really, these macros give you two types of atomicity.
Starting point is 00:51:01 The macro itself gives you power fail atomicity, but the fact that you can pass a lock to it and have it grab the lock for you and automatically release the lock for you gives you multi-thread atomicity. And so we found that actually this TxBegin lock is probably the most common way that we do transactions in all of our examples. Yeah? Do you have the multi-process atomicity? Oh, the multi-process question. So currently the libraries don't support multi-process access, just multi-threads, right?
Starting point is 00:51:28 And honestly, we're looking for requirements of whether the multi-process thing is actually important to people. Because I'd like to hear more about that afterwards. Because most of the applications we're dealing with are just multi-threaded. There's only one of them, one major application we've dealt with that's multiprocess, and they're doing their own library, so they didn't need us. But maybe you know of another. So let's talk.
Starting point is 00:51:56 It's not impossible. It's just that we haven't been pushed into it by a requirement yet. Okay, so I gave you this kind of quick smattering of three ways the library might be used. The slides have a way of pointing you to the code examples. It is C is there today. C++ is actually out there on GitHub today.
Starting point is 00:52:18 It's working. And it's a lot cleaner than those C macros, because we could use a lot of the C++ tricks on type safety. There are lots of examples, and we also have full man pages and blog entries out there that walk you through other tutorials. There are new libraries coming. In fact, I'm going to talk about one on Tuesday if you're interested in it. There are new language supports coming. The Python stuff is out there already. The Java stuff is underway, and we're open sourcing it sometime this month or next month. It hasn't gone through the open source process yet.
Starting point is 00:52:49 Big company. But there is a Google group called PMEM where we talk about all this stuff. And we welcome your participation. And with that, I'm three minutes over. So I think I'll end. And you guys can find me and ask more questions outside if you like. Thank you. Thanks for listening. If you have questions about the material presented in this
Starting point is 00:53:13 podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at snea.org. Here you can ask questions and discuss this topic further with your peers in the developer community. For additional information about the Storage Developer Conference, visit storagedeveloper.org.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.