Storage Developer Conference - #46: Building on The NVM Programming Model – A Windows Implementation

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcast. You are listening to SDC Podcast Episode 46. Today we hear from Paul Luce, Principal Engineer, Intel, and Chandra Kumar Konopi, Senior Software Engineer with Microsoft, as they present Building on the NVM Programming Model, a Windows Implementation, from the 2016 Storage Developer Conference.

Starting point is 00:00:56 All right, so we're going to talk about NVML, which a lot of you folks probably already know about, but believe it or not, we're moving it to Windows now, so we're going to have a Windows version of NBML. My name is Paul Luce. I'm a software engineer at Intel. I've been there over 20 years, most of it doing storage software stuff. I recognize a few folks here in the room.

Starting point is 00:01:17 I did a lot of the early NBIM Express work. If you're familiar with the Open Fabrics Alliance Windows driver, I was the chair of that first working group, and there were a team of four the chair of that first working group and there were a team of four or five of us that wrote that and put that on initially. So it's good to see that's still plugging along and cruising. And I took a little time off and did some OpenStack Swift work.

Starting point is 00:01:35 And now I'm back on this Windows NVML implementation with my partner in crime over here, Chandra. Hi, I'm Chandra, working in NTFS, and I've gone enabling DAX file system mode in NTFS, and Neil, if you're familiar with Neil. And I also work with Paul for moving NVML to Windows. And I think we've got to manually click. Sure.

Starting point is 00:02:03 All right, so let's talk about what we're going to talk about. This is more sort of a high-level talk. We wanted to give you guys an overview on really what we're doing in Windows land. NBML as a library, as a lot of you probably know, has been around for a couple of years following on the programming model. So for an agenda today, we, of course, have to do some level setting on the programming model. Hopefully some of you were here earlier for Doug's talk or many other talks or Andy's talk yesterday

Starting point is 00:02:30 or sticking around for Andy's talk afterwards. And there's lots of deep dive stuff in the programming model. We're just going to kind of remind you what it is and where it's relevant when we talk about NVML, the library. And then, of course, we need to talk about NVML, the library itself. This is not a deep dive talk in the library. And then of course we need to talk about NVML library itself. This is not a deep dive talk in the library. There are other talks that you can hear about the library or you can talk to any of us afterwards. Like I said earlier, we're really here to focus on the fact that we're doing this for windows and

Starting point is 00:02:57 give you some idea of what that means, who's involved, what kind of process we're following, some of the architectural decisions that we had to make early on, how are we going to approach this, and of course the status on where we're at, and when we're going to be done, when you can actually use this library in Windows. So that's what we hope to get out of today, and if we do our jobs correctly, you guys will all walk away being experts on everything on this outline. There will be a quiz.

Starting point is 00:03:31 Okay, so it doesn't say SNEA NBM programming model SW, but it does say open, so that's close. It doesn't say the Intel programming model. So the SNEA NBM programming model, probably no news here to anybody in the room, but started back, I don't know what, four ago four or five years ago right lots of companies put forth the the specification links down below and as Doug talked about in the last in the last talk it's not an API right it's a program model specification keep plugging forward and very similar slide to at least 16 other presentations floating model specification. So let's keep plugging forward.

Starting point is 00:04:05 And very similar slide to at least 16 other presentations floating around out there. Couple of small tweaks that we made for the Windows side about what we're going to go through. We'll just sort of highlight the main three paths that show up in this version of the slide. Of course, your traditional application through the file system down through your NVDIMM driver. We've got your second path through with the application through a PM-aware file system.

Starting point is 00:04:30 And in Windows, it's also called DAX. Windows uses the term storage class memory to refer to persistent memory. So you may see SCM floating around in our slides. But DAX is the implementation and involves NTFS today and the cache manager and the memory manager. Chandra did a lot of that work himself personally, so if you have any really hard questions for him, that'd be awesome. And then really the focus of this talk is the third path, the application direct path is what we call it. And that's of course where once you've gone through the control mechanisms

Starting point is 00:05:04 of the file system, you've done your file mapping, you've got everything set up, and you've got load store access to memory. Now you've got direct access to your hardware. So QNVML. Well, first let's talk about the status of OS enablement of the programming model itself. So I'm not sure how many people know,

Starting point is 00:05:24 but Windows has persistent memory support right now. It's available in all Windows SKUs, starting with the anniversary update and Server 2006, specifically within TFS with DAX mode. But you can do everything on Windows today that you can do on Linux. And then on Linux support, we heard it was somewhere around 4.4, but we don't know where Windows guys, so could have been earlier, could have been later.

Starting point is 00:05:51 Somebody here probably knows better. And then both of these refer to the JETIC-defined NVDIMM devices. Okay, also something that maybe a lot of folks don't know, and I'll talk more about our Windows community a little bit later, it's not just an Intel Microsoft thing, we've got a couple other companies involved and we're beating the bushes trying to get more, but HP Enterprise is doing some stuff above and beyond what Microsoft has done for NBM support, specifically

Starting point is 00:06:22 around Workstation 2012 R2. So they've got the ability to provide early access to NBM technologies in this older OS without any kernel changes from Microsoft. So these are kernel mode drivers that HP Enterprise has provided and able to provide access for the Workstation folks. So one real specific thing, there's no DAX file system support. Like I said, Microsoft is not behind these changes. But this does use a subset of NVML based on the work that we're doing on the Windows side.

Starting point is 00:06:55 And we've got a lot more information that you can get out of this link here. And these slides are posted. So if you want to learn how to do some early access NVM stuff with releases prior to 2016, then you can follow that link and get them. Okay so now NVML, and I think this is probably maybe the third time this slide has been up today or at least something very similar. So hopefully everybody knows the NVML library is something that was created not to implement the persistent programming model, the SNEA persistent programming model, but to assist. So it's there to help application developers better take advantage of this programming model.

Starting point is 00:07:38 So it really came out of input that we had heard on common pain points out in the industry from various customers that we were talking to. And we're going to go through some of those details, exactly what it does. But it involves, of course, hardware abstraction, making things easier. There's definitely some value-added functionality for application developers that aren't used to dealing with transactions or required atomicity, things that used to be dealt with by the file system or somebody beneath them that they didn't care about, and then some performance improvements as well or performance enhancements over maybe standard ways of doing this.

Starting point is 00:08:19 Okay, so I'm going to turn it over to Chandra now, and he's going to go through several of the NVML libraries and show you a couple of examples of what they look like under Windows. Thank you Paul. Okay NVM libraries. So it is just a set of convenience libraries that work on top of open NVM programming model to solve the common problems for application developers. So that is the way to look at it, one way or the other. So it is in user space, for now at least.

Starting point is 00:08:50 And it is not a single library, it's a bunch of libraries, as I initially said. There is one core library which does the trivial functionalities like pmm persist, to make sure that the data that we store is persisted, and testing whether the region is backed by persistent memory, directly backed by persistent memory, such functionalities. And there are other libraries as you can see that, libpmemobj, blog, log, they depend on the core library and provide more advanced support.

Starting point is 00:09:19 Each one has a key focus like pmem log will provide logging APIs, et cetera. And uh, again, this, this adds value on top of that file system as we can see, and it's a user mode library. Okay, let's now look at each library and what's the key focus point. So PMM is again the core library that is going to provide functionalities like flushing or processing the data that is written, testing whether the region is DAX mapped or not. It's a core and every other library depends on that one. PMEM block. This provides block atomicity. If applications are used to or

Starting point is 00:10:00 relay on sector atomicity, maybe this is something that might help those applications. Because it doesn't provide generic transactional support, just an atomicity, whether a block is written or not, this is more efficient than other transactional libraries. And PMM log provides APIs for logging on a persistent memory, like you create a file, log file and keep appending to it on a persistent memory. That's PMM log for applications that need that. And PMEM obj. This one library might be the one most of you need or most of the app developers need

Starting point is 00:10:34 because this provides generic transactional support. We will see the challenges later and then see more into what I mean by generic transactional support. Maybe you already heard from other talks like Paul said. But this is the one that most of you might need. And we have VMEM and VMLock. This essentially provides virtual memory from a persistent memory. We think it is a little, I don't know, we see less interest here. So if you believe that you might be needing this functionality, please give us feedback.

Starting point is 00:11:08 So these are the group of libraries and their overview. And you know, some of you experts out there might be saying, but wait, isn't there more? I was out on GitHub and I saw more than just this. And yeah, there is. So the point is the scope of what we're doing in the Windows side is limited and we're behind where the Linux side is at because of course we're two years behind in development.

Starting point is 00:11:29 But I'll talk more about that in a minute. But if you look out on GitHub and see a remote persistent memory library, it's not a mistake that it's not here. It is there for Linux, it's just not there for Windows. Okay, so with DAX file system, the efficient way or efficient path is to memory map a file and then do load or store directly on the PMM device. So now let's look at how the code looks like for this model. And I think this is the open programming model if you already heard that.

Starting point is 00:12:01 So we create a file or open a file, we create a file mapping file mapping object, map a view of a file, then we load or store on the view directly. Whenever we access the view, we are accessing the persistent device, so that's why it is efficient. No OS software is in the way. And when you want to persist the data that we previously wrote, we use either flash view of file or flash file buffers depending on we need the metadata of the file to be persisted or not.

Starting point is 00:12:29 So flush file buffers will persist all dirty data associated with the file and the metadata so it is less efficient than flush view of file if all an application needs is to persist a given region. So generally applications might prefer to use flush file buffers less often than flush view of file. You can model your application accordingly. So this is the overview. And this doesn't need a DAX file system.

Starting point is 00:12:54 This model that we see here, this piece of code, it works on any file system, even non-DAX file system as well. It just works. But this is the efficient path in case of DAX file system. That is where we'm looking at it. So Chandra, what's the granularity of the flush operations here? Can you come back and pull? Are we flushing on cache line?

Starting point is 00:13:15 Are we flushing on pages? Okay, so this flush view of file, it's a good question. So the intention of flush view of file is to make sure that the data that is there in the virtual memory goes to the disk, all the way to the disk, through the system that always provides. It works on the page granularity. We have to flesh the page, write the page out to the file which is on disk. That is the intended operation of the flesh your file.

Starting point is 00:13:39 Flesh file buffers, it fleshes all the pages corresponding to the file. Also it makes sure that the logs that are written, which is still in volatile memory, gets persisted. So that if you extend a file and then write to it, when you call flush file buffers, it makes sure that the file size is also persisted and the data. If the file size is not persisted, even if we persist the data, it is immaterial.

Starting point is 00:14:01 We are not able to read it back. So does it answer the question, Paul? I have another question, though. Sure. It's easy to say. So you're saying we need to flush a page for doing 18 bytes, but you also say it's not atomic. What does that mean? Thank you. Yeah. Here, we are writing more than 8 bytes, and it's not really atomic. By that, what I mean is, if the machine crashes after the string copy, part of the string could be persisted and part is not, based on what cache lines

Starting point is 00:14:35 are flushed. We are looking at only this thread, maybe there is some other thread operating on the same file and it is issuing a a flush after the string copy and though our flush is not executed, part of the string could make it to the disk while others are still sitting in the cache lines which doesn't reach the memory and thereby the disk. So that's what I meant by atomic operation. Is it... That's a bummer. I wonder what we're going to do about that.

Starting point is 00:15:05 Does the problem make sense? Okay. So I conveniently assume that the problem makes sense. And let's move on. So first let's see how libpmum is going to fit here. As you can see, most of the code remains the same. The create file, create file mapping, map view of file, all the same, except for how we persist the data. Previously we were using flush view of file, here we are using PMM persist.

Starting point is 00:15:33 And why PMM persist? That's a question. So in Dax, we know that the region is Dax mapped, that is, we are directly writing to the persistent memory, and all we need to do is flush the data that is there in the processor cache and that will take care of persisting it. And PMM persist just does that. Sorry, yeah, it just do that. Flushes the processor cache to make sure the data reaches the DIMM which is enough to make sure it persists. Unlike the flush view of file, it has to issue an I.O. for the page, which is there in the oratel memory,

Starting point is 00:16:08 through the I.O. path to reach the disk, which is two different operations all together. It's just that both provide the same guarantee in case of DAX, but this is much efficient. So that's why we would suggest or prefer to use PMEM persists. And again, as we can still see,

Starting point is 00:16:24 the string copy is more than 8 bytes and even in this case it is not atomic meaning that if the machine crashed after the copy but before we persist part of the string could have been flushed from the cache to the DIM and part is not. So we'll still have the open problem. And the picture here, yeah, you have a question? So in the previous case, there is no, if the file system is not PM aware... It may or may not be. Okay. So if the file system is PM aware, will there still be a mapping into memory? Okay, the question is, in this case, if the file system is PMM aware, will there be a mapping into memory? I think the question assumes if there is mapping into a volatile memory and copying data.

Starting point is 00:17:10 No, there is no copy of the data. The mapping is created such that the virtual region directly points to the physical persistent media. So we don't have another copy.. The question is can't flush view of file be enlightened to detect the reason that you are flushing is backed by a position memory, it's a DAX file system, and then work optimally. So to do that, we need a check. So given a region, we need to check whether it is DAX backed, DAX mapped, or regular memory

Starting point is 00:17:56 mapped. And that is a little costly as well. We need to make a call into kernel mode to detect whether this region is DAX mapped or not. It needs a kernel switch from user mode to, we need to switch to kernel mode and then get that info back. Yeah, those are the really the two key differences, right? Is that the context switch and if, and if you remember the previous slide, we talked about being page granular, we're flushing a page. This is a,

Starting point is 00:18:26 the next slide. So besides not having a kernel mode transition here, this is cache line granularity. So those are the two big differences between those two cases. We've actually, Chandra will show you a little performance demo in a minute about what the impact is on doing it one of those two ways. Okay, so thank you Paul. That's a good point, those two things. OK. So thank you. That's a good point.

Starting point is 00:18:45 There's two things. These are the reasons why we prefer PMEM persist. Primarily efficient. And one way to think about it is they both are for different purpose altogether. This is just to flush the browser caches. And this other thing has to make sure that it has to issue an IO for the page to be returned to disk.

Starting point is 00:19:03 It's just worked for DAX mode as well. That's it. And now this picture is again reminding us that libpmum is a value add and top of the DAX file system support that is provided with OS. Just as a reminder. But still not atomic? Yeah, still not atomic. Yeah.

Starting point is 00:19:21 So again, the OS, as we discussed, the OS crashes in between. Based on what cache lines are flushed, part of the string can make it to the DIMM and other part might not. That's the state. Now here we see that libpmm-obj, where it fits and what a little overview. This is the library that most of you might again need. That's why we are dedicating a slide to this library.

Starting point is 00:19:48 And it depends on the PMM and the DAX file system support, that is the layering, and it provides generic transactional support. You say begin a transaction, allocate memory, and then end the transaction. You might ask why we need a transaction for allocating memory. Or you might already have guessed why we need that. Just to reiterate, when we allocate a memory from persistent memory or persistent heap,

Starting point is 00:20:16 once the memory is allocated, heap loses that region that we allocated. And we have to assign the address to a persistent variable or a variable that will be persistent. And later we can use it in the same instance of the application, or if the mission reboots, even after that, we can reach the region from the variable which has the address.

Starting point is 00:20:35 If the mission were to crash, after we allocate and before we assign it to a variable, we essentially lose the region. It is neither in the heap, nor we have a reference to it to reach it. So this is the problem. It is neither in the heap nor we have a reference to it to reach it. So this is the problem. This is one of the challenges of programming with position memory.

Starting point is 00:20:51 So we need a transaction such that once we allocate and before we assign it to a variable, if the machine were to crash, it has to be rolled back and given back to the heap. So that's why we need the transaction support. And let's see more about this lippymemorg. Okay, so here we see the, we are back to our string example,

Starting point is 00:21:13 which Paul was rightly questioning. Why is it still not atomic? Still not atomic. It's a good question, and we have an answer here. So this is how it looks like using lippymemorg. We use tx begin and tx end to indicate the beginning and end of the transaction and within that we use TX mem copy to copy a region of memory into the buffer. And this is atomic, meaning if the machine has to crash, even when you're halfway through

Starting point is 00:21:41 the copy, it won't be part of the string, it will be either all or nothing. Leap MMOps takes care of it for us, like writing a log record and then rolling back in case of a crash, all is taken care. So yeah, and we might not need to use TX mem copy, we can use regular string copy as well, this is not a requirement, or we can use any load or store instructions, but before modifying a region, say if you are modifying 10 bytes of a buffer that we allocated, we have to add the region into the transaction so that libmememob can roll it back. We say that this region is going

Starting point is 00:22:18 to be modified and then we modify the region. So in case of a crash, we can roll back to the original state. The picture below is a peek into how objects are allocated and used using lippmmorgs. It helps the next slide we are going to see. Whenever we allocate an object, we get an OID, an object identifier, which is essentially a pair of the pool that we allocated or you can assume it's a root object or something like that, a pool that we allocated and the relative offset of this object. Just again a reminder that this being persistent memory which can be, that is a file is DAX mapped, it can be unmapped and the application can be closed. Later again when it is mapped, it can be mapped to a different virtual address.

Starting point is 00:23:08 So we can't uniquely identify an object with a pointer. That wouldn't be sufficient. So we always need a relative offset from the beginning of something which is unique. So you can simply assume that it's a pair of the pool object come with a relative offset. That's the object ID. And that is how allocations are made using lip. And that's what this picture is trying to tell us. Okay. Now again, the string copy is atomic is already solved one problem and the code doesn't

Starting point is 00:23:35 look much complex other than macros, which is unknown. Maybe. Yeah. In the previous slide when you were giving the character name with name names, there are still 18 characters. Oh, sorry, one moment. Just click on the little red thing. Oh, okay. Sorry.

Starting point is 00:24:02 You're going to pass 18. Yeah. So whenever you're going to flush, you're going to flush the size of the page. That is with flush view of files. That is true. So what is it, PMEMPERSIST? So PMEMPERSIST is cache line aligned, 8 bytes. So, yeah.

Starting point is 00:24:18 It might help just to say a few more words about this because we don't have any actual real application code in here, but this is meant to be the application, if that's not obvious. It's just all we're showing are macros that are implemented by the library, right? So the library gives you this macro and this macro and this macro and you can put any number of operations inside of those two macros and have them be treated as a transaction. And in this case, TxMemCopy will eventually, internally the library will result to a pmem-persist that we saw a minute ago. Sure.

Starting point is 00:24:51 Could you come again? I don't know. Andy, you want to help us out? Yeah. So just by its name, hardware transactional memory is about feasibility, not persistence. So hardware transactional memory doesn't cover different systems. So it doesn't solve the problem. The problem is about making something atomic in the case of power failure. Sure. Well, this is atomicity both thread and persistence.

Starting point is 00:25:45 It's sending. So I'm not sure if you still have the slide with the two red arrows pointing at the macro. Is that coming now? Yes, Andy. So on that macro, there's two types of atomicity being provided. The begin macro itself provides power fail atomicity, and the lofted case provides a runtime, you know, multi-thread. And hardware has actually been really quite fun. Cool. Thanks, Andy. and hardware transaction. Cool. Thanks Andy. You know where to find him.

Starting point is 00:26:21 Okay. So in the last slide, slide up, we, we discussed that, um, we need not use the macros. We can use regular string copy or as a months as well. This slide will shed more light on that one. So this is just a singly linked list in session. The single list linked list is on persistent memory. That's the case we are going to see. So, okay, so this line, I will, okay, let us, let me first walk through. We again have transaction

Starting point is 00:26:46 begin, TX begin, and TX end, just similar, and here we are allocating a region out of persistent memory, and we are adding data to the node that we allocated, and we're fixing the next pointer to be the next of the root points, and finally we are changing the root. This is when we are touching the existing variable. Roots head to the new pointer that we just allocated, new node that we just allocated. And these are all the new node. Here all the three lines, we are updating the new node. So if the transaction were to abort, there is nothing to worry about. If the new node went back to the heap, we are all good. So we made allocation, we are updating the new node, the new node went back to the heap, we are all good. So we made allocation,

Starting point is 00:27:25 we are updating the new node, and if it goes back to the heap, we are all good. Whereas here, we are updating the root object, which is an existing object. So before updating it, we need to make sure that we say,

Starting point is 00:27:35 hey, this region, the root node, is being updated, added to the transaction. That's what we are doing here. And then we change the root's head, so that in case of a crash after that, before the transaction could commit, we know how to roll back.

Starting point is 00:27:50 And this TXZ lock, this takes care of allocating memory from the heap, and in case of an abort, it will give it back to the heap. It will roll back what it did. So this is kind of safe, and it doesn't have the problem that we previously thought that leak. Male Speaker 1 You know, one other thing that's probably

Starting point is 00:28:09 worth pointing out here, Chandra, if you weren't going to go through it, is these macros here? Chandra Nathanael- Sure. Male Speaker 1 Do you want me to cover those or? Chandra Nathanael- Yes, Paul. I mean, come on. Male Speaker 1 So I'm not sure if any of you saw in Doug's talk or if you attended Andy's thing yesterday, one of the problems with persistent memory programming is finding your stuff after you've gone through a reset or a reboot, right? Where do you find things? And the notion of relative pointers versus

Starting point is 00:28:35 absolute pointers and even closing your application and reopening it, you could get a different base for whatever you've mapped. So what this is showing is how the NVML solves that problem with these object identifiers. So you as an application developer reference all of your objects through object identifiers and macros like this one, which I think would stand for dereference read write. So you basically pass this macro, one of those object identifiers, and it essentially converts it to a pointer. So then you can use standard notation to then get to an element within that structure. So that's what those funny-looking macros are doing in there,

Starting point is 00:29:12 is using object identifiers to convert to more C-style pointers. Any questions on any of that? I was expecting that someone will question, why don't you speak about lock? Do you want to speak about it as well? Well, actually, Andy sort of covered that in answering the question about hardware transactions, but maybe some people might not have heard without the mic, right, that the TxBegin, this is actually two things in one, TxBegin lock. There's a TxBegin macro that is just there for power fail atomicity, but we've got this macro

Starting point is 00:29:47 that also adds a lock in where you pass in the lock to handle multi-thread atomicity. So really that lock is protecting you in two ways, or that macro is protecting you in two ways. Yeah, I mean, it's a convenience variant for TX begin and TX end, TX begin basically. The lock will be released when the transaction commits to robots.

Starting point is 00:30:05 It will be acquired in the beginning. Okay, so we briefly discussed that we will be seeing the performance comparison of PIMMAM Persist and FlashView of file. Here is the demo that, okay. Toby shared in IDF. Toby is our DAX and Nvml PM in Microsoft. And here we are writing the 64 bytes at random offset and after every 64 write

Starting point is 00:30:35 we show a flush and we're just measuring the perf of those operations once with flush view of file and other at other time with Pman persist and the latency is almost four times better. The detailed, uh, info of each operation that is right versus flush is in a different XML because this tool list everything, this is an internal tool and it lists the details in an XML but we get, we still get an overview that just by changing the operations,

Starting point is 00:31:07 your latency improved by like four times. And you can probably see the right show up where you can see these in back, but the IOPS as well, right? So there's flush view file at 1.3 million, and there's PM persist at 2.9 million. And that's because of the two things that we mentioned earlier, that no transition needed here. And primarily this is page granularity. And yeah, so there's no use of the two things that we mentioned earlier that no transition needed here and primarily this is page granularity and yeah, so there's no user to kernel switch. PmumPers is just more granular. And the other thing which is, again,

Starting point is 00:31:35 I would like to mention is they both are for different purpose. PmumPers is just flushes the cache lines and because that's what it's need in DAX world or DAX environment. Whereas flushView of file, it's totally different requirement. We have to make sure that the page that is there in volatile memory reaches disk, which is way different from PMEM persist. And yeah. And one more thing, though it is not related to this performance comparison, LibPMEM has variations for memcpy.

Starting point is 00:32:01 And it does the optimal, it uses optimal instructions for for based on the size of the reason that we are going to copy to. For example, it will know should I use regular copy and then flash or it will prefer non temporal store based on the region size and many other factors. And even the PMM persist depends on the processor. It can use the best instruction available in the process or supported by the processor. OK.

Starting point is 00:32:28 Overall, I think this is what a short bunch of slides, but this is what we wanted to convey. For DAX, the efficient path is to use memory map and then load and store instructions. And NVML will be very convenient to use that model of programming with DAX. We already saw one problem with persistent memory, and when you get into it, you might reach many other problems, and this will be very handy.

Starting point is 00:32:56 And we also want you to know that pmm.io is a place where you get more details about NVML. I think, yeah. And libpmmobj is one of the most required library for general application developers. And I'll hand it over to Paul to walk through how we have been doing this. Okay, yeah, this is kind of a fun part, too.

Starting point is 00:33:19 So we're going to step back from the technical stuff a little bit and talk about sort of how we're organized, what we're doing, what tools we're using, who we are. It's awesome how far we've come in just a few months. And a lot of us have worked in open source communities before. And we're definitely calling this a community. This is not a couple of companies collaborating. We're still very small. But this is very reminiscent for me of when we did the NVM Express driver for Windows. It was initially just three companies and I think, I don't know, five or six of us that had to work under NDA for a few months, but as soon as we got that restriction lifted,

Starting point is 00:33:53 we were wide open to build our community, to pick our tools, our communications methods, our ground rules. We were able to just establish everything starting with just that small handful of people. And it's still going strong today with a gazillion people involved, still using many of the same processes that we started years ago. So I kind of see us doing that right now. We started from scratch. It was just basically me and Chandra, and we've got many more people involved I'll mention here in a minute. And we've got some tools put together, and we've got some things that are growing and some communication stuff that we're doing. So I wanted to kind of share that with you of where tools put together, and we've got some things that are growing and some communication stuff that we're doing.

Starting point is 00:34:27 So we wanted to kind of share that with you of where we're at, and hopefully that we'll be as big as the OFA Windows group is in a couple years from now. So right now the community is HP Enterprise, HP Labs, Intel, and Microsoft. And we'd love to add more companies to that list. And we've still got a lot of work to do, so there's definitely no shortage. We are a time zone-friendly group, even though we're maybe 12 of us total right now. Almost everybody's in a different time zone. We've got Texas, Washington, Arizona, Poland, Brazil, Taiwan, California. We're just all over the place.

Starting point is 00:34:59 So a small group but widely dispersed, so we're able to take in new developers from pretty much any geo. Our goal right now is to be code complete by the end of this year and what that means really is getting all of the framework ported, all of the testing framework. So a big part of this project since it's open source is the ability to do automated unit testing, automated system and functional testing as much as possible so that every time anybody new comes in and touches the code, we know that it's fully regressed before it goes into the database.

Starting point is 00:35:32 So obviously all that has to get done on Windows before we can actually start making changes to the library other than the core stuff that makes it functional that people might be interested Windows specific. So that's what we meant by code complete. We want everything to be working. We won't necessarily have added any new value or anything that Windows people might specifically want.

Starting point is 00:35:54 Somebody raise their hand back there? No? Yeah. Go back one slide. Slide 16. So your first objective is maintaining identical APIs for all OSs. And this is... Oh.

Starting point is 00:36:02 Yeah. Male Speaker 2 Well, I'm glad you pointed that out, SW. I should have worn my glasses. I totally skipped this entire four bullets. And they're really important and that's one of the reasons because it was specifically mentioned, especially in Doug's talk about avoiding APIs. This is not a programming model. This is a library, right? So we do talk about APIs and we do have to define the APIs. And one of the first decisions that we made, obviously with significant input from Microsoft because this is a Windows library, was that

Starting point is 00:36:45 we wanted zero changes in the API. I mean, absolutely identical, down to the function prototype, down to whether it's camel case or not. I mean, we wanted exactly the same. So as an application developer using NVML, if you want to write your application for Windows or Linux, the signatures are spot on. You won't see a single difference. And so far we've been able to maintain that. And I don't see any reason why we're not going to be able to maintain that moving forward. But that's definitely a goal of the library because it's an implementation. Which of course means maximizing our shared code and documentation. And I'll show you our directory structure layout here in a minute so you can sort of see where things are falling out and know that nothing up our sleeves.

Starting point is 00:37:26 It actually is laid out this way. We use the same repository. So if you go to GitHub today, pmm.io, and go poking around the source code, you're looking at the Windows code too. We're working in real time off the same code. It's just when the Linux stuff is released, none of the Windows specific things are tagged. We don't have a Windows specific release tagged and we're not building Windows specific releases right now. But all of the source and the work in progress stuff on test code is available right now. Yeah, I'm really glad you brought that up. I don't know how I missed an entire section of that slide.

Starting point is 00:38:05 Okay, so like I said, the fun part, right, starting a new community, jumping into a working community is really cool too, right? Cuz if it's a good community, if it sucks, then it's no fun to work there. But for the most part, all your decisions are made, and you just gotta learn how to work with the machinery that people are going and understand what's happening. But we got to start from scratch, right? Minus, well, we had to use GitHub cuz that's what the Linux what's happening. But we got to start from scratch, right? Minus, well, we had to use GitHub,

Starting point is 00:38:25 because that's what the Linux library is in. But the things that we're using that were already common there are GitHub. We use Revealable for all of our code reviews. And then we have a homegrown test framework. I say mostly. I think it really is mostly, mostly homegrown. And it's a combination of bash scripts and some Perl and some C code. There's got to be one other language in there somewhere.

Starting point is 00:38:54 It's kind of this mesh of things, and it's really well structured, but it was not an original design goal to have it be multi-OS. So one of the big decisions that we had to come up with when we started this effort, besides the API call, was what are we going to do about the test framework? It's not going to run into Windows. So we identified the maximum amount of shared code that we could possibly come up with and figured out how to wrap it in PowerShell. So we are converting basically everything that's Bash

Starting point is 00:39:26 in the Linux side to PowerShell on the Windows side. And really everything else, including the Perl and of course the C code, is for the most part untouched. Where we find incompatibilities in the C code, we will make every effort not to conditionally compile it out for Windows or conditionally compile in something else. We'll make an effort to refactor it so that it is using something common between the two OSs. Of course, we can't do that everywhere. We do have a few conditional ifdef, win32

Starting point is 00:39:56 kind of things floating around in the test code. Almost none of that in the library, though. That's almost all limited to test code. Then you'll see in a minute when we show the directory structure, there's some obvious things that are big chunks of code that have to be Windows, right? All of our threading implementation, memory mapping, that kind of stuff that is just night and day different, has different implementations behind the scenes. And then on Windows side, we're using Visual Studio, AppVare, which is our continuous integration build environment, which actually I didn't even know this thing existed when we started doing this.

Starting point is 00:40:28 We weren't sure what we were going to do for CI. But we do have a system now set up where we are integrated with GitHub. Reviewable was already integrated and now Appvare is as well. So as a developer, you make your pull request and you instantly get both the Linux library test being run under Travis CI and the Windows test being run under Appvayr CI and then you can see your results and the maintainers can go through and look at all the unit tests and make sure they're passing before they even bother jumping into reviewable. So Appvayr is pretty cool. And then Trello,

Starting point is 00:41:01 I'll talk about that on the next slide. That's really one of our key tools that we can use because we're so small. As soon as we get to be a bigger community, then we'll probably drop that. But you'll see in a second why it's extremely helpful for what we're trying to do. Process-wise, again, because we're really small, we do have a weekly meeting. And it's very helpful because we're trying to move on a tight timeline. And we're trying to port over 160 tests I think plus a lot of the framework the the framework that's written in bash so we've got a lot of well-known tasks that we have to get done so it helps to talk face to face we've also got an IRC

Starting point is 00:41:40 channel that we don't use a whole lot again because we're having weekly meetings but when we stop having meetings I think we'll find ourselves on IRC quite a bit more and then the link below covers Trello and we'll just go ahead and flip to the next slide you'll see what that looks like so if you haven't used Trello before it's super super easy basic way for anybody for us we're using it as developers but for anybody to collaborate where you've got pretty well-known tasks to deal with. The way we run our weekly meeting is going through this Trello board.

Starting point is 00:42:11 Each one of these things is called a card. We start off with our discussions column, so anybody in the community can throw out a discussion topic, and that just guarantees that we talk about it in the meeting. It doesn't mean that you can't talk about it before then. We've got email and IRC and all that stuff. But this is just a way to say, oh yeah, it's not important, but I want to make sure we talk about it next time. We've got sort of our wiki or reference page there.

Starting point is 00:42:34 So we've got all sorts of cards on how to port certain things a certain way so that we're all doing things consistently. We've got a couple of columns for backlog. So we have generic backlog tasks. And we have backlog tasks that are focused around the tests that we're porting so we can get an idea of we have to port the tests first. We don't want anybody touching code until it's testable.

Starting point is 00:42:53 We've got lots of tests to get through first before we start moving on to non-test related items. The next column is just our mirror of GitHub issues. We don't have any yet because we're not doing any of the importing test code. And then moving on down, we've got sort of doing and review and done and enclosed and all that kind of stuff. But this is kind of, you can see,

Starting point is 00:43:13 this is the agenda of how our workflow works in our community. We just go through one item by one item and see who's got roadblocks, who's questioning a certain implementation, how it should be converted. Has somebody hit this before? And so far, it's been working really well. I think this was before we had counters on here, but right now we've got about 150 backlog

Starting point is 00:43:35 items completed and probably about 80 more to go that we know of before we know that the test code has been completely ported and then we'll deal with whatever issues arise and whatever else we need to do specific to Windows. All right, like I said, I was going to dump out the directory structure here that you'd get if you went off and cloned the repo and looked at it locally. So these are all the Linux libraries here, which are now the Linux and Windows libraries. And then in the common directory, which originally was intended for code that was common between

Starting point is 00:44:10 the libraries, it now has two meetings. It's code that's common between the libraries and between the OSs that the library supports. So we've got a few Windows specific things down here. You can see we've got like our P RP thread implementation for Windows is down here. And then we've also got, we're going to migrate this over. We've got this sort of leftover Windows directory up at the root. We're going to slowly migrate that all over into this common area. So basically the layout will be all one directory structure regardless of the OS,

Starting point is 00:44:40 and there will be a few Windows specific files floating in the common area, and that's it. So everything else is untouched. of the OS and there'll be a few Windows specific files floating in the common area and that's it. So everything else is untouched. And the way we laid out our Visual Studio solution, we've got our main solution at the top which includes individual projects for everything related to the library. So we've got right now five or six library projects, right, to build the actual libraries. At least a handful of internal tools that are used by the tests, or even one or two tools that can be used by developers that are built as separate projects.

Starting point is 00:45:19 And then we have this nice long laundry list of subprojects. Each one contains a suite of tests specific to an area of functionality. So each one of those can be anywhere from one test to 30 tests, depending on what unit of functionality we're trying to test with it. But that's how we, that's essentially how we build and debug all of the test framework in the library and all the libraries as well. To date, neither Chandra or anybody else on the team has really had to make hardly any changes to the library. It's really been all changes to the test code. Part of that is because there was some prep work done by the maintainer of the entire

Starting point is 00:45:54 library up front when he knew that Windows was coming. He changed some of the library to make it a little bit more friendly for Windows, not Windows specific, but he made those changes in advance. So really we've been really getting just focused on getting this test code ready so we can start with real implementations. Okay, status and plan. Like I said, we're really active in development now. We have full meetings every week. We've got lots of pull requests going in every week. Lots of reviews happening. I think out of 12 of us working on the code, there's probably 40 things in flight right now,

Starting point is 00:46:30 either being worked on or in review, ready for the maintainer to kick back for a revision. We are nearing the end of implementing the test code, which we know is probably going to produce more backlog items for us once we get everything testable. And then our plans are to continue to maintain this whole thing on PMM.io and we will start building the Windows version of NVML and providing that as a tagged, probably an MSI package or something on, in the same repo.

Starting point is 00:47:04 And then we've just got some of the references here. So that's kind of what we're doing, where we're at and where we're going. And again, if anybody is interested and wants to help out, we've still got lots of work to do. All you've got to do is contact me or Chandra. Both our contact information is on here, and we will get you going.

Starting point is 00:47:21 Any questions? All right, one question. Can you comment on support for other architectures? Yeah, I think that question was asked once the other day too. The library is certainly built to handle other architectures, but nobody has brought forth a patch to support other architectures. So there's nothing in the architecture of the library that would prevent supporting anything else. But nobody's brought that forward.

Starting point is 00:47:47 And if they do, it will be welcome with open arms. Is it packaged up for common distributions? Yeah, it is. I'm not sure what the current status is, but Andy does. Yeah, it's been picked up by Fedora, and it'll be in RHEL 7.3. It's in OpenSUSE, and then it'll be in one of the upcoming SLES things. We haven't submitted it yet to Debbie.

Starting point is 00:48:15 We have all the packaging stuff working. We just haven't finished some of the labor. So it's getting it on the Linux side. Yeah, and on the Windows side, we'll continue to see closer and tighter integration with Microsoft as we make it through the rest of this process. Okay, thanks everybody. Thanks for listening.

Starting point is 00:48:40 If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the developer community. For additional information about the Storage Developer Conference, visit storagedeveloper.org.

Your Ad Here

Storage Developer Conference - #46: Building on The NVM Programming Model – A Windows Implementation

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.