Storage Developer Conference - #123: The NVRAM Standard

Episode Date: April 6, 2020

...

Transcript
Discussion (0)
Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to STC Podcast, episode 123. My name is Bill Gervasi. I'm the principal systems architect for Nantero. And since they paid for my flight up here, you are going to get a little bit of a pitch on Nantero technology. But fundamentally, I'm up here speaking on behalf of the JEDEC organization.
Starting point is 00:00:59 I'm the chairman of the JEDEC Non-Volatile Memory Committee. And what I'm going to do is be talking about some of the work that we're doing in the Non-Volatile Memory Committee. And what I'm going to do is be talking about some of the work that we're doing in the Non-Volatile Memory Committee and how it applies to the technology that Nantero develops. So I'll try to keep the sales pitch down to a minimum. So the process that we're going to go through over the next 45 minutes, we're going to talk about the data processing challenges, which is going to be pretty much candy for you guys.
Starting point is 00:01:28 We're going to go into a little bit about checkpointing, the rationale behind why we do this checkpointing. We're going to look at the standard pyramid that you've seen too many times before, but then we're going to start talking about changes in that pyramid to address the idea of data persistence. Finally, I'm going to roll all this together once you've seen the rationale behind what we're working on
Starting point is 00:01:54 and then talk about a new standard that's in development in JEDEC called the NVRAM standard that will start bringing some of these technologies into the applications that you guys want to develop. So here's the candy, right? You guys all know that data processing is absolutely wonderful until something goes wrong. And that's what the bunch of us get paid to deal with
Starting point is 00:02:20 is when things get screwed up, how do we get back where we wanted to be? And as we know, the weakest link in this process is the fact that since 1971, we've been dependent on a technology that is dynamic. The D in a DRAM stands for the fact that when power goes away, so does your data. And the volatile nature of this DRAM is driving the system's architectures to deal with these failure mechanisms because they happen, unfortunately, far too often. And when they happen, it costs us all a lot. The cost of system failure is sometimes staggering,
Starting point is 00:03:03 just in terms of the lost business opportunities and so forth. And so what you see is that it actually pays for itself to look at this problem and come up with unique ways to solve it. So how did we deal with this? Well, one of the key things that we do in all of our system architectures is checkpointing. And most of you guys probably already know this, but the idea of checkpointing is you're going to take critical data that you can't afford to lose and set it aside.
Starting point is 00:03:34 Now, obviously, that can't be done on a cycle-by-cycle basis today. So you have to pick your points at which you're going to save that data and then have a structured way of recovering. And so you're in your DRAM which you're going to save that data and then have a structured way of recovering. And so you're in your DRAM, you're running along, you get to the point where you say, boy, I have some important data here. It's a bank transaction or it's a stock transaction. Let's go ahead and throw that checkpointed data off to storage, which traditionally was hard drives and then moved into other channels as well. So now you have something that you can get to. And now you're going to keep running,
Starting point is 00:04:11 and you're going to checkpoint every once in a while. You keep running, you keep checkpointing. So what's happening to your system while you're doing this? Well, you're essentially taking machine cycles that could have been used for other purposes, and you're throwing those machine cycles at this checkpointing process. So it's going to degrade your system performance. It's going to burn a lot of power, because now you're not talking just over the DRAM channel, but now you have to wake up your IO subsystem, throw that data out over a higher-powered subsystem in through a certies into a device that's going to then store that permanently. And it fundamentally is just not at all
Starting point is 00:04:49 what we would like to be doing with our system architectures if we had a choice. So why do we do it? Well, we do it because it's how we avoid loss. And so, you know, you're running along, you hit a checkpoint, and you come along and all of a sudden you get a system failure.
Starting point is 00:05:06 Your memory just crashed, something doesn't look right. What are you going to do? So you're going to go over, restart your system, go back to your storage mechanism, get your checkpoint, bring it back into the system so that you can then continue operating. This has a lot of ripple effects. So, for example, we know that we have to deal with these system failures. continue operating. This has a lot of ripple effects. So, for example, we know that we have to deal with these system failures, and we know that we have data that needs to be persistent,
Starting point is 00:05:40 but the storage mechanisms that are provided to us affect our access granularity. So what that means is that when you're talking to memory and your latency is 30 nanoseconds, for example, that's a factor, but 30 nanoseconds you can pretty much ignore. But when you have to go out over the I.O. channel, well, first you have to trap to the operating system, go out through the I.O. channel, go through that SERDES, shovel a packet across to a translator that's going to turn that into signals that it can control a flash device. We're talking about many orders of magnitude of degradation in performance, and as a function of that,
Starting point is 00:06:16 that's why we invented, among other things, the file system, so that you can block up those transactions and get a big chunk of data that you can move so that you can live with the latency hit of going out over that IO channel. Has anyone seen this pyramid before? Yeah, okay. I get that this is kind of vanilla because we see this pyramid at every damn presentation
Starting point is 00:06:44 that we ever attend. But what I did want to get into was, so of vanilla because we see this pyramid at every damn presentation that we ever attend. But what I did want to get into was, so what can we do about this? We understand that these tears of memory happen. And we do also know that there are things that are changing that. I'm sure somebody in the room has also heard of Optane, for example, as one of those types of memory that now wants to come in and affect that pyramid. And the purpose of all these ideas is what we want to do is move persistence closer to the CPU. You want to take that solid data and start bringing it closer and closer with lower and
Starting point is 00:07:19 lower latency so that you can get a better persistence. So how did we start out? Well, like I said, we started out with rotating disks. And we went from that to the SSD. That was a pretty big improvement in speed because you no longer had to deal with rotational latencies and that sort of thing of the rotating media. And then we got a little smarter and we came up with even higher performance variations like the NVMe.
Starting point is 00:07:46 But these things were all sitting out on that I-O channel, which meant fundamentally you had to take that latency hit to go to the I-O subsystem to get to these devices. And what we really wanted was we wanted some kind of a holy grail. What we really wanted was a system that was going to solve all of our problems. And what was that going to be? That holy grail is end-to-end data persistence. What we really want is for power to fail at any time and we don't give a shit.
Starting point is 00:08:29 So we go back to our pyramid, and we do know that things like the Optane and the NBDIM products came in and defined a new tier of memory, storage class memory. And that was kind of nice, because it sat in that wasteland between the CPU and the SSD.
Starting point is 00:08:49 So it was kind of a nice enhancement. But all it did was just make the problem scope a little bit narrower. It really didn't address the problem of what could you do if you could replace the DRAM with a persistent option. And that's what you would call memory class storage. A memory first that has storage characteristics. So with memory class storage entering the hierarchy, now let's take a look at what we can do with these systems. And why am I bringing this up now? Well, one of the questions you can ask yourself is, when is the last time you heard of somebody developing a volatile memory? Been a while, hasn't it?
Starting point is 00:09:33 Everybody in the industry is focusing on these non-volatile architectures. You have the 3D cross points, the MRAMs, the phase change memories, the resistive RAM, and of course the guys that paid me to come up here and teach you this, Nantero with the carbon nanotube memory technology. So these are all in the pipeline. We don't know if all of them will succeed. We also don't know if all of them are going to hit the goals that we're setting here, except that, of course, mine will.
Starting point is 00:10:03 So where are we in the evolution? Well, I mean, we started out with vacuum tubes, and then we went on up to core memory. And then the DRAM made our lives a little more complicated. And I think now we're in the next stage of evolution, which is to move on to the NVRAM. And you notice that I intentionally avoid Sneha's argument that persistent memory is a term that we can use. Sneha wanted to change storage class memory to persistent memory. The problem with that is that that term just really doesn't mean anything more.
Starting point is 00:10:39 It just means that the data sticks around on a power pail. So what we really need is that we need another distinction between a deterministic and a non-deterministic permanent memory. And that's why I'm introducing this new term. It's not like I'm really big into three-letter acronyms, but as an engineer, I have to be. The other thing is that we need to talk about persistence. What is persistence?
Starting point is 00:11:05 Well, it's not the same for everything. And you know that DRAMs and SRAMs have no persistence whatsoever. You have devices like RNRAM or the EFI RAM that have very, very long persistences. And then you have a bunch of them that are kind of in the middle. And so we need to address all of these questions about the definition of persistence before we know that we can resolve some of these questions. So what is right endurance? Right endurance is one of those things that determines how persistent a memory is. Because the idea is that if that thing's going to start breaking down
Starting point is 00:11:42 when you access it, at some point, the breakout is going to happen, and you're going to need to go through and do an operation such as wear leveling to go in there and correct for the fact that you've reached a limit in terms of how many rights that device can endure. I was actually a little scared when I did the research on how bad these problems are because I started looking at things like the number of cycles that are available for some of these technologies. For single level cell, the endurance is pretty bad. For things like triple level cells, you have incredibly bad right endurance characteristics.
Starting point is 00:12:27 And then you have this other weird thing that happens. When you have devices that wear out, you're no longer guaranteed your capacity. And so you have to do things like over-provisioning in order to guarantee a certain amount of capacity in a device. And this is all statistical. If a given device violates this, how do you know? So this is a system level problem
Starting point is 00:12:50 that you periodically have to go and pull these devices to make sure that they're not filling up, that they're not wearing out to the point where the guaranteed capacity is no longer there. It's a pretty big deal, right? Add on top of that sensitivity to temperature. I was looking
Starting point is 00:13:07 through some statistics here and things like in order to hit a 52 week guaranteed persistence, you had to keep your temperatures down to 40 degrees centigrade. That's pretty cold, actually. Most data centers, they tell the DRAMs, you have to be prepared to operate at 95 degrees centigrade. And you look at where 95 degrees centigrade is on that chart, well, guess what? You're not even going to find it. So this is really serious stuff.
Starting point is 00:13:37 And again, you guys live and breathe and eat this stuff. But now let's try to apply that to where we really want to be, that holy grail. Take a second and think about the DRAM protocol, because that's where we want to be. If we want to replace the DRAM, we really need to live in the DRAM world and offer the same level of performance as a DRAM with a non-volatile memory in order to deliver to the industry what it really needs to affect this next evolution of systems technology. And the DRAM interface is fully deterministic. You issue a read or write command, and that data had better be there 15 nanoseconds
Starting point is 00:14:21 later or things start breaking. So now if you tried to introduce a device that had to go off and do wear leveling, what's happening? Well, what's happening is that at the time that the memory controller thinks that the data is ready, you're not there. So what this says is that you cannot have endurance limits on a device that are visible to the memory controller. Otherwise, you are not a memory class storage. So memory class storage needs to be the full speed of a DRAM with no endurance limits
Starting point is 00:14:56 and fully deterministic 100% of the time. So now I'm going to start throwing a bunch of terms out because they're all kind of related. What's the relationship between an NVRAM and a memory class storage? Well, today there's no difference. So the specification that I'm going to be introducing you to in a couple of minutes here, the NVRAM standard is memory class storage for now. However, we also know that once you guys get used to the idea of having persistent memory on the channel and it's not a DRAM anymore, we're going to start changing the protocols. We're essentially going to take over the industry and start determining protocols that are friendly to these new non-volatile memories.
Starting point is 00:15:49 So speaking of personal experience, our technology, the carbon nanotube technology, is built using a cross-point architecture with 64 kilobits in a cross-point supplied a bit at a time. When we take that DRAM protocol that has bank groups, bank addresses, rows, columns, chip IDs, all of that stuff, all those address bits, those are completely artificial to us. So we anticipate that in the future, as we get this predominant in the industry,
Starting point is 00:16:22 what we're going to be doing is introducing new protocols to allow you to do some new cool stuff. Like if you're making an artificial intelligence engine and you don't care about banks and bank groups, we can add to the protocol new ways to get at that data much more quickly. And that's going to be for the next generation of controllers and memories. But, you know, of course we have to do baby steps. We have to kick the DRAM out first, and so let's focus on that.
Starting point is 00:16:53 So storage class memory is not a memory class storage device. And graphically shown, this is basically showing you that you have Flash today being dominant for the storage. You have these phase change memories and the 3D cross points and all that. They're coming into the wasteland as a storage class memory. But again, none of them have DRAM replacement because they're all slower and they all have endurance limits. So a memory class storage device must come in at that level here. A memory class storage device must have the full performance of a DRAM or better. And it has to have DRAM equivalent
Starting point is 00:17:36 endurance. In other words, unlimited endurance. And again, to replace DRAM, it has to have the same capacity or higher than a DRAM. So what is that holy grail looking like? Well, it has to have full speed, non-volatility, the unlimited ride endurance, wide temperature range is going to be a nice plus, scalability. The fabrication has to be capable to be built anywhere, has to be in the power envelopes and the cost envelopes you all expect. If we can achieve that, if I can show you that this is possible, what you have is a drop-in replacement for a DRAM,
Starting point is 00:18:17 even at the module level, that your system will operate as before, but with the added benefit of data persistence. So what are these technologies? Well, like I said, Nantero NRAM is one of the technologies that I've brought into this new standardization effort. But I'm also working with the other guys. As a chairman of JEDEC, I have guys working with me that are bringing in phase change memories and magnetic memories
Starting point is 00:18:44 and resistive memories into this specification. Now the current generation devices can't do that, but they all have stuff in process. And what I'm trying to do is to enable the industry with a single standard that you controller guys can design to and get multi-source technologies for that. And then what we'll do is beat the snot out of each other in the marketplace based on price and performance. So it's based on the DDR5 SDRAM specification, which I don't know if any of you guys are planning on going to the JEDEC training in November, but this is essentially what they'll be teaching at the JEDEC event,
Starting point is 00:19:24 is this new DDR5 SD-RAM. What I'm writing in the non-volatile committee is an addendum to the DDR5 standard called the DDR5 non-volatile random access memory addendum to JESD 79-5. And what this says is a DDR5 NVRAM is just like a DRAM, but in addition to that, no refresh is required. Self-refresh can be a true power-off because there are no cells to refresh.
Starting point is 00:20:02 So you can actually turn power-off completely to the device, which you cannot do to a DRAM today in self-refresh. Some of the timings are going to be different just because you can't expect a resistive RAM or a magnetic RAM to have exactly the same
Starting point is 00:20:19 Rastakast timings and so forth. But they all need to be deterministic and they all need to be within the same scope. Again, the focus is that you guys can design one memory controller that can talk to an SDRAM and or a NVRAM. There are also differences in data persistence.
Starting point is 00:20:43 And what I'm going to do is I'm going to show you some of these, but this is driven by customers. Some customers are okay with data persistence being defined as if power fails, just make sure the data is consistent. Other guys in higher reliability environments say, I need to know exactly when that data is committed to the non-volatile array. And then some people are in between.
Starting point is 00:21:05 So we're allowing for variations in the market to drive us all as suppliers to the market. And then there's another problem that I don't know if you guys were aware, but DRAM kind of dies out at 32 gigabit. Were you aware of that? When you read the DDR5 spec, you're going to see that the DDR5 specification stops at 32 gigabit.
Starting point is 00:21:29 And if you try to get a DRAM supplier to give you a commitment today for a 128 gigabit monolithic chip, you're universally going to get a no right now. And the NVRAMs don't have this limit because it's built on a whole new technology. So one of the things we needed to address was what happens if you want to go beyond 32 gigabits per chip? The other thing is that these technologies are not identical. My NRAM is not exactly the same as a spin transfer device, or I guess they're spin technologies now. They're not identical. So there are going to be some subtle differences, and that's the point of a specification, is
Starting point is 00:22:12 to allow those differences in a standardized way so that you guys that design controllers can look at that and figure out what the differences are in the timings, things like requirements for pre-charge, things like what are the available persistence definitions. But they are all going to be in a common spec. So you have this one common core of features and then a few warts on that to describe the slight differences between the technologies. How do you determine this?
Starting point is 00:22:44 You guys know about the SPD, serial presence detect? It's a configuration EPROM that's on every standard memory module, and it holds the configuration parameters for a device. So those profile 1, 2, 3, 4 features can be expressed in the SPD saying, this technology is the N-RAM, and here is its rasticast delay, here is its right recovery time, and so forth. So let's look at some of these differences. Right now, with a DRAM, what do you have to do? Well, you're running along merrily, and 3.9 microseconds goes by, and all of a sudden you say, I had to go to sleep for 350 nanoseconds to do a refresh cycle.
Starting point is 00:23:32 When you have a persistent core, that no longer is necessary. So that 350 nanoseconds, which incidentally is 15% of your data bandwidth, goes away. But if you want an NVRAM to plug into a DRAM socket invisibly, you have to accept and decode the refresh command and then just no op it. But it has to be a part of the command protocol. And again, that's the purpose of having an NVRAM specification is that you guys know, yeah, you can issue the refresh command and on the next clock you can do some real work. Or you can just drop the refresh command completely.
Starting point is 00:24:16 So what else can you do? Well, what about self-refresh mode? Well, self-refresh mode is an interesting one because it actually has two functions. One of the functions of the self refresh command is that's how you put this device into a lower power state where it just periodically goes in it internally refreshes its content so that you can come back up and keep running. But the second function of the self refresh command is that's how you allow people to change frequency. Because a side effect of going into self-refresh mode
Starting point is 00:24:48 is it resets the DLL inside the device to resynchronize to the new frequency. We need one of those functions for the NVRAM. We want you to be able to change operation frequency, but we don't need that refresh operation. So that means we're not burning any power while we're in self-refresh mode. You can still go into self-refresh,
Starting point is 00:25:12 we just shut everything off and then wait for you to come and exit back out. DRAM also has an interesting characteristic in that the way a DRAM works is you have two functions, activate and pre-charge. The activate function takes the content of the core,
Starting point is 00:25:34 brings it out to sense amplifiers that you can then interact with. However, they also need a pre-charge operation to restore the array. Well, why? It's because they have a destructive activation.
Starting point is 00:25:51 When you read the contents of a memory cell, it discharges the capacitor. So you have to restore the content. You don't need that with a non-volatile device. And so what ends up happening is instead of having this complex thing about activates and reads and writes and precharges and then reactivates, now you have a whole new model which is from the idle
Starting point is 00:26:13 state, you can directly do reads and writes. There's no concept of activation and precharge anymore. It becomes the ideal load store memory. So architecturally speaking, I think you're starting to get the idea that there's a whole lot of cool new stuff that you can do with these devices now that these new functions are coming online. And that's why I'm talking about the
Starting point is 00:26:38 evolution of the NVRAM standard so that in the future we can go to protocols that take advantage of some of these new features. The persistence definitions, I talked about these a little bit. The three definitions that we have so far are the intrinsic, which means immediately on every write content is committed to the internal array. There's an extrinsic version where it requires a flush command that came from the nvdim-p protocol. And then the third one, like I alluded to earlier, on reset, you save it away.
Starting point is 00:27:13 So this is what those look like. The intrinsic persistence, which, by the way, is what Nantero NRAM does, says that every time you do a write, within 46 nanoseconds, that data is committed to the internal non-volatile array, and power can fail,
Starting point is 00:27:29 and the most you can lose is 46 nanoseconds of data. The extrinsic guys say, we're going to do a bunch of buffering on chip for various reasons, and then what we're going to require is that periodically you issue a flush command that will then take the internal buffers and commit them to the
Starting point is 00:27:49 non-volatile arrays. For those guys, your data persistence is measured from so many nanoseconds after the last flush command has been issued. If you're writing applications, you need to know these differences because you need to know these differences
Starting point is 00:28:05 because you need to know when you can free up your requesting application to go off of the active queue and go on to reserve. And then finally, this is kind of like the NVDIMM-N. This particular persistence model says that just go ahead and let the device worry about this and only worry about power fail. That's good enough. And on power fail, it'll go and it flushes its contents. So you only need to keep the power to the device for a few microseconds after the power occurs, the power fail occurs. So I talked about this limitation of DDR5 SD RAM to 32 gigabits.
Starting point is 00:28:47 That's not very far down the road. And if I were a systems guy, this would worry me quite a bit. We're changing the protocol. We're enhancing the DDR5 protocol to enable up to 128 terabits per device. And the way we're doing that is we're taking the standard DDR5 protocol which looks like this right now. You do an activate, you do reads and writes,
Starting point is 00:29:12 you do another activate, do a bunch of reads and writes, and that's how you access your data. What we're doing is adding a row extension command to the DDR5 protocol. So it's a superset of the DDR5 protocol. It's an optional thing until you need to address more than 32 gigabits per chip. And what this does is the
Starting point is 00:29:34 row extension adds another 12 bits of addressing. So how does that really work? So you think about how the DDR5 devices are organized today. You have a row that represents a bank buffer. You have 32 banks, and then you have a column selector across that. With the DDR5 NV RAM, it's exactly the same thing. You still have 32 banks. You still have a row association, but now you have the row extension bits that are being added to the association to those 32 bank buffers.
Starting point is 00:30:13 So here's a sample command sequence. You see a bunch of activates, you see row extensions scattered in there, you see some reads and writes, and you can think through some examples of this. So if you do a row extension A, that's going to latch some high order address bits. So the next time when you do an activation, so say you activate bank W with a specific row value, now that row value gets added to the row extension bits. And when you
Starting point is 00:30:45 do a read from bank W, that is going to address the bank buffer that is addressed by the combination of the extension bits and the activate bits. Yeah? Is it additive
Starting point is 00:31:01 or is it concatenated? It's concatenated. You don't need an add? Right. it concatenated? It's concatenated. You don't need an add? Right. Just concatenated. So that's an example of row extension. Now if you want to replace that row, now let's say we issue another row extension command
Starting point is 00:31:19 that changes the latch for this. By the way, Nathan, I don't want to pick on you just because you're old, but you're old enough to remember expanded memory? That's where I got this idea. I stole this from the old
Starting point is 00:31:33 expanded memory spec because I knew the patents had run out. Okay. Bad ideas never die. Yeah. And so now when you do the row extension, and now you do another activation.
Starting point is 00:31:49 So you're replacing bank W with a new activation. So now the combination of the new row extension bits plus the row information from the activation command is consistent. So now the page buffer is represented by that new address combination. And now when you do the read, you get the data from the new one. So it works just like a DRAM, but you have this little latch that adds some extended address bits.
Starting point is 00:32:24 So now a little pitch on what we work on. So our carbon nanotube memory, it's implemented as a cross point. And then what we're going to do is we're going to take these cross points of carbon nanotube memory cells and we're going to put a DDR4 or a DDR5 in front of it. And what we're doing is translating the
Starting point is 00:32:46 DDR protocol to our internal structure. So, I'd like to say we're hitting every one of those check marks along the way. We're expecting to be at 512 gigabits per die in the DDR 5 generation. We're cheaper than a DRAM. We work at any temperature
Starting point is 00:33:04 extreme you can imagine. And I have this funny little story about the 12,000 years of data retention. And I was meeting with guys from the office of the director of national intelligence a couple of weeks ago. And I threw a slide up that looked a lot like this. This guy from the U.S. government,
Starting point is 00:33:25 who, by the way, never introduced himself. I have no idea who this guy is. He's just in a suit, in the meeting, and I was told not to ask if somebody doesn't offer who they are. This guy says, well, our data shows that carbon nanotube connections should last a million years.
Starting point is 00:33:42 Why are you only saying 12,000 years? I had to admit, the guy had me stumped. I'm not usually at a loss for words, but he got me. So, is this an evolution or is this a revolution? Let's think about the old paradigms. So, in fact, those of you that have been in the hackathon this week, you've seen a whole lot of this stuff, right? That you have these applications that are running along
Starting point is 00:34:13 doing load stores to local memory, and the important thing is that that task that's running stays on the active task list. When you have to go to the file system to do a block transfer, like a checkpoint, you're no longer on the active list. You have to trap to the operating system before you can then take this block of data and throw it down the channel to the storage device.
Starting point is 00:34:47 That context switch is what's killing our performance, and it's also the argument as to why we're not seeing the performance numbers from the storage class memories that you would expect from the raw throughput rate differences between PCIe and the DRAM channel. We're not seeing those huge improvements in performance because we're losing stuff by trapping to the OS and
Starting point is 00:35:16 taking your task off of the active task list. You have to then satisfy your I.O. before putting the task back on the active list. That's what's killing us. That's what I think we're going to be satisfy your I.O. before putting the task back on the active list. That's what's killing us. And that's what I think we're going to be able to solve. The great thing is that I actually don't want to diss the guys that are proposing DAX because DAX is the way we're going to migrate to these new concepts.
Starting point is 00:35:41 DAX gives you that option to still trap to the operating system, still take that performance hit, but at least operate at DRAM speeds by using a move command instead of some IO F read, F write. So this is important.
Starting point is 00:35:59 DAX mode does do a lot of the right things because now what we can do is we can take our existing applications that are doing all this checkpointing, we can mount the storage class memory devices locally, or you can put this new memory class storage in there. And now you can do checkpointing at DRAM speeds, but eventually you can eliminate the checkpointing completely.
Starting point is 00:36:29 Once you're convinced that our technology is solid enough that on power fail you're not going to lose your data, we can start to eliminate the checkpointing altogether. And that, I'm going to argue, is the revolutionary home run. So where does that lead? Well, you know, we look at this storage architecture that we've been making, but now, all of a sudden, we can imagine that with memory class storage, we can envision that if the application fits in main memory
Starting point is 00:37:01 and you don't need to do checkpointing, all you need is a network connection. So this, again, is that revolutionary idea that we can move towards completely storage-free systems. And so I'm going to argue that I think I've made the point that we're looking at a revolutionary change, not just an evolutionary change in systems architecture because it's beyond just the memory impact, but now it is affecting our entire systems architectures.
Starting point is 00:37:29 So we're looking back at that data persistence spectrum, and we started off with tape and moved to hard drives, SSDs, and NVMEs. Storage class memory is a step in the right direction. But there is memory class storage coming into the picture. And that's what I'd like to introduce to you guys today. But I also want to point out, we're not stopping there. We're still looking at the cache and the registers also. And when we can speed up the non-volatile memory technologies
Starting point is 00:38:00 to replace that, I think we'll have found the holy grail. And it's not just main memory for servers. This technology has a home in pretty much everything you can imagine. It has homes in NVMe storage. You still want to have lots and lots of cheap bits, and we're not expecting flash to go away anytime soon. But what you can imagine is a memory class storage device attached to an NVMe controller
Starting point is 00:38:29 that has the flash memory, but now it no longer has to worry about power fail in the NVMe either. You can eliminate, I don't know if you've looked at an NVMe design lately, but about a quarter of the PCB space is consumed by tantalum capacitors that are giving the energy necessary to ensure the data persistence on that link. Another one is the AI and deep learning architectures.
Starting point is 00:39:00 These architectures are inhibited today by the fact that they must use von Neumann architectures so that you can do checkpointing quickly in and out because the backpropagation of the AI architectures means you have a lot of valuable data that's in those AI devices, and you have to spoon it out sometimes. With memory class storage,
Starting point is 00:39:22 they can just leave the data there. Backpropagation no longer is a punishment. You just leave the data there. Back propagation no longer is a punishment. You just leave the data there as long as you want, and now your checkpointing can be more along the order of hours or days instead of every few milliseconds. Finally, CXL. I don't know about you guys, but I think that CXL is a pretty important vector for the industry,
Starting point is 00:39:45 and it's very likely going to be the next interconnect standard for not just CPUs, but application accelerators and storage devices. And this might be the opportunity for that NBRAM standard to start breaking free of the DRAM standard. Because you can imagine a CXL interface on one side going directly to addressing our non-volatile memory tiles on the other side. And no longer having to go through a translation layer of
Starting point is 00:40:17 DDR5 compatibility. So it's pretty damn exciting. I think the wave's coming. And so I want to highlight that what we're looking at here is we're looking at how for decades we've let power fail mechanisms drive our architectures, and we've come up with this system of checkpointing
Starting point is 00:40:38 to deal with all of these errors. We balance stuff in our memory tiers to put stuff in the right places in there. But now we have a new way of moving persistence a lot closer to the processor. And this DDR5 NVRAM standard that I'm driving in Jetix
Starting point is 00:40:57 is going to be the way that you guys see documented how to talk to these new classes of devices so that we can evolve applications to exploit memory class storage. And with that, I'd like to thank you for your time and open the floor to some questions. Nathan, you don't get the first question all the time. Okay.
Starting point is 00:41:21 I have a bunch, so you may want to... Okay, well, you get one. You get one. You get one. Okay. Pick one. A while back, you were talking about consistency of latency and spec, which presumably takes managed non-valuable storage and makes it so that they don't need the spec.
Starting point is 00:41:41 But then, a little bit after that, you said, but the latency of the spec might actually be worse than the latency of traditional DRAM. So the question is... Or better. Right. But the question is, if a storage class can
Starting point is 00:41:58 guarantee a worse case latency, does that mean a spec then? Yeah, and I've gone through this in other versions of my talk where I spend more time on Nantero NRAM. So for example, what I can tell you for Nantero NRAM,
Starting point is 00:42:15 because of that data persistence thing that I told you about where we guarantee data persistence within 46 nanoseconds, we extended RAS to CAS from 15 to 23 nanoseconds. We extended Rastakas from 15 to 23 nanoseconds. However, we took access time from
Starting point is 00:42:31 15 nanoseconds and dropped it down to 12. So we made the trade-off to make Rastakas a little longer and access time from read and write a little shorter. So the overall number comes out pretty close to the same, but it's just we juggled it differently,
Starting point is 00:42:50 and it was because of the customer that wanted to know when they could free up their real-time application and take it off of the active task list. So we made that trade-off. Now, that will be documented in the SPD that our TRCD is 23 instead of 15, that our TAA is 12 instead of 15. Does that 46 nanosecond window create something that the application has to understand that is there?
Starting point is 00:43:18 Is there a window where there's yet another data consistency error window that you still have to manage. It doesn't go away. They have to understand our part of it because what I'm documenting is a chip. So the transom. Five minutes? Okay. So bringing the data into the balls of our chip
Starting point is 00:43:42 is what we measure from in terms of writing a spec. What they have to comprehend is that they might have to go through data buffers at the edge of the card, flight time of 12 inches of trace on the motherboard. Then on their processor, what's their L1, L2, L3 strategy? And where do they have their fences in their cache coherency inside their processor? They have to understand all of the stuff that leads up to the balls of our chip. We simply document the final mile. But they have to do all the other math.
Starting point is 00:44:17 But once they know that math, what they can do is say, this processor cannot come off of the active process list until it knows that the data is persistent for the next 12,000 years or a million years. Then I can release it off of the list. And if that number is 150 nanoseconds total for all those other factors, so be it. 150 nanoseconds after it's been issued, if no other system fault is detected, it can go off the active list.
Starting point is 00:44:52 Now it's Nathan's turn. Yesterday, in this very room, the memory guy over there, yeah, you. Yeah, you. You dirty memory guy yeah bummer
Starting point is 00:45:26 yeah yeah So what is it that you're talking about when you're doing NDRAMs, N-RAMs, to be able to avoid that wear of the leg? So what are some of the tricks that you could pull if you have an endurance limit? Okay, I've worked with some of those companies. Spin Transfer, for example. They have a wear out, or now they, for example. They have a wear out. Or now they're called Spin. They have a wear out.
Starting point is 00:45:48 So one of the ways you can do that is you can put poison on the bus and do a retry. Or you can put a worst case number in that always allows them to do their homework. So they could do a number of things. They could do, say, a write recovery time of 60 nanoseconds. They could do some other tricks to give themselves time to do their homework. Another trick that a different supplier is using, they're going to require the refresh command. But they have no cells to refresh. What are they doing require the refresh command. But they have no cells to refresh.
Starting point is 00:46:25 What are they doing with the refresh command? They're using that 350 nanoseconds to go do their wear leveling and cleanup. Another guy, precharge command. He's going to require the precharge command. Why? There's nothing to precharge. He's going to use the precharge command to do his internal housekeeping. So all of these guys are going to use different tricks to try to fit into the DDR5 protocol.
Starting point is 00:46:50 And do I feel great about enabling my competitors? No. But I recognize that in order for me to create a mass market of 100 billion units, I have to accept all of these architectures in the spec and allow them all their ways to fit into that to allow systems to evolve. Any other
Starting point is 00:47:15 questions? Anything? Jim? You get the internal data. I'll make one final comment. Sure. User to kernel transition, etc. My experience is that
Starting point is 00:47:35 most of the overhead is not the transition. It's just Google crashing code in the file system. And most of that overhead is latency of the file system. And most of that overhead is latency of the file system talking to memory. Okay. It's a transform
Starting point is 00:47:54 of I have something that the user wants a name. Yeah. I want to make that name into an address. And that's not an easy thing to do. Yeah. I'm out of time. But the quick answer to that is that that's not an easy thing to do. Yeah, and I'm out of time. But the quick answer to that is that that's a little different than what I'm getting from the customers I talk to who say that if that were true, they would see a 1,000 to 1 improvement of a storage class memory module
Starting point is 00:48:22 versus a PCIe-based module, and that's not what they're seeing. So that's the short answer, is that that's different than the data I get. So thank you for your time, and let's hand it over to the next speaker. Thanks for listening. If you have questions about the material presented in this podcast,
Starting point is 00:48:40 be sure and join our developers mailing list by sending an email to developers-subscribe at snea.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.