Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage
Developer Conference. The link to the slides is available in the show notes at snea.org
slash podcasts. You are listening to STC Podcast, episode 123.
My name is Bill Gervasi. I'm the principal systems architect for Nantero.
And since they paid for my flight up here, you are going to get a little bit of a pitch on Nantero technology.
But fundamentally, I'm up here speaking on behalf of the JEDEC organization.
I'm the chairman of the JEDEC Non-Volatile Memory Committee.
And what I'm going to do is be talking about some of the work that we're doing in the Non-Volatile Memory Committee. And what I'm going to do is be talking about some of the work that we're doing
in the Non-Volatile Memory Committee
and how it applies to the technology that Nantero develops.
So I'll try to keep the sales pitch down to a minimum.
So the process that we're going to go through over the next 45 minutes,
we're going to talk about the data processing challenges,
which is going to be pretty much candy for you guys.
We're going to go into a little bit about checkpointing,
the rationale behind why we do this checkpointing.
We're going to look at the standard pyramid
that you've seen too many times before,
but then we're going to start talking about changes in that pyramid
to address the idea of data persistence.
Finally, I'm going to roll all this together
once you've seen the rationale behind what we're working on
and then talk about a new standard that's in development in JEDEC
called the NVRAM standard
that will start bringing some of these technologies
into the applications that you guys want to develop.
So here's the candy, right?
You guys all know that data processing is absolutely wonderful
until something goes wrong.
And that's what the bunch of us get paid to deal with
is when things get screwed up,
how do we get back where we wanted to be?
And as we know, the weakest link in this process is the fact that since 1971,
we've been dependent on a technology that is dynamic.
The D in a DRAM stands for the fact that when power goes away, so does your data.
And the volatile nature of this DRAM is driving the system's architectures
to deal with these failure mechanisms because they happen, unfortunately, far too often.
And when they happen, it costs us all a lot. The cost of system failure is sometimes staggering,
just in terms of the lost business opportunities and so forth.
And so what you see is that it actually pays for itself
to look at this problem and come up with unique ways to solve it.
So how did we deal with this?
Well, one of the key things that we do in all of our system architectures is checkpointing.
And most of you guys probably already know this,
but the idea of checkpointing is you're going to take critical data
that you can't afford to lose and set it aside.
Now, obviously, that can't be done on a cycle-by-cycle basis today.
So you have to pick your points at which you're going to save that data
and then have a structured way of recovering. And so you're in your DRAM which you're going to save that data and then have a structured way of
recovering. And so you're in your DRAM, you're running along, you get to the point where you say,
boy, I have some important data here. It's a bank transaction or it's a stock transaction.
Let's go ahead and throw that checkpointed data off to storage, which traditionally was
hard drives and then moved into other channels as well.
So now you have something that you can get to. And now you're going to keep running,
and you're going to checkpoint every once in a while. You keep running, you keep checkpointing.
So what's happening to your system while you're doing this? Well, you're essentially taking
machine cycles that could have been used for other purposes, and you're throwing those
machine cycles at this checkpointing process. So it's going to degrade your system performance.
It's going to burn a lot of power, because now you're not talking just over the DRAM channel,
but now you have to wake up your IO subsystem, throw that data out over a higher-powered
subsystem in through a certies into a device that's going to then store that permanently.
And it fundamentally is just not at all
what we would like to be doing
with our system architectures if we had a choice.
So why do we do it?
Well, we do it because it's how we avoid loss.
And so, you know, you're running along,
you hit a checkpoint,
and you come along and all of a sudden
you get a system failure.
Your memory just crashed, something doesn't look right.
What are you going to do?
So you're going to go over, restart your system, go back to your storage mechanism, get your checkpoint,
bring it back into the system so that you can then continue operating.
This has a lot of ripple effects.
So, for example, we know that we have to deal with these system failures. continue operating. This has a lot of ripple effects.
So, for example, we know that we have to deal with these system failures,
and we know that we have data that needs to be persistent,
but the storage mechanisms that are provided to us affect our access granularity. So what that means is that when you're talking to memory
and your latency is 30 nanoseconds, for example, that's a factor, but 30 nanoseconds you can pretty much ignore.
But when you have to go out over the I.O. channel, well, first you have to trap to the operating system, go out through the I.O. channel, go through that SERDES,
shovel a packet across to a translator that's going to turn that into signals
that it can control a flash device.
We're talking about many orders of magnitude
of degradation in performance,
and as a function of that,
that's why we invented, among other things,
the file system,
so that you can block up those transactions
and get a big chunk of data that you can move so that
you can live with the latency hit of going out over that IO channel.
Has anyone seen this pyramid before?
Yeah, okay.
I get that this is kind of vanilla because we see this pyramid at every damn presentation
that we ever attend. But what I did want to get into was, so of vanilla because we see this pyramid at every damn presentation that we ever attend.
But what I did want to get into was, so what can we do about this?
We understand that these tears of memory happen.
And we do also know that there are things that are changing that.
I'm sure somebody in the room has also heard of Optane, for example,
as one of those types of memory that now wants to come in and affect that pyramid.
And the purpose of all these ideas is what we want to do is move persistence closer to the CPU.
You want to take that solid data and start bringing it closer and closer with lower and
lower latency so that you can get a better persistence. So how did we start out?
Well, like I said, we started out with rotating disks.
And we went from that to the SSD.
That was a pretty big improvement in speed because you no longer had to deal with rotational latencies
and that sort of thing of the rotating media.
And then we got a little smarter
and we came up with even higher performance variations
like the NVMe.
But these things were all sitting out on that I-O channel,
which meant fundamentally you had to take that latency hit to go to the I-O subsystem to get to these devices.
And what we really wanted was we wanted some kind of a holy grail.
What we really wanted was a system that was going to solve all of our problems.
And what was that going to be?
That holy grail is end-to-end data persistence.
What we really want is for power to fail at any time
and we don't give a shit.
So we go back to our pyramid,
and we do know that things like the Optane
and the NBDIM products came in
and defined a new tier of memory,
storage class memory.
And that was kind of nice,
because it sat in that wasteland
between the CPU and the SSD.
So it was kind of a nice enhancement.
But all it did was just make the problem scope a little bit narrower.
It really didn't address the problem of what could you do if you could replace the DRAM with a persistent option.
And that's what you would call memory class storage. A memory first that has storage characteristics. So with memory class
storage entering the hierarchy, now let's take a look at what we can do with these systems.
And why am I bringing this up now? Well, one of the questions you can ask yourself is,
when is the last time you heard of somebody developing a volatile memory?
Been a while, hasn't it?
Everybody in the industry is focusing on these non-volatile architectures.
You have the 3D cross points, the MRAMs, the phase change memories, the resistive RAM,
and of course the guys that paid me to come up here and teach you this,
Nantero with the carbon nanotube memory technology.
So these are all in the pipeline.
We don't know if all of them will succeed.
We also don't know if all of them are going to hit the goals that we're setting here,
except that, of course, mine will.
So where are we in the evolution?
Well, I mean, we started out with vacuum tubes, and then we went on up to core memory.
And then the DRAM made our lives a little more complicated.
And I think now we're in the next stage of evolution, which is to move on to the NVRAM.
And you notice that I intentionally avoid Sneha's argument
that persistent memory is a term that we can use.
Sneha wanted to change storage class memory to persistent memory.
The problem with that is that that term just really doesn't mean anything more.
It just means that the data sticks around on a power pail.
So what we really need is that we need another distinction
between a deterministic and a non-deterministic permanent memory.
And that's why I'm introducing this new term.
It's not like I'm really big into three-letter acronyms,
but as an engineer, I have to be.
The other thing is that we need to talk about persistence.
What is persistence?
Well, it's not the same for everything.
And you know that DRAMs and SRAMs have no persistence whatsoever.
You have devices like RNRAM or the EFI RAM that have very, very long persistences.
And then you have a bunch of them that are kind of in the middle.
And so we need to address all of these questions about the definition of persistence
before we know that we can resolve some of these questions.
So what is right endurance? Right endurance is one of those things that determines
how persistent a memory is. Because the idea is that if that thing's going to start breaking down
when you access it, at some point, the breakout is going to happen,
and you're going to need to go through and do an operation such as wear leveling
to go in there and correct for the fact that you've reached a limit
in terms of how many rights that device can endure.
I was actually a little scared when I did the research on how bad these problems
are because I started looking at things like the number of cycles that are available for some of
these technologies. For single level cell, the endurance is pretty bad. For things like triple
level cells, you have incredibly bad right endurance characteristics.
And then you have this other weird thing that happens.
When you have devices that wear out,
you're no longer guaranteed your capacity.
And so you have to do things like over-provisioning in order to guarantee a certain amount of capacity in a device.
And this is all statistical.
If a given device violates this,
how do you know?
So this is a system level problem
that you periodically have to go and pull these
devices to make sure that they're
not filling up, that they're not wearing out
to the point where the guaranteed
capacity is no longer there. It's a pretty
big deal, right?
Add on top of that
sensitivity to temperature. I was looking
through some statistics here and things like in order to hit a 52 week guaranteed persistence,
you had to keep your temperatures down to 40 degrees centigrade. That's pretty cold, actually.
Most data centers, they tell the DRAMs,
you have to be prepared to operate at 95 degrees centigrade.
And you look at where 95 degrees centigrade is on that chart,
well, guess what?
You're not even going to find it.
So this is really serious stuff.
And again, you guys live and breathe and eat this stuff.
But now let's try to apply that to where we really want to be,
that holy grail. Take a second and think about the DRAM protocol, because that's
where we want to be. If we want to replace the DRAM, we really need to live in the DRAM
world and offer the same level of performance as a DRAM with a non-volatile memory in order
to deliver to the industry what it really needs
to affect this next evolution of systems technology. And the DRAM interface is fully
deterministic. You issue a read or write command, and that data had better be there 15 nanoseconds
later or things start breaking. So now if you tried to introduce a device that had to go off and do wear leveling,
what's happening?
Well, what's happening is that at the time that the memory controller thinks that the data is ready,
you're not there.
So what this says is that you cannot have endurance limits
on a device that are visible to the memory controller.
Otherwise, you are not a memory class storage.
So memory class storage needs to be the full speed of a DRAM with no endurance limits
and fully deterministic 100% of the time.
So now I'm going to start throwing a bunch of terms out because they're
all kind of related. What's the relationship between an NVRAM and a memory class storage?
Well, today there's no difference. So the specification that I'm going to be introducing
you to in a couple of minutes here, the NVRAM standard is memory class storage for now. However,
we also know that once you guys get used to the idea of having persistent memory on the channel
and it's not a DRAM anymore, we're going to start changing the protocols. We're essentially going
to take over the industry and start determining protocols that are friendly to these new non-volatile memories.
So speaking of personal experience,
our technology, the carbon nanotube technology,
is built using a cross-point architecture
with 64 kilobits in a cross-point supplied a bit at a time.
When we take that DRAM protocol that has bank groups, bank addresses,
rows, columns, chip IDs, all of that stuff, all those address bits,
those are completely artificial to us.
So we anticipate that in the future, as we get this predominant in the industry,
what we're going to be doing is introducing new protocols to allow you to do some new cool
stuff. Like if you're making an artificial intelligence engine and
you don't care about banks and bank groups,
we can add to the protocol new ways to get
at that data much more quickly. And that's going to be
for the next generation of controllers and memories.
But, you know, of course we have to do baby steps.
We have to kick the DRAM out first, and so let's focus on that.
So storage class memory is not a memory class storage device.
And graphically shown, this is basically showing you that you have Flash today being dominant
for the storage.
You have these phase change memories and the 3D cross points and all that.
They're coming into the wasteland as a storage class memory.
But again, none of them have DRAM replacement because they're all slower and they all have endurance limits.
So a memory class storage device must come in at that level here. A memory class storage
device must have the full performance of a DRAM or better. And it has to have DRAM equivalent
endurance. In other words, unlimited endurance. And again, to replace DRAM, it has to have the same capacity or higher than a DRAM.
So what is that holy grail looking like?
Well, it has to have full speed, non-volatility, the unlimited ride endurance,
wide temperature range is going to be a nice plus, scalability.
The fabrication has to be capable to be built anywhere,
has to be in the power envelopes and the cost envelopes you all expect.
If we can achieve that, if I can show you that this is possible,
what you have is a drop-in replacement for a DRAM,
even at the module level, that your system will operate as before,
but with the added benefit of data persistence.
So what are these technologies?
Well, like I said, Nantero NRAM is one of the technologies
that I've brought into this new standardization effort.
But I'm also working with the other guys.
As a chairman of JEDEC, I have guys working with me
that are bringing in phase change memories and magnetic memories
and resistive memories into
this specification. Now the current generation devices can't do that, but they all have stuff
in process. And what I'm trying to do is to enable the industry with a single standard that you
controller guys can design to and get multi-source technologies for that. And then what we'll do is beat the snot out of each other in the marketplace
based on price and performance.
So it's based on the DDR5 SDRAM specification,
which I don't know if any of you guys are planning on going to the JEDEC training in November,
but this is essentially what they'll be teaching at the JEDEC event,
is this new DDR5 SD-RAM.
What I'm writing in the non-volatile committee is an addendum to the DDR5 standard called the
DDR5 non-volatile random access memory addendum to JESD 79-5.
And what this says is a DDR5 NVRAM is just like a DRAM,
but in addition to that,
no refresh is required.
Self-refresh can be a true power-off
because there are no cells to refresh.
So you can actually turn power-off
completely to the device, which you cannot
do to a DRAM today in self-refresh.
Some of the timings
are going to be different
just because you can't expect
a resistive RAM or a magnetic RAM
to have exactly the same
Rastakast timings and so forth.
But they all need to be deterministic
and they all need to be within the same scope.
Again, the focus is that you guys
can design one memory controller
that can talk to an SDRAM
and or a NVRAM.
There are also differences in data persistence.
And what I'm going to do is
I'm going to show you some of these,
but this is driven by customers.
Some customers are okay with data persistence being defined as
if power fails, just make sure the data is consistent.
Other guys in higher reliability environments say,
I need to know exactly when that data is committed to the non-volatile array.
And then some people are in between.
So we're allowing for variations in the market
to drive us all as suppliers to the market.
And then there's another problem
that I don't know if you guys were aware,
but DRAM kind of dies out at 32 gigabit.
Were you aware of that?
When you read the DDR5 spec,
you're going to see that the DDR5 specification stops at 32 gigabit.
And if you try to get a DRAM supplier to give you a commitment today for a 128 gigabit monolithic chip,
you're universally going to get a no right now.
And the NVRAMs don't have this limit because it's built on a whole new technology.
So one of the things we needed to address was what happens if you want to go beyond 32 gigabits per chip?
The other thing is that these technologies are not identical.
My NRAM is not exactly the same as a spin transfer device, or I guess they're spin technologies now.
They're not identical.
So there are going to be some subtle differences, and that's the point of a specification, is
to allow those differences in a standardized way so that you guys that design controllers
can look at that and figure out what the differences are in the timings, things like requirements
for pre-charge, things like what are the available persistence definitions.
But they are all going to be in a common spec.
So you have this one common core of features
and then a few warts on that
to describe the slight differences between the technologies.
How do you determine this?
You guys know about the SPD, serial presence detect?
It's a configuration EPROM that's on every standard memory module, and it holds the
configuration parameters for a device. So those profile 1, 2, 3, 4 features can be expressed in the SPD saying, this technology is the N-RAM,
and here is its rasticast delay, here is its right recovery time, and so forth.
So let's look at some of these differences. Right now, with a DRAM, what do you have to do?
Well, you're running along merrily, and 3.9 microseconds goes by,
and all of a sudden you say,
I had to go to sleep for 350 nanoseconds to do a refresh cycle.
When you have a persistent core, that no longer is necessary.
So that 350 nanoseconds, which incidentally is 15% of your data bandwidth,
goes away. But if you want an NVRAM
to plug into a DRAM socket invisibly, you have to accept and decode the refresh command and then
just no op it. But it has to be a part of the command protocol. And again, that's the purpose
of having an NVRAM specification is that you guys know, yeah, you can issue the refresh command
and on the next clock you can do some real work.
Or you can just drop the refresh command completely.
So what else can you do?
Well, what about self-refresh mode?
Well, self-refresh mode is an interesting one
because it actually has two functions.
One of the functions of the self refresh command is that's how you put this device into a lower power state where it just periodically goes in it internally refreshes its content so
that you can come back up and keep running. But the second function of the self refresh command
is that's how you allow people to change frequency.
Because a side effect of going into self-refresh mode
is it resets the DLL inside the device
to resynchronize to the new frequency.
We need one of those functions for the NVRAM.
We want you to be able to change operation frequency,
but we don't need that refresh operation.
So that means we're not burning any power
while we're in self-refresh mode.
You can still go into self-refresh,
we just shut everything off
and then wait for you to come and exit back out.
DRAM also has an interesting characteristic
in that the way a DRAM works is
you have two
functions, activate and pre-charge.
The activate function
takes the content of the core,
brings it out to
sense amplifiers that you can
then interact with.
However, they also need a pre-charge
operation to restore
the array.
Well, why?
It's because they have a destructive activation.
When you read the contents of a memory cell, it discharges the capacitor.
So you have to restore the content.
You don't need that with a non-volatile device.
And so what ends up happening is instead of having this complex
thing about activates and reads
and writes and precharges and then reactivates,
now you have a whole new model
which is from the idle
state, you can directly do reads
and writes.
There's no concept of activation and precharge
anymore.
It becomes the ideal load
store memory.
So architecturally speaking, I think you're starting to get the idea that there's a whole lot of cool new stuff that you can do with these
devices now that these new functions are coming online. And that's why I'm talking about the
evolution of the NVRAM standard so that in the future we can go to protocols that take advantage of some of these
new features. The persistence definitions, I talked about these a little bit. The three
definitions that we have so far are the intrinsic, which means immediately on every write content is
committed to the internal array. There's an extrinsic version where it requires a flush command
that came from the nvdim-p protocol.
And then the third one,
like I alluded to earlier,
on reset, you save it away.
So this is what those look like.
The intrinsic persistence,
which, by the way,
is what Nantero NRAM does,
says that every time you do a write,
within 46 nanoseconds,
that data is committed to the internal non-volatile array,
and power can fail,
and the most you can lose is 46 nanoseconds of data.
The extrinsic guys say,
we're going to do a bunch of buffering on chip
for various reasons,
and then what we're going to require
is that periodically you issue a flush command that will then
take the internal
buffers and commit them to the
non-volatile arrays.
For those guys, your
data persistence is measured from
so many nanoseconds after
the last flush command
has been issued.
If you're writing applications, you need
to know these differences because you need to know these differences
because you need to know when you can free up your requesting application to go off of the active queue
and go on to reserve. And then finally, this is kind of like the NVDIMM-N. This particular
persistence model says that just go ahead and let the device worry about this and only worry about power fail. That's good enough.
And on power fail, it'll go and it flushes its contents.
So you only need to keep the power to the device
for a few microseconds after the power occurs,
the power fail occurs.
So I talked about this limitation of DDR5 SD RAM to 32 gigabits.
That's not very far down the road.
And if I were a systems guy, this would worry me quite a bit.
We're changing the protocol.
We're enhancing the DDR5 protocol to enable up to 128 terabits per device.
And the way we're doing that
is we're taking the standard DDR5 protocol
which looks like this right now.
You do an activate, you do reads and writes,
you do another activate, do a bunch of reads and writes,
and that's how you access
your data.
What we're doing is adding a row extension
command to the DDR5
protocol. So it's a superset
of the DDR5 protocol. It's an optional
thing until you need to address more than 32 gigabits per chip. And what this does is the
row extension adds another 12 bits of addressing. So how does that really work? So you think about
how the DDR5 devices are organized today. You have a row that represents a bank buffer.
You have 32 banks, and then you have a column selector across that.
With the DDR5 NV RAM, it's exactly the same thing.
You still have 32 banks.
You still have a row association,
but now you have the row extension bits that
are being added to the association to those 32 bank buffers.
So here's a sample command sequence.
You see a bunch of activates, you see row extensions scattered in there, you see some
reads and writes, and you can think through some examples of this.
So if you do a row extension A,
that's going to latch some high order address bits.
So the next time when you do an activation,
so say you activate bank W with a specific row value,
now that row value gets added to the row extension bits. And when you
do a read from bank W,
that is going to address
the bank buffer
that is addressed by
the combination of
the extension bits and the activate bits.
Yeah?
Is it additive
or is it concatenated?
It's concatenated.
You don't need an add? Right. it concatenated? It's concatenated. You don't need an add?
Right.
Just concatenated.
So that's an example of row extension.
Now if you want to replace that row,
now let's say we issue another row extension command
that changes the latch for this.
By the way, Nathan,
I don't want to pick on you
just because you're old,
but you're old enough
to remember expanded memory?
That's where I got this idea.
I stole this from the old
expanded memory spec
because I knew the patents
had run out.
Okay.
Bad ideas never die.
Yeah.
And so now when you do the row extension,
and now you do another activation.
So you're replacing bank W with a new activation.
So now the combination of the new row extension bits
plus the row information from the activation command
is consistent.
So now the page buffer is represented by that new address combination.
And now when you do the read, you get the data from the new one.
So it works just like a DRAM,
but you have this little latch that adds some extended address bits.
So now a little pitch on what we work on.
So our carbon nanotube memory,
it's implemented as a cross point.
And then what we're going to do
is we're going to take these cross points
of carbon nanotube memory cells
and we're going to put a DDR4 or a DDR5 in front of it.
And what we're doing is translating the
DDR protocol to our internal structure.
So, I'd like to say
we're hitting every one of those check marks
along the way. We're
expecting to be at
512 gigabits per die in the
DDR 5 generation. We're
cheaper than a DRAM. We work at any temperature
extreme you can imagine.
And I have this funny little story
about the 12,000 years of data retention.
And I was meeting with guys
from the office of the director of national intelligence
a couple of weeks ago.
And I threw a slide up that looked a lot like this.
This guy from the U.S. government,
who, by the way, never introduced himself.
I have no idea who this guy is.
He's just in a suit, in the meeting,
and I was told not to ask
if somebody doesn't offer who they are.
This guy says,
well, our data shows that carbon nanotube connections
should last a million years.
Why are you only saying 12,000 years?
I had to admit, the guy had me stumped.
I'm not usually at a loss for words, but he got me.
So, is this an evolution or is this a revolution?
Let's think about the old paradigms.
So, in fact, those of you that have been in the hackathon this week,
you've seen a whole lot of this stuff, right?
That you have these applications that are running along
doing load stores to local memory,
and the important thing is that that task that's running
stays on the active task list.
When you have to go to the file system to do a block transfer, like a checkpoint,
you're no longer on the active list.
You have to trap to the operating system
before you can then take this block of data
and throw it down the channel to the storage device.
That context switch is what's killing our performance,
and it's also the argument as to why we're not seeing the performance numbers from the storage class memories
that you would expect from the raw throughput rate differences between PCIe
and the DRAM channel.
We're not seeing those huge
improvements in performance
because we're losing
stuff by trapping to the OS and
taking your task off of the active
task list. You have to then
satisfy your I.O. before putting
the task back on the active list.
That's what's killing us. That's what I think we're going to be satisfy your I.O. before putting the task back on the active list. That's what's killing us.
And that's what I think we're going to be able to solve.
The great thing is that I actually don't want to diss the guys that are proposing DAX
because DAX is the way we're going to migrate to these new concepts.
DAX gives you that option to still trap
to the operating system, still take that
performance hit, but at least
operate at DRAM speeds by using
a move command
instead of some
IO F read, F write.
So this is important.
DAX mode does do a lot of the right things
because now what we can do is we can take
our existing applications that are doing
all this checkpointing,
we can mount the storage class memory devices
locally, or you can put this new memory class storage in there.
And now you can do checkpointing at DRAM speeds,
but eventually you can eliminate the checkpointing completely.
Once you're convinced that our technology is solid enough
that on power fail you're not going to lose your data,
we can start to eliminate the checkpointing altogether.
And that, I'm going to argue, is the revolutionary home run.
So where does that lead?
Well, you know, we look at this storage architecture that we've been making,
but now, all of a sudden, we can imagine that with memory class storage,
we can envision that if the application fits in main memory
and you don't need to do checkpointing,
all you need is a network connection.
So this, again, is that revolutionary idea
that we can move towards completely storage-free systems.
And so I'm going to argue that I think I've made the point
that we're looking at a revolutionary change,
not just an evolutionary change in systems architecture
because it's beyond just the memory impact, but now it is affecting our entire systems architectures.
So we're looking back at that data persistence spectrum, and we started off with tape and
moved to hard drives, SSDs, and NVMEs.
Storage class memory is a step in the right direction.
But there is memory class storage coming into the picture.
And that's what I'd like to introduce to you guys today.
But I also want to point out, we're not stopping there.
We're still looking at the cache and the registers also.
And when we can speed up the non-volatile memory technologies
to replace that, I think we'll have found the holy grail.
And it's not just main memory for servers.
This technology has a home in pretty much everything you can imagine.
It has homes in NVMe storage.
You still want to have lots and lots of cheap bits,
and we're not expecting flash to go away anytime soon.
But what you can imagine is a memory class storage device
attached to an NVMe controller
that has the flash memory, but now it no longer
has to worry about power fail in the NVMe either.
You can eliminate, I don't know if you've looked
at an NVMe design lately, but about a quarter
of the PCB space is consumed by tantalum capacitors
that are giving the energy necessary
to ensure the data persistence on that link.
Another one is the AI and deep learning architectures.
These architectures are inhibited today
by the fact that they must use von Neumann architectures
so that you can do checkpointing quickly in and out
because the backpropagation of the AI architectures
means you have a lot of valuable data
that's in those AI devices,
and you have to spoon it out sometimes.
With memory class storage,
they can just leave the data there.
Backpropagation no longer is a punishment. You just leave the data there. Back propagation no longer is a punishment.
You just leave the data there as long as you want,
and now your checkpointing can be more along the order of hours or days
instead of every few milliseconds.
Finally, CXL.
I don't know about you guys,
but I think that CXL is a pretty important vector for the industry,
and it's very likely going to be the next interconnect standard for not just CPUs, but application accelerators and storage devices.
And this might be the opportunity for that NBRAM standard to start breaking free of the DRAM standard. Because you can imagine
a CXL interface on one side
going directly to
addressing our non-volatile memory
tiles on the other side.
And no longer having to go through
a translation layer of
DDR5 compatibility.
So it's pretty damn exciting.
I think the wave's coming.
And
so I want to highlight that what we're looking at here
is we're looking at how for decades we've let power fail mechanisms
drive our architectures,
and we've come up with this system of checkpointing
to deal with all of these errors.
We balance stuff in our memory tiers to put stuff in the right places in there.
But now we have
a new way of moving
persistence a lot
closer to the processor.
And this DDR5 NVRAM
standard that I'm driving in Jetix
is going to be the way that you guys
see documented how to talk to
these new classes of devices
so that we can evolve applications to exploit memory class storage.
And with that, I'd like to thank you for your time
and open the floor to some questions.
Nathan, you don't get the first question all the time.
Okay.
I have a bunch, so you may want to...
Okay, well, you get one.
You get one.
You get one.
Okay.
Pick one.
A while back, you were talking about consistency of latency and spec,
which presumably takes managed non-valuable storage and makes it so that they don't need the spec.
But then, a little bit after that, you said,
but the latency of the spec
might actually be worse than the latency
of traditional DRAM.
So the question is...
Or better. Right.
But the question is, if a
storage class can
guarantee a worse
case latency,
does that mean a spec then?
Yeah, and I've gone through this
in other versions of my talk
where I spend more time on Nantero NRAM.
So for example, what I can tell you
for Nantero NRAM,
because of that data persistence thing
that I told you about
where we guarantee data persistence
within 46 nanoseconds,
we extended RAS to CAS from 15 to 23 nanoseconds. We extended Rastakas from
15 to 23 nanoseconds.
However, we took
access time from
15 nanoseconds and dropped it
down to 12.
So we made the trade-off to make
Rastakas a little longer and
access time from read and write a little
shorter.
So the overall number comes out pretty close to the same,
but it's just we juggled it differently,
and it was because of the customer that wanted to know
when they could free up their real-time application
and take it off of the active task list.
So we made that trade-off.
Now, that will be documented in the SPD
that our TRCD is 23 instead of 15,
that our TAA is 12 instead of 15.
Does that 46 nanosecond window create something that the application has to understand that is there?
Is there a window where there's yet another data consistency error window that you still have to manage.
It doesn't go away.
They have to understand our part of it
because what I'm documenting is a chip.
So the transom.
Five minutes?
Okay.
So bringing the data into the balls of our chip
is what we measure from in terms of writing a spec.
What they have to comprehend is that they might have to go through data buffers at the edge of the card,
flight time of 12 inches of trace on the motherboard.
Then on their processor, what's their L1, L2, L3 strategy?
And where do they have their fences in their cache coherency inside their processor?
They have to understand all of the stuff that leads up to the balls of our chip.
We simply document the final mile.
But they have to do all the other math.
But once they know that math, what they can do is say,
this processor cannot come off of the active process list
until it knows that the data is persistent for the next 12,000 years or a million years.
Then I can release it off of the list.
And if that number is 150 nanoseconds total for all those other factors, so be it.
150 nanoseconds after it's been issued,
if no other system fault is detected,
it can go off the active list.
Now it's Nathan's turn.
Yesterday, in this very room,
the memory guy over there,
yeah, you.
Yeah, you.
You dirty memory guy
yeah
bummer
yeah yeah So what is it that you're talking about when you're doing NDRAMs, N-RAMs, to be able to avoid that wear of the leg?
So what are some of the tricks that you could pull
if you have an endurance limit?
Okay, I've worked with some of those companies.
Spin Transfer, for example.
They have a wear out, or now they, for example. They have a wear out.
Or now they're called Spin.
They have a wear out.
So one of the ways you can do that is you can put poison on the bus and do a retry.
Or you can put a worst case number in that always allows them to do their homework.
So they could do a number of things.
They could do, say, a write recovery time of 60 nanoseconds.
They could do some other tricks to give themselves time to do their homework.
Another trick that a different supplier is using,
they're going to require the refresh command.
But they have no cells to refresh. What are they doing require the refresh command. But they have no cells to refresh.
What are they doing with the refresh command?
They're using that 350 nanoseconds to go do their wear leveling and cleanup.
Another guy, precharge command.
He's going to require the precharge command.
Why?
There's nothing to precharge.
He's going to use the precharge command to do his internal housekeeping.
So all of these guys are going to use different tricks to try to fit into the DDR5 protocol.
And do I feel great about enabling my competitors?
No.
But I recognize that in order for me to create a mass market of 100 billion units,
I have to accept all of these architectures in the spec and
allow them all their
ways to fit into that
to allow systems to evolve.
Any other
questions?
Anything?
Jim?
You get the internal data.
I'll make one final comment.
Sure.
User to kernel transition, etc.
My experience is that
most of the overhead is not the transition.
It's just
Google crashing code in the file system.
And most of that overhead is latency of the file system. And most of that overhead
is latency of the file system
talking to memory.
Okay.
It's a transform
of I have something that the user
wants a name. Yeah.
I want to make that name into an address.
And that's not an easy thing to do.
Yeah.
I'm out of time. But the quick answer to that is that that's not an easy thing to do. Yeah, and I'm out of time.
But the quick answer to that is that that's a little different than what I'm getting from the customers I talk to
who say that if that were true, they would see a 1,000 to 1 improvement of a storage class memory module
versus a PCIe-based module, and that's not what they're seeing.
So that's the short answer,
is that that's different than the data I get.
So thank you for your time,
and let's hand it over to the next speaker.
Thanks for listening.
If you have questions about the material
presented in this podcast,
be sure and join our developers mailing list
by sending an email to developers-subscribe at snea.org.
Here you can ask questions and discuss this topic further with your peers in the storage developer community.
For additional information about the Storage Developer Conference, visit www.storagedeveloper.org.