Storage Developer Conference - #176: Persistent Memories Without Optane, Where Would We Be?
Episode Date: November 10, 2022...
Transcript
Discussion (0)
Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the
SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage
developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage
Developer Conference. The link to the slides is available in the show notes at snea.org
slash podcasts. You are listening to SDC Podcast, episode number 176.
I'm going to talk today on persistent memory.
Without Optane, where would we be?
And first of all, I want to apologize for Jim.
Jim Handy, my colleague, we've done a report together.
You can see some flyers on some of the tables here that he wasn't able to come today.
Emerging memories have gotten a couple of big boosts over the past few years.
So one in the form of Intel's Optane products,
and the other from the migration of CMOS logic to nodes that Norflash and now SRAM look like
they can't support. Although they appear to be two very different spheres, a lot of work that
has been undertaken to support Intel's Optane products, also known as 3D Crosspoint, will lead
to improved use of persistent memories
on processors of all kinds.
In this presentation, we're going to review emerging memory technologies and their roles
in replacing other on-chip memories, the development through SNIA and other organizations that
are fostered by Optane memory, but usable in other aspects of computing. The emergence of
new near and far memory paradigms, in fact yesterday I saw a medium paradigm
was also mentioned, so near, medium and far memory, that have spawned
interface protocols like CXL and OMI. And the emergence of chiplets and their
potential role in the evolution of persistent
processor caches. So we're going to go into some of these things here, and here's an outline of
what I'm going to address. We'll talk, first of all, the rise and fall of Intel's Optane,
the legacy we have because of the development work that was done with Optane.
We'll look at some ideas, concepts about future chips and persistence.
We'll talk a bit about some of these memory technologies.
Won't do the nanotubes today, Bill. I'm sorry about that.
How future processors will benefit.
Then finally a summary and then open for some Q&A if we've got time.
So first let's talk about the rise and fall of Intel's Optane.
So it died, but why?
We're going to have a historical recap, actually from prehistory, 1970 to 2022.
We'll explain some of the economies of scale and the role that had Intel's losses in making a decision. And this is kind of a cautionary tale to anybody who wants to make a
standalone memory technology that competes against the big boys like DRAM and NAND flash.
And Optane, of course, is Intel's name for 3D cross-point memory. I think I mentioned that
before. And that 3D cross-point memory is just another kind of phase change memory.
So Intel was pushing actually phase change memory for most of the company's life.
This magazine cover shows a 1970 article that Gordon Moore co-authored in the first phase change memory that Intel was introducing.
The company finally introduced a commercial phase change memory in 2007 when it appeared that NorFlash would not scale past 65 nanometers.
Samsung introduced a competing product at that time, but both products were later discontinued,
so they didn't really catch on. Finally, in 2015, Intel and Micron together introduced 3D cross-point memory, which is the basis of Intel's Optane product line. It was touted as a new layer
in the memory and storage hierarchy to fill the gap
between NAND SSDs and DRAM. So let's take a look at this chart here. So this chart, it's a way that
actually Jim came up with this as a log-log plot to look at the memory and storage hierarchy.
Different than that typical pyramidal diagram that people use to explain the hierarchy.
It just shows sort of the pyramid, the volume of the pyramid, which part of the pyramid shows how much capacity they tend to store,
and how far down the pyramid they are means they're getting cheaper.
Up the pyramid, you have less content generally, but it's more expensive.
Down below, it's cheaper.
This looks at it in a different way. It plots the speed of various memory layers from tape,
all the way from tape memory and storage,
to the CPU's L1 cache.
And it looks at that against, again, a log-log plot
against price per gigabyte.
So it's price per gigabyte versus the performance,
the data rate performance.
So any other format, if you didn't do log-log, you'd simply have a huge L1 orb,
and everything else would be crowded down into the lower left corner, so you couldn't see anything.
So log-log blows it out, so you can get a look at that.
And this is how computers are designed today.
Each layer, and they use multiple layers, each layer may be faster than the next cheaper layer,
and cheaper than the next faster layer. Optane was introduced to fill the growing gap between
NAND SSDs and DRAM. So when 3D Crosspoint was introduced, it appeared to fit this hierarchy
well. It checked off a number of boxes which seemed like they were important. It's a thousand
times faster than NAND flash, so it would be faster than NAND flash and slower than DRAM.
That seems to fill a box, so check, right? It's one-tenth the die size of DRAM, so you would
intuitively suppose that it could be built for one-tenth or so of DRAM's cost. That one seems to
be also checked there, right? It's also a thousand times NAND's endurance. That's important since memory gets written to many more
times than storage does. Give it a check in the reliability box for that as well. So from all this,
it seems clear that it would fit well into this memory storage hierarchy and fill this void,
you know, potential void between DRAM and NAND flash. So despite the fact, though,
there's important things to think about here, though,
and this is sort of the lessons of this.
Despite the fact that dye was one-tenth that of DRAM,
Intel never could get the cost as low as DRAMs,
not even close to getting it as low as DRAMs,
the actual cost of production.
That's because the economies of scale
are a key part of a chip's cost.
Look at it this way. if you make only one chip
What would it cost?
Probably tens of millions of dollars to get all the equipment to get it running make it make make a chip, right?
If you made only one wafer of ships a chip might cost about a hundred thousand dollars sort of amortizing over the over the
Scale of things if you made ten thousand wafers per month, as Intel and Micron were doing,
each chip might cost about $80. It's only by making 300,000 to 400,000 wafers per month
that the cost would become competitive with DRAM at $40 or so. And that was the problem.
They never reached that scale. High volumes drive down production costs. So how do you get a product to sell in high volume?
You sell it for a low price so that people start using it and they find it attractive.
But the production costs won't actually be low to the point where you can sell it for more than you make it until you get to high volumes.
It's a chicken and the egg problem.
So Intel knew this, and they actually made a conscious decision to lose money,
to drive down the cost, and hopefully to be able to, with a lower-cost product,
be able to build the market, find applications,
and be able to scale up their demand and manufacturing volume.
So let's look at NAND Flash as an example, though,
of what happened for a successful memory, standalone memory,
that was able to get below the cost of another memory, in this case DRAM. So let's take a look at this.
What this slide shows is that single-level cell NAND, back when the NAND was early days of NAND,
that's what you could get, but its most expensive kind, has always had a die size that's about half as large as a DRAM
when both are made on the same, about half the size when made in the same process geometry,
and both have the same number of bits on them. So the box, these boxes are a little complicated
because I have to compare the die size of one 8 gigabit NAND against two 2 gigabit DRAMs, you know, etc. So bear with me here. So at 54 nanometers,
a 4 gigabit NAND is about the size of a 2 gigabit DRAM. At 50 nanometers, an 8 gigabit NAND
is about the size of two 2 gigabit DRAM. At 48 nanometers, an 8 gigabit NAND is about the size of 4 1 gigabit DRAM.
At 44 nanometers, an 8 gigabit NAND is about the size of 2 2 gigabit DRAM. The big point here is
that die sizes aren't everything. Even though there's an advantage there, in the early days,
NAND flash was a lot more expensive than DRAM.
And this is actually a historical chart looking at the NAND and DRAM price per
gigabyte. The NAND line is dotted because the NAND line in the early days is
dotted because these are WSTS figures. It's sort of the gold standard people's
looking at what's happening in the industry.
And they don't report NAND costs prior to 2004. So this is looking at back from 2000 to 2014 in
this particular chart. That said, the number that Jim had at DataQuest and Semicode when he worked
at those places supported this dotted line. So it seemed like there was a trend in the early days,
the price of NAND flashes going down roughly at a scale shown there.
It may have some oscillations on it.
So this is a semi-log plot.
And it prevents us from looking like a hockey stick that drops to near zero at the left
and then hugs the x-axis for the rest of the chart.
So again, if we do these log scales, we can see some of these exponential-type functions
that, in much better observation,
get a lot more information on it.
Also in a semi-log plot,
constant growth shows up as a straight line.
The general trend of these lines
is that due to Moore's Law price reductions,
NAND flash has always moved faster than Moore's Law.
It actually has been scaling very fast.
That makes, of course, gives it a lot of acceleration to increase the capacity,
to lower the price, for example.
So the circle in 2004, let's see, is where NAND flash prices cross over DRAM.
Suddenly, NAND flash found a place in the memory storage hierarchy
because it could start to be cheaper than DRAM.
It now made sense to have SSDs in computers around 2004.
You started to have that economy that made that possible.
Before that, NAND was used for USB flash drives, MP3 players, cameras,
SD cards, things of that sort.
It's lightweight, low-capacity storage, mostly.
So this slide shows how many terabytes of NAND shipped during the same period of time, compared to DRAM.
It's also a semi-log plot, and again, for the same reason, so we can see the scale.
So there's no hockey stick, and steady growth, again, appears as a straight line.
This circle that we saw there is drawn at the same price point as it was in the prior chart at 2004, when the price crossover happened.
At that time, NAND's terabyte shipments were one-third as large as DRAM.
This tells us that the economies of scale for flash were finally allowing this chip,
which is half as big as a DRAM,
just a second here,
to finally match DRAM's cost.
Yes, Steve.
So here, this is the quarterly terabyte chip.
So the black is the NAND flash,
and the red is the DRAM.
So you see there,
so 2004, you had the price crossover.
By 2006, roughly, we were starting to ship more capacity with DRAM,
sorry, with NAND flash than we were with DRAM.
So again, this was a successful memory that was able to move into more applications because it was able to reach those
economies of scale. So this translates into about one-tenths of DRAM's total wafer production.
That was true for Optane 2. If it was to get to cost parity with DRAM, its wafer production volume
would have to be one-tenth that of DRAM, or about 30 to 40 times the number
it actually reached. There's another example of this. In the struggle 3D NAND had in getting to
be viable versus the planar NAND, which was around before 3D NAND started to appear after about 2013,
3D NAND took about three years to become cost-impeditive planar NAND. Before that, it didn't have the volume to be able to get this cost down,
so it was less than the earlier planar process.
So by our estimates, based in part on Intel's reported financials,
Optane's petabyte shipments approached 1 30th that of DRAM's petabyte shipments.
Again, the technology was trying to supplement and get its cost below,
you know, Optane wanted to be below the cost of DRAM. So our 3D cross-point forecast has always
been unabashedly optimistic. The gap is probably wider than that, in fact. Even so, we don't see
Optane's volume allowing it to reach cost parity with DRAM until about 2028. Our guess is that
Intel didn't want to subsidize the technology for that long. They sort of reached the limit of what
they were willing to do. Plus, they had a bad quarter. So here's a timeline of key 3D crosspoint
events. So Intel's actions appear in the upper half and Micron's in the lower half. So we see
around 2015, there was a big
announcement of 3D Crosspoint. There was an early announcement just before the Flash Memory Summit
that year, and then there was a big to-do at the Flash Memory Summit Intel's developers forum about
3D Crosspoint. In 2016, Intel's partner Micron announced their Quantex products. The first Optane drives were shipped in 2017 by Intel.
And by the end of 2018, the first Optane DIMMs were shipping.
A little later on, the first process support of Optane DIMMs was announced in early 2019.
Towards the end of 2019, the first Quantex SSD from Micron was demonstrated.
Then in early 2020, Micron ends the Intel relationship.
They never actually shipped in any kind of volume any SSDs.
They demonstrated stuff, they made announcements, but they never actually shipped.
In 2020, Intel discontinued their consumer Optane SSDs,
which are popular in gaming and other applications
that liked a lot of speed and a lot of memory.
In early 2021, Micron killed their 3D cross-point effort,
and then later on in 2021, Micron sold the Lehigh FAB,
which is used to make most of the 3D crosspoint that Intel was using in their Optane memory.
Yes?
Do you know what happens to the production equipment?
I do not know what happens to the production equipment.
You know, and some of this production equipment is going to be useful for other things
so imagine it's been sold and repurposed for other applications
now Intel did have a facility in New Mexico
where they were also making some quantity of Optane memory
so they had some capacity to make some memory themselves
but not at the scale of the Lehigh facility
in 2022 then in July actually I think it was July 28th,
if I remember that date, just after we finished our report,
yes, Intel announced that Optane was winding down.
So that's sort of the timeline of what happened there,
just to give you an idea of the scale of things.
Again, sort of a cautionary tale to give us some idea of where things are going.
Now, from Intel's reported financials, you can estimate their 3D cross-point losses.
When the company sold off the NAND business, it disclosed Optane's losses for the four quarters
of 2020. These losses closely match those that we've estimated for that year. And the sum of all the losses in this chart, which is based upon looking at the losses in Intel's storage unit,
which included Optane, even while other companies are making money selling SSDs,
indicated we took that as being an indication mostly of the losses with Optane.
So the sum of all the losses in this chart is about $6.8 billion.
Since there are losses we didn't capture, we're pretty sure, we're confident that the company's
total Optane losses were greater than $7 billion, you know, from 2014 here all the way through
2022. Now, certain factors should have increased Optane's volume, but it failed to materialize.
And these were a large part of what was to blame for them never reaching that manufacturing volume.
Optane SSDs were supposed to drive broad usage to ramp the volumes hard. End users, though,
didn't accept the added cost for Optane's performance boost, so that market never developed.
The Optane DIMM, which
should have been the big volume driver, needed special processor support, and that processor
support wasn't offered on Intel's server processors until two generations later than they originally
planned on doing it. And that seriously slowed Optane's name, Optane's ramp ramp and prolong the losses and made it easier for Intel to decide to cap it.
So what are the lessons we can learn from this? Expect losses until volume ramps, if you've got
a new technology like this, and be able to support it. A small die size doesn't matter
if the manufacturing volume isn't big enough. Losses might be larger than expected, in fact, too.
You have to anticipate that in order to reach a low enough price point.
And supporting elements may delay adoption.
Things may not happen the way you'd hope they would,
or things get delayed, that kind of thing.
Optane's processor support was delayed.
Application software wasn't widely available when this started.
But that also leads into what did we gain from Optane? What is Optane's legacy
to us? And actually, it's considerable. Let's talk about what Optane has led to.
Optane led to a legacy of advancements. I'm going to address each of these in turn.
It led to new programming paradigms, new CPU instructions, new approaches to two-speed memory, new near-memory
bus concepts, new approaches to large memory, and new thinking about security concerns with
persistent memory, where when the power goes off, the data is still there. So the new programming
paradigm is related to SNEA's own NVM programming model, which actually started
before the announcement of 3D Crosspoint. But 3D Crosspoint, in the emergence of a commercial
persistent memory, did an awful lot, I think, to stimulate that effort. This model characterized
various types of access to persistent memory, that is a non-volatile memory, including a persistent memory aware kernel, a persistent memory aware filing system, and ultimately direct access to persistent
memory from file system or directly from an application. One of Optane's key features is
that it brings persistence closer to the processor. This makes it easier to recover from power
failures since the memory retains its
values. Before this, it wasn't possible to write the entire memory to disk or an SSD as power was
failing. Now all that has to be written to is the dirty lines in cache. These instructions take care
of that. This means that everything except the registers can be persisted in the event of a power failure. Yes, Steve. That was actually never implemented.
You have to worry about the process of cash.
Until it gets to the device, control, then it's safe.
But before then, it's not.
You have inconsistent...
You've been playing with this.
I've had experience to people who say that that first stage is not...
Suppose EADR, which is only implemented one time,
and EADR is one...
By the way, nobody online can hear you right now.
I'm going to try to paraphrase.
So I think the point that Steve was making,
one of our designated hecklers, is that implementation of saving all the memory, all the data into the persistent memory may have some issues and was oftentimes not consistently done.
So anyway, so that was his comment, just for those who are in the audience. Thank you, Steve. Let's see. Okay. So ADR, so asynchronous DRAM refresh. So it's a strange
term that Intel coined for DRAM that self-powers and self-refreshes when power is lost. What ADR
really means is persistent DRAM, either on an NVDIMM or an OptaneDIMM, more
formally known as an Optane DC Persistent Memory Module, or PMM.
So a combination of fast memory, DRAM, and slow memory, like OptaneDIMMs, is often referred
to as NUMA, non-uniform memory architecture.
Now this chart says that any time that you interrupt the CPU,
which has always been done for storage for SATA, NVMe, or older I.O. channels of course,
does a context switch where the CPU stops everything for 100 microseconds or so, while it pushes all of its status, registers, etc. onto the stack. If you're accessing 100
nanosecond Optane memory,
you don't want to perform an interrupt every time that you would
for an SSD access because you'd be slowing everything down by a thousand
times. For Optane, it makes more sense to pull, to pull, that is P-O-L-L, that is have
the processor run a software idle loop while it's waiting. This is bringing a
lot of focus into the way that interrupts have been handled in the past
and should result in changes to the interrupt handling in the future.
So again, it's one of these things that the existence of fast, persistent memory
led us to consider and think about, no matter how successful it was in total implementation.
So another approach is to make the DRAM bus transactional. So if you send
a number of requests without waiting for the results, and as the results come in, the DRAM
tells the CPU what's heading its way. Intel did this by adapting the DDR4 bus to Optane. A few
signals were added to unused DDR4 pins, and those are represented by that far left arrow on this diagram
here. Nothing else in the DDR4 bus changed here, so the socket can be populated with either DDR4
or Optane memory. Unfortunately, this has to follow changes from DDR4 to now becoming more
common DDR5 and later, so it's a good thing that CXL has been licensed,
has licensed OpenCAPI's OMI, Open Memory Interface,
memory channel, which can be used for DDR4, DDR5, Optane,
and pretty much anything else,
which is now part of the CXL standard group.
So Optane has also made people think of ways to expand memory
that are not limited by capacitive loading in pin count, as was the case prior to CXL. This takes some extra interfacing, such as
a switched fabric, that slows it down, so it's become far memory, a new term. Near memory
is the stuff that doesn't suffer from this delay and is directly attached to the processor.
And yesterday there was at least one session where they were talking about medium memory, which would be the CXL2 and CXL3 would be the farm memory,
where you actually have a switch fabric. CXL is backed by a big consortium, and this will result
in widespread adoption of enterprise and data center applications, likely starting around
next year, 2023. CXL is an alternative protocol that runs on standard PCIe physical layer.
It enables pooling of non-heterogeneous memory where parts of different memory devices can be accessed by different hosts.
This allows flexible allocation of memory resources in composable infrastructure.
CXL supports accelerators near the memory or even in the memory itself.
And we've heard a lot of talk at computational storage, for example.
So there's some degree of that computational storage that's related to this development of CXL that probably owes something to Optane as well.
CXL can support more complex memory sharing tasks. gigatransitions per second PCIe 6 physical layers support switch network fabrics, enabling much
greater scaling of heterogeneous memory pool and more sophisticated memory allocation and
composable memory as well. So, persistence, though, leads to potential security issues.
And the top bullets show how it has been handled in the past. You can see physical destruction,
secure erase, you know, crypto race, AES encryption.
The bottom bullets show how Optane is approaching this. The Optane DIMM is the first ever encrypted
DIMM. Volatile memories don't need AES encryption since they lose their memory contents as soon as
power is removed. Although there have been stories about people that will chill the DRAM, you know,
in order to try to recover whatever was in
it, right? So there are special ways that AES encryption is handled. When Optane is used in
memory mode, it just looks like a big honking DRAM. So users don't expect persistence. Since
that's the case, Intel's drivers simply lose the AES key for Optane every time the power is lost.
Just like DRAM, then Optane comes up with random contents.
If Optane is being used in an app direct mode, in which the applications take advantage of its
persistence, the data must become available again once power is restored, so the data has to persist
and be accessible. Optane does this by storing the key on the module itself and requires a passcode
for the CPU to read the
key. This way, the module's contents cannot be read unless the reader has the passcode, so at
least that's some level of security. If that's not enough, Optane will also do things that are
featured on military SSDs. It will erase and overwrite all addresses upon command, so it's
another option that you could do. So that legacy leaves us better prepared for future
processors. NOR and SRAM scaling, it looks like they're stop or slowing down, and emerging memory
will take over as the embedded memory on system on chips, certainly for NOR, likely for a lot of
SRAM. That emerging memory will be persistent. Optane's legacy helps to support emerging memory caches and registers.
Persistence closer to the CPU or even within the CPU, ultimately, perhaps even the registers could
become persistent over time with new memory technologies. In mixed memory speeds, for instance,
SRAM versus emerging memory is also a characteristic that will be true of future memory storage
hierarchies. With mixed memory speeds, fast, slow, MRAM,
various other things. And also, it leads to thinking about security with persistent memory,
so hopefully leading to well-conceived security protocols. So everything in this chart is
normalized to SRAMs. So everything in this chart is normalized to SRAM's cost at 500 nanometers. That's a half a micron.
So this is showing some of the scaling issues with NOR flash in SRAM.
This chart's log-log again, again, because processors move exponentially.
Every process node is about 70% the size of the prior one,
and the costs move proportionally to the size of the device with that node.
NOR flash memory has a problem after the 28 nanometer node.
NorFlash is a non-volatile memory that the industry uses to store code in microprocessors and microcontrollers,
ASICs and other systems on chips, and it stops scaling at about 28 nanometers.
That causes the cost declines to cease as the red line shows by abruptly going horizontal.
According to papers presented at an IEEE conference over the past few years,
the cell area and thus the cost of SRAM cells stopped shrinking at about 14 nanometers.
Thus, SRAM cost reductions with the shrink
stop at that point.
SRAM also has several transistors per cell,
while the emerging memory candidates
have a single transistor.
Now, maybe a different size transistor,
but it's, again, one transistor versus several.
For applications such as AI inference that needs lots of memory, the emerging non-volatile memories can provide that same capacity in a smaller die and thus cost less than SRAM.
We assume that the new tech wafer for this slide costs about five times as much as the NOR or SRAM wafer, the black line, at present time, for example. As long as a new tech can move two processors past NOR or SRAM,
then it eventually becomes cheaper, despite the higher wafer cost.
This is why so many people have invested so much money
to fund the research of a new memory technology.
And again, this is different than a standalone memory.
This is an embedded memory you're talking about here.
And so you're building the chips anyway.
It's just what memory are you putting in them.
So that whole scaling thing is very different.
This is why so many people have invested so much money to fund the research on new memory
technologies.
The bottom line, new memories are inevitable.
They must gain acceptance for chips to continue to scale in price.
Now to show what this means in chip size and cost, I'm going to present an illustrated
graphic.
So, first of all, this is a photograph of an Intel processor chip made on a 45 nanometer
process.
You can see two very different parts of the layout.
The less regular part of the chip at the top is the logic, and the lower half, with its
very regular patterns, is the SRAM used for on-chip caches.
In the past, the entire chip would scale with process shrinks,
as is illustrated here. As processes move from node to node, from 45 nanometers to 32 nanometers,
22 nanometers, and finally 14 nanometers, the die area and cost would be half that of the prior
generation. It follows a nice scaling curve. Now, this assumes that the SRAM keeps pace with the
process technology, though.
And the prior slide showed us that this doesn't happen after the 14 nanometer node.
So, let's take another look.
Sorry.
Hold on a second.
There we go.
Oh.
Uh-oh.
There's the scan, I think.
Okay. So, let's use the same chip again,
keeping in mind that the top half is logic,
which will scale in proportion to the process,
and the bottom half is SRAM, which will scale less aggressively.
You can see in this series that the top half of the chip gets pretty small towards the end
and only accounts for about 20% of the total die area
at the smallest process nodes.
Yet the SRAM doesn't scale, resulting in a pretty large chip at the end.
In fact, the die size trend starts to level off at the end.
The black lines only indicate height.
Now, the actual area on the chip is something like the square of this number.
So you can see this actually can become a fairly considerable difference.
SRAM is causing the chip to be larger and thus more expensive, and a mature emerging memory
technology could replace it at that point and result in a smaller chip with lower cost
and providing the same amount of memory. Now, an emerging memory will take over as the embedded
memory and SoCs. That emerging memory will be persistent. Options are MRAM, resistive RAM, and actually
several of the foundries making these embedded chips are now offering those as options. It could
also be ferroelectric memory. It could be phase change memory. It could be carbon nanotubes. I
didn't mention carbon nanotubes. It could be various other things. The Optane Legacy line at the bottom shows that all those things we just mentioned,
like the SNEA programming model, will be useful to support all these new memory technologies.
Now let's look quickly at some of these memories. So there's a lot of these memory types,
Vind, Replace, NorFlash, and SRAM. They all share a number of attributes. They all have a single
element bit cell that promises a scale smaller than current technologies in order to support
small and inexpensive dyeing and potentially 3D stacking. They also promise to be easier to use
in flash memory by supporting write-in-place with no need for a block erase, and they have more
symmetrical read and write speeds. Finally, they're all non-volatile, persistent.
Data doesn't disappear when the power is lost.
They can all be used as persistent memory.
New memories are necessary for Moore's law scaling to continue.
These technologies include ferroelectric RAM,
magnetic RAM, resistive RAM, and phase change memory,
such as Intel's Optane memory.
We may not have seen the last of phase change memory.
We'll see.
So here's, I'm not going to spend a lot of time on this chart, but it compares important characteristics of conventional memories such as DRAM, SRAM, NORFLASH, and NANDFLASH with new
non-volatile memory technologies, in particular ferroelectric RAM, resistive RAM, magnetic random
access memory, and phase change memory. The higher endurance and performance in particular of MRAM may make it possible to replace
SRAM and DRAM as MRAM production volumes increase and as new technologies come into play such as spin orbit torque or
voltage controlled magnetic anisotropy which promise lower write
write power in faster speeds.
So MRAM is already shipping fairly low volumes. Everspin has a partnership
with Global Foundries, who are building 3.0 millimeter wafers for Everspin. They offer
MRAM to other customers for embedded memory applications in system on chips that Global
Foundries does. For example, Everspin's MRAM is a standalone device that's used as cache memory
in IBM's Flash core modules. Another company, Rassus, is shipping an MRAM chip that inherited through its acquisition of IDT.
Avalanche and Honeywell are shipping some MRAM for military aerospace applications.
And other foundries now offer MRAM options on their system-on-ship clients. That includes
TSMC and also Samsung. Resistive RAM is also in production, but in a quiet way. It's been shipping by Adesto,
now part of Dialog Semiconductor, since 2013. And actually, Dialog is now part of Ernosys,
so the fish are eating each other. It was licensed its CB RAM technology global founders to be
offered as an embedded non-volatile memory option on its 22FDX platform and future platforms.
ARM announced the spin-out of Surfy Labs, developing a license, new types of non-volatile memory option on its 22FDX platform and future platforms. ARM announced the spin-out of Surfy Labs' development license,
new types of non-volatile memory based on correlated electron materials, or CERAM,
in a joint development project with Symmetrix Corporation.
WeBit Memory recently announced early production of their resistive RAM devices
shortly before the Flash Memory Summit.
Other companies currently ship resistive RAM in both commercial and military aerospace applications,
and leading founders are supporting resistive RAM
as another alternative to embedded NOR flash.
Finally, we come to the oldest emerging memory technology,
ferroelectric memory, or FRAM.
Surprisingly enough, this technology predates
the development of the integrated circuit.
The photo on this slide, published by Bell Labs in 1955, shows a single SBT crystal with
vertical and horizontal metal traces that could be used as a non-volatile
memory. From the perspective of unit shipments, FRAM has also shipped more
than all of their emerging memory technologies combined, having found its
way in over 4 billion chips. It's most commonly used in RFID cards because of its extraordinary low-right energy requirements.
It basically harvests energy from the radio wave that's used to interrogate it.
Until recently, FRAMs were based on unfriendly materials, lead and bismuth in particular,
that semiconductor fabs are not really keen on putting into their production processes.
But in 2011, NAMLAB in Dresden, Germany, found that a crystalline form of hafnium oxide
has strong ferroelectric properties.
Now, this created a new life for ferroelectric materials.
Hafnium oxide is a common material used for high-K dielectrics
in modern CMOS semiconductor processing.
FAB managers understand how to make it in high volumes.
Besides its use in ferroelectric memories,
this form of hafnium oxide is being investigated
for various other applications.
For instance, in DRAM,
they've used it to produce a 3D DRAM
similar in process to a standard 3D NAND flash
or to increase the retention time,
use the ferroelectric properties
to increase retention time of the DRAM
so that you don't have to refresh it as often.
So since SRAM will be replaced by a new memory technology,
then caches will use a new memory as well, ultimately.
Eventually, even a CPU's registers could migrate to this new technology,
bringing persistence into the CPU.
That will require elements of the SNEA NVMe programming model
that we talked about earlier.
Also, fast and slow memories
will play side-by-side in a memory hierarchy
to take advantage of the approach
that Optane required to mix DRAM and 3D Crosspoint.
This table shows how this will play out,
although the timelines aren't necessarily precise.
So where do you go to learn about this stuff?
Well, fortunately, Jim and I have a report
that we just finished. And much of this information on the presentation is drawn from that report.
It's available for people to look at and purchase. It describes the entire emerging memory ecosystem,
the technologies, the companies, the markets, sport requirements, looking at both embedded
and discrete devices. It's 241 pages with 36 tables and 259 figures.
And you can visit these URLs at the bottom to learn more.
And it's now available.
But before we finish here, how will future processes, how will they benefit from these
developments?
In particular, let's look at chiplets, which is becoming an interesting topic right now.
So chiplets are a way to get past a barrier to continuing Moore's law of scaling.
In Gordon Moore's original paper, he said that three things made the number of transistors per chip increase.
First of all, shrinking the process geometries.
Second, increasing the die sizes.
Third, cleverness.
So die size reached a limit thanks to the way that optical lithography works.
There was a maximum reticle size that limits just how large a processor chip can get.
All leading-edge processors are at the maximum size that can fit into the scanner's reticle.
You can't make a bigger die.
To get past this limitation, the industry has decided to put multiple chips into a single package.
So this opens up new opportunities as these chiplets can be made using different processes.
So how does this work?
So here's that same processor photo we showed earlier in the presentation.
So once again, there's a logic side of the chip and a memory side of the chip.
Whoops.
As long as the memory side uses SRAM, it won't scale with finer processes.
It sure would be great to use another technology,
but SRAM is about the only memory you can make in a high-speed CMOS logic process.
So instead, let's maybe make the processor in a high-speed logic process.
You know, a chip for that. And then use another process to build some more economical SRAM.
And another couple of processors to build some more economical SRAM, and another
couple of processors to build a DRAM and some MRAM to give you a very high cache capacity
and some persistent cache. You can't build either DRAM or MRAM on a logic process,
and you can build SRAM cheaper if you don't use a high-speed logic process. So let me remind you again of the economies of scale and how this plays into here.
So the deal is that SRAM made on a processor chip is big and wretchedly expensive.
So a chiplet can be pretty expensive and still compete against SRAM built on a processor
chip.
Let me give you an example of this.
So the current cost for the SRAM portion of a CPU is about one half the cost of a CPU chip. You saw
how big that was. If the chip costs $200 to produce and half the chip is SRAM, then that
SRAM costs $100. Server cache is 64K L1, for example, one megabyte L2 and one and a half megabyte L3, if that totals up to 2.6
megabytes. SRAM costs would be $100 divided by 2.6 megabytes, or about $38 per megabyte,
you know, built into the processor. A four megabit SRAM chip, actually buying a discrete chip,
at a half a megabyte, retails for $6. So that comes to about
$12 per megabyte. So you can see some of these economies here. DRAM is currently selling for
$3 a gigabyte or, you know, 0.3 cents per megabyte. So three hundredths of a cent. A memory
chiplet will reduce the cache's cost, give you the cache fairly close to the processor, but at a
lower cost. But it can be orders of
magnitude more costly than DRAM, of course, and NAND eventually even more expensive than a
non-volatile memory like MRAM. So the costs go down when you use chiplets, even if the chiplets
significantly more costly than in volume NAND or DRAM. And that's part of what's driving that,
plus the issues of trying to build everything on the most modern process, the cost,
is what's driving chiplet and also driving new ways to build Shiplets like the UCIE interface.
And so you're going to have a persistent cache. It's going to happen. And when you do, the ecosystem
will already be in place because of Optane. Hooray! And Shiplets will accelerate this transition. So in summary, Optane could not harness the
economies of scale. It was a grand effort, a cool technology. A lot of people loved it, but never
achieved enough scale, enough applications that would drive it. And there's some things were
delayed and they slowed stuff down. It didn't make it within the footprint of how much Intel
was willing to spend to make it happen. Optane effort, though, generated a great legacy. CXL, part of that legacy,
opens new vistas in data center architectures. Emerging memories are here, and they are
persistent. Future processors will have persistent cache and later registers. Persistent will become
ubiquitous. Optane's legacy is going to benefit tomorrow's processors,
and chiplets are going to accelerate that transition. And probably you're going to see
more chiplet talks, I would guess, at the Storage Developers Conference next year,
maybe even at the Flash Memory Summit. So with that, I guess we've got a little bit
more time for questions if anyone has any. Yes, sir.
Well, let's see. So first of all, it uses like six transistors. Oh, sorry. I just said not in all of it. Well, let's see.
So first of all, it uses like six transistors.
Oh, sorry.
So the question was, what keeps SRAM from scaling the way that Logic does?
Well, first of all, SRAM uses about five or six transistors.
That's how it stays.
It retains the data until the power goes off.
And so that makes it a big thing anyway.
So it makes it ripe for a change there.
But it looks, there's the, you know, I can't say right offhand what the issues are with the
why it doesn't scale below the 14 nanometers, but that's apparently what people are finding.
Yeah, they are.
That was ISCC conference papers, you know, for a few years.
I did notice something recently, though, that in a, actually an IEEE roadmap effort in the, what is it,
more and more, though, there was some talk of SRAM scaling.
The Jim and I are taking a look at what they're projecting there,
which may change some of those numbers.
We just have to investigate and find out.
But until that point, it did look like there were limits in what SRAM could scale.
Yes, Andy.
What is NVDIMM?
NVDIMM is, you know, NVDIMM where you actually have a battery.
Right, yeah.
Those are, right now, it's a popular approach to being able to solve some of these persistence issues.
Ultimately, though, you may be able to do something without a battery.
And that's, if the volume gets high enough in some of these other technologies, which it isn't right now,
then it could, or you don't have room to put a board in, right, then that's what's going to be driving
some of these other technologies. But NVDIMM is certainly there. And the super caps, yeah.
Yeah, yeah. You know, I don't do that directly, but, you know, it certainly is a,
it certainly is an option to create a, as long as the battery still works, non-volatile memory technology that can back up your main memory.
Not as good as on PC, but it's still a viable solution.
Well, and the other thing on CXL is there's an awful lot of CXL-based SSDs, sometimes with a lot of DRAM that people are talking about now.
And also CXL-landed
with the battery.
Yes.
Yeah.
Yeah.
Yes, Steve.
Oh, sorry, I didn't repeat your question.
The question was with regard to NVDIMMs,
and NVDIMMs are...
That was a question about that I just answered.
Go ahead, Steve.
It's never going to be cheaper than DRAM.
DRAM plus a battery plus a flash.
You can never get anything, that was one of the things about us,
it was supposed to be cheaper, and they couldn't actually make it cheaper.
Bench is going to be cheaper.
But having DRAM plus a battery plus a flash and a fourth battery without a flash,
it's never going to be cheaper to deal with it without one.
So you'll never get the capacity increase
per dollar that you were supposed to get with a plug.
But there are some places where the...
If you're talking about the previous question,
there are some things where the NVDIMM
doesn't serve a function, you know,
despite the fact, you know, where...
Oh, yeah, no, I believe you just won't get the capacity.
Yeah, yeah, yeah, yeah.
Yeah.
Okay, thanks. Yeah.
Okay, thanks.
Maybe
one more, I think. Yeah, go ahead.
Sorry. Oh, I'll take yours too.
Go ahead.
You said that the CFL brought some
coherence to the fabric market.
Do you see some
conversions like UCIE for the
chiplet market as well?
Hopefully. I think that's what the UCI-E guys are hoping will happen.
Right now, it's been kind of a wild west.
AMD and Intel doing their own things.
So getting some kind of agreement,
especially so you can bring 30-party chips into the architecture
and have them work together to create interoperability,
I think is extremely important to build that ecosystem.
And the question was about UCIE and its
place
in the future, if you will.
I think I phrased that okay.
SW?
Not a question. One of the things that
I think that you maybe
didn't have a good idea of
is that it really has been
all over the operability.
Oh, yeah.
When the server was out of the ship, Oh, yeah. of what to do about removing that huge latency in the wild side of the stack.
It was never a problem before.
So SW just pointed out something he thought I didn't emphasize enough,
which was the importance of software and operating systems in particular
on being able to work with persistence
and also create awareness of some of the latencies
that were inherent in the way in which we do things today and which could be improved in future technology.
The improvements are done, yeah.
Sorry, the improvements that have been made in the OS stacks to deal with some of the latency issues that emerged in the process of updating the software.
I think that's probably all we have time for.
Thank you very much. I appreciate it.
Thanks for listening.
If you have questions about the material presented in this podcast,
be sure and join our developers mailing list
by sending an email to developers-subscribe at sneha.org.
Here you can ask questions and discuss this topic further
with your peers in the storage developer community.
For additional information about the Storage Developer Conference,
visit www.storagedeveloper.org.