Storage Developer Conference - #176: Persistent Memories Without Optane, Where Would We Be?

Starting point is 00:00:00 Hello, everybody. Mark Carlson here, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developer Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to SDC Podcast, episode number 176. I'm going to talk today on persistent memory. Without Optane, where would we be? And first of all, I want to apologize for Jim.

Starting point is 00:00:51 Jim Handy, my colleague, we've done a report together. You can see some flyers on some of the tables here that he wasn't able to come today. Emerging memories have gotten a couple of big boosts over the past few years. So one in the form of Intel's Optane products, and the other from the migration of CMOS logic to nodes that Norflash and now SRAM look like they can't support. Although they appear to be two very different spheres, a lot of work that has been undertaken to support Intel's Optane products, also known as 3D Crosspoint, will lead to improved use of persistent memories

Starting point is 00:01:26 on processors of all kinds. In this presentation, we're going to review emerging memory technologies and their roles in replacing other on-chip memories, the development through SNIA and other organizations that are fostered by Optane memory, but usable in other aspects of computing. The emergence of new near and far memory paradigms, in fact yesterday I saw a medium paradigm was also mentioned, so near, medium and far memory, that have spawned interface protocols like CXL and OMI. And the emergence of chiplets and their potential role in the evolution of persistent

Starting point is 00:02:06 processor caches. So we're going to go into some of these things here, and here's an outline of what I'm going to address. We'll talk, first of all, the rise and fall of Intel's Optane, the legacy we have because of the development work that was done with Optane. We'll look at some ideas, concepts about future chips and persistence. We'll talk a bit about some of these memory technologies. Won't do the nanotubes today, Bill. I'm sorry about that. How future processors will benefit. Then finally a summary and then open for some Q&A if we've got time.

Starting point is 00:02:40 So first let's talk about the rise and fall of Intel's Optane. So it died, but why? We're going to have a historical recap, actually from prehistory, 1970 to 2022. We'll explain some of the economies of scale and the role that had Intel's losses in making a decision. And this is kind of a cautionary tale to anybody who wants to make a standalone memory technology that competes against the big boys like DRAM and NAND flash. And Optane, of course, is Intel's name for 3D cross-point memory. I think I mentioned that before. And that 3D cross-point memory is just another kind of phase change memory. So Intel was pushing actually phase change memory for most of the company's life.

Starting point is 00:03:26 This magazine cover shows a 1970 article that Gordon Moore co-authored in the first phase change memory that Intel was introducing. The company finally introduced a commercial phase change memory in 2007 when it appeared that NorFlash would not scale past 65 nanometers. Samsung introduced a competing product at that time, but both products were later discontinued, so they didn't really catch on. Finally, in 2015, Intel and Micron together introduced 3D cross-point memory, which is the basis of Intel's Optane product line. It was touted as a new layer in the memory and storage hierarchy to fill the gap between NAND SSDs and DRAM. So let's take a look at this chart here. So this chart, it's a way that actually Jim came up with this as a log-log plot to look at the memory and storage hierarchy. Different than that typical pyramidal diagram that people use to explain the hierarchy.

Starting point is 00:04:27 It just shows sort of the pyramid, the volume of the pyramid, which part of the pyramid shows how much capacity they tend to store, and how far down the pyramid they are means they're getting cheaper. Up the pyramid, you have less content generally, but it's more expensive. Down below, it's cheaper. This looks at it in a different way. It plots the speed of various memory layers from tape, all the way from tape memory and storage, to the CPU's L1 cache. And it looks at that against, again, a log-log plot

Starting point is 00:04:57 against price per gigabyte. So it's price per gigabyte versus the performance, the data rate performance. So any other format, if you didn't do log-log, you'd simply have a huge L1 orb, and everything else would be crowded down into the lower left corner, so you couldn't see anything. So log-log blows it out, so you can get a look at that. And this is how computers are designed today. Each layer, and they use multiple layers, each layer may be faster than the next cheaper layer,

Starting point is 00:05:24 and cheaper than the next faster layer. Optane was introduced to fill the growing gap between NAND SSDs and DRAM. So when 3D Crosspoint was introduced, it appeared to fit this hierarchy well. It checked off a number of boxes which seemed like they were important. It's a thousand times faster than NAND flash, so it would be faster than NAND flash and slower than DRAM. That seems to fill a box, so check, right? It's one-tenth the die size of DRAM, so you would intuitively suppose that it could be built for one-tenth or so of DRAM's cost. That one seems to be also checked there, right? It's also a thousand times NAND's endurance. That's important since memory gets written to many more times than storage does. Give it a check in the reliability box for that as well. So from all this,

Starting point is 00:06:13 it seems clear that it would fit well into this memory storage hierarchy and fill this void, you know, potential void between DRAM and NAND flash. So despite the fact, though, there's important things to think about here, though, and this is sort of the lessons of this. Despite the fact that dye was one-tenth that of DRAM, Intel never could get the cost as low as DRAMs, not even close to getting it as low as DRAMs, the actual cost of production.

Starting point is 00:06:41 That's because the economies of scale are a key part of a chip's cost. Look at it this way. if you make only one chip What would it cost? Probably tens of millions of dollars to get all the equipment to get it running make it make make a chip, right? If you made only one wafer of ships a chip might cost about a hundred thousand dollars sort of amortizing over the over the Scale of things if you made ten thousand wafers per month, as Intel and Micron were doing, each chip might cost about $80. It's only by making 300,000 to 400,000 wafers per month

Starting point is 00:07:13 that the cost would become competitive with DRAM at $40 or so. And that was the problem. They never reached that scale. High volumes drive down production costs. So how do you get a product to sell in high volume? You sell it for a low price so that people start using it and they find it attractive. But the production costs won't actually be low to the point where you can sell it for more than you make it until you get to high volumes. It's a chicken and the egg problem. So Intel knew this, and they actually made a conscious decision to lose money, to drive down the cost, and hopefully to be able to, with a lower-cost product, be able to build the market, find applications,

Starting point is 00:07:54 and be able to scale up their demand and manufacturing volume. So let's look at NAND Flash as an example, though, of what happened for a successful memory, standalone memory, that was able to get below the cost of another memory, in this case DRAM. So let's take a look at this. What this slide shows is that single-level cell NAND, back when the NAND was early days of NAND, that's what you could get, but its most expensive kind, has always had a die size that's about half as large as a DRAM when both are made on the same, about half the size when made in the same process geometry, and both have the same number of bits on them. So the box, these boxes are a little complicated

Starting point is 00:08:36 because I have to compare the die size of one 8 gigabit NAND against two 2 gigabit DRAMs, you know, etc. So bear with me here. So at 54 nanometers, a 4 gigabit NAND is about the size of a 2 gigabit DRAM. At 50 nanometers, an 8 gigabit NAND is about the size of two 2 gigabit DRAM. At 48 nanometers, an 8 gigabit NAND is about the size of 4 1 gigabit DRAM. At 44 nanometers, an 8 gigabit NAND is about the size of 2 2 gigabit DRAM. The big point here is that die sizes aren't everything. Even though there's an advantage there, in the early days, NAND flash was a lot more expensive than DRAM. And this is actually a historical chart looking at the NAND and DRAM price per gigabyte. The NAND line is dotted because the NAND line in the early days is

Starting point is 00:09:37 dotted because these are WSTS figures. It's sort of the gold standard people's looking at what's happening in the industry. And they don't report NAND costs prior to 2004. So this is looking at back from 2000 to 2014 in this particular chart. That said, the number that Jim had at DataQuest and Semicode when he worked at those places supported this dotted line. So it seemed like there was a trend in the early days, the price of NAND flashes going down roughly at a scale shown there. It may have some oscillations on it. So this is a semi-log plot.

Starting point is 00:10:13 And it prevents us from looking like a hockey stick that drops to near zero at the left and then hugs the x-axis for the rest of the chart. So again, if we do these log scales, we can see some of these exponential-type functions that, in much better observation, get a lot more information on it. Also in a semi-log plot, constant growth shows up as a straight line. The general trend of these lines

Starting point is 00:10:36 is that due to Moore's Law price reductions, NAND flash has always moved faster than Moore's Law. It actually has been scaling very fast. That makes, of course, gives it a lot of acceleration to increase the capacity, to lower the price, for example. So the circle in 2004, let's see, is where NAND flash prices cross over DRAM. Suddenly, NAND flash found a place in the memory storage hierarchy because it could start to be cheaper than DRAM.

Starting point is 00:11:12 It now made sense to have SSDs in computers around 2004. You started to have that economy that made that possible. Before that, NAND was used for USB flash drives, MP3 players, cameras, SD cards, things of that sort. It's lightweight, low-capacity storage, mostly. So this slide shows how many terabytes of NAND shipped during the same period of time, compared to DRAM. It's also a semi-log plot, and again, for the same reason, so we can see the scale. So there's no hockey stick, and steady growth, again, appears as a straight line.

Starting point is 00:11:54 This circle that we saw there is drawn at the same price point as it was in the prior chart at 2004, when the price crossover happened. At that time, NAND's terabyte shipments were one-third as large as DRAM. This tells us that the economies of scale for flash were finally allowing this chip, which is half as big as a DRAM, just a second here, to finally match DRAM's cost. Yes, Steve. So here, this is the quarterly terabyte chip.

Starting point is 00:12:17 So the black is the NAND flash, and the red is the DRAM. So you see there, so 2004, you had the price crossover. By 2006, roughly, we were starting to ship more capacity with DRAM, sorry, with NAND flash than we were with DRAM. So again, this was a successful memory that was able to move into more applications because it was able to reach those economies of scale. So this translates into about one-tenths of DRAM's total wafer production.

Starting point is 00:12:55 That was true for Optane 2. If it was to get to cost parity with DRAM, its wafer production volume would have to be one-tenth that of DRAM, or about 30 to 40 times the number it actually reached. There's another example of this. In the struggle 3D NAND had in getting to be viable versus the planar NAND, which was around before 3D NAND started to appear after about 2013, 3D NAND took about three years to become cost-impeditive planar NAND. Before that, it didn't have the volume to be able to get this cost down, so it was less than the earlier planar process. So by our estimates, based in part on Intel's reported financials, Optane's petabyte shipments approached 1 30th that of DRAM's petabyte shipments.

Starting point is 00:13:41 Again, the technology was trying to supplement and get its cost below, you know, Optane wanted to be below the cost of DRAM. So our 3D cross-point forecast has always been unabashedly optimistic. The gap is probably wider than that, in fact. Even so, we don't see Optane's volume allowing it to reach cost parity with DRAM until about 2028. Our guess is that Intel didn't want to subsidize the technology for that long. They sort of reached the limit of what they were willing to do. Plus, they had a bad quarter. So here's a timeline of key 3D crosspoint events. So Intel's actions appear in the upper half and Micron's in the lower half. So we see around 2015, there was a big

Starting point is 00:14:25 announcement of 3D Crosspoint. There was an early announcement just before the Flash Memory Summit that year, and then there was a big to-do at the Flash Memory Summit Intel's developers forum about 3D Crosspoint. In 2016, Intel's partner Micron announced their Quantex products. The first Optane drives were shipped in 2017 by Intel. And by the end of 2018, the first Optane DIMMs were shipping. A little later on, the first process support of Optane DIMMs was announced in early 2019. Towards the end of 2019, the first Quantex SSD from Micron was demonstrated. Then in early 2020, Micron ends the Intel relationship. They never actually shipped in any kind of volume any SSDs.

Starting point is 00:15:22 They demonstrated stuff, they made announcements, but they never actually shipped. In 2020, Intel discontinued their consumer Optane SSDs, which are popular in gaming and other applications that liked a lot of speed and a lot of memory. In early 2021, Micron killed their 3D cross-point effort, and then later on in 2021, Micron sold the Lehigh FAB, which is used to make most of the 3D crosspoint that Intel was using in their Optane memory. Yes?

Starting point is 00:15:55 Do you know what happens to the production equipment? I do not know what happens to the production equipment. You know, and some of this production equipment is going to be useful for other things so imagine it's been sold and repurposed for other applications now Intel did have a facility in New Mexico where they were also making some quantity of Optane memory so they had some capacity to make some memory themselves but not at the scale of the Lehigh facility

Starting point is 00:16:21 in 2022 then in July actually I think it was July 28th, if I remember that date, just after we finished our report, yes, Intel announced that Optane was winding down. So that's sort of the timeline of what happened there, just to give you an idea of the scale of things. Again, sort of a cautionary tale to give us some idea of where things are going. Now, from Intel's reported financials, you can estimate their 3D cross-point losses. When the company sold off the NAND business, it disclosed Optane's losses for the four quarters

Starting point is 00:16:57 of 2020. These losses closely match those that we've estimated for that year. And the sum of all the losses in this chart, which is based upon looking at the losses in Intel's storage unit, which included Optane, even while other companies are making money selling SSDs, indicated we took that as being an indication mostly of the losses with Optane. So the sum of all the losses in this chart is about $6.8 billion. Since there are losses we didn't capture, we're pretty sure, we're confident that the company's total Optane losses were greater than $7 billion, you know, from 2014 here all the way through 2022. Now, certain factors should have increased Optane's volume, but it failed to materialize. And these were a large part of what was to blame for them never reaching that manufacturing volume.

Starting point is 00:17:52 Optane SSDs were supposed to drive broad usage to ramp the volumes hard. End users, though, didn't accept the added cost for Optane's performance boost, so that market never developed. The Optane DIMM, which should have been the big volume driver, needed special processor support, and that processor support wasn't offered on Intel's server processors until two generations later than they originally planned on doing it. And that seriously slowed Optane's name, Optane's ramp ramp and prolong the losses and made it easier for Intel to decide to cap it. So what are the lessons we can learn from this? Expect losses until volume ramps, if you've got a new technology like this, and be able to support it. A small die size doesn't matter

Starting point is 00:18:39 if the manufacturing volume isn't big enough. Losses might be larger than expected, in fact, too. You have to anticipate that in order to reach a low enough price point. And supporting elements may delay adoption. Things may not happen the way you'd hope they would, or things get delayed, that kind of thing. Optane's processor support was delayed. Application software wasn't widely available when this started. But that also leads into what did we gain from Optane? What is Optane's legacy

Starting point is 00:19:07 to us? And actually, it's considerable. Let's talk about what Optane has led to. Optane led to a legacy of advancements. I'm going to address each of these in turn. It led to new programming paradigms, new CPU instructions, new approaches to two-speed memory, new near-memory bus concepts, new approaches to large memory, and new thinking about security concerns with persistent memory, where when the power goes off, the data is still there. So the new programming paradigm is related to SNEA's own NVM programming model, which actually started before the announcement of 3D Crosspoint. But 3D Crosspoint, in the emergence of a commercial persistent memory, did an awful lot, I think, to stimulate that effort. This model characterized

Starting point is 00:19:59 various types of access to persistent memory, that is a non-volatile memory, including a persistent memory aware kernel, a persistent memory aware filing system, and ultimately direct access to persistent memory from file system or directly from an application. One of Optane's key features is that it brings persistence closer to the processor. This makes it easier to recover from power failures since the memory retains its values. Before this, it wasn't possible to write the entire memory to disk or an SSD as power was failing. Now all that has to be written to is the dirty lines in cache. These instructions take care of that. This means that everything except the registers can be persisted in the event of a power failure. Yes, Steve. That was actually never implemented. You have to worry about the process of cash.

Starting point is 00:20:51 Until it gets to the device, control, then it's safe. But before then, it's not. You have inconsistent... You've been playing with this. I've had experience to people who say that that first stage is not... Suppose EADR, which is only implemented one time, and EADR is one... By the way, nobody online can hear you right now.

Starting point is 00:21:18 I'm going to try to paraphrase. So I think the point that Steve was making, one of our designated hecklers, is that implementation of saving all the memory, all the data into the persistent memory may have some issues and was oftentimes not consistently done. So anyway, so that was his comment, just for those who are in the audience. Thank you, Steve. Let's see. Okay. So ADR, so asynchronous DRAM refresh. So it's a strange term that Intel coined for DRAM that self-powers and self-refreshes when power is lost. What ADR really means is persistent DRAM, either on an NVDIMM or an OptaneDIMM, more formally known as an Optane DC Persistent Memory Module, or PMM. So a combination of fast memory, DRAM, and slow memory, like OptaneDIMMs, is often referred

Starting point is 00:22:18 to as NUMA, non-uniform memory architecture. Now this chart says that any time that you interrupt the CPU, which has always been done for storage for SATA, NVMe, or older I.O. channels of course, does a context switch where the CPU stops everything for 100 microseconds or so, while it pushes all of its status, registers, etc. onto the stack. If you're accessing 100 nanosecond Optane memory, you don't want to perform an interrupt every time that you would for an SSD access because you'd be slowing everything down by a thousand times. For Optane, it makes more sense to pull, to pull, that is P-O-L-L, that is have

Starting point is 00:22:59 the processor run a software idle loop while it's waiting. This is bringing a lot of focus into the way that interrupts have been handled in the past and should result in changes to the interrupt handling in the future. So again, it's one of these things that the existence of fast, persistent memory led us to consider and think about, no matter how successful it was in total implementation. So another approach is to make the DRAM bus transactional. So if you send a number of requests without waiting for the results, and as the results come in, the DRAM tells the CPU what's heading its way. Intel did this by adapting the DDR4 bus to Optane. A few

Starting point is 00:23:39 signals were added to unused DDR4 pins, and those are represented by that far left arrow on this diagram here. Nothing else in the DDR4 bus changed here, so the socket can be populated with either DDR4 or Optane memory. Unfortunately, this has to follow changes from DDR4 to now becoming more common DDR5 and later, so it's a good thing that CXL has been licensed, has licensed OpenCAPI's OMI, Open Memory Interface, memory channel, which can be used for DDR4, DDR5, Optane, and pretty much anything else, which is now part of the CXL standard group.

Starting point is 00:24:18 So Optane has also made people think of ways to expand memory that are not limited by capacitive loading in pin count, as was the case prior to CXL. This takes some extra interfacing, such as a switched fabric, that slows it down, so it's become far memory, a new term. Near memory is the stuff that doesn't suffer from this delay and is directly attached to the processor. And yesterday there was at least one session where they were talking about medium memory, which would be the CXL2 and CXL3 would be the farm memory, where you actually have a switch fabric. CXL is backed by a big consortium, and this will result in widespread adoption of enterprise and data center applications, likely starting around next year, 2023. CXL is an alternative protocol that runs on standard PCIe physical layer.

Starting point is 00:25:03 It enables pooling of non-heterogeneous memory where parts of different memory devices can be accessed by different hosts. This allows flexible allocation of memory resources in composable infrastructure. CXL supports accelerators near the memory or even in the memory itself. And we've heard a lot of talk at computational storage, for example. So there's some degree of that computational storage that's related to this development of CXL that probably owes something to Optane as well. CXL can support more complex memory sharing tasks. gigatransitions per second PCIe 6 physical layers support switch network fabrics, enabling much greater scaling of heterogeneous memory pool and more sophisticated memory allocation and composable memory as well. So, persistence, though, leads to potential security issues.

Starting point is 00:25:57 And the top bullets show how it has been handled in the past. You can see physical destruction, secure erase, you know, crypto race, AES encryption. The bottom bullets show how Optane is approaching this. The Optane DIMM is the first ever encrypted DIMM. Volatile memories don't need AES encryption since they lose their memory contents as soon as power is removed. Although there have been stories about people that will chill the DRAM, you know, in order to try to recover whatever was in it, right? So there are special ways that AES encryption is handled. When Optane is used in memory mode, it just looks like a big honking DRAM. So users don't expect persistence. Since

Starting point is 00:26:36 that's the case, Intel's drivers simply lose the AES key for Optane every time the power is lost. Just like DRAM, then Optane comes up with random contents. If Optane is being used in an app direct mode, in which the applications take advantage of its persistence, the data must become available again once power is restored, so the data has to persist and be accessible. Optane does this by storing the key on the module itself and requires a passcode for the CPU to read the key. This way, the module's contents cannot be read unless the reader has the passcode, so at least that's some level of security. If that's not enough, Optane will also do things that are

Starting point is 00:27:15 featured on military SSDs. It will erase and overwrite all addresses upon command, so it's another option that you could do. So that legacy leaves us better prepared for future processors. NOR and SRAM scaling, it looks like they're stop or slowing down, and emerging memory will take over as the embedded memory on system on chips, certainly for NOR, likely for a lot of SRAM. That emerging memory will be persistent. Optane's legacy helps to support emerging memory caches and registers. Persistence closer to the CPU or even within the CPU, ultimately, perhaps even the registers could become persistent over time with new memory technologies. In mixed memory speeds, for instance, SRAM versus emerging memory is also a characteristic that will be true of future memory storage

Starting point is 00:28:02 hierarchies. With mixed memory speeds, fast, slow, MRAM, various other things. And also, it leads to thinking about security with persistent memory, so hopefully leading to well-conceived security protocols. So everything in this chart is normalized to SRAMs. So everything in this chart is normalized to SRAM's cost at 500 nanometers. That's a half a micron. So this is showing some of the scaling issues with NOR flash in SRAM. This chart's log-log again, again, because processors move exponentially. Every process node is about 70% the size of the prior one, and the costs move proportionally to the size of the device with that node.

Starting point is 00:28:47 NOR flash memory has a problem after the 28 nanometer node. NorFlash is a non-volatile memory that the industry uses to store code in microprocessors and microcontrollers, ASICs and other systems on chips, and it stops scaling at about 28 nanometers. That causes the cost declines to cease as the red line shows by abruptly going horizontal. According to papers presented at an IEEE conference over the past few years, the cell area and thus the cost of SRAM cells stopped shrinking at about 14 nanometers. Thus, SRAM cost reductions with the shrink stop at that point.

Starting point is 00:29:14 SRAM also has several transistors per cell, while the emerging memory candidates have a single transistor. Now, maybe a different size transistor, but it's, again, one transistor versus several. For applications such as AI inference that needs lots of memory, the emerging non-volatile memories can provide that same capacity in a smaller die and thus cost less than SRAM. We assume that the new tech wafer for this slide costs about five times as much as the NOR or SRAM wafer, the black line, at present time, for example. As long as a new tech can move two processors past NOR or SRAM, then it eventually becomes cheaper, despite the higher wafer cost.

Starting point is 00:29:52 This is why so many people have invested so much money to fund the research of a new memory technology. And again, this is different than a standalone memory. This is an embedded memory you're talking about here. And so you're building the chips anyway. It's just what memory are you putting in them. So that whole scaling thing is very different. This is why so many people have invested so much money to fund the research on new memory

Starting point is 00:30:13 technologies. The bottom line, new memories are inevitable. They must gain acceptance for chips to continue to scale in price. Now to show what this means in chip size and cost, I'm going to present an illustrated graphic. So, first of all, this is a photograph of an Intel processor chip made on a 45 nanometer process. You can see two very different parts of the layout.

Starting point is 00:30:33 The less regular part of the chip at the top is the logic, and the lower half, with its very regular patterns, is the SRAM used for on-chip caches. In the past, the entire chip would scale with process shrinks, as is illustrated here. As processes move from node to node, from 45 nanometers to 32 nanometers, 22 nanometers, and finally 14 nanometers, the die area and cost would be half that of the prior generation. It follows a nice scaling curve. Now, this assumes that the SRAM keeps pace with the process technology, though. And the prior slide showed us that this doesn't happen after the 14 nanometer node.

Starting point is 00:31:13 So, let's take another look. Sorry. Hold on a second. There we go. Oh. Uh-oh. There's the scan, I think. Okay. So, let's use the same chip again,

Starting point is 00:31:30 keeping in mind that the top half is logic, which will scale in proportion to the process, and the bottom half is SRAM, which will scale less aggressively. You can see in this series that the top half of the chip gets pretty small towards the end and only accounts for about 20% of the total die area at the smallest process nodes. Yet the SRAM doesn't scale, resulting in a pretty large chip at the end. In fact, the die size trend starts to level off at the end.

Starting point is 00:31:53 The black lines only indicate height. Now, the actual area on the chip is something like the square of this number. So you can see this actually can become a fairly considerable difference. SRAM is causing the chip to be larger and thus more expensive, and a mature emerging memory technology could replace it at that point and result in a smaller chip with lower cost and providing the same amount of memory. Now, an emerging memory will take over as the embedded memory and SoCs. That emerging memory will be persistent. Options are MRAM, resistive RAM, and actually several of the foundries making these embedded chips are now offering those as options. It could

Starting point is 00:32:33 also be ferroelectric memory. It could be phase change memory. It could be carbon nanotubes. I didn't mention carbon nanotubes. It could be various other things. The Optane Legacy line at the bottom shows that all those things we just mentioned, like the SNEA programming model, will be useful to support all these new memory technologies. Now let's look quickly at some of these memories. So there's a lot of these memory types, Vind, Replace, NorFlash, and SRAM. They all share a number of attributes. They all have a single element bit cell that promises a scale smaller than current technologies in order to support small and inexpensive dyeing and potentially 3D stacking. They also promise to be easier to use in flash memory by supporting write-in-place with no need for a block erase, and they have more

Starting point is 00:33:21 symmetrical read and write speeds. Finally, they're all non-volatile, persistent. Data doesn't disappear when the power is lost. They can all be used as persistent memory. New memories are necessary for Moore's law scaling to continue. These technologies include ferroelectric RAM, magnetic RAM, resistive RAM, and phase change memory, such as Intel's Optane memory. We may not have seen the last of phase change memory.

Starting point is 00:33:43 We'll see. So here's, I'm not going to spend a lot of time on this chart, but it compares important characteristics of conventional memories such as DRAM, SRAM, NORFLASH, and NANDFLASH with new non-volatile memory technologies, in particular ferroelectric RAM, resistive RAM, magnetic random access memory, and phase change memory. The higher endurance and performance in particular of MRAM may make it possible to replace SRAM and DRAM as MRAM production volumes increase and as new technologies come into play such as spin orbit torque or voltage controlled magnetic anisotropy which promise lower write write power in faster speeds. So MRAM is already shipping fairly low volumes. Everspin has a partnership

Starting point is 00:34:25 with Global Foundries, who are building 3.0 millimeter wafers for Everspin. They offer MRAM to other customers for embedded memory applications in system on chips that Global Foundries does. For example, Everspin's MRAM is a standalone device that's used as cache memory in IBM's Flash core modules. Another company, Rassus, is shipping an MRAM chip that inherited through its acquisition of IDT. Avalanche and Honeywell are shipping some MRAM for military aerospace applications. And other foundries now offer MRAM options on their system-on-ship clients. That includes TSMC and also Samsung. Resistive RAM is also in production, but in a quiet way. It's been shipping by Adesto, now part of Dialog Semiconductor, since 2013. And actually, Dialog is now part of Ernosys,

Starting point is 00:35:13 so the fish are eating each other. It was licensed its CB RAM technology global founders to be offered as an embedded non-volatile memory option on its 22FDX platform and future platforms. ARM announced the spin-out of Surfy Labs, developing a license, new types of non-volatile memory option on its 22FDX platform and future platforms. ARM announced the spin-out of Surfy Labs' development license, new types of non-volatile memory based on correlated electron materials, or CERAM, in a joint development project with Symmetrix Corporation. WeBit Memory recently announced early production of their resistive RAM devices shortly before the Flash Memory Summit. Other companies currently ship resistive RAM in both commercial and military aerospace applications,

Starting point is 00:35:48 and leading founders are supporting resistive RAM as another alternative to embedded NOR flash. Finally, we come to the oldest emerging memory technology, ferroelectric memory, or FRAM. Surprisingly enough, this technology predates the development of the integrated circuit. The photo on this slide, published by Bell Labs in 1955, shows a single SBT crystal with vertical and horizontal metal traces that could be used as a non-volatile

Starting point is 00:36:13 memory. From the perspective of unit shipments, FRAM has also shipped more than all of their emerging memory technologies combined, having found its way in over 4 billion chips. It's most commonly used in RFID cards because of its extraordinary low-right energy requirements. It basically harvests energy from the radio wave that's used to interrogate it. Until recently, FRAMs were based on unfriendly materials, lead and bismuth in particular, that semiconductor fabs are not really keen on putting into their production processes. But in 2011, NAMLAB in Dresden, Germany, found that a crystalline form of hafnium oxide has strong ferroelectric properties.

Starting point is 00:36:49 Now, this created a new life for ferroelectric materials. Hafnium oxide is a common material used for high-K dielectrics in modern CMOS semiconductor processing. FAB managers understand how to make it in high volumes. Besides its use in ferroelectric memories, this form of hafnium oxide is being investigated for various other applications. For instance, in DRAM,

Starting point is 00:37:10 they've used it to produce a 3D DRAM similar in process to a standard 3D NAND flash or to increase the retention time, use the ferroelectric properties to increase retention time of the DRAM so that you don't have to refresh it as often. So since SRAM will be replaced by a new memory technology, then caches will use a new memory as well, ultimately.

Starting point is 00:37:31 Eventually, even a CPU's registers could migrate to this new technology, bringing persistence into the CPU. That will require elements of the SNEA NVMe programming model that we talked about earlier. Also, fast and slow memories will play side-by-side in a memory hierarchy to take advantage of the approach that Optane required to mix DRAM and 3D Crosspoint.

Starting point is 00:37:53 This table shows how this will play out, although the timelines aren't necessarily precise. So where do you go to learn about this stuff? Well, fortunately, Jim and I have a report that we just finished. And much of this information on the presentation is drawn from that report. It's available for people to look at and purchase. It describes the entire emerging memory ecosystem, the technologies, the companies, the markets, sport requirements, looking at both embedded and discrete devices. It's 241 pages with 36 tables and 259 figures.

Starting point is 00:38:27 And you can visit these URLs at the bottom to learn more. And it's now available. But before we finish here, how will future processes, how will they benefit from these developments? In particular, let's look at chiplets, which is becoming an interesting topic right now. So chiplets are a way to get past a barrier to continuing Moore's law of scaling. In Gordon Moore's original paper, he said that three things made the number of transistors per chip increase. First of all, shrinking the process geometries.

Starting point is 00:38:57 Second, increasing the die sizes. Third, cleverness. So die size reached a limit thanks to the way that optical lithography works. There was a maximum reticle size that limits just how large a processor chip can get. All leading-edge processors are at the maximum size that can fit into the scanner's reticle. You can't make a bigger die. To get past this limitation, the industry has decided to put multiple chips into a single package. So this opens up new opportunities as these chiplets can be made using different processes.

Starting point is 00:39:31 So how does this work? So here's that same processor photo we showed earlier in the presentation. So once again, there's a logic side of the chip and a memory side of the chip. Whoops. As long as the memory side uses SRAM, it won't scale with finer processes. It sure would be great to use another technology, but SRAM is about the only memory you can make in a high-speed CMOS logic process. So instead, let's maybe make the processor in a high-speed logic process.

Starting point is 00:40:00 You know, a chip for that. And then use another process to build some more economical SRAM. And another couple of processors to build some more economical SRAM, and another couple of processors to build a DRAM and some MRAM to give you a very high cache capacity and some persistent cache. You can't build either DRAM or MRAM on a logic process, and you can build SRAM cheaper if you don't use a high-speed logic process. So let me remind you again of the economies of scale and how this plays into here. So the deal is that SRAM made on a processor chip is big and wretchedly expensive. So a chiplet can be pretty expensive and still compete against SRAM built on a processor chip.

Starting point is 00:40:43 Let me give you an example of this. So the current cost for the SRAM portion of a CPU is about one half the cost of a CPU chip. You saw how big that was. If the chip costs $200 to produce and half the chip is SRAM, then that SRAM costs $100. Server cache is 64K L1, for example, one megabyte L2 and one and a half megabyte L3, if that totals up to 2.6 megabytes. SRAM costs would be $100 divided by 2.6 megabytes, or about $38 per megabyte, you know, built into the processor. A four megabit SRAM chip, actually buying a discrete chip, at a half a megabyte, retails for $6. So that comes to about $12 per megabyte. So you can see some of these economies here. DRAM is currently selling for

Starting point is 00:41:30 $3 a gigabyte or, you know, 0.3 cents per megabyte. So three hundredths of a cent. A memory chiplet will reduce the cache's cost, give you the cache fairly close to the processor, but at a lower cost. But it can be orders of magnitude more costly than DRAM, of course, and NAND eventually even more expensive than a non-volatile memory like MRAM. So the costs go down when you use chiplets, even if the chiplets significantly more costly than in volume NAND or DRAM. And that's part of what's driving that, plus the issues of trying to build everything on the most modern process, the cost, is what's driving chiplet and also driving new ways to build Shiplets like the UCIE interface.

Starting point is 00:42:10 And so you're going to have a persistent cache. It's going to happen. And when you do, the ecosystem will already be in place because of Optane. Hooray! And Shiplets will accelerate this transition. So in summary, Optane could not harness the economies of scale. It was a grand effort, a cool technology. A lot of people loved it, but never achieved enough scale, enough applications that would drive it. And there's some things were delayed and they slowed stuff down. It didn't make it within the footprint of how much Intel was willing to spend to make it happen. Optane effort, though, generated a great legacy. CXL, part of that legacy, opens new vistas in data center architectures. Emerging memories are here, and they are persistent. Future processors will have persistent cache and later registers. Persistent will become

Starting point is 00:43:00 ubiquitous. Optane's legacy is going to benefit tomorrow's processors, and chiplets are going to accelerate that transition. And probably you're going to see more chiplet talks, I would guess, at the Storage Developers Conference next year, maybe even at the Flash Memory Summit. So with that, I guess we've got a little bit more time for questions if anyone has any. Yes, sir. Well, let's see. So first of all, it uses like six transistors. Oh, sorry. I just said not in all of it. Well, let's see. So first of all, it uses like six transistors. Oh, sorry.

Starting point is 00:43:28 So the question was, what keeps SRAM from scaling the way that Logic does? Well, first of all, SRAM uses about five or six transistors. That's how it stays. It retains the data until the power goes off. And so that makes it a big thing anyway. So it makes it ripe for a change there. But it looks, there's the, you know, I can't say right offhand what the issues are with the why it doesn't scale below the 14 nanometers, but that's apparently what people are finding.

Starting point is 00:44:02 Yeah, they are. That was ISCC conference papers, you know, for a few years. I did notice something recently, though, that in a, actually an IEEE roadmap effort in the, what is it, more and more, though, there was some talk of SRAM scaling. The Jim and I are taking a look at what they're projecting there, which may change some of those numbers. We just have to investigate and find out. But until that point, it did look like there were limits in what SRAM could scale.

Starting point is 00:44:35 Yes, Andy. What is NVDIMM? NVDIMM is, you know, NVDIMM where you actually have a battery. Right, yeah. Those are, right now, it's a popular approach to being able to solve some of these persistence issues. Ultimately, though, you may be able to do something without a battery. And that's, if the volume gets high enough in some of these other technologies, which it isn't right now, then it could, or you don't have room to put a board in, right, then that's what's going to be driving

Starting point is 00:45:06 some of these other technologies. But NVDIMM is certainly there. And the super caps, yeah. Yeah, yeah. You know, I don't do that directly, but, you know, it certainly is a, it certainly is an option to create a, as long as the battery still works, non-volatile memory technology that can back up your main memory. Not as good as on PC, but it's still a viable solution. Well, and the other thing on CXL is there's an awful lot of CXL-based SSDs, sometimes with a lot of DRAM that people are talking about now. And also CXL-landed with the battery. Yes.

Starting point is 00:45:50 Yeah. Yeah. Yes, Steve. Oh, sorry, I didn't repeat your question. The question was with regard to NVDIMMs, and NVDIMMs are... That was a question about that I just answered. Go ahead, Steve.

Starting point is 00:46:04 It's never going to be cheaper than DRAM. DRAM plus a battery plus a flash. You can never get anything, that was one of the things about us, it was supposed to be cheaper, and they couldn't actually make it cheaper. Bench is going to be cheaper. But having DRAM plus a battery plus a flash and a fourth battery without a flash, it's never going to be cheaper to deal with it without one. So you'll never get the capacity increase

Starting point is 00:46:29 per dollar that you were supposed to get with a plug. But there are some places where the... If you're talking about the previous question, there are some things where the NVDIMM doesn't serve a function, you know, despite the fact, you know, where... Oh, yeah, no, I believe you just won't get the capacity. Yeah, yeah, yeah, yeah.

Starting point is 00:46:44 Yeah. Okay, thanks. Yeah. Okay, thanks. Maybe one more, I think. Yeah, go ahead. Sorry. Oh, I'll take yours too. Go ahead. You said that the CFL brought some

Starting point is 00:46:56 coherence to the fabric market. Do you see some conversions like UCIE for the chiplet market as well? Hopefully. I think that's what the UCI-E guys are hoping will happen. Right now, it's been kind of a wild west. AMD and Intel doing their own things. So getting some kind of agreement,

Starting point is 00:47:15 especially so you can bring 30-party chips into the architecture and have them work together to create interoperability, I think is extremely important to build that ecosystem. And the question was about UCIE and its place in the future, if you will. I think I phrased that okay. SW?

Starting point is 00:47:33 Not a question. One of the things that I think that you maybe didn't have a good idea of is that it really has been all over the operability. Oh, yeah. When the server was out of the ship, Oh, yeah. of what to do about removing that huge latency in the wild side of the stack. It was never a problem before.

Starting point is 00:48:09 So SW just pointed out something he thought I didn't emphasize enough, which was the importance of software and operating systems in particular on being able to work with persistence and also create awareness of some of the latencies that were inherent in the way in which we do things today and which could be improved in future technology. The improvements are done, yeah. Sorry, the improvements that have been made in the OS stacks to deal with some of the latency issues that emerged in the process of updating the software. I think that's probably all we have time for.

Starting point is 00:48:44 Thank you very much. I appreciate it. Thanks for listening. If you have questions about the material presented in this podcast, be sure and join our developers mailing list by sending an email to developers-subscribe at sneha.org. Here you can ask questions and discuss this topic further with your peers in the storage developer community. For additional information about the Storage Developer Conference,

Starting point is 00:49:17 visit www.storagedeveloper.org.

Your Ad Here

Storage Developer Conference - #176: Persistent Memories Without Optane, Where Would We Be?

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.