Storage Developer Conference - #198: Riding the Long Tail of Optane’s Comet - Emerging Memories, CXL, UCIe, and More

Starting point is 00:00:00 Hello, this is Bill Martin, SNEA Technical Council Co-Chair. Welcome to the SDC Podcast. Every week, the SDC Podcast presents important technical topics to the storage developer community. Each episode is hand-selected by the SNEA Technical Council from the presentations at our annual Storage Developers Conference. The link to the slides is available in the show notes at snea.org slash podcasts. You are listening to SDC Podcast Episode 198. I'm Jim Handy. Tom Coughlin was supposed to present this with me, but Tom's having an interesting year. He's been elected to be president in 2024 of the IEEE. And as a result, this year,

Starting point is 00:00:54 he's president-elect, which means that he ends up spending more time on an airplane than he does at home. So it's really hard to keep track of him. He and I committed to do this. We were going to speak to you together, and then Tom got told that his schedule was about a month ago, maybe two months ago. He got told that his schedule wouldn't allow it. So here you are. You've got me. Hopefully I'll do a good job for you.

Starting point is 00:01:17 The name of the presentation is Riding the Long Tail of Optane's Comet. I'm going to talk about Optane, a little bit about its history, what came out of Optane's comet. I'm going to talk about Optane, a little bit about its history, what came out of Optane, and the new things that it's allowing computing to do just because they put so much focus and effort into it. So here's the agenda. I'll go over Optane's history, talk a little bit about today's alternatives to Optane. I'll talk extensively about the legacy that Optane left behind and then flow from that into CXL, which is kind of a natural offshoot of this, and then from CXL to UCIE, the chiplet interface standard. And then I'll talk a little bit about the future.

Starting point is 00:02:00 So I'm not going to go step by step through this, but this is Optane's seven-year life, or at least the visible seven years of it. Actually, the first phase change memory that Intel introduced was in 1969. There was a paper by Gordon Moore and a guy who actually puts blog posts on one of my blogs called The Memory Guy, a fellow by the name of Ron Neal, who was the lead author in this paper, talking about how they had made the world's first phase change memory. Intel played with that, even introduced Norflash compatible product back about, I don't know, 2006, I think it was. And then finally landed on this cross-point memory that they introduced in 2015 with Micron.

Starting point is 00:02:49 This slide shows everything Intel did at the top, everything Micron did in the lower part. And so you see that, you know, 2015, there was the announcement of cross-point. And then you had Optane announced. I didn't put that in there, but Quantix, which was Micron's version, got announced. It went on through Optane Memory,

Starting point is 00:03:11 which was not really memory. It was a fast SSD to go along with a hard drive. And then the Optane DIMMs came in about 2010, and that was when things really should have taken off, but they didn't. So that ended up causing some financial problems for Intel. Micron sat back and watched all of these financial problems and they said, we're not participating until it becomes profitable. And then they finally said, this is never going to

Starting point is 00:03:34 become profitable. And they jumped out of the marketplace. Intel came to that realization last year in July. Very interesting little tidbit is that they announced what they called the wind down of Optane exactly seven years to the day after they had announced 3D Crosspoint existed. So in that seven year time span, then we had Optane come through all of those changes. And they drove a lot of change to the industry, but I'll talk about that in a minute. First, I'm going to talk about what are the alternatives to people who were really hoping for this to happen. And the first alternative in this table is Optane. Intel and Micron built a whole ton of Optane, and they've got wafers in some wafer vault in, I think it's Rancho Santa Fe, New Mexico, just waiting, dying to be packaged up so that people can use them in DIMMs.

Starting point is 00:04:29 And so anybody who wants Optane DIMMs can, or the persistent memory module, they call it, can still get those. They'll be able to get them for a while. I see somebody taking pictures of the slides. These slides will be available online shortly, as will a video of this entire presentation if I'm speaking too quickly for you, which I hope I'm not, but I do that sometimes. Anyway, the columns in this is, is it persistent? What is its speed relative to DRAM?

Starting point is 00:04:57 And so I put 30% down for Optane because it's about a third the speed of DRAM. What does it cost compared to DRAM? And the 3D cross point strategy was to price it half as much as DRAM. Now, because of the fact that it came in these very large modules, 256 gigabyte, 512 gigabyte modules, they priced it like Samsung's 256 and 512 gigabyte modules, DRAM modules, a half of that. And Samsung's charging exorbitant amounts for that. So don't look for Optane's price to be 50% that of a 16 gigabyte DRAM module. It's nowhere near that. It's significantly higher. But still, if you're looking for the same size DRAM module, then it costs about 50% of that. And the issue with it is that it's winding down. You've got NVDMN, which has been around for a long time. I'll talk about that more in the future.

Starting point is 00:05:50 But basically, it's a DRAM that's got some backup NAND flash in case there's a power failure. It is persistent. It has the same speed as DRAM. So I put 100% in the speed table. And I say it costs 200% of what DRAM does, but I've heard actually that it's more like five times as much. And the big problem with it is that it requires a battery, which is not the most reliable component in order to do that power down backup thing. Everspin, the company that is the lead in the MRAM business, and MRAM, in case you don't know, is magnetic RAM. It's memory technology that isn't in the mainstream, but it uses magnetic bits that are accessed the same way as a DRAM. And it's persistent. It's the same speed as DRAM,

Starting point is 00:06:39 but it costs about 10 times as much as DRAM. So that's really expensive. It's $1,000 for an 8 gigabit chip. Or no, wait a minute, for a 1 gigabit chip. So, you know, it might even be more than 1,000%. The big problem with it is that it's not compatible with DRAM 100%. And so changes have to be made to the host system to be able to accommodate the fact that it doesn't need to be refreshed. Sounds like a big benefit, but it ends up getting in the way. Fast SSDs, you know, that's something that some people are proposing as an alternative to Optane and Kioxia and Samsung, mostly Kioxia, talk about some of

Starting point is 00:07:17 their SSDs as being storage class memory, which if you take the most broad definition of storage class memory, then yes, it is. But what it's come to be known lately is something that's like DRAM speeds, and it's certainly not that. And so I say that it's 0.1% of the speed of DRAM. It's a thousandth as fast. It costs 20% as much, so it's really cheap, but it's slow. And then the last, you know, most obvious answer is additional DRAM. So it's not persistent. So that's a problem. Same speed as DRAM, same cost as DRAM. And the big problem that it runs into is bus loading. I'm going to talk about each one of these one by one. So

Starting point is 00:07:58 this is Optane. It's still around. The current inventory is fulfilling needs, and it probably will for a number of years. There is ongoing low- fulfilling needs, and it probably will for a number of years. There is ongoing low-level demand, and that's going to keep Intel in that business. They're just not talking about it anymore. And there's an awful lot of support already in place, thanks to a number of SNEA members, because SNEA put together the Persistent Memory Programming Standard, and that ended up being what caused Optane to work in the applications in which it works. You have the NVDIM-N, and I talked about this before. It's a bunch of DRAM. The larger chips are NAND flash chips. It costs about twice as much as DRAM does because you've

Starting point is 00:08:38 got extra stuff on there. There's also a microcontroller that when the power goes out, the microcontroller moves all of the DRAMS data into the NAND flash. When the power comes back on again, the microcontroller moves all of the NAND flash data back into the DRAMS so you can start over with a warm start. But it does require a backup power source, which a lot of people don't like. You have to find a place in your system to stick this particular thing as a PCIe slot with huge capacitors on it. This isn't shown to scale.

Starting point is 00:09:07 That thing is actually much, much larger than it really looks. Or you need to find a place to bolt a battery onto the side of the chassis for the server. So a lot of people dislike that for reliability concerns, etc. You've got the MRAMAM DIM, and unfortunately, I couldn't slant this picture the same way as all the other pictures and have the logos right side up. Everspin's logo is actually upside down on all of these. Production for this started in 2017. They haven't seen a big enough market to warrant making a DDR4 or a DDR5 version, so they only have a DDR3 version. Like I said before, it requires changes to processors because it doesn't refresh, and it costs more than 100 times as much as DRAM.

Starting point is 00:09:52 So that's problematic in its own right, but it does do the trick, and it's a nice, fast application. And then I just wanted to point out that MRAM is slowly being adopted in the enterprise. And so IBM is using it in their Flash core modules. They use it instead of DRAM. And the reason why is because the DRAM is required to store information that would need to be reloaded every time that the SSD rebooted. And they thought that they could probably do better than that. And they could also protect every time that the SSD rebooted. And they thought that they could probably do better than that, and they could also protect data in flight a whole lot better.

Starting point is 00:10:33 So the translation tables, buffers for write coalescing, which would be the data in transition, things that are being written to the SSD but haven't yet made it into the NAND flash because NAND flash is so slow. It's an easy way to protect data in flight, and it's a very easy way to get into persistence if you want to have that in your SSD. But also we're seeing increasing consumer adoption of MRAM in things like medical applications and vehicles, health monitors, that kind of stuff. So what

Starting point is 00:11:07 that's going to do is that's going to cause the number of wafers that have MRM on them to grow at a pretty substantial rate. That's going to drive down the costs because scale is huge in the semiconductor market. The more wafers you make, the cheaper they are to make. And that was one of the things that stood very much in the way of Optane's ever being able to become profitable. And then finally, we believe that the economies of scale will reduce prices as a result of that huge consumer demand. And then finally, SSDs. And I put SSD. Really? Well, yes. And Kioxia and Samsung are both advocating this. They have special NAND

Starting point is 00:11:48 chip architectures for that. You can only warrant doing that. You can only pay for it if you've got a huge NAND volume to support that. And typically they use SLC, single level cell NAND, which is a whole lot faster than multi-level cell NAND, the two bits per cell, three bits per cell. But the way that the market goes, the volume is very low for SLC NAND. The economies of scale play a part here too. And so SLC NAND is about six times as expensive as MLC NAND, which is more expensive than TLC. And then the question is, which performs better, a fast and small DRAM or a great huge NAND flash? Well, that's an interesting thing. And I made a presentation, I don't know how many years ago, at a conference called MemCon about this, where I showed this.

Starting point is 00:12:39 First of all, I wrote a book on cache memory design. And so this kind of feeds into that, is whether a cache memory does a good job or a bad job depends on how much locality there is in what's going on with your memory. And this is also true of virtual memory, is that virtual memory will do fewer page swaps if you've got high locality in your code. Locality is represented by, or high locality is represented by that white line,

Starting point is 00:13:06 is that this is kind of an abstract concept, but it's how many accesses happen in a small address range versus you have a wide address range and you have your accesses smeared across an awful lot of those. So the red shows what happens when you've got your address accesses smeared across a very wide range. The white is when you've got them very tightly clustered in a single range. Now, this might be your DRAM in the system.

Starting point is 00:13:32 And you see that it does an okay job with the red. But everything that's outside of where the DRAM is, is going to end up causing page swap or, you know, something like that. And, you know, something like that. And, you know, the white does better because it's got higher locality. Let's say you doubled the amount of DRAM in your system. All of a sudden, the white would, you know, almost everything in the white is being taken care of by the DRAM. And so, you know, that ends up being a really good solution, but it's still kind of a so-so solution

Starting point is 00:14:05 for the thing that doesn't have very high locality. And this would be an awful lot of databases and AI-type programs. And so an alternative that people who really think through this problem think about is what if you took us back to the original amount of DRAM and you put in a great big, slow, manned flash memory, an SSD. And this is why SSDs

Starting point is 00:14:28 have become so popular, is because you can do that, and you can see that the red is very well taken care of there. And the lower height of this means that, you know, you've got slower access. It's, like I say, kind of an abstract chart. And with the white, you do have that part that's not covered that is not going to do as well. But, you know, overall, if you have a huge slow memory, it's going to do you a really good job when it's matched with a small amount of fast memory. But once again, depending on how localized your references are in there.

Starting point is 00:15:06 So, you know, that's the argument that's being used in favor of using SSDs as an alternative to Optane memory, is that if you have a big SSD, it might be able to get you the same amount of performance that you get with Optane. Okay, the final thing on that table that I showed you was to put more DRAM into the system. And that's been problematic for a number of years.

Starting point is 00:15:35 Large DRAM ends up adding capacitance. They load down the memory channel if you put multiple banks on there, or even if you put multiple chips into a single DIMM. And part of the reason why those large Samsung DIMMs are so expensive is because they take care of that by mounting the chips on top of each other through an arcane approach called through silicon vias. Adding memory channels increases the power and the pin count on the processor because you can, you know, instead of putting a whole lot of DIMMs on a single channel, you just put out, you know, multiple channels that

Starting point is 00:16:09 have a single DIMM on each one. Well, that ends up that the processor has to drive all of those pins. And so it consumes a lot of power, adds a lot of pins. And so this is something that limits the processor's ability to use the power for more productive uses. IBM's been trying to find solutions for this for years. And in their power architecture, they use something called OMI, the open memory interface. It's a non-DDR interface. And so they'll take DDR memories, stick them on something like a DIM, but larger, with a controller. And then that controller is what talks to the processor through the processor's PCI port. So that's the OMI interface. The OMI interface has now been turned into a CXL, a part of CXL, but they acquired the rights for this. But CXL, the original CXL is about adding slower memory to the memory channel through CXL.

Starting point is 00:17:08 And so one of the nice things about that is it allows you to have much larger memories, but more importantly to the hyperscale data centers is that it allows you to have disaggregated memory that you can now treat memory the same way you treat storage or servers. You virtualize the memory and you can assign different servers different amounts of memory depending on what they need. It requires memory tiering, and because of that, then it can accept different speeds of memories. So back in that table, I showed you that Optane was about a third the speed of dram this would take care of

Starting point is 00:17:47 that without having to have a special interface for it and so we'll talk about it cxl in a little bit just bringing us back this is the exact same table that i showed you before and it's just you know so we've got optane winding down but you know by golly it's still there. You've got NVDIMM if you want to pay for it, MRAMDIMM, once again, if you want to pay for it, and also if you can work with the DDR3 interface and your processor can handle no refreshes. Fast SSDs, there are pluses and minuses to that, and then added DRAM. So let's talk a little bit about what happened with Optane to support all of this. And probably the most important thing that I mentioned before is the SNEA persistent memory programming model. And that was just the start of things.

Starting point is 00:18:39 It allows for you to have hierarchical tiers. You can have different speeds of memory in there. But there are other tiers that are starting to appear in the memory area. So, for example, GPUs, which are widely used for artificial intelligence, use high bandwidth memory. This is a really tightly coupled to the processor memory. It has to be within two millimeters of the processor chip. And so it's always packaged inside the GPU's package.

Starting point is 00:19:12 And it stacks. So once again, it uses this expensive technology that Samsung uses for the large DRAM DIMMs. DDR, you know, of course, that's still going to be used for a number of years. And then CXL. We're seeing memory disaggregation happening where servers don't have to have more memory just in case a large program comes along. If a large program or a program that requires a large

Starting point is 00:19:38 memory space comes along, then CXL allows them to borrow memory space from a shared pool. And then finally, it allows memories to move into the chiplet. And so you'll see this model being used for persistent memory caches, which emerging memories support. And I'll talk about emerging memories in a little bit. Another thing that, or I'm sorry, so Optane's legacy gave you a fresh look at memory. And I just have this the old way and the new way. I was thinking of doing a build for this.

Starting point is 00:20:13 But the old way was all DRAM ran at one speed. And now you can have mixed memories running at mixed speeds over the CXL channel. Second one is that persistence is something storage does and not what memory does. And it's slowed down because of context switches. Each transaction with storage requires an interrupt, and that interrupt slows down the overall access. With the new way of looking at things, it's okay to have persistence and memory, and it's okay not to use context switches to get to those.

Starting point is 00:20:42 And I'll talk about that in a moment. Then I say memory is only put on the memory channel. And the one below it, only memory is put on the memory channel. I love English. So memory is only put on the memory channel. You don't put memory into the storage area because it slows it down. But now you've got four channels that you can do that with, HBM, DDRCXL, and UCIE, which is coming, the chiplet interface. And then the bottom one, only memory is put in the memory channel. Now you can put memory semantic SSDs or maybe even other things onto the CXL channel and communicate with it as if it's memory.

Starting point is 00:21:18 And CXL just hides all of that. So this is what I was talking about with context switches. This is a SNEa slide from years ago back when they were working with the persistent memory programming model and you've got orders of magnitude of speed and so you've got the columns hard drive sat ssd and nvme ssd and then persistent memory which was obtained before it was announced. And the green area at the bottom is where you'd want to use polling of actually having the processor just go back and check and say, you ready yet? You ready yet? You ready yet?

Starting point is 00:21:54 In a loop, because that's more efficient than doing a context switch. Up at the top, the pink part, you'd naturally use a context switch because that's the fastest and most efficient way of communicating with a hard drive SATA SSD or an NVM SD. In between, there's this kind of a funny color thing, and that's where you can't really decide which one is which. And CXL is really designed mostly for that lower green band where you don't want to use context switches. NVMe is really good for SSDs. And SATA, that's fine for hard drives. So you don't need to have a fast interface for that. But CXL is a good place to put persistent memory. So let's talk about CXL. First, I'm going to talk, though, about how Intel, how they forced the DDR bus to accept Optane. You've got Optane that's running at a

Starting point is 00:22:49 third the speed of DRAM on the same bus as DRAM, and nobody's going to want to slow down their DRAM bus to a third its speed so that everything will run at the same speed, which is what the DDR bus was designed for. So they said, okay, we'll put some extra hooks into this. We'll call it DDRT, and it will handle both fast and slow memory. It uses a transactional protocol for the slow writes because Optane writes for about twice the time that it takes for a read. So it will dispatch a write, and then it will get back a response from the Optane module saying that write has been completed. It's based on a standard DDR4 interface, and so it had some modified control signals, which I didn't bring a laser pointer here. I guess I can point with this thing. But the modified control signals are one of these. I think it's this one here. I didn't make it big

Starting point is 00:23:43 enough that you could read it. But the red line and the blue line are pretty much the same signals, the big arrows. There are other arrows that pass through. Those are pretty much the same signals for all of these things. And there were just a handful of signals that were different. They were on unassigned pins on the JEDEC DDR4 standard that went there. And so the timing, the protocols, all of that were the same for DDRT as they were for DDR4. And so that allowed you to put DRAM and Optane into the same sockets.

Starting point is 00:24:16 If you had two sockets per channel for your memory channels, then you could put in DRAM in one socket and Optane in the other, which was what they recommended for their users to do. But the trouble is that every time that JEDEC would come up with a new DDR interface, then Intel would have to follow it with a redesign of DDRT to support this. So that was a big headache for them. CXL solved that problem. It removed not only the requirement to have a new bus that matched the DDR, but it also allowed you to use different kinds of memory with one processor. Right now, I know a guy who looks at it and he says, oh, is that one of Intel's DDR4 processors or is that one of Intel's DDR4 processors, or is that one of Intel's DDR5 processors? And he categorizes Intel's processors by the DRAM they're able to use.

Starting point is 00:25:10 This, with CXL, you can use both DRAM interfaces. CXL allows far memory, which is the memory that's on the other side of the CXL channel, to use any interface. And OMI is a faster version of CXL that is used for near memory that allows the same kind of a thing. I'm not going to talk much about OMI here, but if you'd like to talk with me about it later, I'll be over in the persistent memory lab in Salon 8, and I'll take any questions there. CXL also supports memory disaggregation. I have a nice animated slide I didn't bring with me for that,

Starting point is 00:25:49 but basically if you've got one application that needs a huge memory, you don't have to load up all of your servers with a huge memory. You can have that huge memory be in a pool somewhere else and then just be assigned to whichever server needs it. So memory pools can be dynamically allocated. Data sets can be removed from processor to processor. This is shared memory, and this is something that's even more elaborate that's

Starting point is 00:26:13 only available in the third generation of CXL. And it also paves the way for UCIE, the chiplet interface. So let's just talk about that. Any memory using any server, you have a DDR four server and you're going to have some DDR four D RAM. And so you put in a channel between the two of them. That's, that makes sense. That's the way things are always done with DDR five. You do the same thing. You know, you have the DDR five talk to the DDR five. Um, you know, these typically would be on different server motherboards, but with CXL, it gives you the ability through CXL channels instead of DDR channels for the DDR5 server to talk to the DDR4 DRAM, and it doesn't have to be already connected to the DDR4 server, and it allows the DDR4 server

Starting point is 00:27:00 to use DDR5 DRAM. So that's a nice thing by itself. But these each have to be separate CXL channels. You can also put different kinds of memory. I'm listing some emerging memory technologies, which I'll talk about in a minute, on these. So this is MRAM, resistive RAM, ferroelectric memory, Optane is one. I have a question up here. So when the DDR4 server picks up the DDR5 is there some silicon in the CCL? Yeah, it would be through a CXL channel. Yeah, so

Starting point is 00:27:36 CXL is basically the voltage levels the signaling of PCI but with a different protocol layered on top, because the PCI protocol covers an awful lot of bases, and it makes it a little bit slow to do that. And so CXL has narrowed that down. And without getting into it too much, it will handle either PCI by doing some handshaking at the front and saying, are you a PCI device or are you a CXL device? And then it will do that. There's no extra silicon on the server side.

Starting point is 00:28:13 There's extra silicon on the DRAM side because you need to have something that speaks PCI that is a CXL controller over there. And, you know, Marvell, Microchip, I think Samsung makes their own, and a whole lot of companies are going to come out with that. I would expect pretty much any SSD controller company to come out with a CXL controller at some point because it's going to use the same PCI that they already use on their NVMe SSD controllers. So yeah, these other technologies, they're going to require

Starting point is 00:28:39 controllers of their own, but you could talk to any of those memories with any of that, or you could even put flash memory in there and talk to it with that. Now, if somebody were to do this, and this is with CXL 1.0 kind of an interface, then you would need to have a separate CXL channel for every one of these arrows on here. And the people putting together CXL said, okay, that's not really the optimal solution. Let's get rid of those and stick a switch in the middle and then just have everybody talk to the switch. So this is CXL2. And it just allows these people. Now the switch does add yet another delay, but it does allow fast access to all of these. These delays are just, I think they're sub 10 nanosecond delays. And then if you want to get more complex

Starting point is 00:29:25 and do a fabric, then CXL3 gives you this, which allows the switches to connect different hosts to each other, two different memory arrays. It also allows memory to be shared between two processors and for it to be coherently shared that the cache in one processor is not going to have a stale copy and the cache in this other processor is going to have a fresh copy of something that's supposed to be fresh in the main memory. So it takes care of that. So I mentioned the near memory at CPU. When you build a server, you are always going to have DRAM right next to the CPU, and it's going to be attached by a DDR interface. It's just your extended memory, your slower memory, which is called far memory, which is going to be communicated

Starting point is 00:30:10 with via CXL. So near memory at the CPU, far memory on CXL, and CXL can support all kinds of memory applications. Large memories for, you know, doing this disaggregation of memory, can do memory pools, memory sharing for trading messages and memory fabrics. I happen to think that the memory sharing thing is a pretty cool thing because I watch my son play video games. And there are times when he changes the scene in the video game and it takes a long time to load. That's going to go away because he's not going to be moving data from the processor cache to, or from the processor memory to the GPU memory over an NVMe channel. It's going to be being moved by CXL. So that'll speed up an awful lot. And then I say there are no memory interface dependencies. There really is one. The controller is, the CXL controller on the memory side is going

Starting point is 00:31:03 to have to understand the kind of memory it's talking to, but it's not a big sacrifice. So that leads to UCIE, and the UCIE people said, well, let's just take chiplets and put UCIE on them, or put CXL on them. And I put this here. This is, you know, for people old enough to remember it, Chiklitz gum was like phased out in 2006, but it was something that I grew up with. And it's a gum that's in a candy coating. And I said, oh, that name's too close to Chiklitz. So I'll just doctor that picture a little bit. But what it's for is things like this.

Starting point is 00:31:37 This is Intel's Ponte Vecchio server processor. And you can see those little gold squares. Okay, first of all, around the gold squares is this heavy white silver line. That is where the lid of the package on the processor gets glued on. So you wouldn't see this if you were to buy a Ponte Vecchio CPU module. You would see instead just this big, almost square thing of metal on top that said, you know, whatever the processor number is and the Intel logo and all that. But if you peeled that off, then you would see all these little gold squares. The gold squares are separate chips. Some of them are

Starting point is 00:32:16 memories. Some of them are logic chips. I believe that the one in the upper right-hand corner and the one in the lower left-hand corner are IO drivers. The two largest chips, I believe, are the processing chips for the thing. And then the square ones are probably HBM DRAM modules. So, and Intel says that they're going to be doing that. They're going to be introducing their first client processor using a chiplet approach sometime early next year, I believe they said. I can't remember. I think it's called Stony Brook or something. So one of the nice things is that you can have multiple sources for these chips. Right now, HBM is largely supplied by SK Hynix, which is one of three leading DRAM manufacturers. But the other

Starting point is 00:33:05 two, Samsung and Micron, are trying very hard to get into that market and take some away from SK Hynix. UCIE is really cool for memories because it allows the processors to use, or I'm sorry, allows the processor designers to use a logic process to build logic out of and to use a memory process to build a memory out of. Right now, with the older process technologies, you'll have SoCs, microcontrollers, ASICs, and that kind of stuff built in a logic process. And that limits designers to only using SRAM,

Starting point is 00:33:41 which can be built out of logic transistors, and NOR flash, which is the only other memory that does well in a logic process. They're going to be, okay, for multiple reasons, that NOR flash is going away. And for a reason I'll tell you about shortly, then the SRAM is also threatened with going away. And so what are they going to use in the future? Well, they could use DRAM, MRAM, resistive RAM, FRAM, phase change memory. They're all something that could be a whole lot cheaper than SRAM and could migrate through processes a whole lot better than NORFLASH. And if they did

Starting point is 00:34:18 that, then they'd get significant diarrhea and cost reductions, but it would drive them to using a chiplet approach. But one of the nice things is that it would commoditize chiplets. That chiplets right now are not widely used, and so they're sole sourced, and having the memory built on the chip itself is also a sole source kind of a thing. If you have chiplets, then all of a sudden this memory behaves like a memory DRAM or, you know, that kind of memory where all of a sudden. If you have chiplets, then all of a sudden this memory behaves like a memory DRAM or that kind of memory where all of a sudden it's a commodity and everybody who's building it is competing on price to try to get the business. And so the price goes way down. And that can only really happen if you have the same chiplet used by multiple memory companies and sourced by multiple sources. And so if Intel, AMD, NVIDIA, anybody else who builds processors is using the same chiplet, then the market gets big

Starting point is 00:35:15 and there'll be multiple sources for it. That'll get up the volume, that'll get the cost down. And then Micron, SK Hynix, Samsung, Kioxia, Western Digital might all jump into this market and say we want a piece of that. We're going to compete on price and bam, you know, the prices go way down. Now, I talked about how I was going to talk about SRAM. This is something that makes a whole lot more sense

Starting point is 00:35:41 to a chip designer than it's probably going to make to any of you people. But it's a graph of the area of SRAM in F squared, which is proportional to the size of the transistors on the chip. And so if an SRAM, you know, let's say that bottom line is 500 times the size of a transistor on a chip, then a typical SRAM at whatever, 14 nanometers or 10 nanometers is going to be about 450 times the size of a transistor on there. When you get up towards the three nanometer area, all of a sudden with the Samsung process, that one bit of SRAM is going to be as big as a thousand transistors are in the logic. Now that's something that as it goes off into the future, that's going to be a

Starting point is 00:36:38 really bad problem. What this is driving is it's first of all driving for a very large area of a processor chip to be SRAM, which is really, it's not using it to its best cause, but it's also driving the cost up for these chips more than it needs to be. And so it's going to cause an emerging memory technology, probably MRAM, if things stay the way that they are today, to become the cache memory in standard processor chips. Maybe not all of the cache memory. Maybe it will be the L2 cache. But like that diagram that I showed you where I had the two curves,

Starting point is 00:37:16 red curve and the white curve, you're going to see that the size of the caches, the L2 caches, is going to just grow exponentially on these processors once chiplets start being used and once something like MRAM starts being used for that because a very large cache can do such a good job even though it's really slow. So we'll see large capacity future caches

Starting point is 00:37:40 using emerging memories in order to drive out the cost. So chiplet memory can be persistent. I think, you know, I've already said that a number of times. And what that means is that you can have persistent code cache, persistent data cache, which, you know, is a new thing. And then software will need to be written that really takes advantage of that. So that's going to require some re-architecture of that. The NVM programming model, I think, is a good basis for this. There will be security concerns. And I was just talking to John Geldman, who's running the security session right now.

Starting point is 00:38:15 He says otherwise he'd be in here. And, you know, what if persistent memories with persistent caches would fall into the wrong hands? And, you know, do the cache lines, how do you handle that? Do you erase cache lines when they need to be invalidated? Should memory communications and NVM data at rest be encrypted? You know, this is all big questions that are going to have to be answered. And John says, oh, yeah, we're on top of that. Yes, the question in the back Okay, so you were more announcing what's going on

Starting point is 00:39:16 in the storage security areas. Yeah, you haven't figured this problem out yet, but you're working on it, right? Okay, okay, and so you're waiting for a product. Yeah, it's always nice if you can put together the standard before the product comes out, but the product and the market are really the things that drive it. Okay, so I guess I've taken care of all that.

Starting point is 00:39:41 And so off in the future. You know, because of things outside of mainstream computing, we're seeing emerging memory falling into place. And that's mostly because in microcontrollers and ASICs and things that use NORFLASH, NORFLASH can't be built on processes that are smaller than 28 nanometers. And as I said with that other chart, SRAM is growing very unattractive. There's already some use of emerging memories.

Starting point is 00:40:13 MRAM, I told you about with IBM and some other places, is being used in the enterprise. And there's really strong growth in consumer applications for MRAM, which is going to drive the economies of scale. And so we're expecting that the increased consumption will cause the prices to go down because of the economies of scale. And then the technical benefits will fall into the hands of SNEA members. So fast, very low power, less messy than flash, but they're all persistent. And we've got a report that we wrote on this. It's the four types of memory that are on there. MRAM, the magnetic one, phase change memory, which is

Starting point is 00:40:53 Optane, resistive RAM, which the benefit for that is it can go into a cross point just like phase change memory, so it can be really cheap. And then ferroelectric memory, which is something that can be built on current processes. And all of these new memories are persistent. They have a small single element cell. So I got to my last bullet first. They're all persistent. So they can be used as persistent memory, but they also use a single element cell. A single element cell, as opposed to SRAM, which was so problematical because it uses six transistors. These all use a single transistor or even a diode type select mechanism. And so they can be made very small and they can be stacked into 3D. And so the promise, the reason why people have been researching these things since the 1960s is because they can be

Starting point is 00:41:42 built much smaller than DRAM or NAND flash. And as long as they can be built smaller than in theory, they should be able to be built fast. There's I'm sorry, cheaper and cheap. Cheaper is good. It drives the market. They also allow right in place. You're not going to have all of this nastiness in Flash of block erase, page write, of garbage collection, or any of that kind of stuff because of the fact that you can just write over existing data. And they also offer much more symmetrical read-write speeds. Usually the writes are less than 10 times as slow as the reads. And so, you know, very often, they'll be only two times or three times the speed, the time. So that means that they're very fast memories

Starting point is 00:42:32 in comparison to NAND flash. They're much easier to use. You know, this is our view of what the revenue is going to be like out through 2030. It's kind of a small font on there. I'm sorry about that. It's actually 2032 that we go out to with our forecast and the report.

Starting point is 00:42:52 And you can see the DRAM and NAND flash, they're growing, but this is a log chart, so it doesn't really show too much. But you've got really fast growth going on in MRAM, and we're expecting to see that happen over time. It might not be MRAM. It might be one of these other technologies, and we're expecting to see that happen over time. It might not be MRAM. It might be one of these other technologies, but there is going to be that. And this is just a plug for our report on that. Each one of those subway lines on there is a

Starting point is 00:43:15 different type of technology. You've got MRAM phase change and stuff shown off on the right-hand side, and then all of these different options of them that are being explored right now. And eventually, one of these is going to win out, and we cover all of them so that we'll always be a winner. Oh, and the report's now available. There's a URL on the slide, which will be available to you. So we're almost to the end of our 50 minutes here. I'll just say Optane, in its short life, it created a great legacy. It's created a programming module and new architectures, and there are many Optane alternatives that you can use with these.

Starting point is 00:43:58 CXL, you know, all of them have their disadvantages, but that was in the table. CXL has opened the door to new memory architectures, and so processors no longer need to be tied down to a single DDR4, DDR5 interface or a single memory type. UCIE takes CXL's strengths and makes them available to chiplets, and chiplets are the way that future processors are going to be made. So we think that emerging memories are going to really solve a lot of problems tomorrow through these changes. And with that, I'm going to open it up to more questions.

Starting point is 00:44:37 I'll try to remember to repeat the questions, which I haven't been doing so far, so please keep them relatively short. All the way in the back row. Okay, so the question was, instead of using CXL, why didn't we use NVMe over Fabric? And the short answer is that CXL has been designed to be significantly faster than NVMe. NVMe still does a context-switching protocol. It still uses interrupts, and that's very fast for NAND flash. It wasn't

Starting point is 00:45:08 really fast enough to make Optane look very good. And it's much too slow for anybody to want to put DRAM on. And the main reason why the hyperscale data centers want CXL is because they would like to put DRAM in shared pools. Any other questions? Well, you've all been very easy. As I said before, I'm going to be going over to the hardware lab where they've got CXL over there in Salon 8. But anyway, I'll be over there. If you have any other questions you'd like to talk to me about them one-on-one, I'll just go over there and talk with you about them. So thank you very much. Thanks for listening. For additional information

Starting point is 00:45:54 on the material presented in this podcast, be sure to check out our educational library at snea.org library. To learn more about the Storage Developer Conference, visit

Storage Developer Conference - #198: Riding the Long Tail of Optane’s Comet - Emerging Memories, CXL, UCIe, and More

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Storage Developer Conference - #198: Riding the Long Tail of Optane’s Comet - Emerging Memories, CXL, UCIe, and More

...

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.