Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x08: CXL Brings Flexibility to Server Configurations with Astera Labs

Starting point is 00:00:00 Welcome to Utilizing Tech, the podcast about emerging technology from Gisdalt IT. This season of Utilizing Tech focuses on CXL, a new technology that promises to revolutionize enterprise computing. I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gisdalt IT. And joining me today as my co-host is Craig Rogers. Hi, I'm Craig Rogers. You can find me on Twitter at CraigRogersMS. So Craig, you and I have spent a lot of time talking about CXL technology. We recently published a great white paper at gestaltit.com focused on server architecture. And one of the things that we noticed in there is that really the key to server architecture is balance and flexibility.

Starting point is 00:00:49 And more than anything, it seems, you know, we like to geek out about number of cores and megahertz and gigabytes and all that kind of stuff. But more than anything, it's about building the right system for the job. Would you agree? Absolutely. Absolutely. You need to scope your requirements properly. And if previously memory has been an issue where you couldn't scale up enough, we now have technology coming that's going to let us work around that.

Starting point is 00:01:17 Yeah, I recently did a little bit of a thought experiment on that one because, you know, looking at the number of DIMM slots, the one DIMM per channel memory architecture that is increasingly popular, and of course, the size, the availability of size of DIMMs, that really does sort of paint people into a corner in terms of how much memory they can provision in a server for a given amount of money. And that's one of the big use cases, I think, that has been driving CXL technology so far, and at least in the initial rollout, is it's all about memory. So the other day over at Serve the Home, we saw a great article about memory expansion using the technology of a company called Astera Labs. So we decided to invite Ahmad Dinesh in here from Astera to join us and talk about this

Starting point is 00:02:07 awesome flexibility that you can get in terms of memory expansion using CXL. So Ahmad, welcome to the show. Hi, Stephen. Hi, Craig. Thank you for having me today. I'm excited to talk to you today and always excited to talk about CXL. So my name is Ahmad Dinesh. I'm the Senior Director of Product Management at Astera Labs,

Starting point is 00:02:26 responsible for our memory connectivity group. I focus 100% of my time on CXL and memory expansion, memory pooling technologies, and been working with CXL Consortium since it first started back in 2019. Were you previously working with some of the other cache-coher coherent interconnect technologies or did you come in with CXL? I was, yeah. Originally I was working in on OMI or OpenCAPI, a lot of experience as well in Gen Z and it's interesting seeing how the industry really consolidated together right now. Now CXL

Starting point is 00:03:01 Consortium is essentially solving all the solutions all at once, where we have OpenCAPI, Gen Z have been kind of folded under CXL Consortium now as well with all of their assets being owned by the consortium. Yeah, it seemed like that all was coming and was exciting and promising, but it just didn't quite catch fire in terms of implementation, in terms of practical applications. And that was kind of disappointing, honestly, because we've been talking about a lot of this stuff for a while. What makes this time different from those other times? Yeah, really good question. I think the fundamental difference between, oh my, Gen Z and CXL and why CXL has really kind of got that momentum is CXL resides over the PCA physical layer. CPU is natively supported today. It became really easy for everybody to adopt.

Starting point is 00:03:55 So you get the leverage, the same physical infrastructure that you have today and just changing the protocol so you can get a cache coherent memory access. It's been very good that CXL has built on top of PCI, because there's already so much that was standardized. You know, manufacturing processes already were able to work on cards, those sides. So we're not having to reinvent the wheel, so to speak, to gain this additional functionality. The standardized, you know, what the consortium have come up with is great.

Starting point is 00:04:28 It's fantastic. And it's great to see so many different people, so many different companies all working together towards that same goal. You know, I think it's gonna really help adoption. Yous have been in the news there recently with a rather successful funding round. Are you gonna use that money to help drive new products

Starting point is 00:04:48 as he has already had products on your website? How are you planning to leverage that investment? Yeah, that investment is definitely going to be leveraged here as we move forward. We're essentially positioning ourselves to, we're past kind of that startup phase and really in the scale up phase where we're not only expanding in terms of the number of product lines that we support today. We have three major product lines that we've announced. So you can expect a lot more to come from us soon. We've scaled out in terms of our R&D centers as well, opened up offices, R&D offices in Vancouver and Toronto recently and scaled out our team significantly now.

Starting point is 00:05:29 I noticed you also have services that you're selling. Are you working with other companies to help them adopt and work with CXL as well, then? Yeah, we develop the actual silicon itself. We develop a lot of board level solutions as well. And for customers who are looking to purchase our silicon and implement their own custom solutions, custom board solutions, we provide a lot of services to really be able to help them ramp into production as part of that. Really, a service that we do for free is what we call our cloud

Starting point is 00:05:56 scale interoperability lab as well, where we actually do a lot of interoperability inside of our labs so that becomes really easy for customers to adopt. Right. With CXL technology being new, it requires a lot of interoperability inside of our labs. So that becomes really easy for customers to adopt, right? With CXL and all that technology being new, it requires a lot of interoperability, a lot of close partnerships with the CPU and memory vendors. So it really requires a shift in terms of the responsibilities of who's doing that testing, right?

Starting point is 00:06:18 It's no longer a CPU DDR subsystem. It's now a CPU CXL DDR subsystem. And it requires a new set of collaboration with a lot of industry partners. So those are a lot of the types of services we provide here. Isn't it interesting that interoperability testing and things like that, people might think, oh, well, it's all about the technology. But in terms of making it a reality, it's really all about making sure that the systems really work as expected. And so things like that are often overlooked.

Starting point is 00:06:47 I remember, you know, in my history in storage as well, the technologies that had good interoperability testing and communication between vendors, even competing vendors, those were the technologies that took off, whereas the ones that were much more closely held and relied on basically recommendations from integrators or OEMs or something, those were the ones that were a lot slower to catch on. It's really great to hear that you're doing that, because, of course, there are competing products in this same CXL memory expansion space. Yeah, thank you, because you absolutely hit the nail on the head there. You take a look at a lot of these competing technologies

Starting point is 00:07:29 and what really requires that when we talk about the scale at which we need to deploy these solutions and memory being such a fundamental component to the OS, the reliability, availability, serviceability features of a memory controller are significantly important here to make sure that the data centers can kind of maximize their uptime and get that user experience where it needs to be. Intel and AMD, of course, have had generations of being able to improve on their reliability, right? So the close partnerships we have with them have actually

Starting point is 00:08:03 helped us really kind of lead the industry here, what we're doing on reliability as well. And we're going to be seeing obviously Epic4 and Genoa arriving, which is going to enable your products to actually hit the market at a much larger scale with current server vendors. And I'm sure you're probably already working with a lot of the hyperscalers, given the services you're offering. It's interesting, the hyperscalers have almost had something similar to CXL level functionality, through proprietary means or they've had smaller options. It's great to see that they're all working now with CXL.

Starting point is 00:08:47 The standardization always helps. Yeah, the standardization is key here. There's a certain amount of industry standardization that needs to happen so you can get the right ecosystem getting together. But there's obviously ways you want to differentiate yourself as well, right? If everyone just builds exactly what the standard expects, you won't be able to really kind of differentiate and really provide that value feature.

Starting point is 00:09:10 As you noted, you know, AMD's EPYC processors, Intel's next generation Xeon, CXO 1.1 is ready to start deploying, right? And we're working with a lot of the hyperscale customers to really kind of fit this memory expansion technology, the next generation of memory pooling as well, to kind of hit that wave in the next year. So when customers are looking at these products, what are the big differentiators? You mentioned that you do need to have product differentiators as well as just product. What are the big product differentiators that they should be looking at with memory expansion and memory pooling? Yeah, good question. So when you take a look at what it takes to deploy memory, the baseline is reliability, availability of services, the RAS features that you need there. And so without going into a lot of specifics of exactly what we do there, we do actually provide

Starting point is 00:10:01 a lot of customization that we put within our silicon specific to what some of the CSPs have needed. That goes a little bit beyond what a lot of people expect to have as a baseline product. And so that reliability becomes kind of that cornerstone of making sure that you can deploy this at scale. The second is performance, right? What everyone is used to today is local memory attached to a CPU or perhaps remote memory if you're going over a NUMA hop. And so having CXL memory that's, you know, at or below that NUMA hop is kind of that industry expectation here. So because that way, it doesn't need a lot of software complexity, right? If your latency is too high, then you need to get into the realm of memory tiering type of solutions and

Starting point is 00:10:45 having software that's aware of being kind of that latency that's higher than that. And so the performance is that really the second category there. The third, I'd say, is really on security, talking about end-to-end security as well. Of course, you have standard things like you need secure boot, you need authentication capabilities, as you do with a lot of the silicon that deploys in the data center. But we have additional security features to actually solve the end-to-end security requirements for these solutions, especially with the increase in expectations for confidential compute in the hyperscale data centers. And what solutions then have you come up with to provide that insight and provide that layer of security? Yeah, so that security comes in with encryption. So when we take a look at how servers are going to be deployed here,

Starting point is 00:11:38 the memory actually gets encrypted as well. So you're not only protecting the actual CXL link, you're actually protecting the actual memory itself as well. How do you differentiate them between what process or server is allowed access to that memory? Yeah, so the way it's done, we'll go too much into the NDA information here, but the intention here is that when you're moving memory away from the CPU and now it's over a CXL link, it is easier for potential hackers to get access to that. Over a CXL link, you could be snooping on it over across the memory interface. But when you take a look at it from how a lot of memory is used, where you have actual virtual machines that are instantiated, you want to be able to protect against software

Starting point is 00:12:22 attacks as well. You want to be able to protect against one VM accessing another VM's memory. And so being able to provide per VM memory encryption is really key here as we move into the next generation of solutions. Yeah, that's really interesting because all we need or ironically not, all we don't need is a bunch of stories coming out and saying like CXL is a giant memory or a giant security risk. CXL enables, you know, all this, you know, snooping and, you know, isn't secure for multi-tenancy and all these things that, you know, people are going to say if there is an exploit in the wild that allows that sort of thing, right? Yeah, absolutely. And then you can imagine this getting significantly more complex when we're talking about memory pooling, where you have multiple CPUs that are being given access to the same device memory, and being able to have the security capabilities of making sure that one CPU is not accessing another CPU's memory, and they have all the security checks in place is critical.

Starting point is 00:13:21 Otherwise, as you said, people are going to take a look at CXL and, hey, you know what? It didn't hit the mark here. And security would be a really big gap that would prevent CXL from being adopted. One of the things you mentioned in there, though, that caught my ear was this idea that CXL can deliver memory that performs close enough to NUMA-enabled server memory that you don't necessarily need to treat it as hierarchical. Did I misunderstand that? Or is that something that you guys are going to be able to deliver? Yeah, that's exactly what we're delivering. So today's software understands both local memory

Starting point is 00:13:58 as well as kind of that one NUMA hop level latency. And so being below that performance benchmark allows existing software to just work, plug and play. And we've actually been able to show that with a lot of our demos at industry events, doing different benchmarks to show that, you know what, if we actually compare our performance with CXL attached memory, even against local memory, we're performing usually in the range of about 96 to 98% of the performance with certain benchmarks. Specifically in that case, it was the Memtier benchmark that you compare against local memory. And so it really comes to show that CXL is ready

Starting point is 00:14:37 to be adopted, right? We're hitting the performance benchmarks we need. We're hitting the reliability we need. And with the CXL 1.1 CPUs kind of ramping into production soon, we're going to start seeing the CXL at least 1.1 for memory expansion being adopted very quickly here. Yeah, that's really exciting because that was one of the things I think that a lot of people were sort of scratching their heads over is, especially, you know, coming from the wave with Optane memory with the third generation Xeon CPUs and how that was really fundamentally a different performance characteristic than system memory. And I know that a lot of people have been, I guess, maybe conditioned or just assumed that memory attached over the PCI Express bus would also be similarly, maybe not quite

Starting point is 00:15:24 the same performance characteristics, but differentiated from main system memory. We've certainly heard that as well from some of the memory software companies. We've seen companies coming to market with drivers. We've seen what's going on in Linux kernel with regard to CXL-based memory. But man, if even, maybe not all the time, maybe not for pooled memory or shared memory or something, but if some memory could be different, indistinguishable from system memory, that would be really an exciting advancement.

Starting point is 00:15:56 Yeah, and it's gonna be exciting to see as we get into more of these performance benchmarks and seeing some of the actual end user applications taking advantage of it over the next few months here. So we're going to be doing some more demos and industry events you can look forward to there. But this isn't to also take away from what a lot of people are doing for memory tiering. Even our solution, we actually have not disclosed the exact memory type, but we actually support a different non-DRAM based memory solution as a second tier.

Starting point is 00:16:28 It's actually, so there is a space for that. Now, when you get to that space, though, you want to be able to put cost-effective memory there, right? You don't want to put really high expensive DRAM and use it as a lower tier. If you're going to have a lower tier, have a cheaper memory behind that tier. And then that's really where that software can really bring a lot of advantages. Yeah, that's pretty exciting. I suspect that I know, but I guess we'll have to wait until it's official before we mention anything. But yeah, there are other alternatives to DRAM.

Starting point is 00:17:03 But one of the other things we've heard about as well is the idea that perhaps you use different types of DRAM in CXL. So for example, maybe if the systems are going to DDR5, maybe DDR4 goes on the expansion boards as a different tier, or even DDR3, I don't know. Have you experimented with that whole concept as well? Yes, we have as well. You hit the nail on the head again. When we take a look at how memory is being deployed today, each CPU generation has a very specific DDR rate that they support. CXL allows us to kind of decouple whatever memory you put behind that.

Starting point is 00:17:43 So the metric that you look for is, well, I have a certain amount of bandwidth, I have a certain amount of capacity, I have a certain latency sensitivity, and a certain cost target. And so you can put different types of memory behind it and just really look at it from what does it provide me from the CXL interface perspective? I have a BI4, a BI8, or BI16 connection. It gives me a certain amount of bandwidth. How many channels of DDR or other type of memory solution do I need on the back end to utilize that by 4, by 8, or by 16 connection? You mentioned earlier that some of your benchmarks were within 96% of motherboard. I'm imagining how much easier the last 20 years would have been if storage had went that way. But the availability of DDR5 at the moment en masse, I think certainly looking at DDR4 and 3 options, it's going to be good. Because I'm sure not all workloads require DDR5 speeds.

Starting point is 00:18:46 There's going to be workloads that just would be happy enough on DDR3 or 4. So it's a really cost-effective way of approaching it, given that RAM is the biggest cost component of any server. Yeah, and that's, I think, really, Craig, that's really the exciting thing here from an applications perspective is what would you do if you could have all the memory you need? And when I'm saying you, I mean the, you know, the application developers, what would you do if you could have the right amount of memory? If going beyond, you know, so many slots times 32 or 64, you know, whatever size

Starting point is 00:19:21 your DIMMs are, if that wasn't the choice you were making, and if instead you were saying, I want to have the right amount of memory in this server and keep things in memory, I think that's the real question here. And that's the real opportunity as well. Yeah, absolutely. And when you say the right amount of memory, a lot of applications today from a capacity perspective have enough memory, right? But then what they need is more bandwidth. And so one of the interesting things to actually take a look at CXL for is rather than using, you know, more expensive DIMMs that are twice the capacity, use your cheaper DIMMs, put them behind CXL. You get the same amount of capacity, but you actually get a lot more bandwidth.

Starting point is 00:20:06 And so you kind of get to play with CXL as this tool now to kind of get that sweet spot of bandwidth capacity that a specific application needs. From an architecture standpoint, it's really interesting now to see the change that's going to be coming to solutions architects who are used to composing, you know, an X amount of storage RAM, you know, capacity, and then they would refactor that into servers. Now it's, you know, we're going towards rack level composability. So how you amass those resources at rack scale is going to change completely. You know, you'll still be able to do it the old way.

Starting point is 00:20:44 You'll be able to do it with, IXL. There'll be hybrid solutions. It's going to open up a lot of new architectural possibilities that simply haven't existed. Clear initial winners are going to be HPC workloads. You know, they're, I'd imagine users probably seeing those being very early adopters as well. Yeah, HPC is definitely an early adopter here. And we can see even some HPC applications are at the launch with Gen Z based or FPGA based Gen Z solutions years ago as well. And so we're definitely going to see this influx of CXL kind of taking over that space here. With CXL 1.1, it was really kind of thought of when we were launching that as it's a point-to-point solution, right? It provides memory expansion. As we got into 2.0, we started thinking about, well, okay, we need switches in this path here. But CXL 2.0 essentially enabled getting to a one

Starting point is 00:21:41 CXL switch hop. And then 3.0 starting to go, well, let's actually do multiple switch hops. Let's get the fabric level solutions and get really close to essentially where Gen Z was, where we can create large fabrics of, I'll call them resources. And those resources aren't just memory, right? It can be compute, it can be GPUs, accelerators, NICs, it could be storage. You could actually even have PCA devices, PCA NVMe devices in a CXL fabric. And that fundamentally then solves kind of that composable, disaggregated architecture that a lot of people in the industry have been trying to solve for multiple years. So in terms of, I guess, practicality here, we at C22, there were a few companies there with Astera Labs

Starting point is 00:22:26 technology on display. You know, it was shown, of course, with Genoa from AMD, which is the first production server platform to come out with CXL, as well as Intel's Sapphire Rapids. Now, of course, there's sort of an NDA component here in terms of Sapphire Rapids. Now, of course, there's sort of an NDA component here in terms of Sapphire Rapids, but people are talking about it. People are looking at it. People are showing it. What is this product going to look like

Starting point is 00:22:55 when it hits the market? Yeah, so when we take a look at CXL, the first wave of these products is really going to be focused in on memory expansions to unlock the bottlenecks that are occurring for a lot of the AI and machine learning applications. In-memory databases are kind of a very big target for that amount of capacity that's needed. Because when we take a look at AIML, the complexity of the models, the size of the models have grown so large over the

Starting point is 00:23:20 last number of years here that we just need a lot more capacity, a lot more bandwidth for those. So the products are really going to be looking like for this first wave, add-in card level solutions, where it provides the flexibility where you can essentially plug in an add-in card with DIMMs into this slot, where the existing DIMMs that are being used elsewhere anyways, that the Hyperscale customers that the server OEMs are purchasing anyways, just get plugged in behind that type of card with a CXL add-in card, similar to kind of Chemform factor. But a lot of them are actually going to be more custom solutions as well that provide DIMM connectivity and a slightly different server architecture to deploy memory expansion with that flexibility that we talked about.

Starting point is 00:24:07 And so you can plug in a 16 gigabyte DIMM32 all the way up to, in our case, 512 gigabytes per DIMM, depending on what they need. So rather than building an EDSFF type of drive where you're limited to maybe a BI4, a BI8CXL with a fixed capacity, the first products are likely going to be launching here with DIMM-based solutions for that flexibility and performance that's needed. Yeah, I think that's going to be pretty cool because I think the idea

Starting point is 00:24:35 that you could buy an add-in card that looks a little like a PCIe card and has some DIMM slots on it and you can fill it up with DIMMs is pretty awesome. So, Ahmad, that all sounds really cool. When are we going to see this stuff hitting the market? I mean, is it available now in products, or is this something that's happening in 2023? Yeah, so we've been kind of in pre-production phase here, scaling out to a number of customers already. So we've been shipping quite a number of products,

Starting point is 00:25:04 both from a silicon perspective, as well as from our board level solutions to kind of hit this ramp here. So we're expecting the first CXL wave 1.1 to really start wrapping in mid to late 2023 here, as we start seeing the CPUs getting into that production phase as well. Yeah. And I mean, that's the thing we We've all been holding our breath since the last couple of years waiting for Sapphire Rapids and now Genoa as well, and hoping that we'll be able to finally get our hands on this technology. But it really is here. People are showing it.

Starting point is 00:25:37 People are demonstrating it. They're in production. It's pretty cool. Well, thank you so much for this quick overview. I really appreciate some of the thought-provoking aspects of this conversation, especially the idea that maybe CXL memory might not necessarily be hierarchical. And of course, it can also be hierarchical with different memory types. I love that thought. And can't wait to get my hands on this stuff. Before we go, where can people connect with you and continue this conversation? Well, Stephen and Craig, thank you so much for having me today. I'm always excited to talk about CXL. Looking forward to talking to you again soon. And please visit esterelabs.com for more information. Thanks a lot. And as for me, you can find me at gestaltit.com. While you're

Starting point is 00:26:21 there, just look in the sidebar for our server architecture at data center scale white paper, where you'll see Craig and I talking quite a lot about data center architecture considerations as well as CXL in the future. Thank you for listening to the Utilizing CXL podcast, part of the Utilizing Tech podcast series. If you enjoyed this discussion, please do subscribe and rate the podcast in your favorite podcast application. You can also find us at youtube.com slash gestaltit video if you prefer to watch this on video. This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise. For show notes and more episodes, go to utilizingtech.com or find us on Twitter and other social media platforms at Utilizing Tech. Thanks for listening. And we'll see you again on January 2nd, since we're taking a little bit of a holiday break.

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x08: CXL Brings Flexibility to Server Configurations with Astera Labs

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.