Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x13: Enabling CXL in Heterogeneous Compute with Arm

Starting point is 00:00:00 Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT. This season of Utilizing Tech focuses on Compute Express Link, or CXL, a new technology that promises to revolutionize enterprise computing architecture. I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT. Joining me today as my co-host is Craig Rogers. Hi, Stephen. Good to be here again. Looking forward to our upcoming conversation around heterogeneous computing and CXL.

Starting point is 00:00:36 Absolutely. Craig and I have worked together quite a lot on various things, including a recent white paper on cloud and data center architecture. But both of us are also quite aware that architecture is much more than the CPU. Although we're very excited to see AMD and Intel announcing CXL support in their mainstream server CPU lineup, we've also been talking to a lot of vendors who are developing peripheral connectivity and switching chips, software that enables CXL and composability management. Because, of course, it doesn't matter if the host supports it,

Starting point is 00:01:21 it matters if the whole ecosystem supports it. And there's one company that we really were looking forward to talking to because they have a lot more presence in the data center, I think, than this focus on the x86 CPU world would have you think. Right, Craig? For sure. For sure. ARM devices are used in almost every server. People don't realize it's not just an Intel or an AMD chip in there. There's likely ARM as well. Absolutely. And of course, there's been a lot of attention to ARM CPU coming to data center and cloud. But whether it's the CPU or peripherals, memory, everything has ARM chips. So that's why we're really excited to have Eddie Ramirez from ARM joining us today. Welcome to the show, Eddie.

Starting point is 00:02:14 Thank you so much, Stephen. Happy to be here. Great to be talking to you and Craig today. Yeah, it's really good. Ever since I saw your presentation at the CXL forum, where you talked about bringing CXL to the ARM platform, I was very, very excited. Because as I said, I understand that ARM chips, yes, there are ARM CPUs. Yes, they're making waves in cloud and data center. But the fact that ARM is so, so important in the world of heterogeneous compute and in the world of basically everything else that's happening within the data center, it's really, really critical that you all are involved. And it's great to see that you are. I wonder if you can give us just from the start, just a bit of a

Starting point is 00:02:59 roadmap of what is CXL to ARM? Where are you working on it? Sure, no problem. Let me just do a quick introduction of myself. Eddie Ramirez, Vice President of Marketing for the Infrastructure Line of Business. And so I'm part of the business unit that's really looking at how to enable Arm and a robust ecosystem around Arm to be able to deliver solutions within the data center, the cloud, 5G infrastructure, and networking infrastructure. And for us, CXL is something that we feel is going to be very transformational to these market segments. ARM plays a kind of a unique role. We're an IP provider, and so we are actually providing a lot of the processor IP that goes into making not only like server processors, and you see partners like Ampere, you see partners like NVIDIA, and even cloud providers like AWS,

Starting point is 00:03:53 who are now building their own server SOCs, utilizing ARM and this Neoverse platform of IP to build those solutions. But ARM itself is also found in a lot of other places within the server. You see vendors who build BMC chips, right? These are the chips that help provide manageability interfaces and capabilities using ARM cores. We're also in several of the storage devices. And what's now becoming quite interesting

Starting point is 00:04:22 is the accelerators, right? We talk about this heterogeneous move in terms of democratizing compute somewhat. And an example of that would be like the SmartNIC and DPUs, where most of those are using ARM cores to offload a lot of the, what I would kind of consider infrastructure tasks that a server does and offloading that from the main processor to these accelerated devices. And CXL now brings a kind of a fabric and a protocol together that is common right throughout the industry. It's a standard that so many folks are working on that can really help these devices talk to each other and also be able to provide real composability

Starting point is 00:05:09 in the future of the data center. And so we at Arm are very interested in trying to move that forward, enable these vendors that are building these solutions, right, to be able to integrate CXL 2.0, 3.0, and future technologies into their hardware. It's interesting there that you started with CXL 2.0 and then obviously leading to 3.0. 2.0 is obviously allowing memory pooling across multiple hosts. Is that the voice of your customers saying we need this level of functionality? is obviously allowing memory pooling across multiple hosts. Is that the voice of your customers saying,

Starting point is 00:05:48 we need this level of functionality? What you're seeing with CXL 1.1 is really the enablement, right, of memory expansion, right? And that provides a lot of value, right? I don't want to at all discount the value of memory expansion. Because if you think about it, the way that folks have been building servers up to this point for these, you know, high memory workloads is they've actually been adding multiple server sockets.

Starting point is 00:06:20 And you get to a point where you get like a four socket server, where the whole goal of that server was really the memory. So the extra CPU cores go unutilized, right? Nobody wants to spend money and not actually get that return. So now you're able to independently increase the memory without actually adding more CPU sockets. And we see, for example, that that's going to be very important within the ARM ecosystem, because a lot of the vendors who are deploying ARM-based server SoCs are doing that in very high core counts. You have, for example, Ampere with 128 cores per socket, right? So suddenly the core counts have expanded so significantly over the last five years that you want the memory to catch up. And now you're able to do that with 1.1 and with memory

Starting point is 00:07:13 expansion. So I think you'll see this is 2023 is the year where folks actually start maturing the memory expansion solutions and bringing those to market with CXL 1.1. But we're now working with partners, right, who are designing SOCs for the next gen. And that is really going to target CXL 2.0. And with 2.0, I think you now will bring this extra use case of memory tiering where a tier of pool memory or far memory is now available to the users. And that's really also powerful because what you tended to find is that within the data center itself, much of the memory actually goes unutilized. Great paper by Microsoft claiming that almost 50% of the memory that most of the,

Starting point is 00:08:09 you know, VMs that run on the Azure cloud, they're not touching that half of that memory footprint that they actually are paying for. So that tells you that there is probably another mechanism for how to optimize memory so that you don't have to allocate it all up front and pay for it all up front. And that's what the memory pooling in CXL 2.0 hopefully brings to market. So really excited to be working across the industry within the ecosystem to bring those solutions to market. And one of those areas where we see a lot of this really interesting work in the ecosystem is not just in CXL, but also in OCP. When we think about heterogeneous workloads at the moment, you know, we have different

Starting point is 00:08:59 types of processor cores with GPUs, with NPUs, and now extending that out via CXL with ARM, given how well ARM have performed on an increasingly recognized metric, performance per watt now is becoming much more prevalent in some decisions and enabling CXL norm will give potentially a better place to run certain types of workloads that your processors are very good at and handle very well. Craig, I think you've hit it on the nail, right? These systems can all get faster, right? But if they get faster and are consuming you know more power then it's going to be difficult to deploy them right because everybody cares about tco at the end of the day right and so you can you can give an example you can move to ddr5 but if you look at the power consumption

Starting point is 00:09:57 of ddr5 to ddr4 you're going to pay uh with that within your opex budget. And we've recognized that as well. And I think that's been part of the reason why the market has been looking at Arm solutions, because we are bringing the same power efficiencies that we've brought to the smartphone mobile space and bringing that into the infrastructure space. And so power efficiency is definitely one of the key value props for Arm within the data center space. And so power efficiency is definitely one of the key value props for ARM within the data

Starting point is 00:10:26 center space. And it's also one of those that is driving this exploration and adoption of heterogeneous solutions, right? Is there a lower cost way of increasing the compute footprint using devices that could be the processor, that could be smart mix DPUs, that could be other accelerators as well. So I think you hit it on the nail. That is what's going to be driving a lot of the exploration and interest in heterogeneous compute. Yeah, to continue on that, it does seem that the strategy of the architecture strategy of basically every major compute vendor, as well as all of these other vendors in the space, whether they're ISVs or cloud vendors, is to increasingly build heterogeneous systems that have basically that bring in specialized accelerators for different workloads.

Starting point is 00:11:26 So whether it's NVIDIA, who is building a sort of a mesh of GPU and CPU, which is based on ARM IP, or Intel and AMD, who are increasingly leveraging DPUs as a way to offload data processing from their x86 CPUs, which in some cases also use ARM IP, it does seem that heterogeneous compute is the future. And if you look at beyond CXL for memory, and if you look at CXL as a way to enable sort of scalable, heterogeneous systems that combine, well, that transcend the traditional architecture of a server where you've got a CPU and memory and expansion cards and so on. If you look at CXL as a way to sort of break that down and build a different kind of system, I think that we can all agree that that different kind of system is going to conclude both CPU and special purpose acceleration for all sorts of

Starting point is 00:12:33 things, whether it's a data processor, whether it's doing things like encryption and compression, whether it's doing things like ML acceleration or traditional GPU tasks, or whether it's some sort of specialized processor. And when you're building one of those specialized processors, it's very likely that companies are going to be looking to arm IP as a way to bring that to market. Because of course, you need a processor on that thing that exists on the other side of the network. Is that a way to think about heterogeneous compute that's more concrete for people? That is, that's exactly what we're working on today.

Starting point is 00:13:16 We're working with a lot of our silicon partners, right, who are deploying or looking to deploy silicon with accelerators and how to make that easy for them to do that. And at the same time, how to make it easy for them to intersect CXL capabilities. So part of our Neoverse platform is, you know, we're known for the IP processor cores, but a big part of our Neoverse platform is actually our interconnect. And you can think about this as the fabric that connects multiple cores together. We've spent quite a lot of time and energy on making the most capable, lowest latency fabric possible. interconnect what we call our CMN products, that we are enabling the ability of landing CXL features, right? And particularly trying to land those CXL features

Starting point is 00:14:14 as quickly as possible for our customers. And so that's a key part of our Neoverse IP portfolio that we work with partners and engage with partners to really deliver this promise of accelerated compute. And it's important to understand too, I want to make sure that, you know, it's easy to look at all of enterprise tech as a horse race and to have people say, oh, it's ARM versus Intel, it's ARM versus AMD. It's not just ARM versus AMD, it's ARM with. I mean, you guys are

Starting point is 00:14:45 partnering with them. And I think that that's a very, very powerful thing. And that's one thing that I've really loved to see about the CXL group, the CXL consortium, just the CXL community, is that it literally is every company in the industry. And they're all working together in sort of a positive, productive way. And they're building something that isn't selfish, that isn't centered around their thing. They're building something that's interoperable. And that's really exciting. Yeah. And actually, how quickly CXL has grown, right, is fantastic, right? We joined CXL as a board member early on, we've been actively working in the consortium since 2019. And the only other, you know, obviously PCIe, one of the great things is that it leveraged PCIe, which also has, right, a great ecosystem built around that, right?

Starting point is 00:15:50 So they didn't try to start from scratch. They're building on top of what are already great standards within the server space today. But you now have almost 150 partners that are participating in the forum. We're already at the 3.0 spec that was announced at the Flash Memory Summit. And now you're seeing the convergence of things like Gen Z and OpenCAPI into CXL. And so it's really created the ability for so many partners to contribute and then provide innovations on top of it. And so that I think has been really helpful in order to see the rapid pace of what you're seeing CXL coming to the market now. And we're still in the early stages, by the way, right? I mean, a lot of what we saw at Flash Memory Summit or at OCP were a lot of concept products, right? Now we're looking at all things that can be possible, right? Some very interesting things,

Starting point is 00:16:51 like people are adding ARM cores to these memory pools. And now you're thinking about like, wow, I didn't think that maybe I could do this offload capability in the memory pool itself. You're seeing SSDs who will now look at, hey, I might support both NVMe and CXL, right, and offer a different way of delivering persistent memory options. So a lot of this wouldn't happen if we didn't have all kind of a common language to speak. Right. And I think CXL is providing that.

Starting point is 00:17:30 The collaboration between all of the companies involved in the consortium, I think, will contribute a lot to the eventual success of CXL. The fact that there's been an avoidance of doing it on a proprietary basis. You know, there's a lot of companies that could have done it, have done it. But the fact that everybody's agreeing on a standard opens it up, you know, to all of the existing consortium members. And I'm sure we'll see more that don't even exist yet. You know, it's the openness of the standard is great. It has to be a village. It has to be open. And I think that that's the lesson that we've seen, because as you mentioned, a couple of other standards there that kind of were aiming to do a lot of the same things. And of course,

Starting point is 00:18:16 there's been a lot of standards about, you know, interconnects and fabrics as well. And many of them haven't worked because it's been so centric to a single architecture or a single vendor or something. And this is the opposite of that, which I think is really exciting. No, I agree. And with that, obviously, as CXO grows and the amount of use cases that it's trying to tackle. You get more voices in the room and you have to adequately be able to accommodate the interests of all of these parties. And so I think the CXL Forum has played a really good role in being able to grow membership membership but still progress the spec at a cadence that is beneficial uh for the industry and and i think cxl is kind of helping uh by planning releases uh in line with other industry standards that also need to come together right leveraging Leveraging the existing PCIe trust, adoption standards, form factors, connectors, it was

Starting point is 00:19:30 an absolute no-brainer for CXL to be done as an extension of PCIe and probably again another factor that will help adoption. Yeah. And even you think about the memory attached devices that are coming to market, they're leveraging the form factors that NVMe drives actually first pioneered, right? Because again, NVMe runs over PCIe, right? So that physical interface and the form factors can be leveraged. And so now we're not having to restart the wheel on what is the pin connector for these memory attached devices, right? Because we did that with NVMe and with those devices prior. And so we're leveraging what we feel is kind of state of the art for a slightly different application, but still very relevant

Starting point is 00:20:26 from a product management standpoint it also helps as well because so many existing vendors are are already used to working with those slots you know we're just adding features and your chipsets it doesn't require full main system board redesign you know it it could be an evolution to us well yeah but you know it's they're using known parts components yeah i think now what what what now becomes really interesting is you know how does the server evolve right what does the server of the future look like when you have more options uh on how to design that server? And I really like to think about the design point is the rack now, right? Not the individual server, right?

Starting point is 00:21:12 So when you get to tiered memory, right, you can think about, hey, I'm now actually pulling some of these DIMMs, right, from having to reside on the server itself to now they can reside on another like memory appliance, right? And what does that do? How does that change things, right? And how do you make that efficient? And so this is what becomes exciting, right, in terms of the innovation that happens when you kind of create that common language, right, for the fabrics of how these things interconnect. And I think that's the really exciting kind of point we have right now in the marketplace. And you add to the fact that we're right at the beginnings

Starting point is 00:22:01 of artificial intelligence, right, becoming widespread, right? And I think what people are looking at is that, you know, one of the biggest challenges with AI inferencing and deep learning models is they rely on a lot of data to make those inferences, right? So these models get smarter when there are more parameters. When there are more parameters, that's more data that you're trying to pull in. And so at the same time, what's really exciting is this work that we're doing on innovative hardware design

Starting point is 00:22:36 is really going to bring the ability of increasing AI workloads and the benefits of AI to a broader set of the marketplace today, right? It won't just be cloud providers that need these huge data centers to run AI models, because we're trying to make that available to a broader set of market. So I think that's the, you know, the benefits that are really interesting intersects in technology at this time that really make it for an exciting time to be in tech and be in hardware and software.

Starting point is 00:23:14 Yeah, I completely agree about the machine learning processing and so on. I mean, if you imagine a future where memory can be pooled and shared at rack scale, where there are special purpose accelerators deployed throughout the rack, and where you can compose those into a processing system for the application at hand, you basically are going to come up with a system that is just unimaginable today in terms of the amount of resources that it has, in terms of the number of processors, the amount of memory,

Starting point is 00:23:47 the amount of memory bandwidth that it would have. And also you're removing the need to move data around quite so much, because if you can pool and share memory, then you can basically queue it up with one system, process it with another system, and then output it with a third system. And all of them are accessing this at PCIe 5, 6, 7, you know, speed and latency, which is really, really remarkable. I want to wrap up with one thing with you, though, while we've got you. You actually just sort of, you know, really kind of kicked my brain in gear

Starting point is 00:24:24 with your talk about revolutionizing computing. And I wonder if you, you know, really kind of kicked my brain in gear with your talk about revolutionizing computing. And I wonder if you, you know, obviously you can't talk about future products. And these aren't really future products because revolutions happen in the future, in the far future. Where do you envision this going? I think that a lot of people listening are going to be like, when do I have my CXL Raspberry Pi? When do I have my CXL phone, iPad, whatever? But beyond that, what is CXL plus ARM? What is it going to do that's going to make servers completely different? That's a good question. And I'll give you a couple of of examples, right? The first one is when you get to a point where you can compose a system, the real excitement is, can you compose it on the fly, right? And so if you have a workload

Starting point is 00:25:18 and the workload is then able to understand the options, right, of hardware and accelerators that are available, what you would like to see is that you can compose these resources, right, and do it dynamically. And so the idea that an accelerator is talking peer-to-peer to another accelerator, and maybe you have a chain of these, and the processor itself is actually being called upon when needed, right? It doesn't actually need to be the coordination point for all of these accelerators.

Starting point is 00:25:53 I mean, that's what we're looking at in CXL 3.0, right? How do you enable peer-to-peer accelerator communication? Because you can think about that as really exciting. And then you think about each of these areas where even memory has compute, right? Then now you have in-memory compute that's really helping so that you don't just have the entire fabric swamped because you're moving data back and forth, right? You really want to have that data movement happen because it's adding value to the system, right? And so now you can think about that as well. And then you wanna see how do you expand this, right,

Starting point is 00:26:36 into a broader array of systems, right? And I think you're gonna see that. I think you're gonna see 5G networking equipment starting to take advantage of CXL as well, right? I think you're going to see, you know, CXL be integrated into DPUs and into the way switches are built. And so I suppose just quickly then, you know, you mentioned there a few use cases, you know, if we looked at the server side, we would have a rough idea around amd and intel's you know expected cadence around jumping from cxl 1.1 to 2 to 3 um say that it was around a two-year

Starting point is 00:27:17 refresh pattern well what what way are arm looking at that cadence in terms of having that achievable, composable rack, which doesn't really come until around CXL3. What would you expect the cadence to be then on your product side? Well, this is what's interesting, right? Because again, because we don't do the end products, we are at the heart of enabling customers to land these features rather than having to wait on one vendor to do it, they can do it themselves. The real power of ARM, right, has been that we enabled

Starting point is 00:27:53 a partner like Fujitsu to build an HPC chip and land HBM years before you saw it in the x86 camp right you actually saw aws land pcie gen 5 and ddr gen 5 with graviton 3 they've been in production for about nine months now right so this ability of landing these features right and not having to wait on a set cadence is actually what's been part of the success for Arm, right? Is we're enabling these partners to determine the cadence for themselves, right? And so I always kind of try to say, I don't think we're setting this like two year cadence window because our partners are gonna kind of disrupt that.

Starting point is 00:28:44 And that's what we're trying to help as well. Yeah, absolutely. And I think that it's it is really exciting to see Arm not just right there with Intel and AMD and Marvell and, you know, all these other companies, but maybe even in the lead in terms of delivering this, you know, the CXL2, you know, promising future editions as well. But I will say I want my CXL Raspberry Pi. So maybe you can lean on them to deliver that.

Starting point is 00:29:18 When we get ready for these announcements, Stephen and Craig love to come back on and be the first to tell you about them. Yeah, right on. We will definitely look forward to that. Well, thank you so much, Eddie. This has been a wonderful discussion. I really appreciate your candor and your enthusiasm

Starting point is 00:29:35 about the technology specifically and just generally talking about the future, where this goes, because that's why we're doing this. We're not being paid to do this. We're doing this because we're excited about where CXL can lead data center architecture and i can tell that you are as well so as we wrap where can people connect with you and continue this conversation

Starting point is 00:29:54 with you so again feel free to reach out to me on linkedin as well always want to give a shout to the ocp composable memory Workgroup. I've been a participant in that and always welcome more folks to join us and collaborate there. I think that's the space where you're going to see a lot of standardization, right, around how these CXL solutions come to market. And I think that'll benefit lots of folks. And not just myself, but several folks at ARM have given talks around CXL at CXL forums, at OCP, encourage folks to look at those talks

Starting point is 00:30:40 that are up on our YouTube channel as well. Yeah, and we'll include some links to that in the show notes if anybody's interested in learning more, because there's a lot more technical detail there about specifically the announcements and so on with the different IP blocks that ARM has put together. So thank you very much for joining us, and thank you all for listening to Utilizing CXL,

Starting point is 00:31:06 part of the Utilizing Tech podcast series. If you enjoyed this discussion, please do subscribe. You'll find us in your favorite podcast application. And please do consider leaving us a review or a rating. We always love to see those. This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise. For show notes and more episodes, go to utilizingtech.com or find us on Twitter at Utilizing Tech.

Starting point is 00:31:28 Thanks for listening and we'll see you next week.

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x13: Enabling CXL in Heterogeneous Compute with Arm

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.