Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x11: Taking CXL To the Next Level with IntelliProp

Starting point is 00:00:00 Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT. This season of Utilizing Tech focuses on Compute Express Link, or CXL, a new technology that promises to revolutionize enterprise computing. I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT. Joining me today as my co-host is Nathan Bennett. Hey, Stephen, how are you? Pretty good. Pretty good to be glad to be recording some episodes here in 2023 after the Intel Sapphire Rapids announcement, which is pretty awesome.

Starting point is 00:00:35 We've been talking quite a lot about bringing CXL to market and making it real. I mean, that's the point of utilizing tech. And one of the aspects of that that occurred to me is that, you know, a lot of this stuff to the outside, especially can look a little bit, I don't know, just experimental, new. Even with the, you know, Genoa and Sapphire Rapids supporting it, there's still some holes. You know, we're still looking at CXL 1.1. But I think there's a lot of promise here to bring this thing to the enterprise. What do you think? Absolutely. We're seeing adoption from platforms, especially Intel and AMD. I mean, as you've mentioned, that's kind of like step one. Now we're seeing additional expansion from, hey, this is something conceptual to something purposeful, something actually

Starting point is 00:01:20 physical in our hands to this is a platform that multiple other people can now start developing on. And then it'll just continue to grow from there. It's like the dominoes are falling finally, and we're going to see more and more of those dominoes continue to fall. Yeah, absolutely. And that's, I think, where we're going here with this episode. So today, we've got a special guest here, somebody that I've known for a long time, John Spires, CEO of Intelliprop, to talk about how they're working with the CXL Consortium to basically add in a lot of enterprise class features, high availability, security, reliability, sharing of resources, and so on, and really push this spec to the next level. Welcome to the show, John.

Starting point is 00:02:02 Thanks, guys. I really appreciate you taking the time to do this and very excited to talk about CXL and what Intelliprop is doing in the space. Our history has been developing ASICs throughout all these years, as Stephen highlighted earlier, and now we're focused, since 2017, we've been 100% focused on building our CXL fabric. And we're really excited that we have that FPGA version today. And customers are working on running workloads on it in our lab. And we also have a few pods in the field where customers are testing it. So John, tell me a little bit specifically about the product. So what is the nuts and bolts? Like what are you it. So John, tell me a little bit specifically about the product. So what is the nuts and bolts? Like what are you making? HBA, a switch, a chip, what is it?

Starting point is 00:02:51 Our main focus is building a CXL fabric. And as you know, the first versions of CXL were really focused on expanding memory within the server, inside the server. And so adding, you know, things, adding devices that look like SSDs, but they're memory devices that plug into the front of the chassis and connect via CXL. So that's really what what you're going to see kind of as the first wave of adoption. We're focused on external expansion. So, you know, connecting CXL servers to a switch fabric

Starting point is 00:03:27 and then that switch fabric in turn is connected to arrays of CXL devices so expanding outside the server at the rack level and at the multi-rack and across data center level so we're focused on developing that core switching fabric. In order to connect to that switching fabric, you need to have an HBA in your host. And what that does is it tells the CPU, CXL that's on the motherboard that, you know, we plumb all that memory that's out there connected to the fabric up to the server through that card. So the server sees the memory just growing. It doesn't know we're managing it external in a fabric. Yeah. And I think that from the, well, some listeners may be amazed at the idea of external

Starting point is 00:04:20 memory and the idea of pooled memory. If so, maybe listen to some more episodes of the show. But I think those of them that have listened to a lot of episodes of the show might say, cool, that's what CXL version 3 does, right? But of course, I think there's a lot more that you need to do in order to make this thing a reality, right? Yeah, version 3 is really focused on being able to pool these devices together within a server and virtualize them to some extent. And also has some ability to do external expansion to the server, but it's not a full functioning Fabric yet. The fabric working group, which is part of CXL, is working on building all the other features and capabilities that make it a full functioning high availability fabric. As you know, you know,

Starting point is 00:05:16 to be enterprise class, you need, you know, failover capability, high availability. So if a switch goes down, you're still up and running. If a port goes down, there's path failover and you're still up and running. You also need things like security. You need a packet-based routing and you need to be able to connect switches together, daisy chain switches together, so you can expand beyond what can connect into a single switch. All these features and capabilities, and expand outside the rack. All these features and capabilities are in the works, and we're excited that Intelliprop's kind of leading that effort and have a lot of that already incorporated into our switch design. What about taking maybe a bit

Starting point is 00:06:04 a tiered approach? Is that something that y'all are investigating as well? Being able to provide a higher level of performance for some workloads, whereas other workloads are lower performance. Is that something that y'all are looking at providing as well with this platform? We kind of characterize different tiers of memory.

Starting point is 00:06:25 And so, you know, you have on-processor memory, obviously, which is the fastest memory, you know, the kind of the one to 40 nanosecond kind of memory. And then you have the DIM, you know, memory that's on the motherboard connected into your DIM sockets. And then you have CXL connected devices within the server, and then you have memory devices connected outside the server. Each one of these has a latency profile.

Starting point is 00:06:53 We're trying to hit that one NUMA hop latency, and our modeling and design has successfully achieved that, which is 140 nanoseconds to 240 nanoseconds. And we're excited that with a single switch hop, we can achieve those types of latencies. Now, if you go past the rack or do a multiple switch hop through multiple switches, obviously each switch hop adds latency.

Starting point is 00:07:23 But the way we're looking at this is applications require different latencies and you'll I believe you'll even see flash devices with a CXL interface and you know flash devices are in the you know 10 to 40 microsecond kind of latency ranges. But for some applications, that's just fine for a memory. And it really depends on what you're doing. And so our fabric actually has a capability to keep all these statistics

Starting point is 00:07:56 on all the different memory tiers. And then the application can assign itself to the memory that it needs based on whatever the latency profile is required. So, you know, we're excited that, you know, we see the kind of the space playing out as multiple tiers of memory. You'll have non-volatile tiers, volatile tiers, and it'll all be managed and byte addressable and make your programming models very simple and work with a variety of applications. So if I'm hearing you correctly, and correct me if I'm wrong here, because I just want to take something,

Starting point is 00:08:35 what you said and boil it down to a little bit of brass tacks. Customers are going to be able to take their workloads and say, okay, so this workload gets the in-the-box direct memory, whereas this workload may get a switch fabric type memory and this one may get a multi-switch fabric memory. Are you seeing that as the tiered approach? Yeah, the data will be tiered across multiple classes of memory, and you'll see management frameworks that make it all seamless. And memory can be dynamically allocated based on latency requirements. And it's all connected to a single manageable fabric, and

Starting point is 00:09:20 the idea is to make it very simple and easy to deploy. That's fantastic. So right now in CXL, it's very exciting, and there's a bunch of different numbers being thrown across, right? So we've had Sapphire Rapport's announcement. We've had the Genoa announcement. And so we're seeing platforms coming up for CXL, and we're seeing numbers like Interconnect One being

Starting point is 00:09:45 available on some of these platforms and you know CXL 3 is coming out and all these different things and PCI 5 and PCIe 6. Where are y'all targeting this platform to start really getting its traction and moving towards the market? I hear you say that there are some customers that are currently working with it or some pods out there. Where are y'all seeing this actually hitting its stride? Yeah, so obviously the main major pain point is applications with large data sets.

Starting point is 00:10:21 So there's a lot of data analytics applications out there. There's AI applications of data analytics applications out there. There's AI applications. There's databases. As you know, databases are real memory hogs. In fact, you know, a lot of databases are spread across multiple nodes because the amount of memory isn't sufficient in a single node. So you have to have multiple nodes and then slice and dice the data across multiple nodes, process that data and then stitch it all back together. So CXL and having a single memory pool

Starting point is 00:10:55 and having multiple servers or GPUs accessing the same data in the same pool makes it much more efficient, makes it faster. You could even argue that it requires less compute nodes to deliver the results. So yeah, we're working with customers that hyperscalers, defense agencies, higher ed, you know, higher ed institutions that are working on HVAC systems, they're all excited about what CXL is going to bring to the market. Do you think, as somebody who's been in the storage industry, and I'm seeing a lot of analogies here to like Fiber Channel and iSCSI and the things that we all went through with

Starting point is 00:11:37 building storage fabrics, do you think that there's going to come a time when system memory is going to have the kind of advanced features that storage has in storage connected devices, things like snapshots, mirrors, clones, you know, dynamic or sharing, you know, dynamic allocation, and also, you know, literally the multiple nodes sharing the same memory areas. Do you think that's all coming to memory? And is this what you're working toward as well? Yeah, most definitely. I mean, we see, you know, it's very, very similar to SAN and NAS. I mean, if you remember the early days of storage, everybody stuffed disk drives in a server. One server had too many disk drives. This one didn't have enough.

Starting point is 00:12:22 You couldn't reach over and grab the surplus disk in the in the other server well the same thing's happening with memory and it's very difficult to utilize memory across multiple compute domains if you put it in a central pool like you did disk and ssds now all your servers can share it And you'll see a lot of the same features and capabilities. First of all, to make an enterprise class, you need high availability, path failover. You need it very secure, so you need encryption, packet-based routing. You need to be able to daisy chain switches together to expand within the rack, across racks, across the data center. And of course, all this needs to be highly available and performant,

Starting point is 00:13:12 just like SAN and NAS are today, right? And so, and to your point, a lot of these features like remote replication, snapshots, deduplication, I don't know. I mean, some of this stuff may be prohibited because it adds too much latency. But then again, you know, the benefits from those types of technologies could benefit memory. At the end of the day, memory is storage, right? It's volatile storage. And, you know, it needs to be much faster than non-volatile storage.

Starting point is 00:13:45 But again, there's different tiers. A lot of people even use, you know, disks or SSDs as a RAM cache, right, for their applications today. And so, you know, you'll see a lot of the same features and capabilities. In fact, these same features and capabilities, I think, are required to make it a viable solution to the enterprise. Yeah, so I think a lot of that is really awesome. And just like NAS and SAN and all these different networks solutions that we see out there, it brought other complexities that we had to kind of navigate like, okay, well, where is it? How is it running? And how do we make sure that if it fails, nothing bad happens? What are you all trying to kind of turn at the curb and catch

Starting point is 00:14:27 and make sure that you don't have your learning from those experiences and implementing that within your platform? Some of our history, you know, our history started with Gen Z. Now Gen Z is part of the CXL consortium, and Gen Z was designed to be a highly available enterprise class fabric that connected memory. And so a lot of these high-end features and things that you're talking about, like path failover and redundancy and the way we do encryption and port-based routing and things like that, were all designed in Gen Z to work at scale.

Starting point is 00:15:06 Even having 802.3 as a transport within the fabric allows you to go rack to rack. And so a lot of these features and technologies were developed for years in Gen Z. And the exciting part about it is Gen Z is now part of the CXL consortium. And so as you work in what's called the fabric group that's building these switch fabrics or designing the spec for these switch fabrics, there's talk of, well, we can grab that from Gen Z. Oh, Gen Z has that. We can pull that in.

Starting point is 00:15:37 And so a lot of that's happening. It's very exciting. We think we're ahead of the game because we have a lot of that incorporated already. If you know, Intelliprop really started, we chaired the Gen Z consortium. So we're kind of ahead on what all those features do and what they're capable of and have incorporated some of that stuff into our design already. We see CXL 3.0 kind of being the starting point for fabric, fully functional fabrics. You'll probably see a 3.1, some dot releases associated with 3.0 that add specific features. But I think we're kind of predicting that to have a fully functioned, fully featured enterprise class fabric, it might not happen until CXL version 4.

Starting point is 00:16:24 But we're working real hard and working with the consortium to make sure that happens. Yeah, I was going to go there next. Thanks for bringing that up because I know that the consortium is really, really interested in bringing these features in. As you mentioned, all the Gen Z work is now part of the CXL effort. And I think that that's tremendous. I'm really excited to think that you don't have to start from scratch. You don't have to re-engineer it all. And you can, and folks like yourselves who were working on Gen Z can bring your knowledge, your experience, and of course, things like protocols and patents even into CXL. Is that right? I mean, that's really what you're working on in the fabrics group.

Starting point is 00:17:10 Yeah, that's exactly right. We're working on a switch fabric that has targeted latencies that we talked about earlier and has a lot of these high-end features to make it enterprise class so that it can be deployed in, you know, clouds, public clouds, private clouds, high-end enterprises, and be kind of a general purpose switch fabric, much like fiber channel switch fabrics are today. In your experience in doing this stuff, I mean, what do you think are the big features that are needed in order to bring fabrics to fruition? You know, what are the kind of low-hanging fruit or maybe the longer reaches that they need to add to the CXL spec? Well, the first thing is downward compatibility to older versions of CXL. Well, we've heard that that's going to be in there.

Starting point is 00:18:03 I mean, that seems to be a very strong emphasis from the CXL consortium. Yeah. And so at first it wasn't with 3.0. And so we're excited that that's changing. Being able to support peer-to-peer sharing of memory. So GPUs like to communicate among themselves and share memory. And we call that peer-to-peer sharing of memory. So GPUs like to communicate among themselves and share memory, and we call that peer-to-peer. So it needs to support that.

Starting point is 00:18:32 Multi-level switching, so being able to connect switches together, have multiple levels of switching is needed to expand beyond one switch. 802.3 is a transport. I think that's very important because this is very high-speed technology, and the cable, just the cable itself introduces latency. So, you know, I've heard numbers like for every six inches of cable with Gen 6 PCIe, you'll need a retimer. Well, you know, that's kind of ridiculous.

Starting point is 00:19:07 So, you know, I mean, and that makes people instantly think, well, you know, this isn't a viable solution. Maximum scalability may be, you know, constrained to a rack. Well, you know, if you have 802.3 as a transport, now you can go rack to rack and across data centers without having to worry about having retimers everywhere. And so, you know, we think that's required. You need path failover and hot plug. As you know, hot plug is, you know, plug in switches, hot plug new switches, hot plug in new memory arrays, taking memory arrays offline without the system crashing. Just like storage today, you can hot plug drives, hot plug arrays sometimes, and, you

Starting point is 00:19:53 know, hot plug ports on switches without the system crashing. There's also, you know, shared memory pools across compute domains. So, you know, if you have a pool and have multiple servers sharing that same data set, you can't have them step on each other and corrupt memory. And so you need features that allow you to share the same memory pool and have coherency, have ordering and all those types of things. And then in-band hardware acceleration for fabric management. There's a big debate about having in-band management versus out-of-band management. In the storage world, out-of-band

Starting point is 00:20:42 management is the only way to go, right? Because you don't want management in your data path. Well, with CXL, if you're building it at a chip level, you want it in-band because out-of-band has that effect of slowing things down. Being in-band and part of the ASIC framework makes it very fast and efficient and makes it scalable. It's very hard to scale this stuff without a band management. So that's a big one itself. And then robust security, hardwareced security is needed. There's elements of that in the spec already, but we think that it needs more. And then having packet-based routing, you want to be able to route your packets

Starting point is 00:21:37 without de-encrypting them. So you encrypt the data but not the header so you can route the packet without decrypting the whole thing and so you know because if you're deep you know encrypting and decrypting on both sides of the switch you know you're adding a lot of latency so the little subtle features look i covered you know a half dozen or so of them but there's even more than that. This stuff's complex. As you guys know, you know, SANS and NAS were very complex too. It took the industry years to get it right.

Starting point is 00:22:11 I think it's going to take years to get this right as well, but we're excited to be kind of leading that effort. And we think our first silicon will be a viable enterprise class fabric switch. Yeah. On that note, I know that y'all are, have already rolled out an FPGA of the ASIC, which is great because it gives you the chance to make changes and so on and really kind of test it out and see how it's working and make sure that you're really, you know, kind of ready to go. When, when are we going to start seeing silicon ASICs in the market?

Starting point is 00:22:46 So you're going to see a lot of ASICs around CXL 2.0, 3.0 that allow you to expand memory in the server and maybe even do a direct attached array kind of thing. So DAS, or I guess you could call it DAM, direct attached memory. We're calling our, you know, memory array, network attached memory, NAM, you know, which is kind of follow on to the SAN and NAS world, right? But I think those technologies are probably a year out. You know, we're going to have our first silicon towards the end of this year, Q1 next year, and have customers start testing it. We also have a network attached memory array design

Starting point is 00:23:33 that you can connect to your switch. So imagine an HBA, kind of like a fiber channel HBA, connected to the switch, and then multiple NAM arrays connected to the switch, all shared by the servers. That's incredible. How big would such a thing get, theoretically? How much memory could you put into something like that? Well, you know, 24, you imagine a 24 drive to you, although memory generates a lot of heat, more so than an SSD.

Starting point is 00:24:10 So there's thermal issues with high-density arrays of memory. But we're working on solving those thermal issues. But you can imagine a 24 drive to you. And so if they're 256 gigabyte or 512 gigabyte modules, right? You can do the math on that, right? But you can get fairly large arrays. So, John, we've just seen the announcement of AMD Genoa and Intel Sapphire Rapids, which are the first two platforms to support CXL in production, which is awesome because these are going to take over the entire industry in the next year. What's your reaction to these server platforms now that all the work that you've been doing is finally in the hands of end users? Yeah, we're very excited because CXL is not

Starting point is 00:24:56 really viable until the servers support it. So now the servers do support it. And people ask, well, it's only version 1.1 on a server. What does that really mean? Well, what that means is it's capable of connecting to memory within the server, you know, CXL memory within the server. So Samsung, Micron, SK Hynix, all these guys are building the memory modules that plug in like an SSD in front of the server. And so you'll be able to hot add memory to the server through a drive bay, which is very exciting. And so that essentially buys you expansion within the server, memory expansion within the server outside the DIMM slots. And that's what CXL 1.1 is all about, expanding memory within the server.

Starting point is 00:25:49 What we do is we come in and put our card in there with our ASIC on it. And now you can connect to a memory fabric with higher versions of CXL. We go grab that memory that's out there in the pool connected to our switch, and we present it to that chip on the motherboard, that 1.1 chip, and it sees its memory pool just grow. It doesn't know that 15 NAM arrays connected to the switch aren't drives plugged into the server. It sees it as the same thing. So we're plumbing all that external memory up into the server. It knows, you know, it sees it as the same thing. So we're plumbing all that external memory up into the server and CXL 1.1 can access it, make it all byte addressable,

Starting point is 00:26:32 just like it does the stuff that's inside the server. And so the spec of the, you know, these servers isn't that much of a concern. You know, ultimately, say 10 years down the road, maybe CXL version 5 supported by the server, right, for example. Well, now you won't need an HBA in the server anymore. You can connect directly to the fabric switch and everything will just work. But that'll be a ways down the road. Yeah. So just to be clear, these platforms, AMD's and Intel's new server platforms, even though, I mean, I don't want people to be spooked, even though it says 1.1, that's going to be forward compatible with a lot of the things we've been talking about, right? Yeah, very much so.

Starting point is 00:27:15 Excellent. Well, thank you so much. This has been a great update. It's great to kind of hear where the fabrics aspect of CXL is going. It fits really nicely into some of the episodes we've recorded recently. We did have AMD join us. Also, I'll call out the last episode where Dan Ernst from Microsoft Azure talked about what you talked about with memory latency and talking about the fact that the memory latency is actually pretty good and pretty useful with CXL.

Starting point is 00:27:43 Yeah, for most applications, it works just fine. So, John, thank you so much for joining us here on Util with CXL. Yeah, for most applications, it works just fine. So, John, thank you so much for joining us here on Utilizing CXL. As we wrap, where can people connect with you and continue the conversation on CXL and other advanced topics? You can access our website at Intelliprop.com. And I'm John.Spires at Intelliprop.com. And, you know, we're an early stage startup. We haven't invested a lot of money on our website.

Starting point is 00:28:11 But we have some of our early FPGA-based products depicted there and talk about, you know, what our capabilities are. And then we have a white paper and other collateral material and some videos actually showing, for example, at Super Compute 22, we were composing memory into a server using Liquid's software. So the beauty of our solution is we have an API, a rich API, and a lot of these management frameworks like Liquid Composable Management Frameworks can talk to our APIs that allows them to compose memory into servers. And we're working with members and some others on that front. But yeah, so there's videos

Starting point is 00:28:52 of us composing memory from our FPGA fabric into servers, and that's pretty exciting. It is, absolutely. Nathan, anything new? What's going on with you? Anyone can find me on Twitter at VNathanBennett and at Mastodon at VNathanBennett at AWSCommunity.social. And I'm also trying to speed up a bunch of YouTube videos so people can find me on YouTube at VNathanBennett.

Starting point is 00:29:20 And as for me, you can find me at S. Foskett on most social medias, including the Twitter and the Mastodon. Also, I will point out that we are putting together a CXL themed Tech Field Day in March. So if you go to techfieldday.com, you'll see that we have announced Tech Field Day 27. It's going to be March 8th, my birthday and 9th. And hopefully we're going to have some of the folks in the CXL community, some of the companies you've just heard about presenting to the Tech Field Day audience, including folks like myself and Nathan. And we'll put the video live stream of that on LinkedIn. We'll upload the video to YouTube. And if you'd like to be part of that, please drop me a line. You can find me, as I said, at S Foskett on Twitter. That's probably the easiest way to find me. So thanks for listening to the Utilizing CXL podcast, part of the Utilizing Tech podcast series. If you enjoyed this discussion, please do subscribe in your favorite podcast application and give us a rating and a nice review, if you've got one in

Starting point is 00:30:14 you. This podcast was brought to you by gestaltit.com, your home for IT coverage from across the enterprise. For show notes and more episodes, though, go to utilizingtech.com or find us on Twitter at utilizingtech. Thanks for listening, and we'll see you next week.

Your Ad Here

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x11: Taking CXL To the Next Level with IntelliProp

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.