Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x5: How CXL Can Optimize Infrastructure for Machine Learning with Gerry Fan of Xconn Technology

Starting point is 00:00:00 Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT. This season of Utilizing Tech focuses on CXL, a new technology that promises to revolutionize enterprise computing. I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT. Joining me today as my co-host is Craig Rogers. Hi, I'm Craig Rogers. I'm a solutions architect and you can find me on Twitter at CraigRogersMS. So Craig, you listened in, I'm sure, for the last few seasons of Utilizing Tech when we focused on machine learning and artificial intelligence topics. And I was pleased to have you join us here while we're talking about CXL because there is a little bit of a crossover between machine learning and CXL, right? For sure. CXL is going to open up a world of possibilities in how we interact now with these

Starting point is 00:00:55 AI devices that are increasingly hitting the market. It's exciting to see what way that could go. And I think there's two areas to me that really are relevant to AI, and that is memory expansion and right-sizing memory and providing all the memory a system needs at the right time. And that's coming sooner. And then in the future, more flexibility in terms of allocating devices like tensor processors and so on as needed to build the right kind of AI processing capabilities. But right now, I think we need to start thinking about how does CXL affect AI and ML architecture. And that's why we decided to invite on Jerry Fan from Xcon Technologies to join us. Welcome to the show, Jerry. I'm glad to be here, Stephen. This is Jerry Fan. I'm a

Starting point is 00:01:46 co-founder and CEO for XCOM. So my background is for the high performance switching computing and being in the industry for over 25 years. So we started the company a couple of years ago, and we tried to address these performance-related issues in AI and machine learning, and also memory expansion area as well. Yeah, so we met you at the CXL Forum at OCP Summit, well, virtually, as well as the previous CXL Forum. And it was interesting to see that you're really focused on that. I really wanted to create a bit of a tie between the previous seasons of utilizing AI and now utilizing CXL. So talk to us a little bit more. What did you mean just now and generally about optimizing for machine learning?

Starting point is 00:02:39 So typically one of the challenges for the AI system is that one is the data exchange rate between the GPU and the CPU. And the second thing is how the memory gets utilized. So the CXO technology addressing these two challenges. And they have this cache coherency to manage the communication between the GPU, CPU, and also could be among the GPU itself. And they also have this CXL.MEM with very short latency to address how to enable all these GPUs to sharing a common pool of the memories. So these two main advantages is really propel the larger, the attractive applications

Starting point is 00:03:32 in the AI and the machine learning area. What are the specific challenges though that you're able to address here? I mean, what don't we have that we will have once CXL comes to bear? So right now, if you look at the AI and the machine learning system, and basically is the using proprietary interconnects, mainly is dominated by NVIDIA. limits a lot of system vendors to deploy their own specific type of architecture system. So the CXL is a standard among the industry and that opened up the opportunities for the system

Starting point is 00:04:18 vendors and also for the chip vendors like us, we produce these CXL switches. Basically, it's serving as a hub and to connecting the GPU and the CPUs and to enable the system vendors to develop these next generation CXL-based AI and ML learning system. So it's really about interoperability and flexibility and customer choice, I suppose. So customers will be able to maybe mix and match or at the very least to choose the solution that fits them best without having to worry about the proprietary interconnect being making the choice for them. Absolutely.

Starting point is 00:05:01 Obviously, you've been working with vendors then, you know, obviously NVIDIA have the likes of NVLink, you know, their own proprietary method for communicating between devices. I'm sure you've been working with them to help them move over to the likes of CXL and it also helps validate your CXL switching platform. I couldn't go very deep into our conversation with NVIDIA, but I think from the high level and all the companies is moving very deep looking into the CXO area and including NVIDIA as well. cloud providers for system vendors are definitely they are extremely would like to have a good investment on the CXO technology so that they can build the next generation of the AI and ML learning system based on that, because that's give them the performance advantage. And also in terms of cost, because they don't have to rely on the proprietary, the protocols, and especially when the speed become high and higher, and to have a wide adoption of

Starting point is 00:06:13 the industry to support that is more productive and to get all the community and the ecosystem to build around it. That's a really good point. You know, the CXL Consortium has had so many companies now sign up and agree to do things this way. I just called out NVIDIA as one example of, you know, a couple of hundred that we know are going to be adopting this. So I'm sure you've had numerous conversations with many companies around that.

Starting point is 00:06:47 Yeah, we have the conversations with many of these members in the CXO Consortium. And that's one of the very exciting things to see that the entire industry is united and behind this technology. And from the CPU vendors to the switch vendors is united and behind this technology. And from the CPU vendors to the switch vendors to the CXL memory device vendors,

Starting point is 00:07:13 and also the end customers, which are the cloud people or some other providers, which is providing any type of these data processing service, they're all going to benefit a lot based on the collaboration in this CXL industry. Yeah, it is wonderful to see the number of companies that are part of this, including basically every company that people have heard of. Craig mentions NVIDIA, of course, Intel and AMD and ARM, and everybody's in there. And another thing I think from a technical standpoint that's nice is that since CXL is so closely based on and coupled to PCI Express, it really leverages a lot of development on that side as well, as we heard and discussed at the CXL forum. The improvements to PCIe that are coming are rolling out now, rolling out in the next generation

Starting point is 00:08:16 platforms and the platforms after that will really help to make this technology more real. But as somebody who's deeply involved in this, what more do we need? It's not just PCI Express. We need all sorts of things, controller chips and switch chips and software. What more do we need to make this thing real? Yeah, that's a very good question.

Starting point is 00:08:39 So basically this is for the entire ECO system and that's pretty much we need the collaboration from the system, from the software vendors, and from the processor vendors, and from switch vendors, and also from the GPU vendors, and CXO memory device as well. So all of these things need to be put together to build a system which can enable this memory pooling and all we can have an AI and ML system with the more efficient data transfer between the GPU and the CPU and also the GPU and the CPU,

Starting point is 00:09:25 and also the more effective the utilization of the memories in these type of systems. So the collaboration is across the board. And that's why we have this consortium and put the people with the different background and the company was working in a different area and to to to communicate with each other to understand where what other needs and to drive and to make this deployment and to move forward and and also because of the the system is so involving and that's

Starting point is 00:10:10 reason you have you know for the CXR spec you have the different stage and two different features get put in so that these systems can be developed over the time. It's just like the PCIe, you have different generations and for different generations, you keep adding these new ECN or new features. So the CXL is gonna be gone through the similar process. And you are absolutely correct. The CXL is on top of the PCIe. I think that's one of the, I would say,

Starting point is 00:10:50 is the best things happen to make people to adopt this technology because PCIe is very much everywhere. So people feel very comfortable to build something on top of something they know so well. And if you have something brand new, nobody understand and take like a few years, try to understand what's going on, then people will start hesitating to adopt. And so that's one great thing about CXL. So it seems that the first practical application of this technology is memory expansion. And specifically, that is being used to overcome the limits of system

Starting point is 00:11:35 memory buses. So for those who are listening who aren't familiar, most CPUs can access a number of memory channels. Typically, it's three or four memory channels. And into each of those channels, you can put a number of memory chips or DEMs into those slots. Usually, it's three, two, or now one. And those only come in certain sizes. So if you want to maximize your system performance, you're going to put four, you know, memory modules into your system, so that you're using all four channels. And, you know, you only have certain choices, they're based on binary numbers. If you want a specific amount of memory, or if you want to add memory after you initially buy the system,

Starting point is 00:12:23 you have to basically replace those with something bigger. And sometimes that can be a lot bigger and a lot more expensive. So the initial use case for CXL from companies like Samsung and SK Hynix are essentially memory expansion modules that will allow you to add, you know, maybe not as fast, but the right amount of memory to the system. And that has a lot of relevance to machine learning and other big data applications. Jerry, you guys are involved in this right off the bat, right? Absolutely. And that's what at the very beginning when we build up the company and build the product is we are a targeted app at because as you mentioned that there's a lot of the big applications which is consume a huge amount of memories which is some people call like in memory

Starting point is 00:13:15 computing or near memory computing and it's also true for these cs, they only have limited DDR channels. So they only have limited number of the DDR memory capacity. You can add it to connect directly to the CPUs. And one of the reasons obviously is adding new channels are very challenging for the CPU vendors. That's the reason they are reluctant to do that. So that gives the big challenge that if you want to run a big application, if your memory capacity is only limited to like 512 gig or even one terabyte,

Starting point is 00:13:57 for some applications they're not going to be enough. So that's the CXL, the switching we're developing is to enable to overcome this barrier so that we have many ports. We serve as memory expansion as one of the memory use case to connecting those memory devices from, for example, like Samsung or SK or Micron, and to connecting underneath our switch to expand profoundly the capacity of the memory so that the memory can be at like 30 terabytes. And so for such amount of big memories, and you definitely don't want to be owned by one host or one CPU. Because you want to be this memory pool to serve many CPU hosts, and so that to make the sharing and the polling become possible. In that case, when one CPU is idling,

Starting point is 00:15:08 and other CPU can start executing their own job. So that's the, they call the memory desegregation with the multiple hosts to take advantage of this memory polling. And so that the memory utilization, because of the CXL switch we developed, is enable these type of applications. And reduce the TCO for the

Starting point is 00:15:40 there's a big cloud providers tremendously. So that is one of the big trend for the first wave of the CXO application is going on in the industry. And this isn't some far off thing. I mean, we've already heard from, the reason I mentioned Samsung and SK specifically is because they've already announced products.

Starting point is 00:16:03 Micron, of course, is right there as well, as you mentioned. And Intel and AMD, like I said, have already announced that they are working on this. Now, they haven't officially confirmed it, but I think it's an open secret in the industry that this is all coming next year, memory expansion on these systems. And that, you know, a lot of the other things that you described, this idea of a shared memory pool, when does that come unofficially? I know that you're not announcing things or anything, but when could we expect to see that kind of shared memory? The true value for this CXO memory is going to be coming from the memory sharing and pooling.

Starting point is 00:16:41 And so that's why we are working with industry leaders in the cloud area and also in the system area to enable them to build a POC and to analyze, to understand what is the performance improvement and also what is in terms of the TCO cost. And as you said that Steven is the, all the technology is right there. So the CPU vendors have this ability. And of course we are building, we have our first silicon and is running our lab. And for the memory device, of course, you have SK, Samsung, Micron

Starting point is 00:17:30 is also over there for people to building this system to evaluate these multi-host sharing the performance. So that's what we are working with these leading, the companies which are leaders in this area and exploit the performance and many useful things in this system. You've touched on a couple of points there that Stephen and I have had, I think maybe private and public conversations now around the efficiencies. You know, memory is such a huge cost component of any server and making the best use of that memory is a real benefit to the overall TCO. And we had also surmised that, you know,

Starting point is 00:18:25 the likes of the hyperscalers would be very early adopters of this technology. And that rings true that you're saying you're already working with them. It would make sense for them, given the sheer volume of resources that they would have, you know, they would certainly stand to gain the most. In terms of your product,

Starting point is 00:18:45 I've done a wee bit of research before we come on here. Obviously we know CXL 1.1 is coming out with the next generation of servers. Your switch is already operating on CXL 2. Can you tell us any reasons or any thoughts about why you jumped straight to 2.0. Now, obviously we know it's backwards compatible, but what made you jump straight away up to 2.0 there on your CXL switch? Yeah, that's a great question. And as you know that 601.1 basically does not have the switch functionality.

Starting point is 00:19:20 It's just like the point-to-point connections. And the 602.0 is more like a PCIe. So you can build a PCIe-like type of switch. And because we are building, we are the switch company, so we cannot, in theory, we cannot use 1..1 and so that's why we do 2.0 and having saying that and we understand the cpu vendors right now what they have mainly is 1.1 so and we have to make that things to work and that's why our chip has very unique well positioned in this area we can enable today's 1.1 and to make them to to be able to function like

Starting point is 00:20:16 a switch and that's what our chips does this type of virtualization to make this 1.1 host to be able to share the CXL devices along with other 1.1 hosts. We have already developed a fabric manager now that lets you carve up those resources through the switch to CXL 1.1 servers now coming into effect. So even though they're only in 1.1, they're already able to gain access to external RAM through that 2.0 functionality. I just wanted to highlight that even though servers are only in 1.1 now, we already have access to more advanced features from the likes of your 2.0 switch. Right. So basically our switch is enabled the 1.1 host to be able to access all the CXL devices, which is connected underneath our chip. And so our chip does all this memory allocation,

Starting point is 00:21:30 all this kind of virtualization work. So you're a fabric manager then. How are customers going to be able to interact with that fabric manager to allocate off resources to hosts? Yeah, that's a good question. So basically, we develop a fabric manager. And basically, that fabric manager is provisioning our chip, right? So our fabric manager software will interact with the system level software.

Starting point is 00:22:04 And for example, that's what I said, the ecosystem need to be from the different area, is a collaboration of the vendors from different area. And we as a switch, we're providing the software like Fabric Manager to managing managing our chip to managing how to allocate memory for our chip for example the cloud providers this is that this uh big software company and to develop some interface to talking with our fabric manager to instruct our fabric manager what they want us to do in terms of how to manipulating all the memories. There's probably a GUI available

Starting point is 00:22:56 however primarily you're expecting the hyperscalers to want to do that initially through an API given the size and scale they would want to operate on. Right, so we provide the hooks to these the system software vendors and they develop their engine to go through our API to fully control our chip and to achieve this memory allocation and these type of operations. So long-term though, do you think that the software that does fabric management and controls this hardware, do you think that that's going to be developed by independent companies? Or do you think that's gonna be rolled

Starting point is 00:23:49 into operating systems and other kind of system-wide resources? From our understanding, this is mainly is gonna be developed by the chip vendors. And we partner with other, more like in the management software company to develop this fabric manager. And so,

Starting point is 00:24:16 and our customers could leverage what we have for our fabric manager and we provide eventually for our customers going to be the open source. And so they are able to either integrate into their high level system software or they can basically go through the API and talking to our fabric manager. So it's up to them to how to

Starting point is 00:24:47 fully take advantage of the fabric manager we develop. But it sounds like this is something that may be at least the basic functionality may be integrated into system resources or system-wide or global applications. But if you want more advanced features, you'll probably use a proprietary or specific software package.

Starting point is 00:25:13 That's correct. One of the thing I want to highlight is that system software company, they don't want to change their applications. They don't want to touch too much of their existing software so and that's the beauty of this cxl based memory desegregation and they can keep all the high level to gain the full control of the poor memory system. Another question that occurs to me from an application perspective.

Starting point is 00:26:03 A lot of the people that listened to the Utilizing AI podcast were more on the AI application side. Is CXL memory, is this pooled memory or expanded memory, is this going to appear to be part of the regular memory or is it going to be some special kind of memory that you have to deal with differently? I believe there's two types. One is called a new memory. It's more like close to the kernel, this kind of thing. Another is called a DvDex, and that's more like a device type. So to answer your question, it's just like any other memories. The only thing is that it has shorter latency and a much higher capacity. And I think we've already touched on some of that already given Intel Optane because it also solved, I think some of those engineering challenges have already been solved around

Starting point is 00:27:01 the way that Optane was presented to certain devices? You probably know that Optane is a great technology, but seems like it's not going to get widely adopted. And on the other side, and. And of course, Optane has a higher capacity, but in terms of cost, it's much higher. And also, you don't have enough vendors in building these type of devices. So the attractiveness about CXR-based memory is that you have the adoption endorsement from all the major DRAM vendors, the big three, right? Micron, SK, and Samsung.

Starting point is 00:27:58 And that really can help this ecosystem to build up more faster. So looking forward, I guess, what's the pitch? If you were to encounter somebody who was in the machine learning space and they said, oh, CXL, I've heard about that. What does that give me? What is your sort of pitch? What is your way to make them excited about this technology?

Starting point is 00:28:24 I think it's from the two sides. One is the connectivity. If you are the system vendors and you definitely want to build the AI machine learning systems based on the CXL interconnect because that will be much more cost effective and also give you high performance as well. And the second pitch is from the memory pulling perspective, because HDM gives you great performance and however is very expensive and the capacity is low.

Starting point is 00:29:00 So you definitely want to think about this memory pooling using the CXL so that these GPUs, they can use HTM as a sort of like a cache, but for the massive more data storage, they can be using those CXL memories. And so that they will have much larger capacity for them to use. So those are the two areas I would like to emphasize on the AI area. Great. Yeah. I mean, who doesn't want more memory? There are so many memory hungry applications out here and it's nice to have a technology that can address that need.

Starting point is 00:29:46 So thank you so much, Jerry, for joining the conversation here, utilizing CXL and utilizing AI. And before we go, where can people connect with you and learn more about CXL and about what Xcon is doing? Sure, they can go to our website.

Starting point is 00:30:04 And so www.xcon-tech.com and also they can write to me, send me the email. I will channel to the right person. Great, thanks a lot. And how about you, Craig? What's up? Hi, you can find me on

Starting point is 00:30:21 at CraigRogersMS on Twitter. You can find me on LinkedIn as Craig Rogers and right here on Utilizing Tech CXL podcast. And as for me, you'll find me at S Foskett on most social media networks. And of course, you'll find me hosting podcasts at gestaltit.com.

Starting point is 00:30:36 And you'll find those on YouTube as well. Just go to YouTube slash gestaltit video. So thank you very much, everyone, for listening to the Utilizing CXL podcast, part of the Utilizing Tech series. If you enjoyed this discussion, please do subscribe and consider leaving us a rating or review in your favorite application. This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise. But for show notes and more episodes of this podcast, go to utilizingtech.com or find us on Twitter at utilizingtech.

Starting point is 00:31:08 Thanks for listening and we'll see you next week.

Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x5: How CXL Can Optimize Infrastructure for Machine Learning with Gerry Fan of Xconn Technology

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.