Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x5: How CXL Can Optimize Infrastructure for Machine Learning with Gerry Fan of Xconn Technology
Episode Date: November 28, 2022Today's AI and ML systems use proprietary interconnects, which limits the choices available to customers. CXL technology promises to enable greater interoperability, and this is the focus for Xconn Te...chnology. In this episode of Utilizing CXL, Gerry Fan of Xconn joins Stephen Foskett and Craig Rodgers to discuss the ways that CXL can improve machine learning processing. The CXL Consortium is working with nearly every company in the IT industry to bring this promise to life, but we need hardware and software to enable memory pooling, device sharing, and more. The initial CXL products enable right-sizing memory, regardless of the specific architectural details of the CPU chosen. The next addition will be disaggregated and pooled memory using CXL switches, and this is coming to market in the next year or so. This will enable massive pools of memory on-demand for intensive applications. Xconn promises to make memory pooling available to CXL 1.1 hosts as well, and is working on a fabric manager to enable this. Hosts: Stephen Foskett: https://www.twitter.com/SFoskett Craig Rodgers: https://www.twitter.com/CraigRodgersms Guest: Gerry Fan, Cofounder CEO, Xconn Technology. Connect on LinkedIn: https://www.linkedin.com/in/gerry-fan-5769608/ Follow Gestalt IT and Utilizing Tech Website: https://www.UtilizingTech.com/ Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/1789
Transcript
Discussion (0)
Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT.
This season of Utilizing Tech focuses on CXL, a new technology that promises to revolutionize enterprise computing.
I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT.
Joining me today as my co-host is Craig Rogers.
Hi, I'm Craig Rogers. I'm a solutions architect and you can find me on Twitter at CraigRogersMS.
So Craig, you listened in, I'm sure, for the last few seasons of Utilizing Tech when we focused on machine learning and artificial intelligence topics.
And I was pleased to have you join us here while we're talking about CXL because there is a little bit of a crossover between machine learning and CXL, right?
For sure. CXL is going to open up a world of possibilities in how we interact now with these
AI devices that are increasingly hitting the market. It's exciting to see what way that could
go. And I think there's two areas to me that really are relevant to AI,
and that is memory expansion and right-sizing memory and providing all the memory a system
needs at the right time. And that's coming sooner. And then in the future, more flexibility in terms
of allocating devices like tensor processors and so on as needed to build the right kind of AI
processing capabilities. But right now, I think we need to start thinking about how does CXL affect
AI and ML architecture. And that's why we decided to invite on Jerry Fan from Xcon Technologies to
join us. Welcome to the show, Jerry. I'm glad to be here, Stephen. This is Jerry Fan. I'm a
co-founder and CEO for XCOM. So my background is for the high performance switching computing
and being in the industry for over 25 years. So we started the company a couple of years ago,
and we tried to address these performance-related issues in AI and machine learning, and also memory expansion area as well.
Yeah, so we met you at the CXL Forum at OCP Summit, well, virtually, as well as the previous CXL Forum.
And it was interesting to see that you're really focused on that.
I really wanted to create a bit of a tie between the previous seasons of utilizing AI and now utilizing CXL.
So talk to us a little bit more.
What did you mean just now and generally about optimizing for machine learning?
So typically one of the challenges for the AI system is that one is the data exchange rate between the GPU and the CPU.
And the second thing is how the memory gets utilized.
So the CXO technology addressing these two challenges.
And they have this cache coherency to manage the communication between the GPU, CPU, and
also could be among the GPU itself.
And they also have this CXL.MEM with very short latency to address how to enable all
these GPUs to sharing a common pool of the memories. So these two main advantages is really propel
the larger, the attractive applications
in the AI and the machine learning area.
What are the specific challenges though
that you're able to address here?
I mean, what don't we have that we will have
once CXL comes to bear?
So right now, if you look at the AI and the machine learning system,
and basically is the using proprietary interconnects, mainly is dominated by NVIDIA. limits a lot of system vendors to deploy their own specific type of architecture system.
So the CXL is a standard among the industry and that opened up the opportunities for the system
vendors and also for the chip vendors like us, we produce these CXL switches.
Basically, it's serving as a hub and to connecting the GPU and the CPUs
and to enable the system vendors to develop these next generation CXL-based AI and ML learning system.
So it's really about interoperability and flexibility and customer choice, I suppose.
So customers will be able to maybe mix and match or at the very least to choose the solution
that fits them best without having to worry about the proprietary interconnect being making
the choice for them.
Absolutely.
Obviously, you've been working with vendors then, you know, obviously NVIDIA
have the likes of NVLink, you know, their own proprietary method for communicating between
devices. I'm sure you've been working with them to help them move over to the likes of CXL and
it also helps validate your CXL switching platform. I couldn't go very deep into our conversation with NVIDIA, but I think from the high level and all the companies is moving very deep looking into the CXO area and including NVIDIA as well. cloud providers for system vendors are definitely they are extremely would like to
have a good investment on the CXO technology so that they can build the next generation of the AI
and ML learning system based on that, because that's give them the performance advantage.
And also in terms of cost, because they don't have to rely on the proprietary, the protocols,
and especially when the speed become high and higher, and to have a wide adoption of
the industry to support that is more productive and to get all the community and the ecosystem
to build around it.
That's a really good point.
You know, the CXL Consortium has had so many companies now sign up and agree to do things this way.
I just called out NVIDIA as one example of, you know,
a couple of hundred that we know are going to be adopting this.
So I'm sure you've had numerous conversations
with many companies around that.
Yeah, we have the conversations
with many of these members in the CXO Consortium.
And that's one of the very exciting things
to see that the entire industry is united
and behind this technology.
And from the CPU vendors to the switch vendors is united and behind this technology.
And from the CPU vendors to the switch vendors
to the CXL memory device vendors,
and also the end customers, which are the cloud people
or some other providers, which is providing
any type of these data processing service, they're all going to benefit a lot based on the collaboration in this CXL industry.
Yeah, it is wonderful to see the number of companies that are part of this, including basically every company that people have heard of. Craig mentions NVIDIA, of course, Intel and AMD and ARM, and everybody's in there.
And another thing I think from a technical standpoint that's nice is that since CXL is so
closely based on and coupled to PCI Express, it really leverages a lot of development on that side as well, as we heard
and discussed at the CXL forum.
The improvements to PCIe that are coming are rolling out now, rolling out in the next generation
platforms and the platforms after that will really help to make this technology more real.
But as somebody who's deeply involved in this,
what more do we need?
It's not just PCI Express.
We need all sorts of things,
controller chips and switch chips and software.
What more do we need to make this thing real?
Yeah, that's a very good question.
So basically this is for the entire ECO system
and that's pretty much we need the collaboration from the system, from the software vendors,
and from the processor vendors, and from switch vendors, and also from the GPU vendors, and CXO memory device as well. So all of these things need to be put together
to build a system which can enable this memory pooling
and all we can have an AI and ML system
with the more efficient data transfer
between the GPU and the CPU
and also the GPU and the CPU,
and also the more effective the utilization of the memories
in these type of systems.
So the collaboration is across the board.
And that's why we have this consortium
and put the people with the different background
and the company was working in a different area
and to to to communicate with each other to understand where what other needs and to drive
and to make this deployment and to move forward and and also because of the the system is so involving and that's
reason you have you know for the CXR spec you have the different stage and two
different features get put in so that these systems can be developed over the time. It's just like the PCIe, you have different generations
and for different generations,
you keep adding these new ECN or new features.
So the CXL is gonna be gone through the similar process.
And you are absolutely correct.
The CXL is on top of the PCIe.
I think that's one of the, I would say,
is the best things happen to make people to adopt this technology
because PCIe is very much everywhere.
So people feel very comfortable to build something
on top of something they know so
well. And if you have something brand new, nobody understand and take like a few years, try to
understand what's going on, then people will start hesitating to adopt. And so that's one great thing
about CXL. So it seems that the first practical application of this technology
is memory expansion. And specifically, that is being used to overcome the limits of system
memory buses. So for those who are listening who aren't familiar, most CPUs can access a number of
memory channels. Typically, it's three or four memory channels.
And into each of those channels, you can put a number of memory chips or DEMs into those slots.
Usually, it's three, two, or now one.
And those only come in certain sizes. So if you want to maximize your system performance, you're going to
put four, you know, memory modules into your system, so that you're using all four channels.
And, you know, you only have certain choices, they're based on binary numbers. If you want a
specific amount of memory, or if you want to add memory after you initially buy the system,
you have to basically replace those with
something bigger. And sometimes that can be a lot bigger and a lot more expensive.
So the initial use case for CXL from companies like Samsung and SK Hynix are essentially memory
expansion modules that will allow you to add, you know, maybe not as fast, but the right amount of
memory to the system. And that has a lot of relevance to machine learning and other big data applications. Jerry, you guys are involved in
this right off the bat, right? Absolutely. And that's what at the very beginning when we
build up the company and build the product is we are a targeted app at because as you mentioned that there's a lot of the big
applications which is consume a huge amount of memories which is some people call like in memory
computing or near memory computing and it's also true for these cs, they only have limited DDR channels.
So they only have limited number of the DDR memory capacity.
You can add it to connect directly to the CPUs.
And one of the reasons obviously is adding new channels are very challenging for the CPU vendors.
That's the reason they are reluctant to do that.
So that gives the big challenge that if you want to run
a big application, if your memory capacity is only limited
to like 512 gig or even one terabyte,
for some applications they're not going to be enough.
So that's the CXL, the switching we're developing
is to enable to overcome this barrier so that we have many ports. We serve as memory expansion as one of the memory use case to connecting those memory devices from, for example, like Samsung or SK or Micron, and to connecting underneath our switch
to expand profoundly the capacity of the memory
so that the memory can be at like 30 terabytes.
And so for such amount of big memories, and you definitely don't want to be owned by one host or one CPU.
Because you want to be this memory pool to serve many CPU hosts,
and so that to make the sharing and the polling become possible. In that case, when one CPU is idling,
and other CPU can start executing their own job.
So that's the, they call the memory desegregation
with the multiple hosts to take advantage
of this memory polling.
And so that the memory utilization,
because of the CXL switch we developed,
is enable these type of applications.
And reduce the TCO for the
there's a big cloud providers
tremendously. So that is one of the big trend
for the first wave of the CXO application
is going on in the industry.
And this isn't some far off thing.
I mean, we've already heard from,
the reason I mentioned Samsung and SK specifically
is because they've already announced products.
Micron, of course, is right there as
well, as you mentioned. And Intel and AMD, like I said, have already announced that they are
working on this. Now, they haven't officially confirmed it, but I think it's an open secret
in the industry that this is all coming next year, memory expansion on these systems. And that,
you know, a lot of the other things that you described, this idea of a shared memory pool, when does that come unofficially? I know that you're not announcing
things or anything, but when could we expect to see that kind of
shared memory? The true value for this CXO memory
is going to be coming from the memory sharing and pooling.
And so that's why we are working with industry leaders
in the cloud area and also in the system area to enable them to build a POC and to analyze, to understand what is the performance improvement and also what is in terms of the TCO cost.
And as you said that Steven is the,
all the technology is right there.
So the CPU vendors have this ability.
And of course we are building,
we have our first silicon and is running our lab.
And for the memory device, of course, you have SK, Samsung, Micron
is also over there for people to building this system
to evaluate these multi-host sharing the performance.
So that's what we are working with these leading,
the companies which are leaders in this area and exploit the performance and many useful things in this system.
You've touched on a couple of points there that Stephen and I have had,
I think maybe private and public conversations now around the efficiencies. You know, memory is
such a huge cost component of any server and making the best use of that memory is a real
benefit to the overall TCO. And we had also surmised that, you know,
the likes of the hyperscalers would be very early adopters
of this technology.
And that rings true that you're saying
you're already working with them.
It would make sense for them,
given the sheer volume of resources that they would have,
you know, they would certainly stand to gain the most.
In terms of your product,
I've done a wee bit of research before we come on here. Obviously we know CXL 1.1 is coming out with the next generation of
servers. Your switch is already operating on CXL 2. Can you tell us any reasons or any thoughts
about why you jumped straight to 2.0. Now, obviously we know it's backwards compatible,
but what made you jump straight away up to 2.0
there on your CXL switch?
Yeah, that's a great question.
And as you know that 601.1 basically
does not have the switch functionality.
It's just like the point-to-point connections.
And the 602.0 is more like a PCIe.
So you can build a PCIe-like type of switch.
And because we are building, we are the switch company,
so we cannot, in theory, we cannot use 1..1 and so that's why we do 2.0
and having saying that and we understand the cpu vendors right now what they have mainly is 1.1
so and we have to make that things to work and that's why our chip has very unique well positioned in this
area we can enable today's 1.1 and to make them to to be able to function like
a switch and that's what our chips does this type of virtualization to make this 1.1 host to be able to
share the CXL devices along with other 1.1 hosts. We have already developed a
fabric manager now that lets you carve up those resources through the switch to CXL 1.1 servers now coming into effect.
So even though they're only in 1.1, they're already able to gain access to external RAM
through that 2.0 functionality.
I just wanted to highlight that even though servers are only in 1.1 now, we already have access to more advanced features from the likes of your 2.0 switch.
Right. So basically our switch is enabled the 1.1 host to be able to access all the CXL devices, which is connected underneath our chip.
And so our chip does all this memory allocation,
all this kind of virtualization work.
So you're a fabric manager then.
How are customers going to be able to interact
with that fabric manager to allocate off resources to hosts?
Yeah, that's a good question.
So basically, we develop a fabric manager.
And basically, that fabric manager is provisioning our chip, right?
So our fabric manager software will interact with the system level software.
And for example, that's what I said,
the ecosystem need to be from the different area,
is a collaboration of the vendors from different area.
And we as a switch, we're providing the software
like Fabric Manager to managing managing our chip to managing how to
allocate memory for our chip for example the cloud providers this is that this uh
big software company and to develop some interface to talking with our fabric manager to instruct our fabric manager what they
want us to do in terms of how to manipulating all the memories. There's probably a GUI available
however primarily you're expecting the hyperscalers to want to do that initially through an API
given the size and scale they would want to operate on.
Right, so we provide the hooks to these
the system software vendors and they develop their engine to go through our API to fully control our chip and to achieve this memory allocation
and these type of operations. So long-term though, do you think that the software that does fabric
management and controls this hardware, do you think that that's going to be developed
by independent companies?
Or do you think that's gonna be rolled
into operating systems and other kind of
system-wide resources?
From our understanding, this is mainly is gonna be developed
by the chip vendors.
And we partner with other,
more like in the management software company
to develop this fabric manager.
And so,
and our customers could leverage
what we have for our fabric manager
and we provide eventually for our customers
going to be the open source.
And so they are able to either integrate
into their high level system software
or they can basically go through the API
and talking to our fabric manager. So it's up to them to how to
fully take advantage of the fabric
manager we develop. But it sounds like this is something that may be
at least the basic functionality may be integrated
into system resources or system-wide or
global applications.
But if you want more advanced features,
you'll probably use a proprietary
or specific software package.
That's correct.
One of the thing I want to highlight is that
system software company,
they don't want to change their applications.
They don't want to touch too much of their existing
software so and that's the beauty of this cxl based memory desegregation
and they can keep all the high level to gain the full control of the poor memory system.
Another question that occurs to me from an application perspective.
A lot of the people that listened to the Utilizing AI podcast were more on the AI application side. Is CXL memory,
is this pooled memory or expanded memory, is this going to appear to be part of the regular memory
or is it going to be some special kind of memory that you have to deal with differently?
I believe there's two types. One is called a new memory. It's more like
close to the kernel, this kind of thing. Another is called a DvDex, and that's more like a device
type. So to answer your question, it's just like any other memories. The only thing is that it has shorter latency and a much higher capacity.
And I think we've already touched on some of that already given Intel Optane because it also
solved, I think some of those engineering challenges have already been solved around
the way that Optane was presented to certain devices?
You probably know that Optane is a great technology, but seems like it's not going to get widely adopted. And on the other side, and. And of course, Optane has a higher
capacity, but in terms of cost, it's much higher. And also, you don't have enough vendors
in building these type of devices. So the attractiveness about CXR-based memory
is that you have the adoption endorsement
from all the major DRAM vendors,
the big three, right?
Micron, SK, and Samsung.
And that really can help this ecosystem
to build up more faster.
So looking forward, I guess, what's the pitch?
If you were to encounter somebody who was in the machine learning space
and they said, oh, CXL, I've heard about that.
What does that give me?
What is your sort of pitch?
What is your way to make them excited about this technology?
I think it's from the two sides. One is
the connectivity. If you are the system vendors
and you definitely want to build the AI
machine learning systems based on the CXL interconnect
because that will be much more cost effective and also give you high
performance as well.
And the second pitch is from the memory pulling perspective,
because HDM gives you great performance and however is very expensive and the capacity is low.
So you definitely want to think about this memory pooling using the CXL so that these GPUs, they can use HTM as a sort of like a cache, but for the massive more data storage, they can be using those CXL memories.
And so that they will have much larger capacity for them to use.
So those are the two areas I would like to emphasize on the AI area.
Great.
Yeah.
I mean, who doesn't want more memory?
There are so many memory hungry applications out here and it's nice to have a technology
that can address that need.
So thank you so much, Jerry,
for joining the conversation here,
utilizing CXL and utilizing AI.
And before we go,
where can people connect with you
and learn more about CXL
and about what Xcon is doing?
Sure, they can go to our website.
And so www.xcon-tech.com
and also they can write to me,
send me the email.
I will channel to the right person.
Great, thanks a lot.
And how about you, Craig?
What's up?
Hi, you can find me on
at CraigRogersMS on Twitter.
You can find me on LinkedIn
as Craig Rogers
and right here on Utilizing Tech CXL podcast.
And as for me, you'll find me at S Foskett
on most social media networks.
And of course, you'll find me hosting podcasts
at gestaltit.com.
And you'll find those on YouTube as well.
Just go to YouTube slash gestaltit video.
So thank you very much, everyone,
for listening to the Utilizing CXL podcast, part of the Utilizing Tech series.
If you enjoyed this discussion, please do subscribe and consider leaving us a rating or review in your favorite application.
This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise.
But for show notes and more episodes of this podcast, go to utilizingtech.com
or find us on Twitter at utilizingtech.
Thanks for listening and we'll see you next week.