In The Arena by TechArena - Delivery of the CXL Vision with the CXL Consortium's Kurtis Bowman and Kurt Lender
Episode Date: November 28, 2023TechArena host Allyson Klein sits down with the marketing co-chairs of the CXL Consortium at SC’23 to discuss the introduction of the new 3.1 spec, the emergence of true CXL 2.0 solutions, and what ...comes next from this disruptive standard that will re-define data center infrastructure.
Transcript
Discussion (0)
Welcome to the Tech Arena,
featuring authentic discussions between
tech's leading innovators and our host, Alison Klein.
Now, let's step into the arena.
Welcome to the Tech Arena. My name is Allison Klein, and today I'm delighted to be
joined by Curtis Bowman. He is a co-chair of the marketing working group of the CXL Consortium.
Welcome to the program, Curtis. How are you doing? I'm doing great. Thanks, Allison, for having us
on Tech Arena. I'm delighted to have you here. I've been writing about CXL since the inception of Tech Arena.
It's a technology that I'm very excited about, and I absolutely wanted to have you on the show this week.
Why don't we just get started with introductions?
Can you tell me a little bit about your background and how you made it into the CXL marketing co-chair role?
Sure. So if we go back in time a little bit,
there was a lot of different organizations trying to do the same thing around memory expansion and
memory improvements. And so back in the early 2015, 2016 timeframe, I was working with the
Gen Z consortium and helping to drive that.
And then Intel looked at it and said, hey, we've got this thing called CXL that would be, we think, good for the industry as well.
And at the time I was with Dell, so he reached out to Dell and said, hey, we'd like you to be an early adopter.
So I actually was the second member of CXL with Jim Pampas being the first.
And so then other companies obviously joined in.
We did all the work that it needed to create it.
But that's what started the journey onto the 1.0 spec that actually released prior to incorporation of CXL.
Then 1.1 was our first official release onto 2.0, 3.0 last year and 3.1 this year.
Now, as CXL was being developed, I was still at Intel at that time,
and I remember finding out about this technology
and realizing it was going to be a real game changer.
Let's talk about why it's such a game changer
and what came before and what CXL has introduced to server platforms.
Can you just give us a basic definition?
Yeah, so the big hole that we had in the industry was when you had a processor, it had direct attached memory.
And expanding beyond the reach of just that direct attached memory was very costly or impossible,
depending on what you do, right? So go back when there were 8 and 16 gigabyte DIMMs,
you started to fill out your slots.
And if you remember back then, there was two DIMMs per channel
and sometimes even three DIMMs per channel.
But when you maxed that out, you were done.
You couldn't go beyond that.
And you could max it out in capacity or you could max it out in bandwidth. And so what really was the need was this area where I need more memory bandwidth or I need more memory capacity than I can afford.
Because there was always specialty DIMMs, right?
I could get 128 or 256 gig DIMMs, but I really had to open my pocketbook and pay for those. So with CXL, it filled that need of being able to expand CPU memory in a way that gave you capacity and bandwidth simultaneously.
And another key was it reused the slots that were already in the computer. If you remember the transitions as we went from PCI and then PCIX and PCIE, everybody had to decide, am I going to do a computer that has two of these slots and three of those slots?
And the device vendors had to decide, am I doing a card plugs into one or doing two cards so I can plug into both of those types of connectors?
Made it hard on the industry for those transitions.
CXL plugs into the same slot and uses the same electrical connections that PCIe does. And so
as an OEM or an ODM, it can put down one slot and then let the processor decide if it's talking to a CXL device or a PCIe device. And so adoption has become very
easy. It really was up to CPU vendors and then device vendors could decide, you know, this is
a device that's better for CXL or better for PCI. Now the industry worked together on specifications
and then last year Genoa hits the market, Sapphire Rapids hits the market. There's two different platforms
that are offering CXL and let the games begin in terms of innovation. What have you seen from,
and I worked on Industry Standard Consortium, so I know how exciting it is when products start
hitting the market and it's real. What have you seen from folks that have been innovative solutions and where have the initial designs focused in terms of the opportunity for value proposition for the customer?
Yeah, so CXL talks and supports accelerators and memory as endpoints.
The accelerators already had a pretty good foothold in PCIe connectivity.
And so we haven't seen as much interest there.
Most of the innovation has been around this memory place because, again, it was an unsolved
problem.
And what we've seen is people who are adding plug-in cards, think of PCIe plug-in cards,
and they can plug them right in.
But they're also doing modules that plug in the front because PCIe came out to the front of the box
with NVMe devices.
So now I can take and I can make an ESDF form factor, right,
and plug it right into the front of my box,
just like I'm used to adding storage capacity and bandwidth.
I now can do that same thing to add memory capacity.
So that's one of the real innovative ways that they can do this.
And with 1.1 and 2.0, that really is the expansion that we expected.
Inside the box memory expansion.
Then as we start to think about 3.0 and 3.1, that's where it becomes a fabric.
And you started to see real, I expect we'll start to see real
expansion outside the box. So I know that your background is in performance. When you look at
this memory expansion with modules with CXL, how do you look at the performance trade-offs between
standard DDR memory connected via a memory channel versus a CXL attached memory?
Yeah, so that's a layered question.
So let me take it that way.
If you look at unloaded latency, which is what most people talk about,
the unloaded latency for direct from processor to its near memory in that same NUMA node,
you see somewhere in the order of 100 nanoseconds of latency.
If you compare that, if you say, well, I've got a dual processor system and jumping from CPU zero to CPU one, I'm going to add about 100 more nanoseconds of latency.
The same thing occurs when you jump from your CPU to that CXL attached memory.
You get about 100 nanoseconds of extra latency.
That's in an unloaded case, right?
So really not do anything but just going out and talking to that member and having it come
back the system otherwise idle.
Now, when you get into a situation where you're really busy, so think of a database workload
or an AI workload where you're working on a large model.
Now you've got lots of transactions going from
multiple cores in your CPU and you start to really utilize your memory channels. When that happens,
now your latency is actually much higher on those locally attached memories than it is in an idle
situation. You can imagine it being 300, 500, even 600 nanoseconds.
So if I add CXL memory, I add memory channels.
Those memory channels are about 128 gigabytes per second of bandwidth. So that's equivalent to about two DDR5 channels per CXL lake.
So I get my memory bandwidth and because I've added more channels and I can
spread my accesses across more of those memory channels,
I reduce my latency as well.
And so in those environments, you really get performance improvement.
And we've seen a 10X in some of the work that we've done,
we've seen a 10X throughput improvement by adding CXL memory, just two adding cards of CXL memory.
And we've seen about a reduction between 20 and 50% on our latency.
So these are real, tangible results that you can get.
And what happens is that's very dependent on your application. So if your
application is memory bandwidth or memory capacity limited, you're going to see those improvements.
If your application is a compute intensive application, you're not going to see benefits
from expanding your memory. Same as if you add more memory into your directly attached.
Nice.
So that's the layered answer.
Thanks for that explanation, Curtis.
The performance characteristics are really interesting.
And I think that the memory space is one that seems to be very interesting
when I talk to customers as well about their desires for CXL-enabled solutions.
But I know that you're going further.
You mentioned the fabric, and I want to talk about that. At this point, I'd like to introduce our second guest, Kurt Lender,
who is also a marketing co-chair in the CXL Consortium. Welcome, Kurt. Why don't you just
go ahead and introduce yourself and give the audience a little bit about your background?
Sure. Yeah. Hi, I'm Kurt Lender. I am in the CXL consortium. I'm co-chair of the marketing work
group. I work at Intel and I'm in the IO strategist for a technology solutions team
and have been working with CXL basically since day one, since its inception in that sense. And
over the years, I've been enabling PCIe and ecosystems for over a decade or so now.
This week was supercomputing and CXL really lit up the show floor, I think.
Tell me about the news, Kurt, that you guys delivered and why you captured the attention more broadly than where you have been in the past at this important event
for data center compute. Sure. Two reasons. We actually had a booth and we were showing CXL 2.0
products. So if you look at the last couple of years, you know, two, three years ago or three
years ago, it was PowerPoints and vacations we were talking about last year, engineering samples. This year, we're very close to rolling out production. So that was one thing
on the show floor. Curtis and I have been actively being analysts. We rolled out our 3.1 specification,
which was augmenting 3.0 in a few different key areas. One is fabric enhancements. We basically moved with 2.0, 3.0, sorry, we actually
introduced multi-level switching. But with it, we actually went to a port-based routing. So
instead of the tree structure of PCIe, where you have to go down and then back up to the host all
the time, you can actually go not only north and south, but east and west, thereby minimizing hops and latency.
So that was one key thing.
And there were a couple other things that we could talk about, but I'll jump to security.
That was the next piece that we added.
We went and added a trusted security protocol.
We had done IDE for security of in-flight information for 3.0.
The trusted security protocol basically is comprehensive.
It looks at the configuration and making sure that the configuration is secure.
It actually allows for VMs to be secure in a multi-tenant environment, which we didn't
have before.
So we basically are covering security of data at rest,
data in flight,
and all the configuration
that are needed
for a trusted environment.
That's fantastic.
We haven't really talked
about the fabric.
And the fabric is something
that's super interesting to me.
So why don't we just start
with a question?
And I'll go back to you, Kurt,
on when you look at what customers are trying to do
with CXL, why is the moment that a fabric is introduced so interesting in terms of the core
capabilities of what you can do with data center infrastructure? Yeah, if you look at it, I wish
you had the slide in front of me that shows it so well. We can actually start doing composable systems
and actually get to CXL3 3.X allows 4,096 devices. So the configurations that can be done,
tree structures, mesh, ring, star, butterflies, you can do all this with 3.1. And the permutations
that the HPC market and AI developers will be immense in that sense.
And I think that's where a lot of the interest was this week at supercomputing.
When you look at the 2.0 products that you demoed, Curtis, what do you think captured the attention of the SC crowd?
And what were the core capabilities that were different than what we've seen with 1.1?
So really nothing ever came out of 1.1. Just let me be open. It was a proof of concept type
of environment. So it was FPGAs and whatnot. So there is 2.0 brings the ability to not just do
the point to point connection that 1.1 described, but also take it through a single layer of switch to expand that.
And so the thing that caught people's attention and kind of got them to wander into the booth was this idea of expanding memory that we've been talking about.
What kept them in the booth was the expanse of what was available. So we had vendors like Cadence and Synopsys showing off their IP that's available that you can put into silicon with FPG or full silicon.
We had the telemetry vendors like VIAVI and Teledyne showing off their solutions to be able to test and make sure that you were in compliance with the spec.
And then we had lots of vendors showing off silicon that allowed you to go from CXL to DDR4 or DDR5 memory on the bus.
And then finally, some kind of holistic vendors showing off full solutions,
being able to add and remove memory from systems at will.
And then software is obviously the last piece that you have to put together.
So all of those are here this year showing off that you can do CXL solutions with the processors that are available in your environment now.
And you just need to have the workload that requires that memory expansion.
Very cool. When you look forward, Kurt, and you think about you started with acceleration and memory as the two use cases, and you delivered spec after spec working in an industry standard fashion. in time before we start seeing full composability of solutions and a much different view of how
infrastructure is delivered with banks of CPU, banks of accelerators, banks of memory?
Yeah, there's different opinions. I guess my opinion would be we'll see that starting when
3.1 products start rolling out. And that'll be in like maybe two, three, four years type timeframe,
probably three or four, I guess, is the more accurate timeframe.
And to be honest, it's not one of the discussions I've had
with many folks on the floor is CXL isn't a step function.
It's going to be this continuum.
So we're seeing, and Curtis talked about all the different memory expansions
that you can do today.
I'll call them the simple use cases,
but the pooling is coming
and there's a lot of development going on there.
Then there's the fabric approach
and all the disaggregation you can do.
We'll start, that development may even start
in the 2.0 environment,
but it will really ripple out in that 3.0 environment
in about three, four or five years type thing.
So that's where it will really start.
And you'll start seeing that dynamic shift in the industry.
That's awesome.
Yeah.
And I might add just a little to that is this is going to be a rolling, rolling,
roll out, that sounds real, but we're going to see this come out and you're
going to see the hardware coming in for it.
And that will allow for building in the environment.
And then the fabric manager will come in behind that to really make it easy to scale out across kind of the bank of items that you were talking about, CPUs, memory and accelerators.
Behind that, then you'll start to see the OSs and then the applications. And so I think there's an adoption curve across that three to five year time frame Kurt was talking about.
You'll find the people who are tip of the spear coming in maybe as early as two years, really driving to get to where the laggards come in once the application knows how to take care of this.
Yeah, that makes a lot of sense.
One thing I'll add on top of the adding is it was interesting.
I had a lot of universities stop by and, you know, I've always been impressed by all the
universities that attend SC.
They were very interested.
They're interested in Curtis's birds of a feather and asked a lot of questions.
So they're going to start this development work too.
And then they'll get out. The graduates will start getting out in the industry and start
taking that with them. It's just going to feed on itself in that sense.
That's super exciting. One of the things that I've been really impressed with the CXL Consortium is
just how well run it's been in terms of being a true industry standard. And I've been involved
in some other I.O. standards that can get challenging. Curtis, you know, I've been involved in some other IO standards that can get challenging
or Curtis, you mentioned this PCIX and PCIE, which took me back to the bus wars of many too,
too many years ago than I care to mention. What comes next for you guys in terms of what
the consortium itself will do? And how do you keep that spirit of open industry innovation going as you move forward into the next
challenges? I wonder if Curtis will jump in first, but anyway, because technically we don't know,
but we're going to have a face-to-face and our technical task force will tell us where 4.0 will
go. There's many things they can work on. I will note that like in that trusted world that I talked
about in the trusted edition was I talked about and the trusted edition
was in a sort of single trusted environment, right?
When we look at the pooling and sharing that are coming, there's going to be multi environments
or two or more trusted environments coming together, right?
That was past for this to get to make sure we got the first rev out.
So I'm not saying that's going to be it, but that's certainly, and security,
Curtis's words are it's a journey in that sense, we're on that journey and security.
There's probably a lot of other things, but technically the industry will tell
us where the next thing will be.
Yeah.
And maybe I'll take a little time to tell the story.
So there were four competing standards when we got going.
That, that included Gen Z, T6, and OpenCAPI, along with CXL.
The momentum in the industry that we've developed is because we were able to bring all of those organizations together under one roof, under the CXL banner. And what that did was it gives the consortium a lot of IP and a lot of smart minds
to drive that IP. And so when you think about DXL taking the best from each of those and bringing
it in, it really gives us a lot of avenues to go. Gen Z was around fabric. You can see a lot of the
fabric work that's being done now. OpenCAPI was
around very efficient memory interfaces. C6 was about bringing on accelerators and how they come
in. So I think when you look at the ability of CXL to become that unifying body, it really makes
it so that I don't know that we'll end up with big contentions inside the group.
There will be priorities that go out.
And maybe our biggest challenge as we look forward is we've got multiple CPU architectures and multiple GPU architectures that have to be considered.
And so those may take longer to hash through.
In fact, when you look at the 3.0 release and then 3.1, what you really
see is 3.1 is a completion of 3.0, where it completed the fabric, it completed the security,
it completed some memory expansion pieces that we needed. And it did that because they took a
little bit longer to work through with the technical people to make sure that it did work across environment.
I think as long as CXL can keep that, hey, we're working together, we're advancing the industry
in a way that everybody can play, then we won't see a lot of trouble as we go forward. It'll just
be opportunities to improve ourselves with all the IP that we have. Well, guys, thank you so much for being on the
show today. You guys work at two fierce competitors. And I think that you guys being on this podcast
together are indicative of the great industry collaboration around this important industry
standard. So thank you for taking time out of what could arguably be the most exhausting week
of the year at Supercomputing,
to share this story on the tech arena.
I only have one more question for you guys,
and that's where can folks find out more about the new spec,
about what the consortium will be delivering this year,
and to engage with the team if they have questions?
Yeah.
So the easiest place to find out is our website, and that's cxlconsortium. Yeah. So the easiest place to find out is, is our website. Um, and that's, uh,
cxlconsortium.org. Um, but we also have channels on YouTube, on, uh, LinkedIn, Twitter. Uh, I think
the main places are the places you're thinking of Kurt? Yeah. And then we have, we generally try to
do a quarterly webinar. We'll actually, I think we're doing 3.1 in January. So people could catch
that in that sense. And yeah, white papers, videos. I think I will try to upload as many of
our presentations from SC here on our website. There's a resource library there and it's super useful.
Fantastic. Thank you so much for being here today. It was a pleasure. And I would love to have you guys on the show in the future as well. You're always welcome back on the Tech Arena.
Thanks, Allison.
Yep. Thanks, Allison.
Thanks for joining the Tech Arena. Subscribe and engage at our website,
thetecharena.net. All content is copyright by the Tech Arena. Subscribe and engage at our website, thetecharena.net. All content is
copyright by the Tech Arena.