Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x1: What's Next for CXL after Memory Expansion?
Episode Date: October 24, 2022The first CXL products have emerged, with Samsung delivering memory and storage expanders and MemVerge supporting big memory with their software. Stephen Foskett discusses these products with Julie Ch...oi of Samsung, Steve Scargall of MemVerge, Shalesh Thusoo of Marvell, and George Apostol of Elastics.cloud to discuss current and emerging CXL products. This special episode of Utilizing CXL was recorded live at CXL Forum in New York, with the entire industry watching. Once memory expansion is delivered, where do we go next? Marvell is working to support the new protocol in chipsets, and Elastics Cloud developing CXL fabric switches. Everyone is ready for Intel and AMD to release their next-generation server chips, which natively support CXL, and the CXL Consortium is already working on the next release! Links UtilizingTech.com Utilizing Tech on Twitter GestaltIT.com Guest and Hosts Stephen Foskett, Publisher of Gestalt IT and Organizer of Tech Field Day. Find Stephen’s writing at GestaltIT.com and on Twitter at @SFoskett. Julie Choi, Head of New Product Business Marketing, Samsung Memory. Connect with Julie on LinkedIn. Steve Scargall, Senior Product Manager and Software Architect, MemVerge. Connect with Steve on LinkedIn. Shalesh Thusoo, VP of CXL Product Development, Marvell. Connect with Shalesh on LinkedIn. George Apostol, CEO and Cofounder, Elastics.cloud. Connect with George on LinkedIn. Date: 10/24/2022, @SFoskett, @MemVerge, @Elastics_cloud, @Samsung, @MarvellTech
Transcript
Discussion (0)
Welcome to Utilizing CXL, the podcast about this emerging CXL technology.
I'm your host, Stephen Foskett from Gestalt IT, and we are meeting with a number of exciting people in the industry here at the CXL Forum in New York City.
We had the opportunity to sit down and have a bit of a discussion about what's happening with CXL, where it stands, and where it's going.
And so we just decided to jump on stage and record this live.
So we are in the World Trade Center area of New York.
We are at the CXL Forum.
We will be doing more of these CXL Forum events in the future.
And, of course, this is the first real episode of Utilizing CXL, the CXL-focused podcast season from Utilizing Tech.
So if you'd like to learn more about this podcast or see our previous seasons focused on AI, just go to UtilizingTech.com.
But before we get into the conversation, let's meet who's on the panel today. Good to meet you. This is Julie Choi from Samsung Memory Business Unit based in Korea.
I am head of new product business marketing. Yes.
I'm Steve Scarble. I'm the senior product manager and software architect at MemVerge.
Yeah, I'm Shailesh Tussu. I'm the VP of CXL product development at
Marvell. And I'm George Apostol, founder and CEO at Elastics.Cloud. So this is a
real unusual opportunity for us because as Frank Barry from MemVerge just put us
together here, essentially I just gave an opening introduction here to the CXL forum
and kind of laid out where CXL is from an end user perspective and where it's going.
And the goal of this podcast this whole season is going to be to try to explain CXL
to the IT architect audience and to help them to understand where this technology is going to be to try to explain CXL to the IT architect audience and to help them to understand
where this technology is going. And yet, the opportunity to get all of you together for a
conversation was too good to pass up. So which of you, I guess I don't want to make you fight, but
which of you is working on the thing that's most real in the market? I think it might be Samsung,
right? You guys are in the market with a product.
So tell us a little bit about where you stand
with your product now.
Well, in fact, I think Samsung is leading
as well as in other industry.
So in CXL interface,
we actually have launched the first product
as a memory expander in last year.
And then this year, we have also launched the CXL Semantic SSD as a storage device as well.
So I think we are in the first place as a pangae to develop all this market.
Yes.
So one of the first hardware products to support CXL.
That's right. yes. CXL
expander with 512 gigabyte
memory, yeah.
And I know that Memverge is here
already with software that supports
CXL as well, right? Correct, yeah.
So we've been doing memory tiering and memory
snapshotting for a number of years now, initially
with the Optane persistent memory, but
yeah, CXL just dovetails right in
from our perspective, right?
We can continue doing the tiering across that
for unmodified applications to give those applications
the bandwidth performance that they're looking for
on the capacity as well.
And of course, we wouldn't get very far
without you guys as well.
So tell us where you stand.
Yeah, so Marvell is already demonstrating
in an FPGA CXL-based memory pooling.
So we're doing expansion pooling with a unified architecture.
And coming up soon, we'll be doing it with the silicon.
But today, we are demonstrating in an FPGA.
At Elasticsearch Cloud, we're focused on smart, intelligent switch system on a chip.
And we are leveraging the capabilities of CXL to extend the capabilities of the switch.
So today we have things running in FPGA.
We can demonstrate symmetric multiprocessor memory pooling.
And at the later shows, we'll start to unveil
more and more of our functionality in FPGA
with the expectation of having silicon next year. Yeah. So from the external perspective, then,
we actually pretty much went in order here, right? Because we can't do anything without hardware.
And so we have a soon-to-ship product from Samsung for memory expansion and storage. We have a soon-to-ship product from Samsung for memory expansion and storage.
We have a shipping product in terms of software
that enables that product to work.
We also have support in,
I know that there's support in the Linux kernel,
or at least developing support in the Linux kernel.
We're getting support from the server
and CPU platforms as well.
We heard this morning that we're gonna have support in the next and CPU platforms as well. We heard this morning that we're going to have support
in the next generation server platforms from most of the,
well, from the two big CPU makers in Q1.
And of course, for the folks here on my left,
you're working on basically moving this forward
into the next generation. So we hear, you know,
Marvell, you are doing the chips that are going to glue the next generation products together,
and you're prototyping those in FPGA, and then we're hearing that we're also working on switching
fabrics. So essentially, right here, we have a cross-section of the current state and the future
state, and of course, all of these products are going to be working together. From the perspective of
the future, let me go backwards here and ask you, when do we get CXL fabrics
in the market? So I believe we'll start to see solutions come out in the second
half of next year.
And will those require, those will be CXL 3 version 3 products, right?
Well, so the product that we're creating, we're calling it 2.x because it's more than 2, but not all of the features of 3.
The 3 spec is still evolving, and as you know, it takes time to do silicon.
But one of the things that we're also tracking is the processors
and the speed of those processors coming out.
So even though 3.0 is 64 gigatransfers per second,
I think most of the devices coming out you'll see in the next year or so
are going to be at 32 gigatransfers because that's where the processors are today.
Yeah, and I think that that's important.
You know, we're excited about CXL, the prospects of CXL.
And certainly, memory expansion is really happening
and is really important for people
who need to write size memory or need
to access more and more memory in their servers.
But in the future, I think that we need to kind of
temper our enthusiasm and realize that it's gonna be
a little while before some of this rack scale architecture
and disaggregated systems and composable systems
and all that comes.
And in order to get there, we need what basically
you guys are working on as well.
You know, Marvell, we need controllers.
We need chips that support this thing.
How does the development process work to create sort of a generic product
that can be used by a lot of different products, a lot of different OEMs?
How does the development process work, and how are you doing?
Yeah, given CXL is a brand-new technology,
and everyone is really looking at different use cases for it, and the use cases are evolving by the month.
It's really – so what we did is at Marvell is we actually created an architecture that is really looking at it from an expansion to a pooling to a switching and something that can scale from CXL 1.1 all the way to future CXL devices.
There are some thoughts on how it even goes to CXL 3.0.
Now, what we have found is, okay, so we need to show how does this technology really help
end users and their applications, because it was not very well understood even six months
ago in what CXL can do.
And in fact, six months from now, we predict that what people will be doing with it is
something that no one's talking about today, right?
So we are providing various tools to enable the technology and enable use case, even creation
or understanding of CXL going forward.
These tools start with FPGA, but the FPGA really
is representing our silicon architecture and our silicon.
We're not ready to announce our silicon yet,
but it's coming up pretty soon associated with that.
And we feel that's the best way to really enable
both the early adapters, as you said,
is basically the cloud data centers,
but also the rest of the industry
beyond that, the cloud data centers, but also the rest of the industry beyond that,
beyond the cloud data centers, that need to actually try something out, feel it out, before
they can actually get more advanced use cases out there.
Yeah.
Well, it's very smart, and I think that most of the companies in this space are implementing
an FPGA before they try to produce ASICs for these things, because it's just, you know, it's still fairly new, but the
FPGA gives you the ability to have real product that really works, at least, you
know, in labs and testing and kind of building out those products and yet
react as demands change or as the protocols change, right? Right, and then
given new technology and, you know, multiple host companies are all signed
up for the CXL, everyone has its first generation implementation for everyone.
So one of the things that we have done is ensure that we actually can operate across the industry.
And not only with multiple hosts, but also with partnering with memory vendors like Samsung to understand whether it will operate or not operate with our solutions in the future.
And I want to talk to you a little bit about software support, because as I said in my introduction, I see software as the potential key that unlocks CXL, the entire CXL market,
or the roadblock that stops it from being implemented. because if we don't have software support
for this new hardware,
essentially what we're going to end up with
is point solutions for certain specific use cases
that are supported by software,
by proprietary software for that solution.
But if we have a general software layer
that allows us to, for example, in the case of MemVerge, to unlock the
potential for hierarchical or tiered memory, then that means that a whole category of the industry
has opened up, right? I mean, you can essentially, MemVerge would be essentially able to support any
kind of tiered or hierarchical memory system, right? That's exactly right, yeah. So our vision
is to work with all the hardware vendors, whether it be switch guys,
device guys, and unify
this from an application or a software perspective,
right? So, you know, the operating system
Linux has got some CXL tiering
built into it today, although it needs
to mature a little bit, but our tiering
solution, snapshotting solution
will work across any of this
stuff. And it goes beyond
that, right? I mean, it's not just at the device level,
from the host level.
You've also got the orchestration,
or I guess what the CXL spec calls the fabric manager, right?
So somebody has to tell the device guys, the switch guys,
what devices, what ports I need for a particular instance,
and go make that request, go make it happen,
and then present that up to the application.
Now, could that be an external host?
Sure, that could be done.
Should it be the application itself?
More likely, probably, right?
So the application requests the resources,
sends the request out,
and it all magically happens under the covers, right?
And that's really where we are with MemVerge.
And initially, MemVerge was enabling Optane-based memory, but of course, it seems to me that the product was always designed with the idea in mind that there could be all sorts of different types of memory on different buses and different ways of connecting and so on.
And, you know, where do you think memory will reside in future systems? Do you think that it's going to be an expansion card in the server?
Do you think we're going to have a tray of memory in a rack?
I mean, is it going to be connected via fabric?
Is it going to be shared?
What is the future of big memory with CXL look like?
I think it's going to be a combination of all of it, right?
I mean, depending on the latencies, as you add switches, right, we add that latency factor,
which might be okay for a lot of applications,
but for some it's not.
So I think we're still going to need the local DRAM,
we're still going to need some local high bandwidth memory,
maybe some local CXL,
but then we can definitely expand out over the fabric
to pools and sharing and everything else.
So I think that's going to be an exciting thing.
So to your point, you know,
we may end up with memory trays or arrays, memory arrays, right,
versus disk-based arrays, right, where we just plug it into the fabric somewhere and we can access it from anywhere.
Yeah, and that's where I wanted to go here with Samsung as well.
Memory trays, memory arrays, that probably sounds pretty good to a company that's focused on memory chips.
Yeah.
But as Steve mentioned, at this moment, there will be a different market positioning of the memory expander.
Maybe some of customers, they will be requiring higher capacity.
In that case, we will support as an expander type.
And some of devices or some of customers, they will be requiring persistent characteristics. In that case, we will support the dual-mode interface that can support the persistent memory
characteristics. So we are looking at all those types together, yes. Yeah, and I was actually
going to go there next. So Samsung is, of course, a huge maker of DRAM, but also a huge maker of flash memory. And you are clearly focused on bringing both of
these products to market, which are two somewhat different products, but could be used in, well,
it's hard to think about exactly how all that's going to play out. But how do you see this? Do
you see it more as DRAM or more as persistent or a combination for different use cases?
We are actually looking at all three types, I should say. So this is quite
strategic questions in our place, from our perspective, because CXL memory or CXL
expander, some customers are asking low-cost version. In that case,
that will also cannibalize our market. So we have some threats as well in memory
perspective. However, at the same time, customers are asking more memories
whenever they want to adopt flexibly when they need with the AI and ML
processing with mem larger memory sizes they will they will have more
controllability on their side so as a premium type we are looking at the 512
gigabytes or memory size of the memory expander.
And then in the persistent memory wise,
we are looking at the CXL semantic SSDs.
And also for some customers who really want
low cost version of the memories,
we are also looking at whether we can support
CXL memory with a low cost version.
Yes, so I can say we have
all three different products at this moment. Yeah. Yeah, and that's interesting. You know,
low cost would be a very, it's interesting to think of that as a different market,
but the high capacity might lead us to one of the things that you're enabling down here,
which is sharing memory, sharing access to memory
between different CPUs and accelerators and CPUs and so on.
And that's something that you guys are enabling in software, right?
Yeah, yeah.
So we'll be, you know, working with the vendors
to bring memory pooling, memory sharing to market.
But, you know, again, the software has to be enabled to do that, right?
You know, the locking that needs to occur, multi-reader, multi-writer,
all that stuff needs to occur.
You could go off and modify the application to go do that,
but that's a lot of effort,
given the millions of applications that are out there, right?
So if we can add that shim in between the unmodified application
and the hardware and allow the applications to do that natively
and transparently, that's one thing that we're looking at at MemVirtual.
So when do you suppose that we're going to get to this point
where we have shared memory?
In other words, multi-access, multi...
I don't know what the right word for it is,
but when do we get there versus just having it be an expansion?
Yeah, so I think we will be enabling
all of those aspects of it starting coming up next year.
And starting with FPGAs to actually show the concept
on how it can be enabled.
I do think that deploying it at scale
is probably about a couple of years away.
Because people will have to get comfortable from all the RAS capabilities,
the security capabilities.
Although the silicon is now being built for that,
the silicon actually has a responsibility to be even more RAS capable
than a single server because the blast radius is higher.
So we are putting in lots of functionality in there and
security functionalities.
Now you specifically are sharing 3.0 does define some
ways of doing it, which are really well defined in 3.0.
But before 3.0, 2.0, with some software, and VenWord is one of
the examples and other people working on it,
it can be done, but does require a lot more software and careful orchestration with the hardware.
And I would say that you're working on CXL switching,
but I think when people hear this,
they may think that that is only this future use case
in terms of sharing with multiple nodes
and rack scale and everything.
But CXL switches don't just live
in rack scale architecture, right?
I mean, we're going to see these things
pretty much everywhere from inside servers
as well as outside servers, right?
Yes, and that's what we're driving for.
I think one of the advantages of being in a startup
is we can move quickly.
So we have an FPGA product today where we are demonstrating sharing.
We have customers that will be deploying that FPGA using the expanded and extended memory
and the ability to share that memory amongst multiple processors.
So that's the beginning of it.
But in my slide deck, I have a slide deck
called the evolution of composability.
And it starts with memory that we have today.
It starts then at the persistent memory
that she talked about is the next thing.
And then coming from that is going
to be the processors and accelerators.
So CPU, GPU, DPU, whatever XPU comes next is part of the heterogeneous architecture
that you need today in order to be able to process those large AI ML workloads that we're seeing.
And so we'll be able to enable that as well, and then to be able to scale that both inside the box and inside the rack and rack to rack.
So we're looking at all those solutions
to get to that ultimate level of rack level composability.
And as you talked about in your presentation,
we view composability as the way to create
these right size virtual servers that
fit the workload. And so we're on a path towards getting there. Well, I think that that's actually
a really great summary of sort of the state of where we're at right now. So, you know, it's great
that we were able to have you all join us here for what is, I guess, the first episode of where
we're going with this podcast, because we've gotten a pretty good look
at sort of where CXL is today with memory expansion
products shipping today from Samsung,
with software enablement from MemBridge today,
and then where it's going in the future,
whether it's controllers or switch hardware,
whether that enables eventually, as you just said, to get to some sort of rack scale or
even multi-rack scale solution. And I think that those of you listening, keep tuned into this.
There's a lot of exciting development happening here. And again, as I said in my presentation
today, and as I'm going to say basically on every episode of this, this is an industry-wide consortium. There is basically no company in the industry that isn't
involved in this, or at least looking at being involved in this. There's no company in the
industry that's not looking to adopt CXL. And I think that what that means is that we could end up with a
really transformative technology thanks to the efforts of the people in the
industry that are working on developing products but also based on end users who
are excited about the possibilities that this brings in terms of transforming the
architecture of their servers, transforming their data centers and
delivering products at a, you know, and delivering products at the right product
at the right cost, right-sizing memory
and that sort of thing,
as well as the potential in the future
of using this technology to deliver capabilities
that they've never had in terms of shared memory
and enabling XPUs and rack scale and everything.
So we're going to be talking about this
on the Utilizing CXL
podcast going forward. Please do subscribe. You'll find us in all of your favorite podcast
applications, as well as on YouTube at utilizingtech.com, where, as I said, we've got
three seasons talking about artificial intelligence and machine learning. And now we're going to have
at least one season talking about CXL technology.
So if you'd like to be part of this,
if you'd like to follow along,
please do check out utilizingtech.com.
We're, of course, also talking about it on Gestalt IT.
But for those of you here with me on the panel,
where can we continue the conversation with you
relative to your products and thoughts on the industry?
Yeah, so you can follow us on our website.
We've got a lot of content and information,
and we'll be appearing at various CXL get-togethers in the coming years.
So elastics.cloud is where you can find us on the web.
You'll be seeing basically a demonstration at the OCP forum coming up
next week.
And you can also get a lot of information from marvell.com
from a perspective of where we are going in the future.
But coming up, you'll see a lot more information
from Marvell coming up.
Great.
Thank you.
Yeah, and you'll find a lot of CXL-focused blogs,
white papers, solution briefs, et cetera, on memverge.com.
So I encourage you to get in contact
and see where we're going from the software ecosphere perspective.
Yeah, it will be a great honor to support all of you
who is interested with the CXL memory expander.
Thank you.
And we'll include links to all these things
in the show notes for the episode as
well. So thank you very much for listening to this, the first full episode of Utilizing CXL.
If you enjoyed this discussion, please do subscribe because we'll be talking to folks
like this in the industry on a weekly basis at utilizingtech.com. This podcast is brought to
you by gestaltit.com, your home for IT coverage from across the enterprise,
including a weekly tech news show called The Rundown,
hosted by me.
For show notes and more episodes,
go to utilizingtech.com
or follow us on Twitter at Utilizing Tech.
I'm Stephen Foskett.
You can find me on Twitter at sfoskett.
Thanks for listening, and we'll see you next week.