Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x08: CXL Brings Flexibility to Server Configurations with Astera Labs
Episode Date: December 19, 2022The key to building an ideal server is balance, and this is why CXL is important, since it helps overcome the inflexible nature of modern server architectures. In this episode of Utilizing CXL, Stephe...n Foskett and Craig Rodgers talk with Ahmad Danesh of Astera Labs about the company's CXL-based memory expansion technology. Although there are many CXL memory expansion chips coming to market, the industry is keen on interoperability testing to make sure everything works as expected. These products are differentiated based on their reliability, performance, and security, including trusted hardware and encryption. This is especially important with AMD Genoa having recently been launched and Intel Sapphire Rapids coming very soon. Security is very important as memory moves further from the CPU and is shared and pooled among multiple servers. Ahmad also suggests that CXL memory can perform as well as local memory, so specialized memory tiering software might not always be needed. But this technology also allows other types of memory to be used, including non-DDR5 DRAM and potentially future non-DRAM memory. Hosts: Stephen Foskett: https://www.twitter.com/SFoskett Craig Rodgers: https://www.twitter.com/CraigRodgersms Guest Host: Ahmad Danesh, Sr. Director, Product Management, Astera Labs https://www.linkedin.com/in/ahmaddanesh/ Follow Gestalt IT and Utilizing Tech Website: https://www.UtilizingTech.com/ Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/1789 Tags: #UtilizingCXL #MemoryExpansion #Genoa #Epyc #AMD #CXL
Transcript
Discussion (0)
Welcome to Utilizing Tech, the podcast about emerging technology from Gisdalt IT.
This season of Utilizing Tech focuses on CXL, a new technology that promises to revolutionize enterprise computing.
I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gisdalt IT.
And joining me today as my co-host is Craig Rogers.
Hi, I'm Craig Rogers. You can
find me on Twitter at CraigRogersMS. So Craig, you and I have spent a lot of time talking about
CXL technology. We recently published a great white paper at gestaltit.com focused on server
architecture. And one of the things that we noticed in there is that really the key to server architecture is balance and flexibility.
And more than anything, it seems, you know, we like to geek out about number of cores and megahertz and gigabytes and all that kind of stuff.
But more than anything, it's about building the right system for the job.
Would you agree?
Absolutely.
Absolutely.
You need to scope your requirements
properly. And if previously memory has been an issue where you couldn't scale up enough,
we now have technology coming that's going to let us work around that.
Yeah, I recently did a little bit of a thought experiment on that one because, you know,
looking at the number of DIMM slots, the one DIMM per channel memory architecture that is increasingly popular, and of course,
the size, the availability of size of DIMMs, that really does sort of paint people into a corner in
terms of how much memory they can provision in a server for a given amount of money. And that's
one of the big use cases, I think, that has been driving CXL technology so far,
and at least in the initial rollout, is it's all about memory. So the other day over at Serve the
Home, we saw a great article about memory expansion using the technology of a company
called Astera Labs. So we decided to invite Ahmad Dinesh in here from Astera to join us and talk about this
awesome flexibility that you can get in terms of memory expansion using CXL.
So Ahmad, welcome to the show.
Hi, Stephen.
Hi, Craig.
Thank you for having me today.
I'm excited to talk to you today and always excited to talk about CXL.
So my name is Ahmad Dinesh.
I'm the Senior Director of Product Management at Astera Labs,
responsible for our memory connectivity group.
I focus 100% of my time on CXL and memory expansion, memory pooling technologies,
and been working with CXL Consortium since it first started back in 2019.
Were you previously working with some of the other
cache-coher coherent interconnect technologies or
did you come in with CXL? I was, yeah. Originally I was working in on OMI or
OpenCAPI, a lot of experience as well in Gen Z and it's interesting seeing how
the industry really consolidated together right now. Now CXL
Consortium is essentially solving all the solutions all at once, where we have OpenCAPI, Gen Z have been kind of folded under CXL Consortium now as well with all of their assets being owned by the consortium.
Yeah, it seemed like that all was coming and was exciting and promising, but it just didn't quite catch fire in terms of implementation,
in terms of practical applications. And that was kind of disappointing, honestly, because
we've been talking about a lot of this stuff for a while. What makes this time different from those
other times? Yeah, really good question. I think the fundamental difference between, oh my, Gen Z and CXL and why CXL has really kind of got
that momentum is CXL resides over the PCA physical layer.
CPU is natively supported today.
It became really easy for everybody to adopt.
So you get the leverage, the same physical infrastructure that you have today and just
changing the protocol so you can get a cache coherent memory access. It's been very good that CXL has built on top of PCI,
because there's already so much that was standardized.
You know, manufacturing processes already were able to work on cards,
those sides. So we're not having to reinvent the wheel, so to speak,
to gain this additional functionality.
The standardized, you know,
what the consortium have come up with is great.
It's fantastic.
And it's great to see so many different people,
so many different companies all working together
towards that same goal.
You know, I think it's gonna really help adoption.
Yous have been in the news there recently
with a rather successful funding round.
Are you gonna use that money to help drive new products
as he has already had products on your website?
How are you planning to leverage that investment?
Yeah, that investment is definitely going to be leveraged here
as we move forward. We're essentially positioning ourselves to, we're past
kind of that startup phase and really in the scale up phase where we're not only expanding in terms of the number of product lines that we support today.
We have three major product lines that we've announced.
So you can expect a lot more to come from us soon.
We've scaled out in terms of our R&D centers as well, opened up offices, R&D offices in Vancouver and Toronto recently and scaled out our team significantly now.
I noticed you also have services that you're selling.
Are you working with other companies to help them adopt and work with CXL as well, then?
Yeah, we develop the actual silicon itself. We develop a lot of board level solutions as well.
And for customers who are looking to purchase our silicon
and implement their own custom solutions, custom board solutions,
we provide a lot of services to really be able to help them ramp into production
as part of that.
Really, a service that we do for free is what we call our cloud
scale interoperability lab as well, where we actually do a lot of interoperability
inside of our labs so that becomes really easy for customers to adopt.
Right. With CXL technology being new, it requires a lot of interoperability inside of our labs. So that becomes really easy for customers to adopt, right?
With CXL and all that technology being new,
it requires a lot of interoperability,
a lot of close partnerships with the CPU and memory vendors.
So it really requires a shift in terms of the responsibilities of who's doing
that testing, right?
It's no longer a CPU DDR subsystem.
It's now a CPU CXL DDR subsystem.
And it requires a new set of collaboration with a
lot of industry partners. So those are a lot of the types of services we provide here.
Isn't it interesting that interoperability testing and things like that, people might think,
oh, well, it's all about the technology. But in terms of making it a reality, it's really all
about making sure that the systems really work as expected.
And so things like that are often overlooked.
I remember, you know, in my history in storage as well, the technologies that had good interoperability testing and communication between vendors, even competing vendors, those were the technologies that took off, whereas the ones that were much more closely held
and relied on basically recommendations from integrators or OEMs or something,
those were the ones that were a lot slower to catch on.
It's really great to hear that you're doing that,
because, of course, there are competing products
in this same CXL memory expansion space.
Yeah, thank you, because you absolutely hit the nail on the head there.
You take a look at a lot of these competing technologies
and what really requires that when we talk about the scale
at which we need to deploy these solutions
and memory being such a fundamental component to the OS,
the reliability, availability, serviceability features
of a memory controller are significantly
important here to make sure that the data centers can kind of maximize their uptime and get that
user experience where it needs to be. Intel and AMD, of course, have had generations of being able
to improve on their reliability, right? So the close partnerships we have with them have actually
helped us really kind of lead the industry here, what we're doing on reliability as well.
And we're going to be seeing obviously Epic4 and Genoa arriving, which is going to enable your products to actually hit the market at a much larger scale with current server vendors. And I'm sure you're probably already working
with a lot of the hyperscalers,
given the services you're offering.
It's interesting, the hyperscalers have almost had
something similar to CXL level functionality,
through proprietary means or they've had smaller options.
It's great to see that they're all working now with CXL.
The standardization always helps.
Yeah, the standardization is key here.
There's a certain amount of industry standardization
that needs to happen
so you can get the right ecosystem getting together.
But there's obviously ways
you want to differentiate yourself as well, right?
If everyone just builds exactly what the standard expects, you won't be able to really kind of differentiate and really provide that value feature.
As you noted, you know, AMD's EPYC processors, Intel's next generation Xeon, CXO 1.1 is ready to start deploying, right?
And we're working with a lot of the hyperscale customers to really kind of fit this memory expansion technology, the next generation
of memory pooling as well, to kind of hit that wave in the next year. So when customers are
looking at these products, what are the big differentiators? You mentioned that you do need
to have product differentiators as well as just product. What are the big product differentiators
that they should be looking at with memory expansion and memory pooling? Yeah, good question. So when you take a look at what it takes to deploy
memory, the baseline is reliability, availability of services, the RAS features that you need there.
And so without going into a lot of specifics of exactly what we do there, we do actually provide
a lot of customization that we put within our silicon specific to what some of the CSPs have needed. That goes a little bit beyond what a lot of people
expect to have as a baseline product. And so that reliability becomes kind of that cornerstone of
making sure that you can deploy this at scale. The second is performance, right? What everyone
is used to today is local memory attached to a CPU or perhaps
remote memory if you're going over a NUMA hop. And so having CXL memory that's, you know,
at or below that NUMA hop is kind of that industry expectation here. So because that way,
it doesn't need a lot of software complexity, right? If your latency is too high, then you need
to get into the realm of memory tiering type of solutions and
having software that's aware of being kind of that latency that's higher than that.
And so the performance is that really the second category there. The third, I'd say,
is really on security, talking about end-to-end security as well. Of course, you have standard
things like you need secure boot, you need authentication capabilities, as you do with a lot of the silicon that deploys in the data center.
But we have additional security features to actually solve the end-to-end security requirements for these solutions, especially with the increase in expectations for confidential compute in the hyperscale data centers. And what solutions then have you come up with to provide that insight
and provide that layer of security?
Yeah, so that security comes in with encryption.
So when we take a look at how servers are going to be deployed here,
the memory actually gets encrypted as well.
So you're not only protecting the actual CXL link,
you're actually protecting the actual memory itself as well.
How do you differentiate them between what process or server is allowed access to that memory?
Yeah, so the way it's done, we'll go too much into the NDA information here, but the intention here is that when you're moving memory away from the CPU and now it's over a CXL link, it is easier for potential hackers to get access to that.
Over a CXL link, you could be snooping on it over across the memory interface.
But when you take a look at it from how a lot of memory is used, where you have actual
virtual machines that are instantiated, you want to be able to protect against software
attacks as well.
You want to be able to protect against one VM accessing another VM's memory.
And so being able to provide per VM memory encryption is really key here as we move into the next generation of solutions.
Yeah, that's really interesting because all we need or ironically not, all we don't need is a bunch of stories coming out and saying like CXL is a giant memory or a giant security risk.
CXL enables, you know, all this, you know, snooping and, you know, isn't secure for multi-tenancy and all these things that, you know, people are going to say if there is an exploit in the wild that allows that sort of thing, right?
Yeah, absolutely. And then you can imagine this getting significantly more complex when we're talking about memory pooling, where you have multiple CPUs that are being given access to
the same device memory, and being able to have the security capabilities of making sure that one CPU
is not accessing another CPU's memory, and they have all the security checks in place is critical.
Otherwise, as you said, people are going to take a look at CXL and, hey, you know what? It didn't hit the mark here. And security would be a really big gap that would prevent CXL
from being adopted. One of the things you mentioned in there, though, that caught my ear was this idea
that CXL can deliver memory that performs close enough to NUMA-enabled server memory that you
don't necessarily need to treat it as hierarchical.
Did I misunderstand that?
Or is that something that you guys are going to be able to deliver?
Yeah, that's exactly what we're delivering.
So today's software understands both local memory
as well as kind of that one NUMA hop level latency.
And so being below that performance benchmark allows existing software
to just work, plug and play. And we've actually been able to show that with a lot of our demos
at industry events, doing different benchmarks to show that, you know what, if we actually compare
our performance with CXL attached memory, even against local memory, we're performing usually
in the range of about 96 to 98%
of the performance with certain benchmarks. Specifically in that case, it was the Memtier
benchmark that you compare against local memory. And so it really comes to show that CXL is ready
to be adopted, right? We're hitting the performance benchmarks we need. We're hitting the reliability
we need. And with the CXL 1.1 CPUs kind of ramping into production soon,
we're going to start seeing the CXL at least 1.1 for memory expansion being adopted very quickly here.
Yeah, that's really exciting because that was one of the things I think that a lot of people were sort of scratching their heads over is,
especially, you know, coming from the wave with Optane memory with the third generation Xeon CPUs and how that was
really fundamentally a different performance characteristic than system memory.
And I know that a lot of people have been, I guess, maybe conditioned or just assumed
that memory attached over the PCI Express bus would also be similarly, maybe not quite
the same performance
characteristics, but differentiated from main system memory. We've certainly heard that as
well from some of the memory software companies. We've seen companies coming to market with drivers.
We've seen what's going on in Linux kernel with regard to CXL-based memory. But man, if even,
maybe not all the time, maybe not for pooled memory or shared memory or something,
but if some memory could be different,
indistinguishable from system memory,
that would be really an exciting advancement.
Yeah, and it's gonna be exciting to see
as we get into more of these performance benchmarks
and seeing some of the actual end user applications
taking advantage of it over the next few months here. So we're going to be doing some more demos and
industry events you can look forward to there. But this isn't to also take away from what a lot
of people are doing for memory tiering. Even our solution, we actually have not disclosed the exact
memory type, but we actually support a different non-DRAM based memory solution
as a second tier.
It's actually, so there is a space for that.
Now, when you get to that space, though, you want to be able to put cost-effective memory
there, right?
You don't want to put really high expensive DRAM and use it as a lower tier.
If you're going to have a lower tier, have a cheaper memory behind
that tier. And then that's really where that software can really bring a lot of advantages.
Yeah, that's pretty exciting. I suspect that I know, but I guess we'll have to wait until
it's official before we mention anything. But yeah, there are other alternatives to DRAM.
But one of the other things we've heard about as well is the idea that perhaps you use different types of DRAM in CXL. So for
example, maybe if the systems are going to DDR5, maybe DDR4 goes on the expansion boards as a
different tier, or even DDR3, I don't know. Have you experimented with that whole concept as well?
Yes, we have as well.
You hit the nail on the head again.
When we take a look at how memory is being deployed today,
each CPU generation has a very specific DDR rate that they support.
CXL allows us to kind of decouple whatever memory you put behind that.
So the metric that you look for is, well, I have a certain amount of bandwidth, I have a certain amount of capacity, I have a certain latency sensitivity, and a certain cost target.
And so you can put different types of memory behind it and just really look at it from what does it provide me from the CXL interface perspective?
I have a BI4, a BI8, or BI16 connection.
It gives me a certain amount of bandwidth. How many channels of DDR or other type of memory solution do I need on the back end to utilize that by 4, by 8, or by 16 connection?
You mentioned earlier that some of your benchmarks were within 96% of motherboard.
I'm imagining how much easier the last 20 years would have been if storage had went that way.
But the availability of DDR5 at the moment en masse, I think certainly looking at DDR4 and 3 options, it's going to be good.
Because I'm sure not all workloads require DDR5 speeds.
There's going to be workloads that just would be happy enough on DDR3 or 4.
So it's a really cost-effective way of approaching it,
given that RAM is the biggest cost component of any server.
Yeah, and that's, I think, really, Craig,
that's really the exciting thing here from an applications perspective
is what would you do if you could have all the memory you need? And when I'm saying you, I mean
the, you know, the application developers, what would you do if you could have the right amount
of memory? If going beyond, you know, so many slots times 32 or 64, you know, whatever size
your DIMMs are, if that wasn't the choice you were making, and if instead
you were saying, I want to have the right amount of memory in this server and keep things in memory,
I think that's the real question here. And that's the real opportunity as well.
Yeah, absolutely. And when you say the right amount of memory, a lot of applications today
from a capacity perspective have enough memory, right? But then what they need is more bandwidth.
And so one of the interesting things to actually take a look at CXL for is rather than using,
you know, more expensive DIMMs that are twice the capacity, use your cheaper DIMMs,
put them behind CXL. You get the same amount of capacity, but you actually get a lot more bandwidth.
And so you kind of get to play with CXL as this tool now to kind of get that sweet spot
of bandwidth capacity that a specific application needs.
From an architecture standpoint, it's really interesting now to see the change that's going
to be coming to solutions architects who are used to composing, you know,
an X amount of storage RAM, you know, capacity, and then they would refactor that into servers.
Now it's, you know, we're going towards rack level composability.
So how you amass those resources at rack scale is going to change completely.
You know, you'll still be able to do it the old way.
You'll be able to do it with, IXL. There'll be hybrid solutions. It's going to
open up a lot of new architectural possibilities that simply haven't existed. Clear initial
winners are going to be HPC workloads. You know, they're, I'd imagine users probably
seeing those being very early adopters as well.
Yeah, HPC is definitely an early adopter here. And we can see even some HPC applications are at the launch with Gen Z based or FPGA based Gen Z solutions years ago as well.
And so we're definitely going to see this influx of CXL kind of taking over that space here. With CXL 1.1, it was really kind of thought of when we were launching that as it's a point-to-point
solution, right? It provides memory expansion. As we got into 2.0, we started thinking about,
well, okay, we need switches in this path here. But CXL 2.0 essentially enabled getting to a one
CXL switch hop. And then 3.0 starting to go, well, let's actually do
multiple switch hops. Let's get the fabric level solutions and get really close to essentially
where Gen Z was, where we can create large fabrics of, I'll call them resources. And those
resources aren't just memory, right? It can be compute, it can be GPUs, accelerators, NICs,
it could be storage. You could actually even have PCA devices, PCA
NVMe devices in a CXL fabric. And that fundamentally then solves kind of that composable, disaggregated
architecture that a lot of people in the industry have been trying to solve for multiple years.
So in terms of, I guess, practicality here, we at C22, there were a few companies there with Astera Labs
technology on display. You know, it was shown, of course, with Genoa from AMD, which is the first
production server platform to come out with CXL, as well as Intel's Sapphire Rapids. Now, of course,
there's sort of an NDA component here in terms of Sapphire Rapids. Now, of course, there's sort of an NDA component here
in terms of Sapphire Rapids,
but people are talking about it.
People are looking at it.
People are showing it.
What is this product going to look like
when it hits the market?
Yeah, so when we take a look at CXL,
the first wave of these products
is really going to be focused in on memory expansions
to unlock the bottlenecks
that are occurring for a lot of the AI and machine learning applications. In-memory databases are
kind of a very big target for that amount of capacity that's needed. Because when we take a
look at AIML, the complexity of the models, the size of the models have grown so large over the
last number of years here that we just need a lot more capacity, a lot more bandwidth for those. So the products are really going to be looking like for this first wave,
add-in card level solutions, where it provides the flexibility where you can essentially plug
in an add-in card with DIMMs into this slot, where the existing DIMMs that are being used
elsewhere anyways, that the Hyperscale customers
that the server OEMs are purchasing anyways, just get plugged in behind that type of card with a
CXL add-in card, similar to kind of Chemform factor. But a lot of them are actually going to
be more custom solutions as well that provide DIMM connectivity and a slightly different server
architecture to deploy memory expansion with that flexibility that we talked about.
And so you can plug in a 16 gigabyte DIMM32
all the way up to, in our case, 512 gigabytes per DIMM,
depending on what they need.
So rather than building an EDSFF type of drive
where you're limited to maybe a BI4, a BI8CXL
with a fixed capacity,
the first products are likely going to be launching here with DIMM-based solutions for that flexibility and
performance that's needed. Yeah, I think that's going to be pretty cool because I think the idea
that you could buy an add-in card that looks a little like a PCIe card and has some DIMM slots
on it and you can fill it up with DIMMs is pretty awesome. So, Ahmad, that all sounds really cool.
When are we going to see this stuff hitting the market?
I mean, is it available now in products,
or is this something that's happening in 2023?
Yeah, so we've been kind of in pre-production phase here,
scaling out to a number of customers already.
So we've been shipping quite a number of products,
both from a silicon perspective, as well as from our board level solutions to kind of hit this ramp
here. So we're expecting the first CXL wave 1.1 to really start wrapping in mid to late 2023 here,
as we start seeing the CPUs getting into that production phase as well.
Yeah. And I mean, that's the thing we We've all been holding our breath since the last couple of years waiting for Sapphire
Rapids and now Genoa as well, and hoping that we'll be able to finally get our hands on
this technology.
But it really is here.
People are showing it.
People are demonstrating it.
They're in production.
It's pretty cool.
Well, thank you so much for this quick overview.
I really appreciate some of the thought-provoking aspects of this conversation, especially the idea that maybe CXL memory might not necessarily be hierarchical. And of course, it can also be hierarchical with different memory types. I love that thought. And can't wait to get my hands on this stuff. Before we go, where can people connect with you and continue this
conversation? Well, Stephen and Craig, thank you so much for having me today. I'm always excited
to talk about CXL. Looking forward to talking to you again soon. And please visit esterelabs.com
for more information. Thanks a lot. And as for me, you can find me at gestaltit.com. While you're
there, just look in the sidebar for our server architecture at data center scale white paper, where you'll see Craig and I talking quite a lot about data center architecture considerations as well as CXL in the future.
Thank you for listening to the Utilizing CXL podcast, part of the Utilizing Tech podcast series.
If you enjoyed this discussion, please do subscribe and rate the podcast in your favorite podcast application.
You can also find us at youtube.com slash gestaltit video if you prefer to watch this on video.
This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise.
For show notes and more episodes, go to utilizingtech.com or find us on Twitter and other social media platforms at Utilizing Tech.
Thanks for listening. And we'll see you again on January 2nd,
since we're taking a little bit of a holiday break.