Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x13: Enabling CXL in Heterogeneous Compute with Arm
Episode Date: January 30, 2023Although the emergence of CXL in server CPUs is big news, the inclusion of this technology in ARM processor IP is just as important. In this episode of Utilizing CXL, Eddie Ramirez of ARM joins Craig ...Rodgers and Stephen Foskett to discuss CXL in the ARM-powered ecosystem. ARM develops processor IP that is used in CPUs as well as supporting processors throughout the datacenter. We begin with a discussion of CXL 1.1, which brings memory expansion to ARM CPUs. But ARM is also delivering CXL 2.0 which would allow memory pooling to increase the utilization of memory, and thus overall system efficiency. The next step is true heterogeneous compute, with accelerators like GPU and DPU sharing memory with CPUs in a flexible fabric that can leverage CXL. Hosts: Stephen Foskett: https://www.twitter.com/SFoskett Craig Rodgers: https://www.twitter.com/CraigRodgersms Guest: Eddie Ramirez, VP of Marketing at Arm: https://www.linkedin.com/in/eddie-ramirez-41233a1/ Follow Gestalt IT and Utilizing Tech Website: https://www.UtilizingTech.com/ Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/1789
Transcript
Discussion (0)
Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT.
This season of Utilizing Tech focuses on Compute Express Link, or CXL,
a new technology that promises to revolutionize enterprise computing architecture.
I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT.
Joining me today as my co-host is Craig Rogers.
Hi, Stephen. Good to be here again.
Looking forward to our upcoming conversation
around heterogeneous computing and CXL.
Absolutely. Craig and I have worked together
quite a lot on various things,
including a recent white paper on cloud and data center
architecture. But both of us are also quite aware that architecture is much more than the CPU.
Although we're very excited to see AMD and Intel announcing CXL support in their
mainstream server CPU lineup, we've also been talking to a lot of vendors
who are developing peripheral connectivity and switching chips, software that enables CXL and
composability management. Because, of course, it doesn't matter if the host supports it,
it matters if the whole ecosystem supports it. And there's one company that we really were looking forward to talking to because they have a lot more presence
in the data center, I think, than this focus on the x86 CPU world would have you think. Right, Craig?
For sure. For sure. ARM devices are used in almost every server.
People don't realize it's not just an Intel or an AMD chip in there.
There's likely ARM as well.
Absolutely. And of course, there's been a lot of attention to ARM CPU coming to data center and cloud. But whether it's the CPU or peripherals, memory, everything has ARM chips.
So that's why we're really excited to have Eddie Ramirez from ARM joining us today.
Welcome to the show, Eddie.
Thank you so much, Stephen.
Happy to be here.
Great to be talking to you and Craig today.
Yeah, it's really good. Ever since I saw your presentation at the CXL forum, where you talked about bringing CXL to the ARM platform, I was very, very excited.
Because as I said, I understand that ARM chips, yes, there are ARM CPUs. Yes, they're making waves in cloud and data center. But the fact that ARM is so, so important in the world
of heterogeneous compute and in the world of basically everything else that's happening
within the data center, it's really, really critical that you all are involved. And it's
great to see that you are. I wonder if you can give us just from the start, just a bit of a
roadmap of what is CXL to ARM? Where are you working on it? Sure, no problem. Let me just do a quick
introduction of myself. Eddie Ramirez, Vice President of Marketing for the Infrastructure
Line of Business. And so I'm part of the business unit that's really looking at how to enable Arm
and a robust ecosystem around Arm to be able to deliver solutions within the data center, the cloud, 5G infrastructure, and networking infrastructure.
And for us, CXL is something that we feel is going to be very transformational to these market segments.
ARM plays a kind of a unique role.
We're an IP provider, and so we are actually providing a lot of the processor IP that goes into making not only like server processors,
and you see partners like Ampere, you see partners like NVIDIA, and even cloud providers like AWS,
who are now building their own server SOCs, utilizing ARM and this Neoverse platform of IP
to build those solutions. But ARM itself is also found in a lot of other places
within the server.
You see vendors who build BMC chips, right?
These are the chips that help provide
manageability interfaces and capabilities using ARM cores.
We're also in several of the storage devices.
And what's now becoming quite interesting
is the accelerators, right?
We talk about this heterogeneous move in terms of democratizing compute somewhat.
And an example of that would be like the SmartNIC and DPUs, where most of those are using ARM
cores to offload a lot of the, what I would kind of consider infrastructure tasks that
a server does and offloading that from the main processor to these
accelerated devices. And CXL now brings a kind of a fabric and a protocol together that is common
right throughout the industry. It's a standard that so many folks are working on that can really
help these devices talk to each other and also be able to provide real composability
in the future of the data center.
And so we at Arm are very interested in trying to move that forward, enable these vendors
that are building these solutions, right, to be able to integrate CXL 2.0, 3.0, and
future technologies into their hardware.
It's interesting there that you started with CXL 2.0 and then obviously leading to 3.0.
2.0 is obviously allowing memory pooling across multiple hosts.
Is that the voice of your customers saying we need this level of functionality? is obviously allowing memory pooling across multiple hosts.
Is that the voice of your customers saying,
we need this level of functionality?
What you're seeing with CXL 1.1
is really the enablement, right, of memory expansion, right?
And that provides a lot of value, right?
I don't want to at all discount the value
of memory expansion. Because
if you think about it, the way that folks have been building servers up to this point for these,
you know, high memory workloads is they've actually been adding multiple server sockets.
And you get to a point where you get like a four socket server, where the whole goal of that server
was really the memory. So the extra CPU cores go unutilized, right? Nobody wants to spend money
and not actually get that return. So now you're able to independently increase the memory without
actually adding more CPU sockets. And we see, for example, that that's going to be very important within the
ARM ecosystem, because a lot of the vendors who are deploying ARM-based server SoCs are doing
that in very high core counts. You have, for example, Ampere with 128 cores per socket, right?
So suddenly the core counts have expanded so significantly over the last five years
that you want the memory to catch up. And now you're able to do that with 1.1 and with memory
expansion. So I think you'll see this is 2023 is the year where folks actually start maturing the
memory expansion solutions and bringing those to market with CXL 1.1.
But we're now working with partners, right, who are designing SOCs for the next gen.
And that is really going to target CXL 2.0.
And with 2.0, I think you now will bring this extra use case of memory tiering where a tier of pool memory or far memory is now available
to the users.
And that's really also powerful because what you tended to find is that within the data
center itself, much of the memory actually goes unutilized. Great paper by Microsoft claiming that almost 50% of the memory that most of the,
you know, VMs that run on the Azure cloud, they're not touching that half of that memory footprint
that they actually are paying for. So that tells you that there is probably another mechanism for how to optimize memory so that you don't have
to allocate it all up front and pay for it all up front. And that's what the memory pooling in
CXL 2.0 hopefully brings to market. So really excited to be working across the industry within
the ecosystem to bring those solutions to market.
And one of those areas where we see a lot of this really interesting work in the ecosystem
is not just in CXL, but also in OCP.
When we think about heterogeneous workloads at the moment, you know, we have different
types of processor cores with GPUs, with NPUs, and now extending that out via CXL with ARM, given how well
ARM have performed on an increasingly recognized metric, performance per watt now is becoming
much more prevalent in some decisions and enabling CXL norm will give potentially a better place to run certain types
of workloads that your processors are very good at and handle very well.
Craig, I think you've hit it on the nail, right? These systems can all get faster,
right? But if they get faster and are consuming you know more power then it's going to
be difficult to deploy them right because everybody cares about tco at the end of the day right and so
you can you can give an example you can move to ddr5 but if you look at the power consumption
of ddr5 to ddr4 you're going to pay uh with that within your opex budget. And we've recognized that as well.
And I think that's been part of the reason why
the market has been looking at Arm solutions,
because we are bringing the same power efficiencies
that we've brought to the smartphone mobile space
and bringing that into the infrastructure space.
And so power efficiency is definitely
one of the key value props for Arm within the data center space. And so power efficiency is definitely one of the key value props for ARM within the data
center space. And it's also one of those that is driving this exploration and adoption of
heterogeneous solutions, right? Is there a lower cost way of increasing the compute footprint
using devices that could be the processor, that could be smart
mix DPUs, that could be other accelerators as well. So I think you hit it on the nail. That
is what's going to be driving a lot of the exploration and interest in heterogeneous compute.
Yeah, to continue on that, it does seem that the strategy of the architecture strategy of basically every
major compute vendor, as well as all of these other vendors in the space, whether they're ISVs
or cloud vendors, is to increasingly build heterogeneous systems that have basically that bring in specialized accelerators for different workloads.
So whether it's NVIDIA, who is building a sort of a mesh of GPU and CPU, which is based on ARM IP,
or Intel and AMD, who are increasingly leveraging DPUs as a way to offload data processing from their
x86 CPUs, which in some cases also use ARM IP, it does seem that heterogeneous compute is the
future. And if you look at beyond CXL for memory, and if you look at CXL as a way to enable sort of scalable, heterogeneous systems that
combine, well, that transcend the traditional architecture of a server where you've got a CPU
and memory and expansion cards and so on. If you look at CXL as a way to sort of break that down
and build a different kind of system, I think that we can all agree that that different kind
of system is going to conclude both CPU and special purpose acceleration for all sorts of
things, whether it's a data processor, whether it's doing things like encryption and compression,
whether it's doing things like ML acceleration or traditional GPU tasks,
or whether it's some sort of specialized processor. And when you're building one of
those specialized processors, it's very likely that companies are going to be looking to arm
IP as a way to bring that to market. Because of course, you need a processor on that thing that exists on the other side
of the network.
Is that a way to think about heterogeneous compute that's more concrete for people?
That is, that's exactly what we're working on today.
We're working with a lot of our silicon partners, right, who are deploying or looking to deploy
silicon with accelerators and how to make that easy for
them to do that. And at the same time, how to make it easy for them to intersect CXL capabilities.
So part of our Neoverse platform is, you know, we're known for the IP processor cores, but a big
part of our Neoverse platform is actually our interconnect. And you can think about this as the fabric that connects multiple cores together.
We've spent quite a lot of time and energy on making the most capable, lowest latency fabric possible. interconnect what we call our CMN products, that we are enabling the ability
of landing CXL features, right?
And particularly trying to land those CXL features
as quickly as possible for our customers.
And so that's a key part of our Neoverse IP portfolio
that we work with partners and engage with partners
to really deliver this promise of
accelerated compute. And it's important to understand too, I want to make sure that,
you know, it's easy to look at all of enterprise tech as a horse race and to have people say,
oh, it's ARM versus Intel, it's ARM versus AMD. It's not just ARM versus AMD, it's ARM with.
I mean, you guys are
partnering with them. And I think that that's a very, very powerful thing. And that's one thing
that I've really loved to see about the CXL group, the CXL consortium, just the CXL community,
is that it literally is every company in the industry. And they're all working together in
sort of a positive, productive way. And they're building something that isn't
selfish, that isn't centered around their thing. They're building something that's interoperable.
And that's really exciting. Yeah. And actually, how quickly CXL has grown,
right, is fantastic, right? We joined CXL as a board member early on, we've been actively working in the consortium since 2019.
And the only other, you know, obviously PCIe, one of the great things is that it leveraged PCIe, which also has, right, a great ecosystem built around that, right?
So they didn't try to start from scratch. They're building on top of what are already great standards within the server space today.
But you now have almost 150 partners that are participating in the forum.
We're already at the 3.0 spec that was announced at the Flash Memory Summit. And now you're seeing the convergence of things like Gen Z and OpenCAPI into CXL. And so it's really created the ability for so many partners to contribute and then provide
innovations on top of it.
And so that I think has been really helpful in order to see the rapid pace of what
you're seeing CXL coming to the market now. And we're still in the early stages, by the way,
right? I mean, a lot of what we saw at Flash Memory Summit or at OCP were a lot of concept products, right? Now we're looking at all things that can be possible, right?
Some very interesting things,
like people are adding ARM cores to these memory pools.
And now you're thinking about like,
wow, I didn't think that maybe I could do
this offload capability in the memory pool itself. You're
seeing SSDs who will now look at, hey, I might support both NVMe and CXL, right, and offer
a different way of delivering persistent memory options. So a lot of this wouldn't happen
if we didn't have all kind of a common language to speak.
Right. And I think CXL is providing that.
The collaboration between all of the companies involved in the consortium, I think, will contribute a lot to the eventual success of CXL.
The fact that there's been an avoidance of doing it on a proprietary basis.
You know, there's a lot of companies that could have done it, have done it. But the fact that
everybody's agreeing on a standard opens it up, you know, to all of the existing consortium
members. And I'm sure we'll see more that don't even exist yet. You know, it's the openness of
the standard is great. It has to be a village. It has to be
open. And I think that that's the lesson that we've seen, because as you mentioned, a couple
of other standards there that kind of were aiming to do a lot of the same things. And of course,
there's been a lot of standards about, you know, interconnects and fabrics as well. And many of
them haven't worked because it's been so centric to a single architecture or
a single vendor or something. And this is the opposite of that, which I think is really exciting.
No, I agree. And with that, obviously, as CXO grows and the amount of use cases that it's trying to tackle. You get more voices in the room and you have to adequately be
able to accommodate the interests of all of these parties. And so I think the CXL Forum has played
a really good role in being able to grow membership membership but still progress the spec at a cadence
that is beneficial uh for the industry and and i think cxl is kind of helping uh by planning
releases uh in line with other industry standards that also need to come together right leveraging Leveraging the existing PCIe trust, adoption standards, form factors, connectors, it was
an absolute no-brainer for CXL to be done as an extension of PCIe and probably again
another factor that will help adoption.
Yeah.
And even you think about the memory attached devices that are coming to market, they're leveraging the form factors that NVMe drives actually first pioneered, right? Because again, NVMe runs over PCIe, right? So that physical interface and the form factors can be leveraged. And so now we're not having to restart the wheel on what is the pin connector for these
memory attached devices, right?
Because we did that with NVMe and with those devices prior.
And so we're leveraging what we feel is kind of state of the art for a slightly different
application, but still very relevant
from a product management standpoint it also helps as well because so many existing vendors are are
already used to working with those slots you know we're just adding features and your chipsets it
doesn't require full main system board redesign you know it it could be an evolution to us well yeah but you know it's
they're using known parts components yeah i think now what what what now becomes really interesting
is you know how does the server evolve right what does the server of the future look like when you
have more options uh on how to design that server?
And I really like to think about the design point is the rack now, right?
Not the individual server, right?
So when you get to tiered memory, right, you can think about, hey, I'm now actually pulling
some of these DIMMs, right, from having to reside on the server itself to now they can reside on another
like memory appliance, right? And what does that do? How does that change things, right? And how
do you make that efficient? And so this is what becomes exciting, right, in terms of the innovation
that happens when you kind of create that common language, right, for the fabrics of how these things interconnect.
And I think that's the really exciting kind of point
we have right now in the marketplace.
And you add to the fact that we're right at the beginnings
of artificial intelligence, right, becoming widespread, right? And I think what
people are looking at is that, you know, one of the biggest challenges with AI inferencing and
deep learning models is they rely on a lot of data to make those inferences, right? So these models
get smarter when there are more parameters.
When there are more parameters,
that's more data that you're trying to pull in.
And so at the same time, what's really exciting
is this work that we're doing on innovative hardware design
is really going to bring the ability of increasing AI workloads
and the benefits of AI to a broader set of the marketplace today,
right? It won't just be cloud providers that need these huge data centers to run AI models,
because we're trying to make that available to a broader set of market. So I think that's the,
you know, the benefits that are really interesting intersects
in technology at this time
that really make it for an exciting time
to be in tech and be in hardware and software.
Yeah, I completely agree
about the machine learning processing and so on.
I mean, if you imagine a future
where memory can be pooled and shared at rack scale,
where there
are special purpose accelerators deployed throughout the rack, and where you can compose
those into a processing system for the application at hand, you basically are going to come up
with a system that is just unimaginable today in terms of the amount of resources that it has, in terms of the number of processors, the amount of memory,
the amount of memory bandwidth that it would have.
And also you're removing the need to move data around quite so much,
because if you can pool and share memory,
then you can basically queue it up with one system,
process it with another system, and then output it with a
third system. And all of them are accessing this at PCIe 5, 6, 7, you know, speed and latency,
which is really, really remarkable. I want to wrap up with one thing with you, though,
while we've got you. You actually just sort of, you know, really kind of kicked my brain in gear
with your talk about revolutionizing computing. And I wonder if you, you know, really kind of kicked my brain in gear with your talk about revolutionizing computing.
And I wonder if you, you know, obviously you can't talk about future products.
And these aren't really future products because revolutions happen in the future, in the far future.
Where do you envision this going?
I think that a lot of people listening are going to be like, when do I have my CXL Raspberry Pi?
When do I have my CXL phone, iPad, whatever? But beyond that, what is CXL plus ARM? What is it going to do that's going
to make servers completely different? That's a good question. And I'll give you a couple of of examples, right? The first one is when you get to a point where you can compose a system,
the real excitement is, can you compose it on the fly, right? And so if you have a workload
and the workload is then able to understand the options, right, of hardware and accelerators that are
available, what you would like to see is that you can compose these resources, right, and do it
dynamically. And so the idea that an accelerator is talking peer-to-peer to another accelerator,
and maybe you have a chain of these,
and the processor itself is actually being called upon
when needed, right?
It doesn't actually need to be the coordination point
for all of these accelerators.
I mean, that's what we're looking at in CXL 3.0, right?
How do you enable peer-to-peer accelerator communication?
Because you can think about that as really exciting.
And then you think about each of these areas where even memory has compute, right?
Then now you have in-memory compute that's really helping so that you don't just have the entire fabric swamped because you're moving data back and forth, right?
You really want to have that data movement happen because it's adding value to the system, right?
And so now you can think about that as well.
And then you wanna see how do you expand this, right,
into a broader array of systems, right?
And I think you're gonna see that.
I think you're gonna see 5G networking equipment
starting to take advantage of CXL as well,
right? I think you're going to see, you know, CXL be integrated into DPUs and into the way switches
are built. And so I suppose just quickly then, you know, you mentioned there a few use cases,
you know, if we looked at the server side, we would have a rough idea around amd and intel's
you know expected cadence around jumping from cxl 1.1 to 2 to 3 um say that it was around a two-year
refresh pattern well what what way are arm looking at that cadence in terms of having that achievable, composable rack,
which doesn't really come until around CXL3.
What would you expect the cadence to be then on your product side?
Well, this is what's interesting, right?
Because again, because we don't do the end products,
we are at the heart of enabling customers to land these features
rather than having to wait on one vendor
to do it, they can do it themselves. The real power of ARM, right, has been that we enabled
a partner like Fujitsu to build an HPC chip and land HBM years before you saw it in the x86 camp right you actually saw aws land pcie gen 5 and ddr gen 5 with graviton 3
they've been in production for about nine months now right so this ability of landing these features
right and not having to wait on a set cadence is actually what's been part of the success for Arm, right?
Is we're enabling these partners
to determine the cadence for themselves, right?
And so I always kind of try to say,
I don't think we're setting this like two year cadence window
because our partners are gonna kind of disrupt that.
And that's what we're trying to help as well.
Yeah, absolutely.
And I think that it's it is really exciting to see Arm
not just right there with Intel and AMD and Marvell and,
you know, all these other companies, but maybe even in the lead
in terms of delivering this, you know, the CXL2, you know, promising future
editions as well. But I will say I want my CXL Raspberry Pi. So maybe you can lean on
them to deliver that.
When we get ready for these announcements, Stephen and Craig love to come back on and
be the first to tell you about them.
Yeah, right on.
We will definitely look forward to that.
Well, thank you so much, Eddie.
This has been a wonderful discussion.
I really appreciate your candor
and your enthusiasm
about the technology specifically
and just generally talking about the future,
where this goes,
because that's why we're doing this.
We're not being paid to do this.
We're doing this because we're excited
about where CXL can lead data center architecture and i can tell that
you are as well so as we wrap where can people connect with you and continue this conversation
with you so again feel free to reach out to me on linkedin as well always want to give a shout to
the ocp composable memory Workgroup. I've been a
participant in that and always welcome more folks to join us and collaborate there. I think that's
the space where you're going to see a lot of standardization, right, around how these CXL
solutions come to market. And I think that'll benefit lots of folks.
And not just myself, but several folks at ARM
have given talks around CXL at CXL forums, at OCP,
encourage folks to look at those talks
that are up on our YouTube channel as well.
Yeah, and we'll include some links to that in the show notes
if anybody's interested in learning more,
because there's a lot more technical detail there
about specifically the announcements and so on
with the different IP blocks that ARM has put together.
So thank you very much for joining us,
and thank you all for listening to Utilizing CXL,
part of the Utilizing Tech podcast series.
If you enjoyed this discussion, please do subscribe.
You'll find us in your favorite podcast application.
And please do consider leaving us a review or a rating.
We always love to see those.
This podcast is brought to you by gestaltit.com, your home for IT coverage from across the enterprise.
For show notes and more episodes, go to utilizingtech.com
or find us on Twitter at Utilizing Tech.
Thanks for listening and we'll see you next week.