Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x19: Enabling Mass Adoption of CXL-Attached Memory with Rambus
Episode Date: March 13, 2023Moving memory and other resources off the system bus to CXL is exciting, but how do we ensure that these systems will be reliable, available, and serviceable? This episode of Utilizing Tech features M...ark Orthodoxou, VP of Strategic Marketing for Datacenter Products at Rambus discussing with Stephen Foskett and Craig Rodgers the technology and standards required for mass adoption of CXL-attached memory. Rambus brings decades of experience and a breadth of technology to the deployment of memory in high-performance and highly-available systems. As a CXL Consortium member, Rambus is bringing this experience to CXL, enabling the technology across the ecosystem. Memory expansion with CXL is being deployed today, and memory pooling over CXL fabrics is coming, but it is disaggregation and rack-scale architecture that will ultimately be the result of the adoption of CXL. Hosts: Stephen Foskett: https://www.twitter.com/SFoskett Craig Rodgers: https://www.twitter.com/CraigRodgersms Rambus Representative: Mark Orthodoxou, VP of Strategic Marketing - Datacenter Products: https://www.linkedin.com/in/mark-orthodoxou-94b189/ Follow Gestalt IT and Utilizing Tech Website: https://www.UtilizingTech.com/ Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/1789 Tags: #UtilizingCXL, #CXLAttachedMemory, #Datacenter, @RambusInc, @UtilizingTech
Transcript
Discussion (0)
Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT.
This season of Utilizing Tech focuses on Compute Express Link, or CXL,
a new technology that promises to revolutionize enterprise computing.
I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT.
Joining me today as co-host is Craig Rogers.
Hi, Stephen. Good to see you again.
Are you looking forward to talking about what Rambos are doing in the CXL space?
Excellent. Yes, indeed I am.
Many of us are very familiar with Rambos over the years.
We've encountered the company, and I think a lot of us know that Rambus
is responsible for some of the basic structure of how server memory is constructed these days.
But I think some people might have some questions about sort of where that goes into CXL. But I
mean, if you think about it, it makes a lot of sense that a company like Rambus would be
in the CXL space, because in order to make this stuff real, there's going to
need to be a lot of focus on the things that they have dealt with in the past, reliability,
availability, serviceability, security, basically making this thing enterprise grade, right?
Absolutely. Given Rambus's historical exposure and experience with memory and CXL first leveraging memory to enter the market, it makes absolute sense.
So let's then bring in our guest. Welcome, Mark Orthodoxou, VP of Strategic Marketing for Rambus. Welcome to the show. It's good to have you here.
Thanks, Stephen. Yeah, like you said, my name is Mark Orthodoxy, VP of Strategic Marketing for Data Center Products here at Rambus. It's nice to talk to you guys today.
So, Mark, you heard what we had to say there. You know, what are the, let's start right off the bat
with the sort of things that might hold back memory across CXL from really taking
root in, well, basically big production environments? Yeah, that's a great question.
And one that I feel like, you know, as an ecosystem, we don't necessarily talk enough
about because let's face it, for the first time ever, we're talking about taking system memory, sticking it off a serial interface, moving the memory controller off the CPU.
And what are the implications of that?
I mean, the biggest implication, of course, is that that has to be reliable.
And you said it in your intro there.
I think the industry as a whole needs to kind of really focus in that area. If you think about
the main CPU players, the Intels, the AMDs of the world that have for many, many years honed their
memory controllers to work with system memory, mostly in the form of RDMs, that's a lot of
man years of effort that have gone into ensuring that there's the right ECC technology, that there's no silent
data corruption under most every use case, things of that nature. A lot of that work now has to be
recreated by third parties. And that is not a easy task. Now, companies like Rambus, who have been in
the memory business for a long time, are uniquely capable of doing that with some of the insights we've gained working with the memory suppliers.
But that's one major element. The second piece of it is security.
As you also mentioned in your intro, I mean, now we're moving memory a little further from the CPU.
Security becomes a concern and that has to be that has to be dealt with appropriately. And then I think more broadly, you talk about
just the standards work that's required to ensure everybody is aligned on the foundational
requirements for this type of memory attachment. A lot of work needs to be done there too. And
RAMBUS is heavily involved in all those areas. You mentioned there about standards. It's so important for something
this big and this open. With so many members of the CXL consortium, it's so important that this
is all properly standardized. And for adoption to take place, it's going to have to be standardized. Companies won't simply
throw it into a data center on the assumption that it's going to work.
They're going to want to know how it works, how reliably it works,
and how well they can maintain that with serviceability moving forward.
The standards activities are
quite involved now around CXL attachment of memory in particular, not strictly speaking memory, but a lot of focuses in that area.
I mean, everybody knows the CXL consortium, of course, which really is the foundational standards body that's essentially delivering the protocol.
Rambus, heavily active in that area.
We actually, as of January of this year,
we sit on the board of directors for the CXL consortium
and actually are the only memory company
that's building memory silicon that does so.
And that's one obviously major thrust.
So foundationally, all the ecosystem players have to come together to make sure that the protocol itself is suitable for the application.
But that's just one part of it. A lot of other standards bodies now are very involved by necessity, because as you pointed out, how do we use this stuff in production as things really ramp up over the next couple of years. And that has at least two other major components.
One is what are the form factors and the standard,
the most important requirements for the controllers that will implement this
type of technology. So, you know,
Jetec is very active in that and, you know,
Rambus has been working closely with Jetec for many, many years. That standard, just like it's important in the RDM industry, is new work group was formed formally, the Composability
Memory Systems Work Group within OCP. And what is that group doing? Well, that group is essentially
building an open source fabric manager for the purposes of managing all this tiered memory,
which is going to be foundational for its use as we go forward. In the Linux community, there's kernel work to ensure open source kernels.
So all this standards activity has to come together.
The ecosystem has to agree on all these things for this to be deployed in any kind of meaningful volume in a ubiquitous way.
And, you know, Rambus is heavily involved in all those groups, as are many of our
customers and partners. So it's a very exciting time. It's really positive, I think, to see such
broad activity across all these standards bodies. It speaks to the importance of the technology.
Yeah, I think that's what's been interesting to us here on the podcast is just the number of
companies, the breadth of the companies that are involved, and all of the different resources and ideas and technologies that are
being brought together. And it's really, I think, important that companies like Rambus are there as
well because of the long history, not just in terms of patents and technology, but even just understanding of
this market. I mean, Rambus is a company that has worked in the enterprise industry, enterprise
server industry and cloud servers, well, Nintendo 64, everything for so long. And yet having that kind of background and bringing that to the table is
important because, you know, for me as a storage guy, one of the things that kills me is, you know,
people kind of come in and sit down and they're kind of like, oh, how hard could storage be?
We'll just, you know, we'll just store some data. It'll be fine. Well, it's the same thing with
memory. I mean, people come in, you know, if they don't know the memory business and you don't have the background in this and you don't
understand what it takes to make memory reliable and to handle the issues that are going to happen
and to predict what issues are going to happen, because we've seen this before and we'll see it
again. You don't ask the right questions and then you don't have something that's reliable.
You don't have something that's highly available. And so it's good to have that kind of background
being brought to the table.
So what is it that specifically that Rambus is bringing
in terms of understanding of system memory to CXL?
Yeah, that's a great question, Stephen.
Thank you.
I mean, well, first of all,
I can't claim any responsibility
for what we were doing back with Nintendo as many years ago.
I'm about a year and a half new to Rambus.
But I mean, the reason why I joined the team was because I feel that Rambus brings something very unique to this ecosystem. in the serialization of memory for many years now, all the way from back when we were talking about things like C6
and OpenCAPI or OMI and Gen Z.
And CXL clearly has sort of taken the lead
from an ecosystem adoption standpoint.
And what Rambus brings is rather unique.
First of all, as you pointed out,
we have a great deal of historical contribution
to the memory ecosystem through inventions and involvement with
the memory suppliers by supplying chipsets that live in the RDIMs that today ship ubiquitously
in the data center. And we also have this rather unique pool of IP to draw from that we ourselves
develop and sell to third parties in the form of CXL controller IP,
PCIe controller IP, CertEZ, security blocks, and then a very strong SOC design team within
Rambus that's able to take all these pieces and put them together to do something I think pretty special. And so when you look at the memory
ecosystem and you think about companies that really kind of know the nuts and bolts of what
it is to take things like DRAM and make them consumable on a mass scale, Rambus is rather
unique. Now, when you think about the different ways that memory is going to be deployed with CXL,
there's a whole bunch of different variations on that. And that's also important to consider
in answering your question, because there's a great deal of complexity, as I mentioned,
in recreating the memory controller, taking it off the CPU and putting it external, and then
attaching our DIMMs to that. You have to consider all the reliability things
that any CPU would need to consider.
But there's another type of use case as well,
which is where DRAM is populated directly down on a board
next to something like a CXL memory controller
that necessitates an understanding of capabilities
that are very analogous to what today you find
in things like an RCD that live on an RDIM.
And very few companies have the understanding and the background to make that a productizable thing as well.
There's a lot of
acquisitions leading up to this point, and a lot of them were focused on enabling us in this area.
One of the more recent ones is an acquisition we did of a company called Hardened. They're based
in Montreal, Canada, and their expertise is in the areas of ECC, in the areas of compression. And it kind of
hints to the types of things that we feel are necessary to have a very deep understanding of
to make this thing successful. So those are a few things in answer to your question.
Yeah. And it's interesting too, that you bring up sort of non-memory thing, non-memory topics as well.
Because of course, the differentiator for CXL memory
is that it's going over PCI Express.
It's probably going to be going over fabrics.
There's probably going to be other types of IO
passing alongside it.
There's a lot of questions that go beyond
sort of what needed to be asked
when you're talking about
memory on a system bus isn't there. So I think it's important to bring in those kind of things,
bring in that kind of understanding, maybe it's storage-based or it's interconnect GPU-based,
those kinds of understandings and meld them with how memory needs to work, right?
Right. Well, I mean, exactly, 100%. I mean, when you take a step back and you look at the big picture here, I mean, what is CXL
offering? Well, it's offering a new attach point for memory. And one of it's often talked about
use cases. Why is that important? Well, that's important because today, I mean, we'll be just
lots of documentation or papers and, and, and, and, and talks about how important memory is from a cost standpoint,
how the need for memory is growing, and what are the tools that are available in the absence of
CXL to the system architect. It's really, you have HBM and you have various types of direct
attached DDR. And then you have storage, right? There was an attempt to bridge that gap a little
bit with things like 3D Crosspoint. That's not really an option now.
So what are the tools that we have in the toolbox?
Well, CXL creates a whole bunch of those tools by creating these memory tiers.
And as soon as you have these memory tiers, which, to your point, is very analogous to
storage in the past, you have a whole bunch of different things you need to think about.
You need to think about whether or not and what the use case is from a latency
standpoint, for example, on whether or not you encrypt that memory. You need to think about
things like storage techniques that are not entirely different from an FTL style layer you
might see in flash SSDs today when you want to do some kind of post-processing on a
cooler tier of memory in order to use that memory more efficiently. There's a whole bunch of
enterprise storage concepts that become very applicable to the memory space. Just pure data
integrity through silicon and end-to-end in a system is an art that was really, in many ways,
perfected in storage that is very applicable to the memory space. And I think carrying all those
ideas forward in a deliberate manner, right, we can't bite off too much too quickly, is going to
be how this market actually is realized,
how people actually start to find reliable ways
to use these new types of memory tier
to solve some of these problems,
given, again, today, their only option is
throw more HBM at it or throw more direct-attached DDR at it,
which is fraught with all kinds of limitations.
There's been huge parallels
between where
we were with storage and where we're going with memory. And one of the first things we did, you
know, we had huge amounts of wasted storage and physical servers, you know, that just weren't
using it. And the solution was pooling it in virtualization. Now we're doing the same with
memory. We're pooling the memory to let us mitigate that wasted resource.
And it's an expensive resource to waste.
But, you know, where we started off with storage with basic RAID and then to erasure coding, you know, storage has evolved over time.
And, you know, SSDs and NVMEs, it'd be interesting to see where memory goes now. That pooling is going to be a foundation
that other services, things that we haven't
even thought of yet. Nobody was thinking of erasure coding on a
physical server. It'd be very interesting to see where it
goes. Yeah, I fully agree. And there's a lot of
conversations being had, both in closed
door meetings and in standards discussions around exactly that universe of application space.
And there are a lot of different thoughts on the right way to architect such a system. I mean,
as you pointed out in storage, we saw that evolution happen in the form of different
types of storage media with different types of interfaces, migrating to things like JBODs
or JBOFs that attach to a number of servers.
Will we see similar things for memory?
I think we will.
Will they live on the other end of a fabric?
I think eventually they will. There's a lot of things to work through to get to that point. So I think that's an inevitable
place that we will get to. I do think we need to crawl, walk, run a little bit as an industry.
And again, I mean, just sort of the way we started this conversation. I mean, there are some foundational things to address in order to really convince the ecosystem
that we have a reliable way to use this memory
with the correct performance profile
and the right software infrastructure to manage it.
And then that will continue to grow.
And we will eventually see,
we will eventually reach that end of the rainbow where we're able to effectively utilize memory directly and remotely and with reliability so that the architects have truly all the tools in their toolbox to architect servers and racks and data centers the way that they would like to, depending on the workloads that they have, which are themselves widely varied.
We're starting off with memory pooling,
and we believe that that's going to be quite rapidly and widely adopted.
We don't know about, say, the future of CXL,
you know, features in 3, 4, 5, et cetera.
And I'm getting these visions in my head of memes,
you know, design and user experience, you know and i'm getting these visions in my head of memes you know uh design and user
experience you know you have companies like rambus now that are building these that that are creating
the the tools and then we're going to have companies who are actually building the cxl
solutions where do you see those solutions going long longer term what after memory pooling what do you believe
would be the next adoption down the cxl roadmap well there's a number of ways that we can um
we can look at a few different directions um so as you point, memory pooling is broadly talked about. It seems to deliver a pretty obvious value proposition, which is use that expensive resource more efficiently, free up capital to purchase more memory, to also eventually over time just more broadly become a foundational new component of what has often been talked companies would love to get to is a truly sort of heterogeneous composable rack where they can plug and play things wherever they like.
And that means also the attachment of GPUs and accelerators, the ability to do direct transfers from memory to memory, from cache to cache across those heterogeneous components.
So that is, I think, one certain direction that the ecosystem is trying to move towards after, strictly speaking, pooling of memory.
I think the pooling of memory is talked about most nearest term in terms of that sort of far memory because it is such an
expensive resource there's never been a solution for this before whereas we do have some solutions
right now for disaggregation of things like gpus and accelerators over pcie and there's also some
proprietary solutions out there as most people know. But putting that all together kind of is then the
next step. You can't really do that effectively, strictly speaking, over, for example, PCIe. You
need something like CXL and the protocol enhancements that it brings to make that work.
But the other thing that I'll say is, you know, going back to direct attached memory, right now we talk a lot about,
well, you have this many, you know, DRAM channels on a CPU from an Intel or an AMD or pick your
favorite CPU company. And CXL, which runs over PCIe, which is a ubiquitous interface, there's
increasing number of lanes per CPU to allow for
it to be used to do interesting things. You can take that CXL memory and you can add more memory
bandwidth or more memory capacity or introduce new types of memory over that additional interface.
But I would suggest also that if you look far enough out, the CPUs themselves today,
what are they growing to? More than 6,000 pins, something like that.
Like I've talked to customers that they can't conceivably go two DIMMs per channel on a socket
and a CPU because they physically are running out of space in a server if it's a two socket
server, for example. In fact, maybe even a two socket server becomes hard to build because these things get so large. So I would say that there's also a possibility down the road that the serial memory, as opposed to delivering a new memory tier, sort of is just the mechanism to attach memory to a CPU, whether it's hot, whether it's warm, whether it's cool, whether it's cold. And I would point to a company like IBM, who's demonstrated, has pioneered sort of this thought experiment, which is serializing
memory on their power series. So is that a direction that we could also go? And that's
going to require new standards work in terms of form factors of memory components, as well as
perhaps the protocol itself. So there's a number of different directions that we could go beyond the direct memory
attachment with CXL that we talk about quite a bit as an industry right now and pooling,
right?
There's those two sort of different paths, which I think are very powerful and very
interesting and speak to why we're all making these investments right now. Yeah. And speaking of that, I think that we can all kind of understand where, you know,
that Rambus has, is able to bring IP to the market in terms of helping, you know, basically
having, and I think you've already maybe announced some product in that area, or maybe there,
you know, we've heard about it, but where does the product picture go for you?
Are you basically going to be enabling with IP, or are there going to be Rambus products?
Yeah, great question. As you know, Stephen, we haven't made any announcements yet in the area
of specific CXL products. We do have lots of information available about the IP
that we make available to the ecosystem to build products of their own. All I can tell you right
now is that we're doing some pretty interesting stuff in this area. And hopefully we'll be talking
about it a little bit more openly soon. You do get a bit of a hint for the direction we're headed if you look at some of the publications that we've issued on our CXL initiative.
But the rest of it, we'll have to wait for a future discussion before I can talk more about it.
Well, I can't wait to hear what that is. But, but in terms of IP, at least it's, it's great to know that that you all are there to basically support these
products that are coming to market with you know,
with the Rambus technology. And, and, and I guess for now you know,
that is one of the approaches that a lot of companies are making to to
basically some of them are providing IP and some of them are providing, you know, a product.
And they're all meeting together in the CXL market. Another question that I would have
relating to Rambus, though, is, you know, there's a whole thicket of intellectual property here.
How does that work? I mean, we're outsiders. We don't see
how it goes. How does it work to bring technology from 30 different companies together into a
standard? Yeah, that's an interesting problem space, but one that we've navigated successfully
in the past, and again, I'll go back to examples like storage, right?
I mean, the situation was no different with SaaS or SATA.
It was no different with NVMe, even PCIe itself. for technology innovation and the need for product to enable it becomes so clear that you kind of
have to sit at the table with your partners and your customers to ensure that we're all investing
R&D dollars in a manner that we can deliver some ROI. And then the only way you can really achieve
that is consensus at some level.
And so, yes, is there an art in enabling standardization while leaving an opportunity for innovation?
Absolutely. But, you know, it certainly is possible. It's been done before.
And I think actually to the credit of, you know, the end users of these products and that's the cloud service providers, that's the server OEMs.
They are very, they put a great deal of importance on participation and driving in a collaborative way that work.
To the point where, you know, they will probably say that, you know, they don't want to partner with a company that,
that, you know, is fighting or resisting that. Nobody wants to put something proprietary that,
you know, is sort of defined in a dark room into a system that gets broadly deployed.
And so it's really almost a mandatory part of the pre-sale cycle and of the industry enablement cycle. I mean, if you want to
make product to address these things, you just need to collaborate in that way. And everybody
sees it. It makes for some very interesting and heated discussions sometimes in standards bodies.
But at the end of the day, it lands in a constructive place. And we'll navigate
it just like we navigated in the past for some of those standards that I mentioned, by example.
There's one thing you mentioned there that actually set off a light bulb moment in my head,
and that was around the serialization of memory access. And I hadn't really thought of it that
way. And again, I'm getting storage flashbacks you know of parallel
ATA and ultra two wide CTV with 80 pins and you know now it's down to a few pins on a on a sas
cable and obviously the performance and latency and everything from where parallel was to now
how it is in serial has improved so it'd be very interesting to see if memory access took that trend.
I mean, at the end of the day, what is a CXL interface
but a memory interface, at least in one incarnation,
a memory interface that consumes one third the pins of, for example, a DDR5 channel?
How is that?
How is that not a good thing for the compute ecosystem?
And in the long run, if we can get the latency down to where it needs to be,
the reliability down to where it needs to be,
it seems to me that that could be quite ubiquitous.
And I would also point out that when you couple it with standards like UCIE and other things, it also contributes
to the disaggregation of the CPU itself at a chiplet level, because now you have new ways to
plug and play different architectural components right at a package level, which again,
introduces all kinds of interesting possibilities. Absolutely. And I think that, like you said, Mark,
one of the
things that I'm going to bring back in here to kind of look forward to where CXL goes, I think
that one of the biggest problems that CXL is solving is what you mentioned earlier about basically
the physical constraints that we're facing due to the system architecture choices that we made a decade ago.
And so we've got too many pins, we've got too many lines,
we've got too many memory channels,
and we've been trying to overcome this.
CXL allows us to overcome that in a way
that not only means that the physical footprint
of the motherboard can be redesigned and re-engineered,
but in a way that allows us to
basically blow up the whole concept of a server and maybe get to this kind of rack scale architecture.
And I think that the coolest thing is that in the long term, that opens the door to basically more
compute, more memory, more storage. It opens the door to building systems that are
basically impractical given the constraints of current architecture. And that, I think,
is really where everything changes. Because there's all these dominoes that are going to
fall down. As soon as we don't have all these ranks and ranks of memory stacked across that server,
we can move that stuff around. We can adjust pooling. We can adjust power. We can, you know,
basically adjust, you know, how we place the CPUs. We can do all these things. And a lot of that is
being held back by memory, by memory channels. And I think that this changes it. Yeah, I mean, absolutely. And I mean, if anybody is wondering if the requirement for more memory
is going to go away, I encourage them to type that question into chat GPT and see what it tells you.
The point being that, you know, AI is not going to be slowing down anytime soon. And that's a
big memory driver. I think on one of your other episodes,
to highlight the point that you just made,
you were talking to Dan Ernst at Microsoft
and he said something like,
computer architecture isn't for it.
So there you go.
I fully agree with him.
And I think that supports your point.
Absolutely not.
And I think that that's really exciting
to see where this is going.
Well, this has been a wonderful conversation.
I really appreciate having you on here.
Also, I'm really glad to hear that you're a listener to Utilizing Tech.
Where can people keep up the conversation with you?
Where can they connect with you?
Yeah, thanks, Steve.
And I certainly hope that I can connect with some of your viewers.
The best opportunity coming up is I'll be delivering a keynote at MemCon 23 down in
Mountain View, California on March 29th.
So feel free to grab me at the event.
Please come to the session and talk to me afterwards.
Also, you can reach me on LinkedIn.
And for what Rambus is up to, please do visit rambus.com.
Thanks again, Steve and Craig, for having me today.
It was a lot of fun. Hope to do it again.
Great. Well, thanks for joining us.
And Craig and I will be a part of the Tech Field Day event that's happening.
Actually, it will be happened by the time this episode is published.
So please check out YouTube slash Tech Field Day for video recordings.
And that features a presentation from this about CXL Consortium and a few of the other companies in the CXL space.
So we look forward to having you join us there.
Thank you for listening to Utilizing CXL, part of the Utilizing Tech podcast series.
If you enjoyed this discussion, please do subscribe in your favorite podcast application or on YouTube.
Also, please do give us a rating
review because, of course, that's always welcome. This podcast is brought to you by gestaltit.com,
your home for IT coverage from across the enterprise. For show notes and more episodes,
go to utilizingtech.com or find us on Twitter or last time at Utilizing Tech.
Thanks for listening and we'll see you next week.