Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x4: Implementing CXL with George Apostol of Elastics.cloud
Episode Date: November 14, 2022Experts call CXL the lifeblood of composable datacenter infrastructure, and for good reason. It has unlocked tremendous possibilities and is reshaping the server architecture for good. Over 190 compan...ies including some of the biggest names in the industry are involved in it, and as new versions of CXL are rolling out, the technology is clearly taking steps towards maturity. But the market needs more CXL-based technologies to kick-start its evolution. In this episode of Utilizing CXL, hosts Stephen Foskett and Craig Rodgers join Founder and CEO of Elastics.cloud, George Apostol to talk about this transition and the need for CXL solutions, and what Elastics.cloud is bringing to market. Hosts: Stephen Foskett: https://www.twitter.com/SFoskett Craig Rodgers: https://www.twitter.com/CraigRodgersms Guest: George Apostol, Cofounder and CEO, Elastics.cloud. Connect on LinkedIn: https://www.linkedin.com/in/geapostol/ Follow Gestalt IT and Utilizing Tech Website: https://www.UtilizingTech.com/ Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/1789
Transcript
Discussion (0)
Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT.
This season, we're focusing on CXL, a new technology that promises to revolutionize server architecture.
I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT.
On this episode, I'm joined by co-host Craig Rogers. Welcome, Craig.
Thank you, Stephen. I'm Craig Rogers. You can Craig Rogers. Welcome, Craig. Thank you, Stephen.
I'm Craig Rogers.
You can find me on at CraigRogersMS on Twitter.
And we are here to talk about CXL and how it's going to transform and change the way we interface with components in the future.
So, Craig, you and I were both part of the CXL forum recently at OCP Summit.
And during that, we saw a really great explosion of technology,
a great amount of support from various different companies.
We also saw the progress that this technology has made
from the very basic initial products,
which are rolling out kind of now as you're reading this,
but also where it's going in the future.
We heard from the CXL Consortium. We heard from the PCI SIG. We learned about the next versions of CXL.
And we saw what this all promises. But in order to get there, we're going to need some new
technology, right? Indeed. the equipment we need to interface
with these components in a different way
simply isn't on the market right now,
and that has created a lot of opportunities.
Yep, and that's why in this episode,
we decided to invite on one of the speakers
from the CXL forum, and frankly,
a technology pioneer in his own right,
George Apostol from elastics.cloud.
Welcome, George. Thank you, George Apostle from elastics.cloud. Welcome, George.
Thank you. It's nice to be here.
So tell us a little bit about yourself and your background and how you got into this technology.
Yeah, so we were back maybe 20 years ago, I was the vice president of engineering at a company
called PLX Technology. And at PLX, we pioneered PCI Express switches
that are pretty much in the market today.
So from a career standpoint,
I've spent a lot of time designing these systems
and designing systems connected to various kinds of components
that have sort of transformed throughout the years
in terms of not only their functionality,
but also performance.
And so as that performance of the compute elements, the IO elements has gone up, the need for greater performance at the system level has become apparent.
And now PCI Express is not serving the need anymore, which is the impetus for CXL and why you see such a
great adoption for it because you know the need for better performance and
better utilization right has become very very key in the data center space today
I actually caught your presentation at CXL summit it was very interesting the
way you have taken a system on a chip approach to
create your products. Is it fair to say that's going to be the bedrock, you know, the foundation
for your CXL switching products moving forward? Yeah, because as we looked at it, you know,
as Stephen said, we've started, and the CXL Consortium will also say, right, it's kind of like,
you know, qual rock uh technology that they're looking
at and a lot of people are doing the crawling but we're looking ahead forward to see okay when when
we want to run what is it going to take to do that right and and we believe being able to not just
connect the devices together but to be able to control and manage, to reconfigure, to compose,
all of that is going to take more than just enabling the connectivity.
So we're building an SOC to be able to do all of that.
And our SOC just happens to have a 256 lane switch on it.
A happy coincidence.
It's interesting as well that these are also you
know cxl 1.1 will be coming out with sapphire rapids and our initial gains in terms of operational
efficiencies are all going to be around ram and memory pearling and the you know it's it's it's
the biggest gain we'll be able to make in terms of efficiency. You know, more than 50% of a server's cost right now is RAM.
And if that RAM isn't being utilized completely, it's wasted money.
You know, it's negative on the TCO of your overall platform.
But I think you're working at stuff currently on CXL version 3, which is even going to allow future components to be integrated with GPUs,
storage, AI modules, et cetera.
Can you tell us any more about that?
Yeah.
So as we started to look at this technology, we started looking at what is the evolution
of composability, as I call it, right?
So you're correct everyone right now the focus is on memory and RAM
because because of the cost issues but once you've developed a shared pooled disaggregated RAM
solution right that works the next step is going to be how do you do that in a tiered memory fashion, right? Because you can't put everything
in DRAM. And so we're looking at what are the different, you know, mid-tier and then longer
tier technologies around memory and storage that can be created, you know, in order to, again,
get more efficiency on the cost side.
So after you've got this pool of DRAM, you may have a pool of some mid-level type of
memory, depending upon what it is, and going all the way to SSD.
So now you have this ability to share all of this data to be able to get efficient on
where your large data sets get put within that tiered structure.
Then after that, now you've got this heterogeneous computes that are going to be used in order to be able to
process those workloads more efficiently.
And then once you have that, then you've got to figure out how do you scale that?
How do you scale that in the box?
How do you scale that in the rack and rack to rack?
So we believe that CXL is going to be able to scale easily within the box challenges in the rack and challenges rack to rack. So we believe that CXL is going to be able to scale easily within the box,
challenges in the rack and challenges rack to rack, but that's going to be the scope of that.
When you start going rack to cluster and cluster to cluster, that's where Ethernet lives and it's
going to live forever. So we believe that CXL, just like PCI Express and Ethernet, are going to
coexist probably for the next 20 years.
So as these things evolve, what we're looking at is how do you create these pooled resources?
And then how do you execute on true composability?
Where workload comes in, you can specify the compute, memory, storage, networking that's required for that
workload, compose a virtual server, execute that workload, put the resources back and do the next
one, right? And do that at a speed that is unheard of today, right? Literally on the order of
microseconds, we believe ultimately we'll be able to do that. So to do all these pieces, right, you're going to need to have intelligence where these devices
are being connected.
You can't have a single point of control because the resources are too big to do that.
You'll just create more bottlenecks.
And so this problem has to get sort of disaggregated in the same way that the components are disaggregated. How they are managed also has to be disaggregated. Right. So this is these are the things that we're looking ahead and trying to figure out how are these
next generation composable infrastructures, composable architectures going to work?
Yeah, that's really the question, isn't it? Because it seems to me, especially after
attending the CXL forum and seeing what is currently being developed by the server and memory manufacturers,
that the concept of CXL-based tiered memory is not really in question.
I mean, these things are going to be delivered.
There's going to be multiple providers of the physical components.
There's going to also be support for it in software.
And that's all being worked on.
The question is, I guess, long term, where does this go? I want to zoom in there, George, on one of the things that you talked about, maybe we can talk about the rest too. But let's start with this
whole world of tiered memory and what those systems look like. So a system that has a CXL-based memory expander in the short term, in 2023, is going to run
some kind of software that allows the application to access that expanded memory that is off
the memory bus.
And then you'll have basically a server with more memory than you normally would, and then
that can be used for some kind of big memory application. But going forward, once we kind of get beyond that, this whole world of tiered memory, I think
this may be eye-opening to listeners, it may be eye-opening to others, but it really ought not to
be, because already processors have cache memory, they have DRAM. As I said, now they're going to have CXL memory expanders.
Many of the people listening may be familiar with Optane persistent memory modules that gave you tiered hierarchical memory in the third generation Ice Lake systems from Intel.
And those things are going to continue in the next generation and beyond that, thanks to CXL.
What do these systems look like?
I mean, what do you think that a big memory system is going to really look like short term?
Well, so, you know, one of the things that we see immediately,
so, you know, of course, we're developing, you know, along with the SPAC as things go along also. So today, as an example, we have an
FPGA card that connects to a CXL slot in a server, and that FPGA card has memory on it. So that's
sort of the basic expanding memory within the server.
Right. And even just with that, we can see significant amount of performance for these workloads and databases that are much bigger than what can fit into the available memory space.
So, you know, we've run a Redis database that basically is very full as we start to do queries, and it starts to swap. Today it swaps into and out of SSD.
So, you know, we ran that test, and then we said, okay, instead of swapping out of SSD, let's swap directly to our FPGA memory connected through CXL. And mind you, it's an FPGA that's not as fast as what an
ASIC does. But even with just that, we saw a 20x improvement across the board in bandwidth and
latency and the number of operations per second, simply because the access time for that is orders
of magnitude faster than what an SSD is, right? And so this is why we see these applications that are coming up.
They have varying performance tiers required for them.
If you look at autonomous driving as an example, if you're driving and you're updating the
map while you're doing that, while you're driving, then that's sort of critical
data, right? But if you're parked at a light and not really doing anything, you don't need to have
the updated information as fast as necessary. And when you look at these 12K cameras that are all
looking at this data, it becomes very important
how you're going to store that, how you're going to use that, and how those updates are going to
happen. Same thing with AR, VR, same thing with Web 3.0, right? There is a tiered structure, and this is
why, you know, Microsoft and the Linux community are all looking at how do they add semantics,
right, into the operating systems
so that you can use these tiered type of memory environments and again it's all about optimizing
cost for the system that you're creating and the workloads that you want to process i i think optin
filled a very good gap there you know you were mentioned multiple orders of magnitude between NVMe and system RAM.
Optane was slap bang in the middle. It was a perfect middle tier in terms of latency there.
The potential applications for increased memory in a single server is huge across the board.
And it's great that it's the first problem being addressed.
But moving on a little more further term,
what do you think the next wave of devices
will be coming into the CXL bus?
Well, I mean, I think if you look at the manufacturers
of these components across the board,
they're all transforming to CXL interface today. And I think this is going to depend on the systems designers as well as the end users as to what they're going to require from a performance and cost standpoint in order to dial the right solutions in. And do you think the server manufacturers are going to be able to increase their cadence
in terms of how often they upgrade to the next PCI Express or CXL bus now?
There was a huge gap between PCI Express 3 and 4, and now 5 is around the corner.
In order to get to CXL 3, we need 6 and 7 to hit the market.
Right.
So again, not necessarily, right?
The way the spec is being created is you can have CXL 3.0 functionality at Gen 5 speeds,
right? So it just depends on where the systems guys want to intersect,
you know, the various parts of the spec.
And I believe that a lot of that functionality will be done
in a Gen 5 environment because going to Gen 6 speeds
is another sort of change architecturally as well as electrically, right?
And so there's some work that has to get done there.
But I think right now,
a lot of people are already comfortable
with Gen 5 speeds.
It's interesting then that you're saying,
you know, as long as we're willing to accept
the performance hit in effect
from not moving to six or seven quickly,
we can have that level of
functionality and your solution then must have a software component an orchestration component
that that controls this allocation of resources um what challenges have you met there well so
one of the things that we want to make sure of, because we've seen this with other technologies like InfiniBand, for an example, right, which is a great technology,
but it was a heavy lift in order to implement it, especially on the software side, right? And so
what we want to ensure of is that the, you know, the last few decades of work that's been done
on network orchestration, service
orchestration, resource orchestration, all the way up to Kubernetes and the management of
workloads, we want to make sure that we're not changing any of that paradigm, right? And so
what we have to do in our chip is we call our resource manager, which is responsible for the locality of devices
that are connected to our device, right?
And then being able to provide that information to the upper layers of the software, depending
upon where they're needed.
And there are varying de facto standards.
I wouldn't say that there's there is a standard I
think there's things that people use that we are working with to be
compatible so that they'll be seamlessly be able to access you know our what
we're doing inside of our chip in order to manage those resources it is a soft
there is a big software component and the things that we're going to be able to do,
because we have all of the knowledge of what's connected to us and how the traffic is flowing,
I think is going to be eye-opening from a server architecture perspective as well.
One of the things I tried to do when I was at Samsung, previous to starting this company,
was to be able to characterize
sort of where all the traffic was going in a server.
That seemingly simple task was not simple at all.
There's a really big challenge to do that
because the visibility wasn't there, right?
But now if I have this intelligent switch
to which all the components are connected,
I have to count the packets anyway
to make sure they're they're going in and out and with that i can i can you know gather information
on traffic flows how things are going right this is the same way ethernet did and this is how qos
was born right from ethernet standpoint so you know we're we're taking those lessons learned
and saying how can we move that to managing the flow of data at the nanosecond level, right is a significant part of the performance bottleneck
that we have today. If we have a shared pool of memory where all the devices that need that data
is accessing that same device, then we're going to minimize the number of buffer transforms,
the number of copies, right? And that in itself is going to create significant better overall
system performance. And so, you know, for us, it's really about optimizing that system from a hardware connectivity
perspective, but also on how the software manages those resources in order to be able
to get the data to where the compute can use it and operate on it.
So it sounds as though we're going to gain a whole new layer of monitoring insight in terms of looking at performance, you know, areas that we previously haven't been able to see easily.
Absolutely. Yeah. And I think that's going to be, I mean, ultimately, I tell my team, right, we're going to be a five, six to one software to hardware company when all is said and done.
But all of it is enabled by the performance and the connectivity that we get from the hardware.
But now that we have this visibility, the controllability, the granularity of control over these devices,
I believe there's a whole slew of applications and new kinds of things that we're going to be able to do architecturally
in order to improve overall performance at scale. And I think that's the key thing,
is to do it at scale. This is one thing that I think a lot of people aren't privy to,
is the details and the amount of work that Google has done in order to do things at
scale. And a lot of the things that we're talking about, they've already implemented in many, many
different things. And so what we're saying is how do we take that scale-up technology and bring it
into the rack and bring it into the boxes and the rack that can then be used at not just the cloud
layer, but at the fog layer
and at the edge layer where we're seeing, again, tremendous growth in equipment that needs to go
out there simply because you can't move data around. So let's talk about that, George. I think
that from a practical application standpoint, what will this allow? What will this enable that we can't do now? And as you mentioned,
I mean, I think the real difference is that current server architecture is really limited
basically by the box, by the constraints of things like memory channels, number of sockets,
number of PCI channels, that sort of thing. Once we're not
constrained by that, once we have a different kind of computing, a different shape of computing,
how will that change the applications that are being run on systems? Will we still be
sharding things? Will we still be developing microservices and tiny containers? Or will we
move back to a more monolithic compute environment?
Well, I don't think that you will get to a monolithic compute environment because I think
the heterogeneous compute environment is here to stay.
These specialized processors, who I first heard Dave Patterson and John Hennessey talk
about, is becoming the reality of of how these
systems are are working there's just the the you know the scale up uh model you know because of uh
denard scaling is sort of we hit a limit on that so we need to we need to scale out in terms of
the the kinds of processing elements that we use to process the data.
Right. But, you know, I think, you know, one of the,
one of the things that we looked at was just in the memory configurations alone, right. We call it the $80 billion problem. Right.
And it's because if the, you know,
the recent study that have come out that everyone's talking about in the
industry, you know, that said, you know, 25% of memory in the server is stranded.
That means it never even gets accessed because there's just not enough
resources to be able to get to that part of the memory.
And then 50% of a lot of the, you know,
the virtual machines and structures that are within there also don't get used
because there's
there's a performance bottleneck in terms of being able to use all those right and so you know if you
look at the amount of dollars that are spent every day um in in putting memory into these servers
that get stranded and that get put into a rack um and uh and you just count the dollars, right? It turns out it's at 400,000 racks per year average being deployed.
It's an $80 billion savings that we're going to see just by being able to pull memory.
This is why there's so much interest in doing this.
And fundamentally, what that's going to do is, one, it's going to free the memory and be cheaper,
but you're also going to get better performance because now the elements that have to access that memory can do it through um through
greater bandwidth of connections that go into that memory that you can't put into a server because of
the limitations of the power that can fit into the box, right? So if you've got a server appliance, you can have thousands of processing elements
accessing that pooled memory
and processing that pooled memory, right?
Where you really couldn't do that before
when it was compartmentalized in these boxes.
I think there's another interesting aspect there,
just flipping back to your deterred memory approach. There's nothing that says it has to be
DDR5 in the expansion. It could be DDR4, it could be DDR3, and now you're driving potentially
further significant savings. If a workload doesn't need high-speed RAM, it just needs a lot of it, that's a fantastic use case of older technology.
Yeah, and again, that's why I'm saying that the systems designers will be able to now have options under how these things are composed.
And it's not just the cookie-cutter granularity that you get today in the data center.
And this disaggregation is also going to help, as you talked about before, about how do you do upgrades.
Today, the server never gets touched once it gets deployed until it gets thrown away.
Now, if you've got these disaggregated resources, you can add more memory to it and don't have to disturb everything else that's happening.
You can add new memory, right, and new processing elements and, you know, different ways of
upgrading the various pieces of the server, right, compute, memory, storage, networking,
and software without having to bring down, right, entire aisles of racks and machines
in order to make that happen. I think that's going to make the management of these devices in the data center a lot easier.
More complex from the technical side, but easier from the physical side of having to go through and change things.
And that's going to lower overall costs again significantly.
Data center guys hate the truck role, right?
And we'll be able to do things with much more efficiency.
So from your perspective, do you think that it would be better for that software to be standardized and live in the operating system or something like that?
Or do you think that it would be better to have more custom architecture to the software as well, or maybe even integrate it into the applications?
Yeah, so I mean, I think it's a tiered problem, just as creating the tiered access. Now you've
got to be able to create the tiered usability. And I think, again, this is why we need to really work together from an industry
standpoint to figure out where those control points need to be, right? So that they can be
optimized appropriately. In some cases, it may be just perfectly fine to let the hardware figure
out where the tier needs to be, right? But in other cases, you want to have maybe even more
granularity and more control on the application side,'s going to make sure that your high priority data is where it needs to be.
Yeah, that's great.
And I think that it reflects the reality of the challenge that a generic piece of operating system software is not really going to cut it.
We're going to need to have more integration higher in the stack
in order to make best use of these things,
and also in order to react as systems are recomposed and reconfigured.
So to wrap up this episode,
we're going to be asking a similar question to most of our guests,
at least here in this season.
And what I want to do is I want to kind of take a
step back and think about optimistically. We mentioned, for example, Craig mentioned just now
the strange concept, but a very realistic one, that CXL memory could actually open the door to
using lower performance memory modules because they're on the other side of the CXL bus.
Is there some other unexpected or surprising way that you think that CXL will change
technology and computing? Well, again, I think we go back to my early days working at Xerox and then
being fortunate to interact with a lot of the pioneers at Xerox and then being fortunate to interact with a lot of the
pioneers at Xerox PARC who are looking at different kinds of technologies.
And that's where I first heard the word composability.
This was back in the 80s.
And again, the concept has been around for a long time.
And the reason is because we don't want to have to continue to recreate
architectures over and over and over again as we start to improve, right? If we can have an
architecture that, you know, that can grow, you know, with the individual component technology
as it evolves, then we're better able to use that technology, right? And then ultimately
that will lead to the greater performance, right? And we're being pushed to do that now
because all the data that's being created now is a lot of it is machine generated data.
And people don't want to lose that data, right? We can't store it all because we can't create enough bits to store it,
but people don't want to lose the value of that data.
And so in order to extract that value,
we've got to be able to really optimize how we run these AI ML workloads on this data
and bring the cost to where it's reasonable to do it. And I think this is what CXL is enabling that we didn't have with PCI Express.
And that is the ability to share components, the ability to have peer-to-peer communications between those components,
and then ultimately leads to the ability to do what I call true composability.
So composability is not just connecting devices together.
It's how you control those devices and how you manage those devices all the way up the stack,
as we've been talking about, in order to ensure that we're optimizing the system level performance at the right points.
Well, thank you so much for this conversation, George.
It's been great catching up with you
and connecting with you here.
Before we go, where can people connect with you
and continue this conversation on CXL technology?
Yeah, so, I mean, we're on the web at elastics.cloud.
And I think the next appearance we're going to show we'll be at is Supercomputing 22 in Texas.
And there we're actually looking at demonstrating a multi-server, meaning four or more, if we can get them, servers connected and sharing memory altogether.
And so we're starting to scale this thing out so that we can start to demonstrate what the power of this is
as you start to move up the stack on the software side and managing the workloads that use those memories.
Well, that's great. I can't wait to see that.
One thing I'll mention is that we did,
all three of us were present at the CXL Forum and the videos of those presentations
are going to be online shortly,
maybe by the time this episode is published.
So just use your favorite search engine,
look for CXL Forum and look for elastics.cloud
or the panel and so on.
And you'll find video recordings of that event as well.
Thank you for listening to the Utilizing CXL podcast,
part of the Utilizing Tech series.
You can find more episodes of this podcast
in your favorite podcast application.
Just search for Utilizing CXL or Utilizing Tech,
and you can find episodes of our previous iteration, Utilizing AI. For show
notes and more episodes, go to UtilizingTech.com. Thanks for listening, and we'll see you next week.