In The Arena by TechArena - CXL Enabled Memory Innovation with Memverge
Episode Date: March 28, 2023TechArena host Allyson Klein chats with Memverge founder and CEO Charles Fan about his company’s disruptive vision for breaking through data center memory limitations and what the CXL standard will ...bring to infrastructure innovation.
Transcript
Discussion (0)
Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators
and our host, Alison Klein.
Now let's step into the arena.
Welcome to the Tech Arena. My name is Alison Klein, and today I'm delighted to be joined by Charles Phan, founder and CEO of Memverge. Welcome to the program, Charles.
It's wonderful to have you on the show. Doing great. Thank you for having me here, Alison.
Why don't we just start with an introduction of MemVerge? I'm very familiar
with you. Let's just ground on the company and why you decided to found MemVerge.
Yeah. So MemVerge was a company that was founded in 2017. And we started the company because we
think the world needs more memory. And as it needs more memory because it needs more data,
that it needs more memory software as well.
So we started a company to develop software specifically for memory
to enable the data-intensive applications.
Now, you've been delivering disruptive memory technologies
since the inception of MemVerge.
Why is this so critical for customers
and why does the world need more memory right now?
Yeah, because the modern application
tends to be data intensive.
They are processing more and more data faster and faster.
And a good example would be AI machine learning applications
that we are hearing more and more about,
like ChatGPT and so on.
And in order to train the AI model, they have to process billions and trillions of data samples
and dealing with large models with billions or even trillions of parameters. And all these
require very low latency store for these data. Classically,
storage system were built as a storage of data, but they are too slow now. The world needs a more
memory-centric architecture where all these data can be stored and can be processed in real time.
When you take a look at the solutions
that you've been delivering in market,
and you mentioned ChatGPT, different AI and ML models,
what trends do you see across the types of applications
and the types of industries that are looking at this?
Or is the trend horizontal and it's pretty much everyone?
Yeah, I think it is affecting more and more different industries and verticals. Over the
last six years, we had a pleasure working with a number of customers across different verticals.
So I'll give some examples. We've been working with customers in the financial and they have
trading platforms that are dealing
large amount of data and they need to perform real-time analytics at very low latency.
Now, a more memory-centric architecture, both for the databases and for the applications,
become critical for them. And so we've been working with a few customers in those space to applying our big memory technology to enable their trading databases and applications.
Another example would be scientific computing in particular, in life
sciences, in the genomics, bioinformatics.
We are looking more and more into the gene sequences, both of people,
as well as of animals and plants. And there are more and more
sophisticated algorithms to run on them. And that requires more memory-centric infrastructure as
well. And we have been delivering solutions to help them accelerate time to results and also
lower their costs as they are moving their workload to the cloud. So those are just some examples
of how a big memory technology can be used by different industries to accelerate their
applications. Now, Charles, I know that you started in 2017. And at that point, you were
looking at some esoteric paths to expand memory for application requirements. Yeah. Then CXL entered the picture.
Can you describe when you first saw the CXL spec
and realized what was going to be happening
from an industry standard perspective,
what did your mind do when you saw that?
And what did that represent to you
in terms of an opportunity for Membered?
I think that was a real revelation.
You see the CXL spec on the market and actually embraced by the industry leaders.
And that we are going to see the hardware CXL hardware hitting the market for
production by the end of this year.
So we think it's going to be a revolutionary change and the most significant change
that CXL represent is a, I call it a liberation of memory from a computer.
And let me explain why I call it a liberation. If you go back 50 years, everything is within
a computer box, memory, storage, networking, compute, of course, and software. So you have
basically a computer or a server as your single unit of
compute. Over the last 50 years, more and more different resources become liberated or
disaggregated from compute. Starting from storage, that storage starts to go out of the computer box,
become its own being in the form of a storage array, storage servers, file servers that
essentially are dedicated to manage storage and that can scale independently and manage
independently and protect it independently as a compute. And networking as well become
disaggregated and both storage and networking can be software defined. And even GPUs can be decoupled from CPUs in a pretty disaggregated way.
And memory is the last resource that's still very tightly coupled to a CPU until CXL. So what CXL
does is it can put memory onto this bus, on this new CXL bus. And this can be inside the computer
box, can also be outside the computer box, connected to
the computers through a switch. So now you can have a memory pool that is shared across multiple
computers. And you can scale this memory pool independent as you scale up the number of cores
in those computers. And you can scale, either you need more memory or more storage, you can do that separately. And you can enable dynamic provisioning of memory
to the computers that needs it to increase utilization. And you can even have multiple
computers accessing the same memory region to share the memory. So all this was not possible
until you can decouple, disaggregate, and liberate the memory away from the computer.
And that's what enables.
Yeah, that's really incredible.
And when you think about the fact that we've been relying on the pizza box rack-based servers
construct for 25 years as the definer for what goes into a data center, being able to decouple
those things really opens up a wealth of creativity in terms of how folks can build as well as compose
infrastructure. I'm really excited about it. I know that there are different versions of CXL.
We have platforms from both AMD and Intel on CXL 1.1 today,
but the CXL Consortium has also delivered CXL 2.0, CXL 3.0.
Can you walk us through what the various versions of this specification are and what does that mean from a standpoint of deploying technology today
with CXL 1.1 and ensuring that that technology
is going to work with future versions?
Sure.
The CXL consortium has been doing a really good job
getting about 250 companies together
to define by now three different versions of the spec.
1.1 was the first version that was published,
I think, back in 2019.
It described the memory
expansion case, where how do you add more memory beyond what DDR delivers. So CXL runs on PCIe Gen
5. And by 1.1 specification, it allows you to plug in more memory on CXL on PCIe Gen 5 that can expand both capacity and bandwidth of a memory that's
inside of a server.
This is a kind of the first step to enable this new protocol.
And as you described, both Intel and AMD in their newest CPU platform start to support
it.
And we have a number of memory controllers that's available now.
And we are expecting the leading memory vendors to ship the 1.1 memory card by the end of this year.
Now 2.0 was specified in 2020, and that added an important feature that's the switching of CXL.
And what this effectively enables is the pooling of memory where multiple servers can
have access to the same pool of memory. This disaggregation will enable, as you described,
the full composability of a data center. So 2.0 is really a key spec version that enables the pooling and memory composability.
In fact, the AMD Genoa platform already support the version 2.0 for Type 3 devices and Type
3 devices are the memory devices.
So they actually spend extra time to support 1.1 and 2.0 at the same time that allows the
initial pooling deployment to happen with their new shipping CPU.
And we also expect other 2.0 product to start up here towards the end of this year and next year as well.
Both the memory devices, memory switches, as well as the compute from the CPU and GPU and other accelerator point of view.
CXL 3.0 is the latest specification and that's built on top of the 2.0.
2.0 was built on top of 1.1.
3.0 was released August of last year, 2022.
And it added a number of important
consumments on top of 2.0,
including allowing the hardware
implementation of cache coherency
for memory sharing.
So with 2.0, the hardware will
not support memory sharing, but that can potentially be enabled through software like ours on 2.0,
but 3.0 would allow hardware to implement that for better performance. It will also allow the
cascading of switches. With 2.0, it only allows a single switch to be connected between the hosts and the memory.
But with 3.0, you could have a switch of switches that increase the scalability of how many
servers can have access to the same shared memory.
It has a number of other features which are also important that allow direct access from
any compute processors to any memory without going through the main CPU and things like that. So 3.0 makes
it a very compelling, complete solution. But we do expect probably there will be two or three
years before the actual product, 3.0 product, hardware product hit the market.
When you look at that landscape and you see so much new infrastructure opportunity in front of
you, what would your advice be to an enterprise? I mean,
I know that all of the cloud service providers are deeply involved in the CXL board and are
very much invested in this, but what would your advice be to an enterprise on how to navigate
their legacy infrastructure with new CXL alternatives and an elegant migration to this new world?
Yeah.
So I think the first product ready for production is going to hit the market towards the end
of this year.
I expect probably 2024, really for early adopters to run in pilot deployment of CXL for big
use cases in their environment.
Because I think one of the things we are working on is to identify the low-hanging fruits,
the applications and use cases where CXL can deliver.
And as the enterprises and as the customers gain confidence and familiarity with this
new amazing technology, we expect larger scale deployment to start in 2025.
And by that time, 2.0 should be a very mature protocol supported by all the mainstream CPUs
as well as memory devices. And I expect 3.0 early implementation will start to appear as well.
So 2025 could be the beginning of a larger scale adoption of the technology.
When you look at a technology like this and the complexity of new composable infrastructure,
I understand that there needs to be some changes in cloud stacks to be able to do that
type of configuration. But what other software changes are needed up the stack for applications to be able to clarify how much memory capacity is required for that application?
And how is the software industry engaging?
So hardware alone will not be able to allow the customer to adopt CXL.
It has to be a combined effort between hardware and software. The first
layer of software support is already in place, and that happened within the operating systems.
The leading operating system, Linux, Windows, Microsoft, VMware, vSphere, ESX, have all built
in support for CXL devices. So that's already put in place. And now it is how do you manage the expansion and pooling and sharing
the control path that's needed? And how do you make it transparent to existing applications
without requiring application change? And this is a level of software that Manverge is engaged in.
We are essentially building a software-defined memory layer that can abstract out the heterogeneity
of memory being made available to the application, including the classic legacy DDR memory.
By the way, this will not be replaced.
CXL memory will be an augmentation to the local DDR memory, but will not replace it.
So essentially we are dealing with a more heterogeneous environment.
There are high bandwidth memory, there are the DDR memory, and then there is a CXL memory.
And how does the application deal with all these in a transparent way is where our software is
here to contribute. And then they also need a software to manage the composability of memory, of dynamic elastic provisioning of memory to the nodes that needs it.
And there will need to be software managed to cache coherency.
Even after 3.0, there is still room for software to build on top of the hardware to enable the cache coherent shared memory access.
And all of these memories need to be created.
And once the memory can exist independently as a memory pool, additional data services
can be implemented. For example, compression to reduce the memory cost and increase the memory
capacity. There could be data protection, data services there to protect the data on those memory. There could be
security data services to guarantee the right access is being enforced and so on and so forth.
So a myriad of services that will be required to be created in this new architecture.
Charles, I've heard some folks raise questions about security concerns with disaggregation.
How has security been looked at and what is MemVerge doing in this space?
Yeah, I think security is an integral part of any technology evolution or revolution. will change some variables and will require a security expert to evaluate to see what's the
right setup to guarantee the integrity of access and guarantee the integrity of data. And at the
same time, I think this new architecture could provide new possibilities of how security could
be enforced. In the first area, there could be concerns about when multiple nodes
can access the same memory, how do you ensure the access control domain cuts across multiple nodes,
and they don't step on each other's toes, and nothing can be seen by the people who are not
supposed to see it. And I think this is a layer of security software that will need to be implemented to
ensure this. And this will be part of our mission to allow these exciting features to be delivered
without compromising security. In the second area is that we are building some interesting
data services like memory snapshot, which could have interesting application in security space as well. Essentially, we are able to capture a running application state at any time.
And so if you do that periodically, and if there is a security breach, or if there's
something that went wrong, you can go back to the snapshot to do some forensics of how
things were before.
So this basically provides additional tools for the security engineers to ensure the security
of their environment.
That's terrific.
One final question for you.
Obviously, MemVerge is delivering solutions in this space.
I'd love for you to comment on what you're delivering and also where folks can find out
more because I'm sure we've piqued folks' interest about your solutions and how you're
engaging and we want to engage with your team.
Yes. So please visit our website, www.manverge.com. We have more information about CXL
and hardware and software and how can you get early access to it. And you can also email us
at info at manverge.com. Well, Charles, thank you so much for being on the program today.
It was a real pleasure and I have been a fan of yours.
I'm so glad that we finally got to meet.
It's my pleasure to be here and thank you for the invitation.
Thanks for joining the Tech Arena.
Subscribe and engage at our website, thetecharena.net.
All content is copyright by The Tech Arena.