In The Arena by TechArena - CXL Enabled Memory Innovation with Memverge

Starting point is 00:00:00 Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now let's step into the arena. Welcome to the Tech Arena. My name is Alison Klein, and today I'm delighted to be joined by Charles Phan, founder and CEO of Memverge. Welcome to the program, Charles. It's wonderful to have you on the show. Doing great. Thank you for having me here, Alison. Why don't we just start with an introduction of MemVerge? I'm very familiar with you. Let's just ground on the company and why you decided to found MemVerge. Yeah. So MemVerge was a company that was founded in 2017. And we started the company because we

Starting point is 00:01:00 think the world needs more memory. And as it needs more memory because it needs more data, that it needs more memory software as well. So we started a company to develop software specifically for memory to enable the data-intensive applications. Now, you've been delivering disruptive memory technologies since the inception of MemVerge. Why is this so critical for customers and why does the world need more memory right now?

Starting point is 00:01:29 Yeah, because the modern application tends to be data intensive. They are processing more and more data faster and faster. And a good example would be AI machine learning applications that we are hearing more and more about, like ChatGPT and so on. And in order to train the AI model, they have to process billions and trillions of data samples and dealing with large models with billions or even trillions of parameters. And all these

Starting point is 00:02:02 require very low latency store for these data. Classically, storage system were built as a storage of data, but they are too slow now. The world needs a more memory-centric architecture where all these data can be stored and can be processed in real time. When you take a look at the solutions that you've been delivering in market, and you mentioned ChatGPT, different AI and ML models, what trends do you see across the types of applications and the types of industries that are looking at this?

Starting point is 00:02:40 Or is the trend horizontal and it's pretty much everyone? Yeah, I think it is affecting more and more different industries and verticals. Over the last six years, we had a pleasure working with a number of customers across different verticals. So I'll give some examples. We've been working with customers in the financial and they have trading platforms that are dealing large amount of data and they need to perform real-time analytics at very low latency. Now, a more memory-centric architecture, both for the databases and for the applications, become critical for them. And so we've been working with a few customers in those space to applying our big memory technology to enable their trading databases and applications.

Starting point is 00:03:30 Another example would be scientific computing in particular, in life sciences, in the genomics, bioinformatics. We are looking more and more into the gene sequences, both of people, as well as of animals and plants. And there are more and more sophisticated algorithms to run on them. And that requires more memory-centric infrastructure as well. And we have been delivering solutions to help them accelerate time to results and also lower their costs as they are moving their workload to the cloud. So those are just some examples of how a big memory technology can be used by different industries to accelerate their

Starting point is 00:04:11 applications. Now, Charles, I know that you started in 2017. And at that point, you were looking at some esoteric paths to expand memory for application requirements. Yeah. Then CXL entered the picture. Can you describe when you first saw the CXL spec and realized what was going to be happening from an industry standard perspective, what did your mind do when you saw that? And what did that represent to you in terms of an opportunity for Membered?

Starting point is 00:04:41 I think that was a real revelation. You see the CXL spec on the market and actually embraced by the industry leaders. And that we are going to see the hardware CXL hardware hitting the market for production by the end of this year. So we think it's going to be a revolutionary change and the most significant change that CXL represent is a, I call it a liberation of memory from a computer. And let me explain why I call it a liberation. If you go back 50 years, everything is within a computer box, memory, storage, networking, compute, of course, and software. So you have

Starting point is 00:05:22 basically a computer or a server as your single unit of compute. Over the last 50 years, more and more different resources become liberated or disaggregated from compute. Starting from storage, that storage starts to go out of the computer box, become its own being in the form of a storage array, storage servers, file servers that essentially are dedicated to manage storage and that can scale independently and manage independently and protect it independently as a compute. And networking as well become disaggregated and both storage and networking can be software defined. And even GPUs can be decoupled from CPUs in a pretty disaggregated way. And memory is the last resource that's still very tightly coupled to a CPU until CXL. So what CXL

Starting point is 00:06:16 does is it can put memory onto this bus, on this new CXL bus. And this can be inside the computer box, can also be outside the computer box, connected to the computers through a switch. So now you can have a memory pool that is shared across multiple computers. And you can scale this memory pool independent as you scale up the number of cores in those computers. And you can scale, either you need more memory or more storage, you can do that separately. And you can enable dynamic provisioning of memory to the computers that needs it to increase utilization. And you can even have multiple computers accessing the same memory region to share the memory. So all this was not possible until you can decouple, disaggregate, and liberate the memory away from the computer.

Starting point is 00:07:07 And that's what enables. Yeah, that's really incredible. And when you think about the fact that we've been relying on the pizza box rack-based servers construct for 25 years as the definer for what goes into a data center, being able to decouple those things really opens up a wealth of creativity in terms of how folks can build as well as compose infrastructure. I'm really excited about it. I know that there are different versions of CXL. We have platforms from both AMD and Intel on CXL 1.1 today, but the CXL Consortium has also delivered CXL 2.0, CXL 3.0.

Starting point is 00:07:57 Can you walk us through what the various versions of this specification are and what does that mean from a standpoint of deploying technology today with CXL 1.1 and ensuring that that technology is going to work with future versions? Sure. The CXL consortium has been doing a really good job getting about 250 companies together to define by now three different versions of the spec. 1.1 was the first version that was published,

Starting point is 00:08:21 I think, back in 2019. It described the memory expansion case, where how do you add more memory beyond what DDR delivers. So CXL runs on PCIe Gen 5. And by 1.1 specification, it allows you to plug in more memory on CXL on PCIe Gen 5 that can expand both capacity and bandwidth of a memory that's inside of a server. This is a kind of the first step to enable this new protocol. And as you described, both Intel and AMD in their newest CPU platform start to support it.

Starting point is 00:09:01 And we have a number of memory controllers that's available now. And we are expecting the leading memory vendors to ship the 1.1 memory card by the end of this year. Now 2.0 was specified in 2020, and that added an important feature that's the switching of CXL. And what this effectively enables is the pooling of memory where multiple servers can have access to the same pool of memory. This disaggregation will enable, as you described, the full composability of a data center. So 2.0 is really a key spec version that enables the pooling and memory composability. In fact, the AMD Genoa platform already support the version 2.0 for Type 3 devices and Type 3 devices are the memory devices.

Starting point is 00:09:55 So they actually spend extra time to support 1.1 and 2.0 at the same time that allows the initial pooling deployment to happen with their new shipping CPU. And we also expect other 2.0 product to start up here towards the end of this year and next year as well. Both the memory devices, memory switches, as well as the compute from the CPU and GPU and other accelerator point of view. CXL 3.0 is the latest specification and that's built on top of the 2.0. 2.0 was built on top of 1.1. 3.0 was released August of last year, 2022. And it added a number of important

Starting point is 00:10:36 consumments on top of 2.0, including allowing the hardware implementation of cache coherency for memory sharing. So with 2.0, the hardware will not support memory sharing, but that can potentially be enabled through software like ours on 2.0, but 3.0 would allow hardware to implement that for better performance. It will also allow the cascading of switches. With 2.0, it only allows a single switch to be connected between the hosts and the memory.

Starting point is 00:11:06 But with 3.0, you could have a switch of switches that increase the scalability of how many servers can have access to the same shared memory. It has a number of other features which are also important that allow direct access from any compute processors to any memory without going through the main CPU and things like that. So 3.0 makes it a very compelling, complete solution. But we do expect probably there will be two or three years before the actual product, 3.0 product, hardware product hit the market. When you look at that landscape and you see so much new infrastructure opportunity in front of you, what would your advice be to an enterprise? I mean,

Starting point is 00:11:46 I know that all of the cloud service providers are deeply involved in the CXL board and are very much invested in this, but what would your advice be to an enterprise on how to navigate their legacy infrastructure with new CXL alternatives and an elegant migration to this new world? Yeah. So I think the first product ready for production is going to hit the market towards the end of this year. I expect probably 2024, really for early adopters to run in pilot deployment of CXL for big use cases in their environment.

Starting point is 00:12:25 Because I think one of the things we are working on is to identify the low-hanging fruits, the applications and use cases where CXL can deliver. And as the enterprises and as the customers gain confidence and familiarity with this new amazing technology, we expect larger scale deployment to start in 2025. And by that time, 2.0 should be a very mature protocol supported by all the mainstream CPUs as well as memory devices. And I expect 3.0 early implementation will start to appear as well. So 2025 could be the beginning of a larger scale adoption of the technology. When you look at a technology like this and the complexity of new composable infrastructure,

Starting point is 00:13:14 I understand that there needs to be some changes in cloud stacks to be able to do that type of configuration. But what other software changes are needed up the stack for applications to be able to clarify how much memory capacity is required for that application? And how is the software industry engaging? So hardware alone will not be able to allow the customer to adopt CXL. It has to be a combined effort between hardware and software. The first layer of software support is already in place, and that happened within the operating systems. The leading operating system, Linux, Windows, Microsoft, VMware, vSphere, ESX, have all built in support for CXL devices. So that's already put in place. And now it is how do you manage the expansion and pooling and sharing

Starting point is 00:14:06 the control path that's needed? And how do you make it transparent to existing applications without requiring application change? And this is a level of software that Manverge is engaged in. We are essentially building a software-defined memory layer that can abstract out the heterogeneity of memory being made available to the application, including the classic legacy DDR memory. By the way, this will not be replaced. CXL memory will be an augmentation to the local DDR memory, but will not replace it. So essentially we are dealing with a more heterogeneous environment. There are high bandwidth memory, there are the DDR memory, and then there is a CXL memory.

Starting point is 00:14:52 And how does the application deal with all these in a transparent way is where our software is here to contribute. And then they also need a software to manage the composability of memory, of dynamic elastic provisioning of memory to the nodes that needs it. And there will need to be software managed to cache coherency. Even after 3.0, there is still room for software to build on top of the hardware to enable the cache coherent shared memory access. And all of these memories need to be created. And once the memory can exist independently as a memory pool, additional data services can be implemented. For example, compression to reduce the memory cost and increase the memory capacity. There could be data protection, data services there to protect the data on those memory. There could be

Starting point is 00:15:45 security data services to guarantee the right access is being enforced and so on and so forth. So a myriad of services that will be required to be created in this new architecture. Charles, I've heard some folks raise questions about security concerns with disaggregation. How has security been looked at and what is MemVerge doing in this space? Yeah, I think security is an integral part of any technology evolution or revolution. will change some variables and will require a security expert to evaluate to see what's the right setup to guarantee the integrity of access and guarantee the integrity of data. And at the same time, I think this new architecture could provide new possibilities of how security could be enforced. In the first area, there could be concerns about when multiple nodes

Starting point is 00:16:46 can access the same memory, how do you ensure the access control domain cuts across multiple nodes, and they don't step on each other's toes, and nothing can be seen by the people who are not supposed to see it. And I think this is a layer of security software that will need to be implemented to ensure this. And this will be part of our mission to allow these exciting features to be delivered without compromising security. In the second area is that we are building some interesting data services like memory snapshot, which could have interesting application in security space as well. Essentially, we are able to capture a running application state at any time. And so if you do that periodically, and if there is a security breach, or if there's something that went wrong, you can go back to the snapshot to do some forensics of how

Starting point is 00:17:38 things were before. So this basically provides additional tools for the security engineers to ensure the security of their environment. That's terrific. One final question for you. Obviously, MemVerge is delivering solutions in this space. I'd love for you to comment on what you're delivering and also where folks can find out more because I'm sure we've piqued folks' interest about your solutions and how you're

Starting point is 00:18:02 engaging and we want to engage with your team. Yes. So please visit our website, www.manverge.com. We have more information about CXL and hardware and software and how can you get early access to it. And you can also email us at info at manverge.com. Well, Charles, thank you so much for being on the program today. It was a real pleasure and I have been a fan of yours. I'm so glad that we finally got to meet. It's my pleasure to be here and thank you for the invitation. Thanks for joining the Tech Arena.

Starting point is 00:18:39 Subscribe and engage at our website, thetecharena.net. All content is copyright by The Tech Arena.

In The Arena by TechArena - CXL Enabled Memory Innovation with Memverge

TechArena host Allyson Klein chats with Memverge founder and CEO Charles Fan about his company’s disruptive vision for breaking through data center memory limitations and what the CXL standard will ...bring to infrastructure innovation.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.