In The Arena by TechArena - Optimizing AI Scale-Out: Cornelis Networks’ Vision with Lisa Spelman

Episode Date: September 12, 2024

Lisa Spelman, CEO of Cornelis Networks, discusses the future of AI scale-out, Omni-Path architecture, and how their innovative solutions drive performance, scalability, and interoperability in data ce...nters.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome into the arena. My name is Alison Klein, and today I've got a really exciting guest with me. I'm at the AI Hardware Summit in the Bay Area, and Lisa Spellman, CEO of Cornelis Networks, is here. Welcome, Lisa. Thanks for having me, Alison. It's super
Starting point is 00:00:37 exciting to be here together with you and in the arena with you. What a difference a few months makes. You were on the program a few months ago with a completely different company. You've made a tremendous career move and are now the CEO of Cornelis Networks, steering that ship. I have so many questions to ask you about this, but let's just start with an introduction of Cornelis. Cornelis has never been on the program before. Yeah, thank you again for having me. So I'm excited to be our first guest. So Cornelis Networks is a company that specializes in data center fabric and interconnects. And we are really at the intersection of the growth in the massive scale out system, how to unlock more performance out of AI systems that are being built all over the world and high performance computing.
Starting point is 00:01:24 So really seeing that convergence come together. And we think we have a really unique position, some great durable architectural advantage and the opportunity to scale and just solve this next frontier of system optimization that's ripe for innovation. And when I say that, Dex Frontier, so much has been done at the compute layer and so many really great investments have been made and so many people are solving these huge problems of how to handle these massive training models. And there's opportunity yet still at the network layer to actually improve so much of the performance of those GPUs, of those AI acceleration ASICs, of the whole system. And that's where we come in. We're ready to solve that challenge for our customers. You know this market really well. I've watched you engage in this market for a long time.
Starting point is 00:02:18 You know these players intimately. I feel like watching the cloud providers build out AI clusters is like watching a never-ending season of The Amazing Race as they look for AI dominance and their path to AGI. How much do supply chain constraints impede their progress? And does this open the door for consideration of new technology entrants like Cornelis? Yep. So I just want to start by saying, if we're ever on Amazing Race together, I'm the driver. Okay. So I claim that you can be the navigator. I drive an M2. I'm just saying. I'm a good driver. You have to eat all the bugs. Okay. But now that we've got that cleared up, I do think supply chain constraints have had an impact. But when we look at the arc of what we're doing or the arc of computing and what's happening,
Starting point is 00:03:06 this is all stuff that's going to get solved. And it's not that we're necessarily focused on, again, addressing a specific supply chain constraint. We're addressing a gap in the capability of that really large scale out. And so what Cornelis has that's unique to us and what we can offer to our customers is this ability to not only have competitive bandwidth, which is very important for artificial intelligence, but also getting into extremely low latency, improving the message rates and driving that ability to add GPU after GPU while not impacting the performance. So that scalability differentiation makes a huge impact. So you can take your idle GPUs, you can take your unused compute capacity and put it to work because you can feed it with the data.
Starting point is 00:03:58 Now let's take a step back. Cornelis' product portfolio is based on a technology I know really well, OmniPath. How does this technology differ from other high-speed fabric technologies like InfiniBand? And why do you see this as a winning solution for AI fabrics? Yep. And the OmniPath architecture, we do really believe, offers a durable competitive advantage for our customers. And it gets into some of what I was saying around, it's not just throwing more bandwidth at the problem. And as you get to larger scale, you do start to run into some of those,
Starting point is 00:04:31 again, latency, message rate, those scalability issues. And there's really three architectures that are deployed currently in the market. There's the InfiniBand architecture, there's the Ethernet architecture, and there's the OmniPath architecture, which up until now has been deployed primarily in high-performance computing. Yeah, same bring some of those capabilities that were at first only important in high-performance computing, but actually now are quite important in AI scale out. And, you know, we've seen this market. Yes, the hyperscalers are big consumers of technology. They're going to set the pace for the frontier models. And I see that continuing into the future as we see it for now. But this enterprise AI and kind of public cloud next wave
Starting point is 00:05:33 AI is going to be a huge marker of investment for the industry as well. And there's going to be customers that are driving tremendous innovation and for a variety of reasons may not be offering all of that training or that inference from a cloud customer. So we see the market as really split across multiple segments, but having the technology portfolio to address all of them. And again, I know from my previous experience
Starting point is 00:06:01 how much a small improvement in utilization can make a complete difference to your total cost of ownership. I know how much it can save you when you can pull a watt of power out of a system. So you look at these optimizations that we're offering to these massive scale-out systems, and I think it's a pretty compelling story. The other thing we're doing is that our products are going to support a very interoperable and multi-vendor environment. NVIDIA GPUs have been deployed as a standard. AMD is making tremendous progress with their GPU products. Companies like Google investing in their TPU,
Starting point is 00:06:43 Microsoft investing in their own custom silicon. And we're fully prepared to support any and all of it across the board. So that offers a really nice environment for customers that aren't going to just be one single vendor for all of their AI training, inference, and high-performance computing. You know, I think it's something that we both know well that customers do not like vendor lock-in and they don't want a single source. So this opens up an incredible opportunity. Now, one thing that we've been covering on Tech Arena that I need to ask you about is UltraEthernet. All of the major cloud providers have thrown their hats in to
Starting point is 00:07:18 support UltraEthernet. And it feels like another day of the same story with Ethernet of we are just going to get more out of this technology and we're going to throw more development at it. And we've done this many times in the industry. How does Cornelis embrace Alter and Ethernet? You guys are part of the initiative. And how do you see that working hand in hand with OmniPath? Well, I'm glad you asked, actually, because this is one of the most exciting things about our roadmap. Ethernet is 45 years old.
Starting point is 00:07:45 That's amazing. A technology that has done so much for the world of computing. But it does have, again, some of those what I'll call architectural limitations. And on the whole, we are absolutely supportive of this move towards ultra-Ethernet and are helping shape what that actually looks like and is defined. What it offers us is the ability to pair it with our architecture. So if you think about a training cluster or even an inference cluster, you're going to be able to, within that cluster, get the absolute peak performance, price performance, power savings, GPU utilization, again, latency, message rates, all of that at the absolute max of what the OmniPAP architecture can deliver.
Starting point is 00:08:29 And then we're going to have this ultra-Ethernet compatibility that allows that cluster and that system to appear to the rest of your data center as like-for-like. And so it'll make interoperability so much easier. So you get the best of the performance where you absolutely require it and need it and the price performance and then to the rest of your data center. So as you connect to your CRM or your ERP or whatever it is where the data is coming in and out of, you have that opportunity to look like just another system within the data center. So I think that interoperability that we're pursuing with both Ethernet and UltraEthernet is going to be a game changer for the way that the OmniPath architecture can fit into a large data center. And one of the things that you spoke about on that answer that I think is so important, all of these organizations, whether they're hyperscalers or enterprises, are still running legacy applications and need heterogeneous solutions to drive that. So this makes a lot of sense, gives a lot of extensibility in terms of the technology.
Starting point is 00:09:34 That leads me to my last question. What can we expect from Cornelis now that you've taken the reins on the leadership? That's one that we can unpack over a glass of wine at some point. But what does 2025 look like for the company? Yeah, we're super excited about the future. It's just such an exciting place to be. And it's really cool to be part of this build out and all the innovation that is being unlocked right now.
Starting point is 00:09:58 So for 2025, it's a big year for us. We're bringing out our next generation of products. We're building a world-class engineering and execution team that's excited to take on these challenges of this ultra ethernet definition and integration into the products. And we're really ramping up our go-to-market and sales efforts
Starting point is 00:10:18 because we're getting out in front of customers and making sure everyone knows who we are and what we have coming next. And I think just as a company, we're feeling so much energy and momentum about this place where we sit right now, where we have the opportunity to really help solve our customers' biggest problems. I mean, I said it earlier, and I can't think of a better way to say it. We really feel like we're sitting on the edge of the next optimization frontier in the AI scale out system. And it is fully our intention to be the absolute best option for our customers. long time and a leader that I believed in for a long time coming together. I'm still back on your
Starting point is 00:11:05 comment that I have to eat all the bugs, but thank you so much for being here with us today. We'd love to have you back on the show. Yeah. Thank you, Alison. It's great to be here. Great to be connected. I appreciate your kind words and we can negotiate over maybe there's a certain bug I'll eat. Sounds good. Thanks so much. And thanks for this episode of the Tech Arena. Thanks for joining the Tech Arena. Subscribe and engage at our website, thetecharena.net. All content is copyright by the Tech Arena.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.