Semiconductor Insiders - Podcast EP329: How Marvell is Addressing the Power Problem for Advanced Data Centers with Mark Kuemerle

Starting point is 00:00:07 Hello, my name is Daniel Nenny, founder of semi-wiki, the open forum for semiconductor professionals. Welcome to the Semiconductor Insiders podcast series. The guest today is Mark Kimmerly, Vice President of Technology Custom Cloud Solutions at Marvell. Mark is responsible for defining leading edge ASIC offerings and architect system-level solutions. Before joining Marvell, Mark was a fellow in integrated systems architecture at Global Foundries and has held multiple engineering positions at IBM. He has authored numerous articles on die-to-eye connectivity and multi-chip systems that hold several patents related to low-power technologies and package integration.

Starting point is 00:00:46 Welcome to podcast, Mark. Thank you very much. It's a pleasure to be here. I'm really looking forward to the discussion. Great. So, Mark, let's start out with what brought you to Marvell? Oh, actually, what brought me to Marvell was acquisition of our business in 2019. So as you mentioned, I originally worked at IBM, was really at the beginning of the custom ASIC business, and transitioned into Global Foundries, when Global Foundries acquired the IBM Microelectronics Division.

Starting point is 00:01:20 And at some point, Global Foundries spun off our business as we were really focused on leading-edge technologies in ASICs. And we created a startup called Avera Semi, and we were really, really fortunate to be acquired by Marvell, who, of course, has a huge focus on custom semiconductors. We've been running like crazy ever since. It's been a really great experience being a part of Marvell. Yeah, that's a great story. I spent a lot of my career in the ASIC business as well, so I've followed you guys all the way through from the IBM days.

Starting point is 00:01:56 And Marvell really has a good focus. and a good technology to bring to the ASIC business. So it's pretty amazing. So let's talk about AI accelerators. So they are increasingly power and connectivity constrained, rather than compute constrained. So from your perspective, how does die-to-die technology become a first-order design parameter

Starting point is 00:02:18 for next generation XPUs? Absolutely. So one of the things that we've noticed that's really been evolving over the last couple of years is that when the big hyperscalers would build data centers in the past, they usually had a budget. And that budget was in dollars. And that determined how big of a data center they could build and really how much total equipment would be integrated in a given year. And what we've seen happening over the last two years is that power is really becoming the new money when it comes to building a data center.

Starting point is 00:02:55 they're not really necessarily looking at a capital constraint that's holding them back. The data centers are really looking at how much power they can get delivered to a certain geography so they can build a data center. So really what's becoming the budget that determines how many XPUs you can build and deploy is really the power envelope. So power is becoming an increasing focus, and it's really becoming a focus all the way through the architecture of every XPU. We're also seeing that with this move for not only, you know, this power constraint,

Starting point is 00:03:33 but the ability to continue to increase performance, these devices are getting, you know, much more massive. They're built out of multiple chips put together. And we're seeing more and more connectivity between multiple dye in the system. This connectivity is through things like die-to-die interface. And the more connectivity we have, the better than that. the performance for that XPU scales. If we don't have enough connectivity,

Starting point is 00:03:59 each piece becomes kind of its own memory domain. If we have enough connectivity, we can get everything to behave as if it's one big device. And so die-to-di, we call it the bandwidth between the different devices and is increasing dramatically. And the die-to-die power is actually becoming a fairly significant contributor to the overall power in the device.

Starting point is 00:04:21 So divider-di-die power is really skyrocketing as these new architectures are involving, as a percentage of overall power, and it becomes really, really, really important to optimize the power out of those interfaces so that our customers can succeed and deploy more XPUs than those power constrained data centers.

Starting point is 00:04:40 Oh, yeah, great point. So from what I understand, Marvel's die-to-di-dye interface delivers simultaneous two-way data over a single wire. So why is this bi-directionality such an important shift compared to the traditional approaches? Yeah, absolutely. And this really brings us back to the previous question, which is all about power constraints. We found that with die-to-di interfaces, you can, of course, continue to increase the data rate by putting more, by building more and more capability or more and more equalization into that interface so that it can achieve these higher performances, so that we can get a higher beamwood density, which is the amount of data we can move across the chip edge.

Starting point is 00:05:21 simultaneous by-day has really become a key factor for us in our own IP development to keep the power consumption low. It enables us to get the same kind of performance per wire, so we can essentially double the performance per wire by having the transmit and receive both driving the same wire simultaneously. But it keeps the data rate similar in each direction. So the amount of equalization, the amount of smarts that we have to build into the IP doesn't actually increase, too much and it allows us to actually get significantly more bandwidth with only a slight increment in power consumption and we've been able to retool the IP so that we can even compensate for the small additional power consumption for simultaneous by-dye by making other design optimizations which which pull power consumption out of the IPs. Oh, interesting. So listen,

Starting point is 00:06:17 I'm a big UCIE fan and you know UCIE is gaining industry momentum, yet Marvell is offering an alternative approach. So how should designers think about the trade-offs between open standards and purpose-built die-to-di interfaces? Absolutely. And I'm a big UCI fan as well. You know, I can really speak to both of those aspects. And really, I would say at Marvell, we're really doing it all.

Starting point is 00:06:44 If you look at having standards that you can use, so for example, you can use a chip, from one provider and multiple applications. And at Marbell, we're partnering with Nvidia for NVLink Fusion, where UCIE becomes a great interface choice because you have to think about it as it's one chiplet that might be used by multiple different devices, multiple different customers, and so having an interface that's available by lots of IP providers like UCIE

Starting point is 00:07:15 is a great choice for that. The other interfaces that we're talking about today today or that we spoke about in the last question, are really a way to really optimize the amount of bandwidth you can move, bandwidth density, and optimize power consumption, just beyond what we can achieve with standards-based IPs. The standards just aren't able to evolve quickly enough

Starting point is 00:07:40 to give us the bandwidth density and the power consumption that we need. So when we have systems that are actually moving this incredible amount of data between multiple dye, It becomes a great choice, right? If it's fully contained within that system and we can get a PPA benefit from it, it just makes total sense to use it. And we can use UCAE on other interfaces where we might connect to a standard chiplet. So it's not really one or the other decision for us.

Starting point is 00:08:12 We really look at the pros and cons of each interface and each type of device where we're integrated. together, we make the choices based on what makes the most sense for our customers and what optimizes the design. Yeah, absolutely. So as you mentioned, power is a limiting factor in scaling AI infrastructure. So how does adaptive traffic aware power management in this die-to-die interface change how hypers design for real world bursty, you know, type workloads? That's a great question. The real key is really understanding the workload. The workload itself. So if you think about a typical XPU device, there can be periods of very high activity. So if we think about when these devices are initializing and bringing in a new model or loading

Starting point is 00:09:02 a cache in for a given set of users, then we're very, very busy when we're doing that, right? There's a lot of data moving around between all the different dye, between the die and the memory. there's there's we're just really fully loaded and very active but then there's other periods of time when we're just crunching on computations and there's nothing going on on those i.o and so it makes sense to be able to essentially put the put the IP to sleep when we're in those modes and by understanding the workloads and the transitions in and out of those workloads we can do that intelligently and save a really significant amount of the overall power um as a mentioned, the DITADADIIP in some cases is becoming a really big contributor. So by being able to essentially snooze away while that computation is being done and then wake up when periods of new activity happen, we're able to really get a significant next level optimization in the system. Interesting. So we talked a little bit about chiplets, but as die counts and leak counts expand rapidly, features like redundant lanes and automatic lane repair become critical.

Starting point is 00:10:18 So how do these features impact yield or long-term reliability and total cost of ownership? Yeah, absolutely. And they really do, right? If you think about the amount of interconnect that we might have between any two chips connecting together with the die-to-dye interface, we could have thousands of wires that are actually connected. connecting them and each of those wires has to be, you know, connected through some kind of packaging technology. And there absolutely could be scenarios where there could be opens or, you know, between multiple devices, just a wire doesn't land on a pad or who knows what happens. And by building redundancy into the system, you're able to really overcome a lot of the yield detractors that you can see.

Starting point is 00:11:12 on an assembly for the assembly of the overall device. One of the things, you know, we had talked about standards-based IT versus this kind of bespoke IP optimized to the application. One of the things that you can actually make choices on when you develop a specific IP is actually the amount of redundancy that you include in the system. So if you think about the IP being very flexible, we have the ability to kind of dial in our redundancy based on what we actually need. So when we look at a product, we'll actually do a detailed analysis of how many wires we're going to have

Starting point is 00:11:49 and what the likelihood of a fault in wire is and actually dial in the amount of redundancy based on that, which keeps us from wasting wires. And of course, wasting wires, waste span with, and gives us flexibility versus kind of, you know, going with a one-size-fits-all approach where we might have a couple of redundant lanes per 64 or something like. that. We really have the ability to dial it in and optimize based on intelligence models.

Starting point is 00:12:18 Right. Ah, interesting. So, Mark, final question. What's next? How do you see die-to-die technology evolving over the next, you know, few years? Sure, sure. So it's already evolving. We're already seeing significant changes in the way die-to-die technology is used. We're building more and more different implementations of it to either reduce the area that we take up and a customer's die by having a very shallow IP or, you know, for example, die-to-di-di in 3-applications, which is a total rethinking of how those systems tend to work. So, you know, we're seeing all of these changes, all of these new topographies, new package technologies. And one of the things we're really excited about and looking forward to is really how we can start using die-to-di-di to integrate optics seamlessly, you know, even doing things like disaggregating memory.

Starting point is 00:13:24 Yeah. You know, I've worked with Marvell many years. I'm here in Silicon Valley, and I've been in this industry for 40 years. And Marvell is just an amazing company. So it's great to meet you, Mark, and thank you for your time. Wonderful to meet with you, too. It's a pleasure to speak with you today. That concludes our podcast.

Starting point is 00:13:42 Thank you all for listening and have a great day.

Semiconductor Insiders - Podcast EP329: How Marvell is Addressing the Power Problem for Advanced Data Centers with Mark Kuemerle

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.