Semiconductor Insiders - Podcast EP329: How Marvell is Addressing the Power Problem for Advanced Data Centers with Mark Kuemerle
Episode Date: January 30, 2026Daniel is joined by Mark Kuemerle, Vice President of Technology, Custom Cloud Solutions at Marvell. Mark is responsible for defining leading-edge ASIC offerings and architects system-level solutions. ...Before joining Marvell, Mark was a Fellow in Integrated Systems Architecture at GLOBALFOUNDRIES and has held multiple engineering… Read More
Transcript
Discussion (0)
Hello, my name is Daniel Nenny, founder of semi-wiki, the open forum for semiconductor professionals.
Welcome to the Semiconductor Insiders podcast series.
The guest today is Mark Kimmerly, Vice President of Technology Custom Cloud Solutions at Marvell.
Mark is responsible for defining leading edge ASIC offerings and architect system-level solutions.
Before joining Marvell, Mark was a fellow in integrated systems architecture at Global Foundries
and has held multiple engineering positions at IBM.
He has authored numerous articles on die-to-eye connectivity and multi-chip systems
that hold several patents related to low-power technologies and package integration.
Welcome to podcast, Mark.
Thank you very much.
It's a pleasure to be here.
I'm really looking forward to the discussion.
Great.
So, Mark, let's start out with what brought you to Marvell?
Oh, actually, what brought me to Marvell was acquisition of our business in 2019.
So as you mentioned, I originally worked at IBM, was really at the beginning of the custom ASIC business, and transitioned into Global Foundries, when Global Foundries acquired the IBM Microelectronics Division.
And at some point, Global Foundries spun off our business as we were really focused on leading-edge technologies in ASICs.
And we created a startup called Avera Semi, and we were really, really fortunate to be acquired by Marvell,
who, of course, has a huge focus on custom semiconductors.
We've been running like crazy ever since.
It's been a really great experience being a part of Marvell.
Yeah, that's a great story.
I spent a lot of my career in the ASIC business as well, so I've followed you guys all the way
through from the IBM days.
And Marvell really has a good focus.
and a good technology to bring to the ASIC business.
So it's pretty amazing.
So let's talk about AI accelerators.
So they are increasingly power and connectivity constrained,
rather than compute constrained.
So from your perspective,
how does die-to-die technology become a first-order design parameter
for next generation XPUs?
Absolutely.
So one of the things that we've noticed
that's really been evolving over the last couple of years
is that when the big hyperscalers would build data centers in the past, they usually had a budget.
And that budget was in dollars.
And that determined how big of a data center they could build and really how much total equipment would be integrated in a given year.
And what we've seen happening over the last two years is that power is really becoming the new money when it comes to building a data center.
they're not really necessarily looking at a capital constraint that's holding them back.
The data centers are really looking at how much power they can get delivered to a certain geography
so they can build a data center.
So really what's becoming the budget that determines how many XPUs you can build and deploy
is really the power envelope.
So power is becoming an increasing focus, and it's really becoming a focus all the way through
the architecture of every XPU.
We're also seeing that with this move for not only, you know, this power constraint,
but the ability to continue to increase performance, these devices are getting, you know,
much more massive.
They're built out of multiple chips put together.
And we're seeing more and more connectivity between multiple dye in the system.
This connectivity is through things like die-to-die interface.
And the more connectivity we have, the better than that.
the performance for that XPU scales.
If we don't have enough connectivity,
each piece becomes kind of its own memory domain.
If we have enough connectivity, we can get everything
to behave as if it's one big device.
And so die-to-di, we call it the bandwidth
between the different devices and is increasing dramatically.
And the die-to-die power is actually becoming
a fairly significant contributor to the overall power
in the device.
So divider-di-die power is really skyrocketing
as these new architectures are involving,
as a percentage of overall power,
and it becomes really, really, really important
to optimize the power out of those interfaces
so that our customers can succeed
and deploy more XPUs
than those power constrained data centers.
Oh, yeah, great point.
So from what I understand,
Marvel's die-to-di-dye interface delivers simultaneous two-way data
over a single wire.
So why is this bi-directionality
such an important shift compared to the traditional approaches?
Yeah, absolutely. And this really brings us back to the previous question, which is all about power constraints.
We found that with die-to-di interfaces, you can, of course, continue to increase the data rate by putting more, by building more and more capability or more and more equalization into that interface so that it can achieve these higher performances, so that we can get a higher beamwood density, which is the amount of data we can move across the chip edge.
simultaneous by-day has really become a key factor for us in our own IP development to keep the power consumption low.
It enables us to get the same kind of performance per wire, so we can essentially double the performance per wire by having the transmit and receive both driving the same wire simultaneously.
But it keeps the data rate similar in each direction.
So the amount of equalization, the amount of smarts that we have to build into the IP doesn't actually increase,
too much and it allows us to actually get significantly more bandwidth with only a slight
increment in power consumption and we've been able to retool the IP so that we can even compensate
for the small additional power consumption for simultaneous by-dye by making other design
optimizations which which pull power consumption out of the IPs. Oh, interesting. So listen,
I'm a big UCIE fan and you know UCIE is gaining
industry momentum, yet Marvell is offering an alternative approach.
So how should designers think about the trade-offs between open standards and purpose-built
die-to-di interfaces?
Absolutely.
And I'm a big UCI fan as well.
You know, I can really speak to both of those aspects.
And really, I would say at Marvell, we're really doing it all.
If you look at having standards that you can use, so for example, you can use a chip,
from one provider and multiple applications.
And at Marbell, we're partnering with
Nvidia for NVLink Fusion, where UCIE becomes a great interface choice
because you have to think about it as it's one chiplet
that might be used by multiple different devices,
multiple different customers, and so having an interface
that's available by lots of IP providers like UCIE
is a great choice for that.
The other interfaces that we're talking about today
today or that we spoke about in the last question,
are really a way to really optimize
the amount of bandwidth you can move, bandwidth density,
and optimize power consumption,
just beyond what we can achieve with standards-based IPs.
The standards just aren't able to evolve quickly enough
to give us the bandwidth density and the power consumption
that we need.
So when we have systems that are actually moving
this incredible amount of data between multiple dye,
It becomes a great choice, right?
If it's fully contained within that system and we can get a PPA benefit from it, it just makes total sense to use it.
And we can use UCAE on other interfaces where we might connect to a standard chiplet.
So it's not really one or the other decision for us.
We really look at the pros and cons of each interface and each type of device where we're integrated.
together, we make the choices based on what makes the most sense for our customers and what
optimizes the design. Yeah, absolutely. So as you mentioned, power is a limiting factor in
scaling AI infrastructure. So how does adaptive traffic aware power management in this die-to-die
interface change how hypers design for real world bursty, you know, type workloads?
That's a great question. The real key is really understanding the workload. The workload
itself. So if you think about a typical XPU device, there can be periods of very high activity.
So if we think about when these devices are initializing and bringing in a new model or loading
a cache in for a given set of users, then we're very, very busy when we're doing that, right?
There's a lot of data moving around between all the different dye, between the die and the memory.
there's there's we're just really fully loaded and very active but then there's other periods of time when we're just crunching on computations and there's nothing going on on those i.o and so it makes sense to be able to essentially put the put the IP to sleep when we're in those modes and by understanding the workloads and the transitions in and out of those workloads we can do that intelligently and save a really significant amount of the overall power um as a
mentioned, the DITADADIIP in some cases is becoming a really big contributor. So by being able
to essentially snooze away while that computation is being done and then wake up when periods
of new activity happen, we're able to really get a significant next level optimization in the system.
Interesting. So we talked a little bit about chiplets, but as die counts and leak counts expand rapidly,
features like redundant lanes and automatic lane repair become critical.
So how do these features impact yield or long-term reliability and total cost of ownership?
Yeah, absolutely.
And they really do, right?
If you think about the amount of interconnect that we might have between any two chips connecting together with the die-to-dye interface,
we could have thousands of wires that are actually connected.
connecting them and each of those wires has to be, you know, connected through some kind of packaging technology.
And there absolutely could be scenarios where there could be opens or, you know, between multiple devices, just a wire doesn't land on a pad or who knows what happens.
And by building redundancy into the system, you're able to really overcome a lot of the yield detractors that you can see.
on an assembly for the assembly of the overall device.
One of the things, you know, we had talked about standards-based IT versus this kind of bespoke IP
optimized to the application. One of the things that you can actually make choices on
when you develop a specific IP is actually the amount of redundancy that you include in the system.
So if you think about the IP being very flexible, we have the ability to kind of dial in our redundancy
based on what we actually need.
So when we look at a product,
we'll actually do a detailed analysis of how many wires we're going to have
and what the likelihood of a fault in wire is
and actually dial in the amount of redundancy based on that,
which keeps us from wasting wires.
And of course, wasting wires, waste span with,
and gives us flexibility versus kind of, you know,
going with a one-size-fits-all approach where we might have
a couple of redundant lanes per 64 or something like.
that. We really have the ability to dial it in and optimize based on intelligence models.
Right. Ah, interesting. So, Mark, final question. What's next? How do you see die-to-die
technology evolving over the next, you know, few years? Sure, sure. So it's already evolving. We're
already seeing significant changes in the way die-to-die technology is used. We're building more
and more different implementations of it to either reduce the area that we take up and a customer's
die by having a very shallow IP or, you know, for example, die-to-di-di in 3-applications,
which is a total rethinking of how those systems tend to work. So, you know, we're seeing all of these
changes, all of these new topographies, new package technologies.
And one of the things we're really excited about and looking forward to is really how we can start using die-to-di-di to integrate optics seamlessly, you know, even doing things like disaggregating memory.
Yeah.
You know, I've worked with Marvell many years.
I'm here in Silicon Valley, and I've been in this industry for 40 years.
And Marvell is just an amazing company.
So it's great to meet you, Mark, and thank you for your time.
Wonderful to meet with you, too.
It's a pleasure to speak with you today.
That concludes our podcast.
Thank you all for listening and have a great day.
