In The Arena by TechArena - UCIe Unleashing Chiplet Innovation with NVIDIA
Episode Date: January 24, 2023TechArena host Allyson Klein chats with NVIDIA Data Center Product Architect and Universal Chiplet Interconnect Express organization board member Durgesh Srivastava about the new UCIe specification an...d how it will reshape the foundations of compute architectures.
Transcript
Discussion (0)
Alison Klein, Welcome to the tech arena. My name is Alison Klein and today I'm delighted to be joined by Dagesh Srivastava, Senior Director and Data Center Architect at NVIDIA. Dagesh, welcome to the program.
Dagesh Srivastava, Senior Director and Data Center Architect at NVIDIA.
Thank you, Alison. So glad to be here. So, Dagesh, you work on a lot of exciting technologies for NVIDIA, but why don't we just start with an introduction of your role and how it relates to our topic today, UCIe.
Great. So, I'm Senior Director in Hardware Engineering at NVIDIA.
I'm driving the product architecture for several server products.
So, my focus has been to look at the silicon and systems to address the
growing need for the compute, and we'll touch on it a little bit more. And I represent NVIDIA
and UCI consortium as a board member, and that's how you and I connected. Prior to joining NVIDIA
a year and a half ago, I was at Intel for 24 years, a little over 24 years. I worked on various projects starting from Mydanium and Xeon,
and I worked on chipsets, Atom SoC, and client products. And I also spent some time on designing
the autonomous driving solution for data centers. Specifically last couple of years at Intel,
I was very active in server memory pooling, and we'll touch a little bit, and
disaggregation to solve the issues related to memory over-pollutioning.
So overall, to summarize, I come from systems and silicon background, and that's what I
am passionate about and trying to solve the problems and continue to do that at NBIA.
That's fantastic.
You know, a lot has been written about the slowing of Moore's law
and the industry looking for innovative ways to continue to scale performance.
We've seen things like multi-core processors and heterogeneous solutions come to the table,
but it seems like we need more. What methods are being considered and how does UCIE enter that picture?
That's a very, very good question, Alison.
And it's in like everyone's mind in the industry
and trying to figure out what do we do?
So like as a background, as you're saying,
computer requirements are growing at exponential rate
due to AI deep learning usages.
And we have seen the chat GPT these days,
and which is amazing and how it has been trained and developed.
So overall, just to give you the data,
the size of data and AI market is
on track to break the 500 billion mark by 2024.
So as you can see, it's pretty big.
And current computing solutions which we have are not really
catching up at the same rate.
So the requirements are going much faster.
Computing solutions are growing, but not that fast.
Like as an example, memory in terms of the gigabyte per dollar is saturating.
It's not getting cheaper.
CPU performance goes as the regular cadence, like 10, 15, 20%, power efficiency is not improving.
So overall we have a big gap in requirement and solutions.
So what needs to be done even before we go into UCI and first thing is we have
to redefine the way we do Silicon, the way we do compute, we do infrastructure
and systems for data center.
So there'll be a lot of customization instead of using general purpose,
just to address this, get more compute or getting more power efficient.
And this will help the address the slowdown on Moore's law.
So what I believe is that we must move toward this segregation and accelerators
to address the gap and accelerator will help with that prediction in the specific
usages, or we must build servers
and racks and data centers structure a little bit differently what we have been doing and which we
have been doing for multiple decades and last but not least we also have to make sure we are
designing for sustainability because the power consumption keeps going on high more and more and
dissipation and how do we keep a balance on the requirements
but we keep a greener planet as well. So what we thought is chiplets is one way of providing
a solution to this increasing compute requirements which is break into smaller pieces and try to
address it. We can talk more about it in the sense, but the usages, like you can see video
transcoding usages are already there.
Compression usage are there.
So if these chiplets exist, people can plug and play or use it and define and
address the compute requirements as well as try to solve that slowness of Moore's law.
So let's take a step back.
It was five
years ago that DARPA started its chiplet group and the industry started to get serious about chiplets.
Let's just get a definition of what are chiplets and what does the commercial delivery look like
for chiplet-based architectures? Yeah, that's again an important point to clarify and define.
Even within companies, there are different technologies.
Sometimes people use chiplets and dilates and microchips, I've also heard from some
people.
So yeah, chiplets, architecture really involves a small and modular chips, which we call as
chiplets.
And they are like smaller blocks. You put them together and
create larger and more complex ecosystem. So the goal really is to provide the flexibility in
design and manufacturing process. What it means is that you can have one chiplet that is small
building block, could be on a different process technology or even coming from a
different company or organization. And you can plug it with another one, which is coming
from a totally different environment.
So this reduced the cost as well, because you can use the same
chiplet for different product and plug and play really.
So putting it another way, chiplets are one way of providing more compute
with specific application and flexibility, which you are seeing and more and more as you generate more product and product combinations and the usages are going so
as i was touching previously let me elaborate on that a little bit like say examples are video
transcoding compression encryption memory tracking or tracing all these things are needed so some
applications required some don't so if we come up with a chiplet solutions for these kind of accelerators, we can combine it, use it where they are needed, and not use it where they're not needed.
So you can see that we have accelerated.
We are trying to provide the solution for exceeding compute requirements, as well as we don't have the silicon sitting, which is not being utilized.
So that it helps in all sorts of ways. And the other thing which you ask is,
what do you think of the whole ecosystem? So the chiplet ecosystem is going to grow,
and I'm already seeing that there's a lot of interest and a lot of things which are happening.
So the way it will evolve is there will be chiplet-based
accelerators, as I mentioned, but there will be companies providing the packaging solution,
like 2D, 2.5D, 3D. And then there will be companies which will be integrating for some
of these companies which need these solutions. So chiplet make that multi-vendor implementation
possible. I want to see this Lego plug and play.
You want something, you go in a portfolio, you look at things and churn out a chip,
which can be deployed very quick and come out of the solution very fast in the industry.
Now, that Lego plug and play is something that made me really excited to see the announcement
of the UCIE consortium.
You're one of the lead architects on it.
It's a new standard interconnect for chiplets.
Why is this specification so important for the industry and that vision of Lego plug
and play?
I'm very excited about UCIE.
I'm really glad that we are doing it for industry and bringing a whole ecosystem together.
So what is UCIE? UCIE provides a complete die-to-die interconnect, or like what we're talking about, the chiplets in this sense.
And it is not just focused on specific layers. And generally, these protocols are layered protocols so it has the physical interconnect which are the electrical it goes to the protocol stack that how it talks to the back end of a specific
chip it has a software model the whole manageability and compliance testing so the way
we are doing it is it provides this universal chiplet express interface and providing all the complete solution. So now, as you can see,
independent companies can come and develop these solutions and they can talk to each other
on a package itself. So that's the interest driven that now make it more industry standard
so that people can do the innovation and develop it independently
and we have leveraged as you have seen in the announcement pci express compute express link
as the backbone of it but they're not just tied to it there are other like whole arm ecosystem also
is very excited about it because you can plug it in a streaming protocol which is part of the uci
express so it's like really in in the end is a multi-vendor ecosystem
for system on a chip, which is like a full motherboard coming
on the chip.
And customization, it is going to be extremely useful for this.
The main thing is the open standard,
bringing everyone together.
So we have like 100 plus now members.
And just bringing the
goodness of various experts, various people, that's going to go a long way. The last thing I will just
add is that we also have eyes on the future. So we not just define the problems of today,
that how we will scale in the future. So we have like modular design within the chiplet
architecture and if you have high bandwidth you can have more modules and so on and so forth.
So we kept that in mind that how we will keep scaling as the compute requirements kept going
up and up. So you've created this vision of LEGO plug and play, choose your best of breed
chiplets for your solution and utilize UCIIE to connect them. But what we've seen
from the industry thus far is more use of proprietary interconnect for chiplet designs
and chiplets provided by a single vendor. And there's also been a lot of industry attention
on CXL, the industry standard that offers chip-to-chip interconnects on a motherboard. How do we get to the future vision from where we are today?
And is there a role for all of these solutions as we move forward?
How do they fit between multi-chip and chiplet architectures?
Excellent point and excellent question.
Just the basis of UCI is that how do we standardize rather than going the every company has their
own and I can call it loosely proprietary.
In some sense it is because companies have to address their requirements.
So that was your first.
So let me let me deep dive what you asked the first question is extremely important.
So all the big silicon providers like us, like Nvidia, we have our own chip to chip.
The reason is because as we talk Moore's law is slowing down and we have computing requirements,
we have come up with our interconnectivity, which is very tightly coupled meets our requirements.
And same thing, other big silicon vendors have the solutions
which are providing the way to address the problems that's the main reason to come up with
this industry standard so that we don't diverge too much in terms of the chiplet in terms of the
packaging technology in terms of the substrate so that we can come up all together and put our minds together and we
have a standard. So I do still believe that in some usages, the silicon providers like us will
continue in certain usages having a tightly coupled outpropriety just for that specific
usages, which is internal, inward looking, but anything which is going outward which is where we can have accelerators or having another chip talking to a third party provider that will definitely go through ucie
and ucie is the best solution to address that and the second part of the question was that you were
talking about how this whole thing ties in pcie cx, and then ARM has its own standard, which is a pretty big ecosystem.
So the goal is UCIe is providing the interconnect between the chiplets.
What it means is that IP you have or whatever internal protocol, whether it's ARM-based,
CXL, PCIe, you're building on top of it.
So you're trying to complement it. So that's the beauty of UCIe that UCIe as a
interconnect between chiplets, it will bring all of these protocols, which are running in the
background on the chip into one all together so that they can talk to each other, they can
interact with each other and does not have to be exactly the same. And that's the beauty of the open ecosystem, leveraging the
industry and putting it together.
And as you can see, as you see in industry, it's not just the silicon provider.
There are packaging companies, there are TSMC is a board member.
So all those things, Samsung is there.
So all those things, all putting it together as a complete solution helps
to build that ecosystem, not having just the proprietary solutions,
not just going with the one specific solution,
leverage CXL, PCIe,
or any of the ecosystem protocols which exist.
It's an exciting future,
and you can see how it really puts the power
in the hands of the customer
to dial in exactly what they need.
Now, I have to ask the question,
NVIDIA obviously has a great silicon portfolio
and you're investing in this space
because it's part of your strategy.
What are NVIDIA's plans for integration of UCIe
into your lineup?
Yeah, we are very excited.
And that's why, as you can see, we are the promoter
and board members. And I'm really proud to represent NVIDIA in the board. And as we mentioned
so far, chip-led interconnects are more important than ever. And we are seeing that to ensure big
workloads. And we can scale. We are leaders in the GPU market for training and AI workloads.
And we definitely have seen the heterogeneous compute engines can execute at their full potential and homogeneity is no longer helping or helping as a scale.
So as workloads continue to grow, we need to move past some of the today's platform level limitation and enable innovation and system integration.
So we are going to fully support UCI across all our product range as we go further. And
we are also helping in promoting UCI adoption with our partners. So wherever there's a need
and where there is a chip-led concept with industry and external interaction,
we are fully supportive of it. We are very excited about it.
And we have our NVLink C2C, which is chip-to-chip, does something similar, but any inward-looking or
very tightly coupled applications from customers, which are a lot of bandwidth and working with us,
so we will have that. But this whole chip-led ecosystem and UCIE is extremely important for us, and
we are really driving some of the innovation in that area.
I think that you've laid out so much interesting information for the audience that folks are
going to want to continue to engage and ask questions.
Where can folks go to find out more information about NVIDIA's technology
in this space and engage with your team further?
And where can they find out more information about UCIe?
Yes.
So let me start with UCIe.
So best places to find the information about is our website, UCIe Express Consortium website.
I really encourage people, if you're not members, please become members,
contributing members, join the technical work groups. We have five of those technical work
groups talking from protocol, software, all the way down to electrical and compatibility
and help to bring the ecosystem and address the slowdown on Moore's law and where the compute requirements are going. Regarding Edwinia, where we do the best is our GTC, which happens twice a year.
And we bring our new information related products and our mindset and where we are going with it.
And of course, regarding me, I can be contacted on LinkedIn if there is a specific question,
interest or anything you want to know. But I do encourage everyone to join the UCI consortium and help us grow this ecosystem and
make it a successful and to address the compute requirements.
Dagesh, I was so excited to do this episode. I named UCIE one of the most important technologies to follow in 2023 along the lines of chat
GPTs.
So maybe I'm a geek, but maybe I know the importance of the continuation of delivering
semiconductor performance.
What you and the UCIE team are doing is really exciting.
I can't wait to see more.
And thank you so much for being on the
show today. Thank you, Alison, and I appreciate your time and thank you for inviting me and
giving an opportunity to talk about where we are going and what we want to do.