In The Arena by TechArena - UCIe Unleashing Chiplet Innovation with NVIDIA

Episode Date: January 24, 2023

TechArena host Allyson Klein chats with NVIDIA Data Center Product Architect and Universal Chiplet Interconnect Express organization board member Durgesh Srivastava about the new UCIe specification an...d how it will reshape the foundations of compute architectures.

Transcript
Discussion (0)
Starting point is 00:00:00 Alison Klein, Welcome to the tech arena. My name is Alison Klein and today I'm delighted to be joined by Dagesh Srivastava, Senior Director and Data Center Architect at NVIDIA. Dagesh, welcome to the program. Dagesh Srivastava, Senior Director and Data Center Architect at NVIDIA. Thank you, Alison. So glad to be here. So, Dagesh, you work on a lot of exciting technologies for NVIDIA, but why don't we just start with an introduction of your role and how it relates to our topic today, UCIe. Great. So, I'm Senior Director in Hardware Engineering at NVIDIA. I'm driving the product architecture for several server products. So, my focus has been to look at the silicon and systems to address the growing need for the compute, and we'll touch on it a little bit more. And I represent NVIDIA and UCI consortium as a board member, and that's how you and I connected. Prior to joining NVIDIA
Starting point is 00:00:56 a year and a half ago, I was at Intel for 24 years, a little over 24 years. I worked on various projects starting from Mydanium and Xeon, and I worked on chipsets, Atom SoC, and client products. And I also spent some time on designing the autonomous driving solution for data centers. Specifically last couple of years at Intel, I was very active in server memory pooling, and we'll touch a little bit, and disaggregation to solve the issues related to memory over-pollutioning. So overall, to summarize, I come from systems and silicon background, and that's what I am passionate about and trying to solve the problems and continue to do that at NBIA. That's fantastic.
Starting point is 00:01:42 You know, a lot has been written about the slowing of Moore's law and the industry looking for innovative ways to continue to scale performance. We've seen things like multi-core processors and heterogeneous solutions come to the table, but it seems like we need more. What methods are being considered and how does UCIE enter that picture? That's a very, very good question, Alison. And it's in like everyone's mind in the industry and trying to figure out what do we do? So like as a background, as you're saying,
Starting point is 00:02:19 computer requirements are growing at exponential rate due to AI deep learning usages. And we have seen the chat GPT these days, and which is amazing and how it has been trained and developed. So overall, just to give you the data, the size of data and AI market is on track to break the 500 billion mark by 2024. So as you can see, it's pretty big.
Starting point is 00:02:40 And current computing solutions which we have are not really catching up at the same rate. So the requirements are going much faster. Computing solutions are growing, but not that fast. Like as an example, memory in terms of the gigabyte per dollar is saturating. It's not getting cheaper. CPU performance goes as the regular cadence, like 10, 15, 20%, power efficiency is not improving. So overall we have a big gap in requirement and solutions.
Starting point is 00:03:09 So what needs to be done even before we go into UCI and first thing is we have to redefine the way we do Silicon, the way we do compute, we do infrastructure and systems for data center. So there'll be a lot of customization instead of using general purpose, just to address this, get more compute or getting more power efficient. And this will help the address the slowdown on Moore's law. So what I believe is that we must move toward this segregation and accelerators to address the gap and accelerator will help with that prediction in the specific
Starting point is 00:03:43 usages, or we must build servers and racks and data centers structure a little bit differently what we have been doing and which we have been doing for multiple decades and last but not least we also have to make sure we are designing for sustainability because the power consumption keeps going on high more and more and dissipation and how do we keep a balance on the requirements but we keep a greener planet as well. So what we thought is chiplets is one way of providing a solution to this increasing compute requirements which is break into smaller pieces and try to address it. We can talk more about it in the sense, but the usages, like you can see video
Starting point is 00:04:26 transcoding usages are already there. Compression usage are there. So if these chiplets exist, people can plug and play or use it and define and address the compute requirements as well as try to solve that slowness of Moore's law. So let's take a step back. It was five years ago that DARPA started its chiplet group and the industry started to get serious about chiplets. Let's just get a definition of what are chiplets and what does the commercial delivery look like
Starting point is 00:04:57 for chiplet-based architectures? Yeah, that's again an important point to clarify and define. Even within companies, there are different technologies. Sometimes people use chiplets and dilates and microchips, I've also heard from some people. So yeah, chiplets, architecture really involves a small and modular chips, which we call as chiplets. And they are like smaller blocks. You put them together and create larger and more complex ecosystem. So the goal really is to provide the flexibility in
Starting point is 00:05:32 design and manufacturing process. What it means is that you can have one chiplet that is small building block, could be on a different process technology or even coming from a different company or organization. And you can plug it with another one, which is coming from a totally different environment. So this reduced the cost as well, because you can use the same chiplet for different product and plug and play really. So putting it another way, chiplets are one way of providing more compute with specific application and flexibility, which you are seeing and more and more as you generate more product and product combinations and the usages are going so
Starting point is 00:06:10 as i was touching previously let me elaborate on that a little bit like say examples are video transcoding compression encryption memory tracking or tracing all these things are needed so some applications required some don't so if we come up with a chiplet solutions for these kind of accelerators, we can combine it, use it where they are needed, and not use it where they're not needed. So you can see that we have accelerated. We are trying to provide the solution for exceeding compute requirements, as well as we don't have the silicon sitting, which is not being utilized. So that it helps in all sorts of ways. And the other thing which you ask is, what do you think of the whole ecosystem? So the chiplet ecosystem is going to grow, and I'm already seeing that there's a lot of interest and a lot of things which are happening.
Starting point is 00:07:01 So the way it will evolve is there will be chiplet-based accelerators, as I mentioned, but there will be companies providing the packaging solution, like 2D, 2.5D, 3D. And then there will be companies which will be integrating for some of these companies which need these solutions. So chiplet make that multi-vendor implementation possible. I want to see this Lego plug and play. You want something, you go in a portfolio, you look at things and churn out a chip, which can be deployed very quick and come out of the solution very fast in the industry. Now, that Lego plug and play is something that made me really excited to see the announcement
Starting point is 00:07:45 of the UCIE consortium. You're one of the lead architects on it. It's a new standard interconnect for chiplets. Why is this specification so important for the industry and that vision of Lego plug and play? I'm very excited about UCIE. I'm really glad that we are doing it for industry and bringing a whole ecosystem together. So what is UCIE? UCIE provides a complete die-to-die interconnect, or like what we're talking about, the chiplets in this sense.
Starting point is 00:08:18 And it is not just focused on specific layers. And generally, these protocols are layered protocols so it has the physical interconnect which are the electrical it goes to the protocol stack that how it talks to the back end of a specific chip it has a software model the whole manageability and compliance testing so the way we are doing it is it provides this universal chiplet express interface and providing all the complete solution. So now, as you can see, independent companies can come and develop these solutions and they can talk to each other on a package itself. So that's the interest driven that now make it more industry standard so that people can do the innovation and develop it independently and we have leveraged as you have seen in the announcement pci express compute express link as the backbone of it but they're not just tied to it there are other like whole arm ecosystem also
Starting point is 00:09:17 is very excited about it because you can plug it in a streaming protocol which is part of the uci express so it's like really in in the end is a multi-vendor ecosystem for system on a chip, which is like a full motherboard coming on the chip. And customization, it is going to be extremely useful for this. The main thing is the open standard, bringing everyone together. So we have like 100 plus now members.
Starting point is 00:09:44 And just bringing the goodness of various experts, various people, that's going to go a long way. The last thing I will just add is that we also have eyes on the future. So we not just define the problems of today, that how we will scale in the future. So we have like modular design within the chiplet architecture and if you have high bandwidth you can have more modules and so on and so forth. So we kept that in mind that how we will keep scaling as the compute requirements kept going up and up. So you've created this vision of LEGO plug and play, choose your best of breed chiplets for your solution and utilize UCIIE to connect them. But what we've seen
Starting point is 00:10:27 from the industry thus far is more use of proprietary interconnect for chiplet designs and chiplets provided by a single vendor. And there's also been a lot of industry attention on CXL, the industry standard that offers chip-to-chip interconnects on a motherboard. How do we get to the future vision from where we are today? And is there a role for all of these solutions as we move forward? How do they fit between multi-chip and chiplet architectures? Excellent point and excellent question. Just the basis of UCI is that how do we standardize rather than going the every company has their own and I can call it loosely proprietary.
Starting point is 00:11:14 In some sense it is because companies have to address their requirements. So that was your first. So let me let me deep dive what you asked the first question is extremely important. So all the big silicon providers like us, like Nvidia, we have our own chip to chip. The reason is because as we talk Moore's law is slowing down and we have computing requirements, we have come up with our interconnectivity, which is very tightly coupled meets our requirements. And same thing, other big silicon vendors have the solutions which are providing the way to address the problems that's the main reason to come up with
Starting point is 00:11:52 this industry standard so that we don't diverge too much in terms of the chiplet in terms of the packaging technology in terms of the substrate so that we can come up all together and put our minds together and we have a standard. So I do still believe that in some usages, the silicon providers like us will continue in certain usages having a tightly coupled outpropriety just for that specific usages, which is internal, inward looking, but anything which is going outward which is where we can have accelerators or having another chip talking to a third party provider that will definitely go through ucie and ucie is the best solution to address that and the second part of the question was that you were talking about how this whole thing ties in pcie cx, and then ARM has its own standard, which is a pretty big ecosystem. So the goal is UCIe is providing the interconnect between the chiplets.
Starting point is 00:12:51 What it means is that IP you have or whatever internal protocol, whether it's ARM-based, CXL, PCIe, you're building on top of it. So you're trying to complement it. So that's the beauty of UCIe that UCIe as a interconnect between chiplets, it will bring all of these protocols, which are running in the background on the chip into one all together so that they can talk to each other, they can interact with each other and does not have to be exactly the same. And that's the beauty of the open ecosystem, leveraging the industry and putting it together. And as you can see, as you see in industry, it's not just the silicon provider.
Starting point is 00:13:33 There are packaging companies, there are TSMC is a board member. So all those things, Samsung is there. So all those things, all putting it together as a complete solution helps to build that ecosystem, not having just the proprietary solutions, not just going with the one specific solution, leverage CXL, PCIe, or any of the ecosystem protocols which exist. It's an exciting future,
Starting point is 00:13:59 and you can see how it really puts the power in the hands of the customer to dial in exactly what they need. Now, I have to ask the question, NVIDIA obviously has a great silicon portfolio and you're investing in this space because it's part of your strategy. What are NVIDIA's plans for integration of UCIe
Starting point is 00:14:19 into your lineup? Yeah, we are very excited. And that's why, as you can see, we are the promoter and board members. And I'm really proud to represent NVIDIA in the board. And as we mentioned so far, chip-led interconnects are more important than ever. And we are seeing that to ensure big workloads. And we can scale. We are leaders in the GPU market for training and AI workloads. And we definitely have seen the heterogeneous compute engines can execute at their full potential and homogeneity is no longer helping or helping as a scale. So as workloads continue to grow, we need to move past some of the today's platform level limitation and enable innovation and system integration.
Starting point is 00:15:05 So we are going to fully support UCI across all our product range as we go further. And we are also helping in promoting UCI adoption with our partners. So wherever there's a need and where there is a chip-led concept with industry and external interaction, we are fully supportive of it. We are very excited about it. And we have our NVLink C2C, which is chip-to-chip, does something similar, but any inward-looking or very tightly coupled applications from customers, which are a lot of bandwidth and working with us, so we will have that. But this whole chip-led ecosystem and UCIE is extremely important for us, and we are really driving some of the innovation in that area.
Starting point is 00:15:51 I think that you've laid out so much interesting information for the audience that folks are going to want to continue to engage and ask questions. Where can folks go to find out more information about NVIDIA's technology in this space and engage with your team further? And where can they find out more information about UCIe? Yes. So let me start with UCIe. So best places to find the information about is our website, UCIe Express Consortium website.
Starting point is 00:16:21 I really encourage people, if you're not members, please become members, contributing members, join the technical work groups. We have five of those technical work groups talking from protocol, software, all the way down to electrical and compatibility and help to bring the ecosystem and address the slowdown on Moore's law and where the compute requirements are going. Regarding Edwinia, where we do the best is our GTC, which happens twice a year. And we bring our new information related products and our mindset and where we are going with it. And of course, regarding me, I can be contacted on LinkedIn if there is a specific question, interest or anything you want to know. But I do encourage everyone to join the UCI consortium and help us grow this ecosystem and make it a successful and to address the compute requirements.
Starting point is 00:17:19 Dagesh, I was so excited to do this episode. I named UCIE one of the most important technologies to follow in 2023 along the lines of chat GPTs. So maybe I'm a geek, but maybe I know the importance of the continuation of delivering semiconductor performance. What you and the UCIE team are doing is really exciting. I can't wait to see more. And thank you so much for being on the show today. Thank you, Alison, and I appreciate your time and thank you for inviting me and
Starting point is 00:17:51 giving an opportunity to talk about where we are going and what we want to do.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.