In The Arena by TechArena - Re-imagining Memory through Advanced Chiplet Innovation with ZeroPoint Technologies’ Nilesh Shah

Episode Date: April 24, 2024

TechArena host Allyson Klein chats with ZeroPoint Technologies’ VP of Business Development Nilesh Shah about the AI era demands for memory innovation, how advanced chiplet architectures will assist ...semiconductor teams in advancing memory access for balanced system delivery, and how ZeroPoint Technologies plans to play a strategic role in this major market transition.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators and our host, Alison Klein. Now, let's step into the arena. Welcome to the Tech Arena. My name is Alison Klein, and we're recording this week from OCP Lisbon, and it's been a fantastic week already. I'm so excited to have Nilesh Shah on the show, VP of Business Development at Zero Point Technologies. Welcome to the show, Nilesh. How's it going? It's going great, thanks. Now, I feel like you and I have been circling the globe together. I saw you at Chiplet Summit, I saw you at GTC, and here we are in Lisbon at OCP.
Starting point is 00:00:53 Why don't you just provide some background on Zero Point Technologies, since this is the first time the company is on the show, and your role at the company. Absolutely. I'll give you a brief background on the company's on the show and your role at the company. Absolutely. I'll give you a brief background on the company. So, ZeroPoint, and it's, yeah, we are at OPP in Europe. So, ZeroPoint is actually a startup based out of Sweden. It was founded back in 2016. Came out of the Chalmers University out of Sweden and was founded by Per Stenstrom
Starting point is 00:01:28 who is pretty well known in the computer architecture circles and they've been researching compression algorithms for several years and that's where the company sprung out of, got some initial funding and since then had a couple of other runs of funding. The company is primarily focused on data compression. What they figured out is, of course, there's an explosion of data, the data centers and devices and whatnot. But what they figured out is 75% of the data is redundant or can be compressed in some way. So that's the focus of the company.
Starting point is 00:02:06 So ZeroPoint is an IP licensing company. We develop hardware IP components primarily focused on compression across the memory hierarchy, starting from cache compression all the way down to flash compression, and then everything in between CXL, main memory, and so on and so forth. You know what you're talking about? There's always been a value in compression, but when you consider the amount of data analysis and data movement in the AI era, you can start really seeing why this technology would have a lot of value for a lot of people. And what attracted me to talk to Zero Point is just the fact that we are entering the chiplet economy and a company like Zero Point
Starting point is 00:02:51 has such a huge market opportunity within the open chiplet space. You're giving a talk this week in Lisbon on chiplet-based LLC cash and memory expansion. I get asked about this topic all the time from different clients about AI requirements for memory and really pushing memory capabilities. Let's just back up a bit and discuss why is there this inherent problem with system memory for today's workloads? That's a great question.
Starting point is 00:03:24 So, of course, right now, if you look at all the noise around AI, and it's not just noise, it's actual deployments and investments following it. There's several reasons for the way these new types of workloads have evolved. Let's just start with LLM, large language model. That's a great and popular work that a lot of people are familiar with. So if you just look at LLMs and the need for if you interacted with applications like chat, GPD, etc. It's all about responsiveness. So when you need this kind of responsiveness, there's actually a lot of work that went in before that to train these humongous models. And then when you're interacting with them, you need to run them in real time.
Starting point is 00:04:12 The components of these models, there's weights, there's parameters, and several other things that go around this model. And where do they need to be stored? So memory. And then with accelerators like GPUs, that's the primary poster child for AI is the GPU. And so GPUs go hand-in-hand with HP and 5-bandwidth memory. And one of the main reasons for that is you need to keep these compute engines busy all the time. So that's where you get your responsiveness and performance. So of course the closer the memory is to the
Starting point is 00:04:51 compute unit and the higher the bandwidth and lower the latency, that's what actually differentiates you. So this is in contrast to let's say a CPU based architecture where you had multiple levels of memory hierarchies. Maybe you have an L1 cache, an L2 cache, an L3, and you went out to DRAM and then you went out to an SSD. So you have these multiple layers. But in these AI type of workloads, what you need is a lot of memory,
Starting point is 00:05:21 very fast, high throughput, and it has to be predictable. So when you look at your normal CPU architectures, a lot of out-of-order processing that occurs, there's a lot of variability in your responsiveness. With some of these accelerators, GPUs, and even if you look at some of the custom chips that the industry is developing, look at architectures like Brock or TenStorm. A lot of emphasis on predictable performance, and that's driving the need for on-chip memories. When we talk about on-chip memories, we're talking SRAM. And the next level down is maybe HBM.
Starting point is 00:06:01 So, of course, HBM is the poster child of chiplet done right. There's quite a lot of work in the JEDEC standards and et cetera to be done. So these are the workmates driving the need for memory in a chiplet format so you can assemble them rapidly, high bandwidth, and then actually with the performance that's required. You just killed my memory storage hierarchy pyramid in the last one what you just talked about i'm gonna mourn it for a little while but you know hbm great example of 2.5b and 3d packaging technologies bringing that core capability
Starting point is 00:06:39 on die and in delivering a different way that you can actually couple memory with logic. Take me back into why this sets us up perfectly for chiplet architectures and where do you see this going? Yeah. By the way, so when I think of HPM and when you think of the memory hierarchy, so AI is, you can segment it into this AI training, but that's just one piece of it. And the next piece of it, down from the training work,
Starting point is 00:07:17 what the industry calls foundational model, training models from scratch. So that's one area. The second area is then fine-tuning. So that's where, so maybe in the foundational model training space, there's a handful of companies, the multi-billion dollar companies that are in that space. Then you have the fine-tuning space where you can take some of these foundational models and then fine-tune them for your disease. So there you have a lot more opportunity for the memory hierarchy coming into the play.
Starting point is 00:07:51 And then you've made all these investments in these large clusters of HPM-based memory deployments. Now you need to recoup that at some point. So how do you recoup all this investment? Well, that's when you actually do inference and deliver some meaningful results that you can monetize. And when you're looking at those inference use cases, now the memory IRP becomes pretty critical in terms of delivering performance at a reasonable PC or total cost of ownership. So no, the memory hierarchy doesn't go away.
Starting point is 00:08:26 It's just the different stages of these workrooms. Now, we are at the dawn of what we could consider an open chiplet era. You know, a couple years ago, we saw the UCIE specification come out that enabled open interconnects for chiplets. How do you see this industry evolving, and why is memory such an important part of the equation in terms of full chiplet designs? Yeah, if you look at the...
Starting point is 00:08:56 So two parts to this question. One is the chiplet industry in general, and why memory for chiplets? So one thing about memory, when we talk about, if you look at how memory is scaling, so let's take a look at SRAM memories specifically. If you looked at some of the data put out by all of the large fab manufacturers, the TSMCs and Intels of the world,
Starting point is 00:09:27 if you look at as they shrink the process down, going down to seven to five to three nanometers, the logic is scaling really well. So it scales as you would expect. You shrink the process, everything shrinks well. But the SRAM is not scaling as you would expect it. So that's where you run into challenges. So if you take a look at your monolithic ASIC design,
Starting point is 00:09:55 where you design everything on a single process node, IO and memory and processing, that worked well until we got to these really small nodes. And now you're running into the challenge where memory is not scaling. So it doesn't make sense to pack everything to the same process node. That's where chiplets come into play, where I may want to have my compute die on a different process node, say two nanometer, but then maybe I want my memory on the process. So I can package them separately. And the chiplet is a great way to do it, to split up the functionality. And maybe I want my IO package separately. So that's where this interplay of chiplets and memories becomes
Starting point is 00:10:39 almost kind of a natural. What are the impediments today that you see to move forward with chiplets? And what do you see from the ecosystem that gives you a lot of optimism? Yeah, if you look at chiplets, you spoke about the chiplet summit, and there's a lot of industry standards that have developed. But let's first start with what's preventing chiplets from just exploding today.
Starting point is 00:11:06 You spoke earlier about interconnect technologies and standards like UCI. They solved the challenge of connecting two chiplets together, but that's just step one. Now it's connected to these two chiplets together, but there's a lot of other things that need to be standardized. For example, let's take HBM, which is a successful example of a very specific instantiation of a chiplet. It's very well defined. There's a JEDEC standard around it, how to configure a false set, et cetera. But with chiplets, when you define the interconnect, let's say UCI or here at UCP, there's a bunch of wires. There's several other ways to just link two chiplets together.
Starting point is 00:11:53 That's great, but that's step one. Now, the next step is how do these two chiplets talk to each other? Because they may be running completely separate protocols on these chiplets. Maybe like an Axie interconnect on one, and that may be the system interconnect. Now you need to get these chiplets talking together. You need to be able to test them in a common framework. You need tools if you're using different processes.
Starting point is 00:12:19 So all of these are things that need to come together for the chiplet economy to really boom and bustle. Now, what gives me hope is when you look around at the standards that are developing and putting effort and the investment, right? It's one thing to build standards, but then you look at the number of companies investing time, effort, and energy to actually put out these chips. I think that's the most positive sign of the demand. And clearly, the biggest value proposition of chiplets is the ability to improve your yield and lower your total cost. And not just cost in terms of dollars, but also time to market.
Starting point is 00:13:02 If I want to put together a solution, let's say for the automotive market, I might have a low-end car, a mid-range car, and a high-end car. And in all of these, I may have different levels of functionality. So that is actually the poster child for the chiplet economy, where with chiplets, you can actually mix and match
Starting point is 00:13:22 these different capabilities you need and get to market very rapidly. I see the same thing being driven in the data center and the server use cases as well. So I think that's where the industry is making the push to broaden the impact of chiplets and this cost-benefit beyond this certain vertical market. Now, you are the VP of Business Development for Zero Point. How does your company plan to disrupt in this arena? Yeah, I'd love to chat about that. So Zero Point Technologies, like I mentioned, we have algorithms.
Starting point is 00:14:03 So one of the key innovations of zero-point technologies is a cashline granularity compression algorithm that actually works at a single 64-byte cashline. Now, if you look at some of the industry standard algorithms, and there are several wonderful algorithms that work great for storage technologies, Z standard, and so on and so forth. So these work great when you're working with big chunks of data, like a page or a block, or maybe even files. There's a lot of applications where you need to compress an entire file.
Starting point is 00:14:39 But when you're talking about memory, you don't have enough time you don't have seconds or milliseconds no you don't even have microseconds you have nanoseconds if you're gonna do anything you have a few nanoseconds to be done with it and that's the technology that Zero Point has developed is a cash-slang granularity algorithm which operates in a few nanoseconds and compresses and decompresses on the fly. That's incredible. So that's what enables us to integrate ourselves
Starting point is 00:15:10 even at like an L3 or system-level cache on the chip and then even going out to CXL memory, which is still memory. It's, of course, SRAM is much faster. But the same technology can be applied to CXL and other memories. So essentially, our job is to get all of the data. We want to do compression and decompression, but we want to be invisible. And that's where Zero Point is really disrupting the industry by helping deliver the solution. And if you talk about chiplets,
Starting point is 00:15:45 the value proposition of chiplets is cost reduction. And if you are going to cost reduce, but then you are going to build a chiplet, you have added some other components like IRO and other things out there. So you want to amortize that cost more effectively. What better way if you're going to make a chiplet memory, the best way or one of the ways you could
Starting point is 00:16:07 amortize cost is compress that by 2 to 4x. So now almost your development cost has been amortized over this 2 to 4x memory capacity versus just the amount of physical space. So instead of having 4x the physical space,
Starting point is 00:16:24 now you have the same effective capacity but with the same physical space. So instead of having 4x the physical space, now you have the same effective capacity, but with the same physical space. That's incredible, and I'm sure some of our listeners are quite interested in what you just said and would like to learn more. Where would you send them for more information? For more information, of course, if folks happen to be at, Zero Point is pretty much at all of the English-spe industry standard events, the future of memory and storage, and then the CXL events, the CXL DevTones.
Starting point is 00:16:54 We have several other places where we are presenting. We're at pretty much all the standards bodies like OCP and SNEA, JEDEC, CXL Consortium. So these are great places to interact. Of course, there's a website. We have a newsletter that you can sign up for, so you can stay tuned. We have our LinkedIn site.
Starting point is 00:17:16 You can connect onto our LinkedIn page. And then, of course, feel free to get in touch with me. I'm happy to chat with you. I love chatting with people, especially about solving these problems for chiplets and data center and even devices. We've taken all these domains. So yeah, I would love to reach out to me directly as well.
Starting point is 00:17:40 Thank you so much, Nilesh, for taking some time out of your schedule. I know that you're teaching here. You're talking to folks. It was great having you on the tech Melesh, for taking some time out of your schedule. I know that you're teaching here. You're talking to folks. It was great having you on the Tech Arena. Thanks, Alison. It was great being here. Thanks for joining the Tech Arena. Subscribe and engage at our website, thetecharena.net.
Starting point is 00:17:59 All content is copyright by The Tech Arena.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.