In The Arena by TechArena - Re-imagining Memory through Advanced Chiplet Innovation with ZeroPoint Technologies’ Nilesh Shah
Episode Date: April 24, 2024TechArena host Allyson Klein chats with ZeroPoint Technologies’ VP of Business Development Nilesh Shah about the AI era demands for memory innovation, how advanced chiplet architectures will assist ...semiconductor teams in advancing memory access for balanced system delivery, and how ZeroPoint Technologies plans to play a strategic role in this major market transition.
Transcript
Discussion (0)
Welcome to the Tech Arena, featuring authentic discussions between tech's leading innovators
and our host, Alison Klein.
Now, let's step into the arena.
Welcome to the Tech Arena. My name is Alison Klein, and we're recording this week
from OCP Lisbon, and it's been a fantastic week already. I'm so excited to have Nilesh Shah on the
show, VP of Business Development at Zero Point Technologies. Welcome to the show, Nilesh. How's
it going? It's going great, thanks. Now, I feel like you and I have been circling the globe together.
I saw you at Chiplet Summit, I saw you at GTC, and here we are in Lisbon at OCP.
Why don't you just provide some background on Zero Point Technologies,
since this is the first time the company is on the show, and your role at the company.
Absolutely. I'll give you a brief background on the company's on the show and your role at the company. Absolutely.
I'll give you a brief background on the company.
So, ZeroPoint, and it's, yeah, we are at OPP in Europe.
So, ZeroPoint is actually a startup based out of Sweden.
It was founded back in 2016.
Came out of the Chalmers University out of Sweden and was founded by Per Stenstrom
who is pretty well known in the computer architecture circles and they've been researching compression
algorithms for several years and that's where the company sprung out of, got some initial
funding and since then had a couple of other runs of funding. The company is primarily focused on data compression.
What they figured out is, of course, there's an explosion of data,
the data centers and devices and whatnot.
But what they figured out is 75% of the data is redundant
or can be compressed in some way.
So that's the focus of the company.
So ZeroPoint is an IP licensing company.
We develop hardware IP components primarily focused on compression across the memory hierarchy,
starting from cache compression all the way down to flash compression, and then everything
in between CXL, main memory, and so on and so forth.
You know what you're talking about? There's always been a value in compression, but when you consider
the amount of data analysis and data movement in the AI era, you can start really seeing why this
technology would have a lot of value for a lot of people. And what attracted me to talk to Zero Point is just the fact that we are entering the chiplet economy
and a company like Zero Point
has such a huge market opportunity
within the open chiplet space.
You're giving a talk this week in Lisbon
on chiplet-based LLC cash and memory expansion.
I get asked about this topic all the time from different clients about AI requirements for memory
and really pushing memory capabilities.
Let's just back up a bit and discuss why is there this inherent problem with system memory for today's workloads?
That's a great question.
So, of course, right now, if you look
at all the noise around AI, and it's not just noise, it's actual deployments and investments
following it. There's several reasons for the way these new types of workloads have evolved.
Let's just start with LLM, large language model. That's a great and popular work that a lot of people are familiar with.
So if you just look at LLMs and the need for if you interacted with applications like chat,
GPD, etc. It's all about responsiveness. So when you need this kind of responsiveness, there's
actually a lot of work that went in before that to train these
humongous models. And then when you're interacting with them, you need to run them in real time.
The components of these models, there's weights, there's parameters, and several other things that
go around this model. And where do they need to be stored? So memory. And then with accelerators like GPUs,
that's the primary poster child for AI is the GPU.
And so GPUs go hand-in-hand with HP and 5-bandwidth memory.
And one of the main reasons for that is
you need to keep these compute engines busy all the time.
So that's where you get your responsiveness and
performance. So of course the closer the memory is to the
compute unit and the higher the bandwidth and lower the latency, that's
what actually differentiates you. So this is in contrast to
let's say a CPU based architecture where you had multiple levels
of memory hierarchies.
Maybe you have an L1 cache, an L2 cache, an L3,
and you went out to DRAM and then you went out to an SSD.
So you have these multiple layers.
But in these AI type of workloads, what you need is a lot of memory,
very fast, high throughput, and it has to be predictable. So when you
look at your normal CPU architectures, a lot of out-of-order processing that occurs,
there's a lot of variability in your responsiveness. With some of these
accelerators, GPUs, and even if you look at some of the custom chips that the
industry is developing, look at architectures like Brock or TenStorm.
A lot of emphasis on predictable performance, and that's driving the need for on-chip memories.
When we talk about on-chip memories, we're talking SRAM.
And the next level down is maybe HBM.
So, of course, HBM is the poster child of chiplet done right.
There's quite a lot of work in the JEDEC standards and et cetera to be done.
So these are the workmates driving the need for memory in a chiplet format
so you can assemble them rapidly, high bandwidth,
and then actually with the performance that's required.
You just killed my memory storage hierarchy pyramid
in the last one what you just talked about i'm gonna mourn it for a little while but you know
hbm great example of 2.5b and 3d packaging technologies bringing that core capability
on die and in delivering a different way that you can actually couple memory with logic.
Take me back into why this sets us up perfectly for chiplet architectures
and where do you see this going?
Yeah.
By the way, so when I think of HPM and when you think of the memory hierarchy,
so AI is, you can segment it into this AI training,
but that's just one piece of it.
And the next piece of it, down from the training work,
what the industry calls foundational model,
training models from scratch.
So that's one area.
The second area is then fine-tuning. So that's where,
so maybe in the foundational model training space, there's a handful of companies, the
multi-billion dollar companies that are in that space. Then you have the fine-tuning
space where you can take some of these foundational models and then fine-tune them for your disease.
So there you have a lot more opportunity for the memory hierarchy coming into the play.
And then you've made all these investments
in these large clusters of HPM-based memory deployments.
Now you need to recoup that at some point.
So how do you recoup all this investment?
Well, that's
when you actually do inference and deliver some meaningful results that you can monetize.
And when you're looking at those inference use cases, now the memory IRP becomes pretty critical
in terms of delivering performance at a reasonable PC or total cost of ownership. So no, the memory hierarchy doesn't go away.
It's just the different stages of these workrooms.
Now, we are at the dawn of what we could consider an open chiplet era.
You know, a couple years ago, we saw the UCIE specification come out
that enabled open interconnects for chiplets.
How do you see this industry evolving,
and why is memory such an important part of the equation
in terms of full chiplet designs?
Yeah, if you look at the...
So two parts to this question.
One is the chiplet industry in general,
and why memory for chiplets?
So one thing about memory, when we talk about,
if you look at how memory is scaling,
so let's take a look at SRAM memories specifically.
If you looked at some of the data put out by all of the large fab manufacturers,
the TSMCs and Intels of the world,
if you look at as they shrink the process down,
going down to seven to five to three nanometers,
the logic is scaling really well.
So it scales as you would expect.
You shrink the process, everything shrinks well.
But the SRAM is not scaling as you would expect it.
So that's where you run into challenges.
So if you take a look at your monolithic ASIC design,
where you design everything on a single process node,
IO and memory and processing,
that worked well until we got to these really small nodes. And now you're running into
the challenge where memory is not scaling. So it doesn't make sense to pack everything to the same
process node. That's where chiplets come into play, where I may want to have my compute die on a
different process node, say two nanometer, but then maybe I want my memory on the process. So I can package
them separately. And the chiplet is a great way to do it, to split up the functionality. And maybe
I want my IO package separately. So that's where this interplay of chiplets and memories becomes
almost kind of a natural. What are the impediments today that you see to move forward with chiplets?
And what do you see from the ecosystem
that gives you a lot of optimism?
Yeah, if you look at chiplets,
you spoke about the chiplet summit,
and there's a lot of industry standards that have developed.
But let's first start with what's preventing chiplets
from just exploding today.
You spoke earlier about interconnect technologies and standards like UCI.
They solved the challenge of connecting two chiplets together, but that's just step one.
Now it's connected to these two chiplets together, but there's a lot of other things that need to be standardized.
For example, let's take HBM, which is a successful example of a very specific instantiation of a chiplet.
It's very well defined.
There's a JEDEC standard around it, how to configure a false set, et cetera.
But with chiplets, when you define the interconnect, let's say UCI or here at UCP, there's a bunch of wires.
There's several other ways to just link two chiplets together.
That's great, but that's step one.
Now, the next step is how do these two chiplets talk to each other?
Because they may be running completely separate protocols on these chiplets.
Maybe like an Axie interconnect on one,
and that may be the system interconnect.
Now you need to get these chiplets talking together.
You need to be able to test them in a common framework.
You need tools if you're using different processes.
So all of these are things that need to come together
for the chiplet economy to really boom and bustle.
Now, what gives me hope is when you look around at the standards that are developing and putting effort and the investment, right?
It's one thing to build standards, but then you look at the number of companies investing time, effort, and energy to actually put out these chips.
I think that's the most positive sign of the demand.
And clearly, the biggest value proposition of chiplets is the ability to improve your
yield and lower your total cost.
And not just cost in terms of dollars, but also time to market.
If I want to put together a solution, let's say for the automotive market,
I might have a low-end car, a mid-range car,
and a high-end car.
And in all of these,
I may have different levels of functionality.
So that is actually the poster child
for the chiplet economy,
where with chiplets, you can actually mix and match
these different capabilities you need
and get to market very rapidly.
I see the same thing being driven in the data center and the server use cases as well.
So I think that's where the industry is making the push to broaden the impact of chiplets and this cost-benefit beyond this certain vertical market.
Now, you are the VP of Business Development for Zero Point.
How does your company plan to disrupt in this arena?
Yeah, I'd love to chat about that.
So Zero Point Technologies, like I mentioned, we have algorithms.
So one of the key innovations of zero-point technologies
is a cashline granularity compression algorithm
that actually works at a single 64-byte cashline.
Now, if you look at some of the industry standard algorithms,
and there are several wonderful algorithms that work great
for storage technologies, Z standard, and so on and so forth.
So these work great when you're working with big chunks of data, like a page or a block,
or maybe even files. There's a lot of applications where you need to compress an entire file.
But when you're talking about memory, you don't have enough time you don't have seconds or
milliseconds no you don't even have microseconds you have nanoseconds if
you're gonna do anything you have a few nanoseconds to be done with it and
that's the technology that Zero Point has developed is a cash-slang
granularity algorithm which operates in a few nanoseconds and compresses and
decompresses on the fly.
That's incredible.
So that's what enables us to integrate ourselves
even at like an L3 or system-level cache on the chip
and then even going out to CXL memory, which is still memory.
It's, of course, SRAM is much faster.
But the same technology can be applied to CXL and other memories.
So essentially, our job is to get all of the data.
We want to do compression and decompression, but we want to be invisible.
And that's where Zero Point is really disrupting the industry by helping deliver the solution.
And if you talk about chiplets,
the value proposition of chiplets is cost reduction.
And if you are going to cost reduce,
but then you are going to build a chiplet,
you have added some other components
like IRO and other things out there.
So you want to amortize that cost more effectively.
What better way if you're going to make a chiplet memory,
the best way or one of the ways you could
amortize cost is compress
that by 2 to 4x.
So now almost your development cost
has been amortized over this
2 to 4x memory capacity
versus just the amount of
physical space. So instead of having
4x the physical space,
now you have the same effective capacity but with the same physical space. So instead of having 4x the physical space, now you have the same effective capacity, but with the same
physical space. That's incredible, and I'm sure some of our listeners are quite interested
in what you just said and would like to learn more. Where would you send them for more
information? For more information, of course, if
folks happen to be at, Zero Point is pretty much at all of the
English-spe industry standard events,
the future of memory and storage,
and then the CXL events, the CXL DevTones.
We have several other places where we are presenting.
We're at pretty much all the standards bodies
like OCP and SNEA, JEDEC, CXL Consortium.
So these are great places to interact.
Of course, there's a website.
We have a newsletter that you can sign up for,
so you can stay tuned.
We have our LinkedIn site.
You can connect onto our LinkedIn page.
And then, of course, feel free to get in touch with me.
I'm happy to chat with you.
I love chatting with people,
especially about solving these problems for chiplets and data center
and even devices.
We've taken all these domains.
So yeah, I would love to reach out to me directly as well.
Thank you so much, Nilesh, for taking some time out of your schedule.
I know that you're teaching here.
You're talking to folks. It was great having you on the tech Melesh, for taking some time out of your schedule. I know that you're teaching here. You're talking to folks.
It was great having you on the Tech Arena.
Thanks, Alison.
It was great being here.
Thanks for joining the Tech Arena.
Subscribe and engage at our website, thetecharena.net.
All content is copyright by The Tech Arena.