Utilizing Tech - Season 7: AI Data Infrastructure Presented by Solidigm - 4x16: Transparently Tiering Memory in CXL with Hasan Al Maruf
Episode Date: February 20, 2023Tiered memory will have different performance, so operating systems will need to incorporate techniques to adapt to pages with different characteristics. This episode of Utilizing CXL features Hasan A...l Maruf, part of a team that developed transparent page placement for Linux. He began his work enabling transparent page placement for InfiniBand-connected peers before applying the concept to NUMA nodes and now CXL memory. Hosts: Stephen Foskett: https://www.twitter.com/SFoskett Craig Rodgers: https://www.twitter.com/CraigRodgersms Guest Host: Hasan Al Maruf, Researcher at the University of Michigan: https://www.linkedin.com/in/hasanalmaruf/ Follow Gestalt IT and Utilizing Tech Website: https://www.UtilizingTech.com/ Website: https://www.GestaltIT.com/ Twitter: https://www.twitter.com/GestaltIT LinkedIn: https://www.linkedin.com/company/1789 Tags: #UtilizingCXL #TieringMemory #CXL #Linux @UtilizingTech @GestaltIT @UMich
Transcript
Discussion (0)
Welcome to Utilizing Tech, the podcast about emerging technology from Gestalt IT.
This season of Utilizing Tech focuses on Compute Express Link, or CXL,
a new technology that promises to revolutionize enterprise computing architecture.
I'm your host, Stephen Foskett, organizer of Tech Field Day and publisher of Gestalt IT.
Joining me today as my co-host is Craig Rogers.
Hey, Stephen. Good to be here. Looking forward to getting a chat with Hassan around CXL and
memory tiering, you know, that software layer in Linux. What about you?
Yeah, that's really where we're going to go, Craig. Essentially, we've been hearing recently that now that there
are some CXL supporting platforms out there and some devices that can be tried out, we're starting
to see some numbers. And the numbers look pretty good. The summary is that a CXL memory expansion
card looks a little bit like a memory on a remote NUMA node, or maybe even a
little bit better than that. But that being said, there are going to be much, there's going to be
much greater latency once the memory moves onto a fabric and gets into a shared pool and maybe even
goes remote, like we talked about within Teleprop. So we really need to start thinking about how to handle this idea that there's different
kinds of memory.
There's different tiers of memory with different performance.
And so that's why we received a tip that we should invite Hassan Almarouf to join us here.
Hassan is a researcher at the University of Michigan focused on transparent page placing,
which is really a great and important way to enable the use of tiered memory.
Welcome to the conversation, Hassan.
Hi. Hi, Stephen and Craig. It's nice to meet you.
So tell us a little bit more about your background. How'd you get into this?
Yeah, sure. So I'm a PhD student at the University of Michigan.
I'm at the end of my PhD. So my research focus is mostly on data center research management
and performance optimization. It's mostly on research disaggregation, heterogeneity at memory
management, and recently on CXL systems. So you can say in a word, my whole PhD is like how to make memory disaggregation
practical. I lead a comprehensive solution to address the host level, network level, and end-to-end
aspects of practical memory disaggregations, how we can hide the latency during memory
disaggregation, how we make it like fault tolerant, how we can actually make it more efficient and ubiquitous. And recently with CXL systems,
there are stairs of memory could be there
and how we can handle those things.
And mostly we focus on making these solutions
transparent to the applications
so that existing or newer applications
can have benefit of this memory disaggregation.
And you can say whatever challenges comes to make memory disaggregation. And you can say like whatever challenges comes
to make memory disaggregation practical,
me and our lab kind of focused
almost all of these aspects over the years.
And it's very, I was like very fortunate,
highly opportunity to like work with Meta
and get access to like CXL devices
and do something that could be like open sourced
and merged with Linux over the time. And so this TPP you are talking about is like
transparent memory page placement for CXL systems. This is the like new additions in our work.
And yeah, and we are thinking of many more new angles of CXL-enabled systems,
how to approach them from the software's area.
And yeah, that's it in a very few words about me.
You mentioned a really interesting turn of phrase there around hiding latency.
You know, one of the biggest concerns people will have with large scale adoption of CXL
is not just cost in adding in more RAM,
but also the performance and latency is one of those key metrics that people are looking at.
Can you go into any more detail
around how you're able to hide that latency?
Yeah, yeah, yeah.
So my very first PhD project was Leap.
It was like how, like at that time when I started in 2017,
there was like RDM enabled,
InfiniBand networks.
That was the fastest network possible available there,
which is like microsecond scale.
So in that case, with this InfiniBand network,
you can connect two different servers,
and one server can access the memory of another in microseconds.
So even if you remove all the software overheads from the OS, still the network is around three to four microseconds for a four kilobyte page.
So you cannot like break the physics, right?
So what you have to do, in normal case, CPU attached memory is hundreds of nanoseconds latency.
And in RDM enabled area, it's like microsecond scale.
So you have to somehow hide this latency by doing some prefetching or some intelligent systems.
So this leap is like enabled in the kernel prefetching mechanisms for remote memory or disaggregated memory things.
And that could like make almost provide hundreds of nanoseconds of latency till 95 percentile and rest
of the percentiles we had to go around microseconds. With CXL this microsecond
comes to like around 300 or 200 of nanoseconds but it's still like higher
than like hundreds or 200 nanosecond higher than the normal local CPU attached memory's latency.
So in TPP we had some approach so that like our main approach was like whatever is hot
they should be in the CPU attached memory and whatever is cold or warmer we can like
handle the extra latency access could be put into the like lower tiers of memory slower tiers of memory that is at us to the CXL
So that was one approach. Well, whatever is hot always try to move them to the hottest years of memory
level
but this is also a
Reactive approach whatever we have in the TPP. This is a state of that but still it has some scope of improvement
Some prefetching could be applied here. Some
other mechanisms could be applied here to make it even more faster. So yeah, so right now the
best approach is like whichever is hot, always try to allocate or bring them in the topmost
tiers of memory and that can like reduce a lot of slow tier memory access and you can
effectively hide some of this Excel latency of
CXL memory. Yeah. And we're not talking about caching here. We're talking about basically
having the software intelligently move pages of memory based on the characteristics of that
page, the accesses to that page. So, you know, if it's something that's being used quite a lot, quite often,
you put it in the memory that's closest to the CPU.
And if it's not used very much, then you put it in a different tier.
And it sounds like this would make sense.
Like you said, you've got InfiniBand-connected RDMA peers.
You've got other kinds of memory tiers as well. I mean, we've heard quite
a lot about Optane memory, persistent memory from Intel, which Intel just announced the third
generation along with the Sapphire Rapids platform. We've talked about CXL connected memory,
and even Pneuma memory, I could imagine that TPP might be viable even in a conventional, you know, just a Pneuma system where you've got some memory that's close and some memory that's a little bit further away.
Yeah, yeah, you are absolutely right.
So basically in TPP's world, although it's designed for the CXL systems, but it could be like generic to any kinds of NUMA systems. So when you
attach a CXL memory devices, it appears to the OS as a new NUMA node. So if we can support
for different NUMA nodes, how page placement for different types of NUMA nodes, that will
be obviously applicable to any CXL systems. And that's what we did in TPP. It's like when you have to like allocate a page,
you try to allocate the most hot pages in the very faster tiers of NERSC. And over the time,
like you cannot always do these things because your CPU attached memory has some limited capacity.
So at some point, if everyone is hot, you have to move some things at some other layers, right?
So it's like TPP is a solution that gives you whatever matters the most,
they will always remain on the topmost layers.
And these warmer or colder portions that will be efficiently migrated or efficiently
moved to the lowest part so that this reclamation does not impact your allocation because in today's
OS whenever you need to like allocate you have to find whether there's enough memory in the
topmost tiers or not if not there then you will to the, by default it will go to the next available memory tiers.
And when you allocate in the next available memory tiers,
it will be all like, even in a two socket systems,
it will have extra 60 to 80 nanoseconds
just for accessing a memory of the second socket
or the remote socket of the CPUs.
So that's why we had faster demotion path
and we have one opt,
we have one very optimized promotion path.
In the demotion path, we demote so fast
that always there is always headroom in the topmost tier
and new allocations can happen over there.
And in the promotion, wherever someone becomes hot again within the
coldest part they could be like optimist optimized way they could be like bring back in the systems
and not every uh brought back page could be utilized there like you can say some pages could
be cold and accessed within once in an hour and if you bring that if you like mistake them as a hot
page and bring them back
into the like topmost tier then eventually it will be colder again and you have to like uh
demote it back so this will be some wastage of bandwidth wastage of cpu cycles so we need to be
more accurate about what we are bringing from the colder tier to the uh to the like uh hot tier so
this all these things are uh like covered by the TPP. They consider at which point, which pages
could be a promotion candidate and what will be the benefit of moving them from coldest
year to the topmost year. So, yeah.
And you guys submitted, or I think you submitted a kernel patch to Linux a couple of years ago in 2021.
Is that in the kernel now?
Is that something that's out there?
There's basically two patch sets.
One is like all the basic things of TPP.
There's some performance monitor, how much anon or anon page, different types, anon page types or cache page types types how much has been like promoted or demoted was the rate of this movement
was the failure of page migrations or promotion emotions that's one thing
another is like we decoupled the we added new watermark systems there's
usually there was like high mean high, mean watermarks. We added another extra watermarks over there.
Then we made this demotion path, how fast we can do this.
There's another patch over there.
And also during this promotion, we modified the Autonomo.
So right now, Autonomo has different modes.
One is like for vanilla and another is like for tiered memory systems.
So for this tiered memory system, this TPP will work and do these migration steps or
promotion steps.
So that's one thing.
There's basic TPP approaches which will be generic.
And there are some another patch set that is like for specific to applications.
Like you can say some applications use anon pages, some applications use cache pages.
So let's say there one application is not sensitive to the cache but during the boot up it allocates most of the
file pages and those file pages consume most of the hot tiers of your memory hot
most of the memory of your hot memory tiers and when this anon comes which is
very much sensitive they have to like eventually allocate it to the like lower
tier or slower tier and then you either you have to like eventually allocate it to the like lower tier or slower tier and uh then you either
you have to like promote them back to the topmost tier or you have to like uh move you have to like
bear this latency of the cxl memory access so there's like page type error allocations if we
say okay for this application cache is not that much sensitive from the very beginning of the
application runtime all the cache will be allocated to some slow memory tiers and all the unknowns will have very like will have the all
top most or highest priority in the topmost tier that's one patch set another is like interleaving
so right now in a NUMA when you do interleaving every new monitor is of similar behavior and they
consider them of same value so one interleaving
is every it will like do round robin if you have three nodes it will one one one allocate one by
one but in cxl world different numenors may have different characteristics some may have like high
bandwidth some may have like high low latency so based on different numuners behavior we can set the weight of interleaving.
So maybe topmost tier can have like 10 pages that will be followed by two pages in the slowest tier.
So that there you can like set the ratio how much of the allocations can be like on different tiers
of the memories. So this is another pathset. So these two pathsets has been like published.
So far among them like the generic tpp has already been merged in Linux version 5.18. And these interleaving or page-tab error
allocations policies, those are yet to merge because those are very specific to different
applications. And Linux communities always consider whatever is generic, they should be merged first.
So if it's not that much generic, it's very hard to convince people, okay, this should be go there.
So, but we are very much, we are very much hopeful.
These things will be merged pretty soon, the page type error or applications error policies.
But the basic TPP has already been in Linux kernel.
It's interesting what you're describing there, you know, that tearing,
you know, if we throw back to the storage world, you know, you'd had early storage devices,
that would have had an SSD caching layer for hard drives, whilst we're miles apart in terms
of latency and throughput, you know, with tiers of memory, the functionality level is the same,
where you're keeping hot data on faster, closer storage.
It's more readily accessible.
But normally those are proprietary type products
that provide that level of functionality.
You're talking about this being integrated into the Linux kernel itself
and making those decisions around the tiering.
That would have to be observable you
know there would need to be evidence as to how that was being used and address so how can Linux
people observe that you know monitor that and report against it okay so so far in our tpp patch
there's uh some basic monitoring but this is not the wholesome monitoring. Some monitoring is like when we are reclaiming,
what types of pages we are reclaiming,
what's the rate of reclamation.
So then you can say, okay,
these are the specific types of pages
that remains cold over time.
And for that, they are going to move to somewhere else.
And when we are like promoting
what types of pages has been promoted,
what's the rate within times,
how much has been,
what types of page has been promoted at what granularity
that can be seen over there.
And during migration, if something fails,
what is the like failure or reasons of the fail,
where it's failing, which node is failing.
So these very basic observations are
available today with TPP. But that's not enough. You need to know what's the application's behavior.
And for that, maybe you may not modify the whole Linux kernel. You may need some different tools.
So we have in our project, in Meta, we we have a tool it's called like Chameleon.
It's not still open source,
but we are very hopeful to open source it
within couples of months.
So that tool is useful for monitoring.
So here you can like see when you run an applications,
you can see what's the behaviors of this application's
memory access pattern,
how much of what chunks of its memory has been hot
within a different time period.
Like maybe you can say within two minute time period,
20% of this application's memory remains hot.
What is the re-access granularity of a page?
Maybe within two minute,
this page has 80% of its pages has been re-accessed.
So that means most of the pages are
being tasked over time um and like um what is their temperature like you can have a heat map
i can say so um and that can be used there's different approaches to doing so one could be
like using linux kernels uh page access bit tracking stuff uh there could be like using Linux kernels page access bit tracking stuff there could be another things like peps based stuff supported by Intel for AMD's
processor there is another ABI's so with using these counters or CPU counters you
can have some idea okay for this particular applications this is memory
access behavior and if we consider this behavior whether this application
is applicable for the traditional setups or it can be moved to the like cxl world and if we move to
the cxl world how much performance loss we can consider like like things are like with cxl you
have to bear the latency bottlenecks so there should be some performance dimensions over there but how much performance
dimensions we can have and if we can uh very intelligent in moving right portions of memory
at different tiers maybe we cannot have this performance bottlenecks like in meta we find like
even uh even if if we support only 20 20 percent of the whole working set into the topmost tier the top we actually reduce the topmost tier to 20 percent of the whole working set into the topmost tier,
we actually reduce the topmost tier
to 20% of its whole working set
and 80% could be like in the CXL tiers,
we find we lose only 5% of the performance.
So you can know these things
by characterizing this application.
So this kind of tools we already developed
is able to be like open source.
If the paper is published,
it is easy to like open source the stuff
because in the matter you need to know,
you know, like there's a lot of reviewing process
whether we should publish it or not.
But there's another company like Memverse.
They have already a product like Memviewer.
This can also be useful
to understand the same kind of things.
So yeah, so I think like the basic Linux,
it should have some counters.
It should have some monitoring systems from the very basic thing.
But in the Linux, you cannot have this freedom of doing everything.
So there should be some user space tools.
And with like with CXL switches, there will be more counters.
So using those counters, you can see, okay, this specific node has these amounts of
load, heat, means kind of things. And based on that, you can come up with another tools
for monitoring what's the memory access behavior at different layers or different nodes or different
devices. And yeah, from there, you can have a more robust and more strong characterizations
of applications.
So I'm glad that you brought up the meta results because I know that that's been another thing that you've done
is you've done some benchmarking,
some work characterizing the performance
of transparent page placement.
And it's really pretty positive.
Like you said, you don't have to have the entire working set
in the hottest tier of memory.
You can do it on a page-by-page basis. You showed there some serious performance improvements by implementing
this. Do you think that eventually this is... I mean, we've been sort of assuming that tiered
memory is going to be a normal part of compute architecture going forward. Do you think that
eventually transparent page placement is just going to be something that the kernel does
and people aren't going to, it doesn't have a name anymore because it's just something that's on
and you just use it? Yeah, why not? Like if you consider like in today's, in the NUMA systems,
you are basically doing some kind of page placement right so what's the difference with
the cxl like it's the complicated numma systems right um and you have to do some page placement
for these types of things today's numma systems is everyone is homogeneous and future numma systems
would be like heterogeneous numma systems so there should be some support for heterogeneity
perspectives and tpp is the stepping stone for this heterogeneity so there could be like many more improvement over it like when you like
in tpp right now we are like demoting the next possible numa node but you with a very large with
switching with cxl 2.0 or 3.0 when switching is available multi-layer stuff switching is there
there could be like gfm that could be like some other um like type two types of devices could be there so in that case which layer or which uh
specific numa node could be useful for these applications like not always having the topmost
tier is helpful for the application so if you consider multiple applications running on the same systems and everyone has different demands
of their performance,
in that case,
maybe next top tier can be helpful
even to maintain its performance,
desired performance.
So in that case,
which tier you can choose
or what fraction of which tier
is enough to provide
the performance benefit
of these applications, that could be a good challenge. And we are working on that, what fraction of which tier is enough to provide the performance benefit of
these applications.
That could be a good challenge.
And we are working on them,
working on that.
And I hope like it could be like easily,
or it,
there could be like many more challenges that can be like approached and
people will work and already working on them.
And obviously there will be,
some of them will be merged to the Linux
because the whole industry is moving at that area.
So you have to have some support somewhere.
The Linux community have always been very good
at begging in a lot of kernel support
for a lot of hardware solutions.
Are you seeing other areas of the linux community
contributing anything else around cxl and memory tiering you know because it knowing the technology
is going this direction it's one of these things that will benefit all the users of linux and all
the contributors so like uh just just as I said before what is like
memory monitoring tool like what is the work sets behavior already there's a
patch set been merged with 5.15 Linux it's like from Song Yangju from the
Amazon it's like daemon the culture is daemon so it's like for a specific
workload what fraction of is is like
hot cold stuffs that's already been uh you can have some view of this world so that's already
been there i know in the like samsung uh they are doing samsung ghosting they are doing some
uh very good works or they have some work in the pipeline to support cxl ecosystems uh that could be like uh like these
auto healing stuffs or these resiliency stuffs like earlier everything was within a single
machine and now we are going beyond a single machines domain so animations can fail anytime
you need to support provide like your main memory is can be like unavailable at any time so you have
to provide some resiliency for that.
People are working on that area.
And like memory sharing,
sharing consistency, coherency management.
So these things are being addressed.
And I think the whole industry is working on different aspects of these problems.
And maybe within one or two years,
I think there will be lots of patches
being merged in the kernel.
And maybe CXL will be very ready to use by everyone
by maybe end of 2023 or 2024.
Yeah, and I think we also have to give credit to Intel
for doing a ton of the basic plumbing for CXL
within the Linux kernel.
I know that they've submitted and they just keep submitting more and more and more, including recently a bunch of support for type three memory in CXL.
So I think that it's great to see so many people contributing to the development of the Linux kernel, because, of course, a lot of systems out there,
I think probably an easy majority,
if not a vast majority,
are going to be running Linux at this point.
Though, as we've talked with other companies,
I mean, we talked to VMware,
they're certainly planning on having
a lot of CXL support as well.
But I mean, right now, Linux is really where it's at.
Does that match your opinion, Hassan? Yeah. So I think everyone is trying to build the whole ecosystem so that
it's open to all. Everyone can easily use it. And I think almost all the big companies are
focusing on how to do stuff, make them open source and make it even more usable by everyone.
So it sounds as though the community are contributing
to empower the proprietary solutions they need to build
to do advanced monitoring, advanced memory tearing solutions,
but they're building in the mechanisms within the kernel space
so that they facilitate their user space requirements?
So some part,
there should be some mechanisms that is fundamental.
Like these placement stuffs,
without modifying the OS from the user space,
you cannot do much like uh let me
share my experience so when in the matters project initially we tried with this characterization tool
we found okay these are the different um these are the different uh heat map of a specific
applications and from that heat map why not we do this memory management on in the OS level
in in the user space from the user space so
the problem of here there was like um some memories or some pages they are very short-lived they comes
and go you cannot like by the time you decide okay these are the pages we need to like migrate
those pages has been gone and uh from the user, you need some user to kernel space movement context switching
that can take time. And you don't have that much freedom of words. So that's from that point,
we decided user space solution may not be sufficient. We need to modify the kernel and
build, apply these specific things from the kernel so that it could be easy, it could be faster, and it could be like generic. And that's why even we have the freedom of doing things from the kernel so that it could be easy it could be faster and it
could be like generic and that's why even we have the freedom of doing everything from the user
space eventually we have to went to the like linux kernel and need to like modify this and i know
some cloud vendors uh in the hypervisor level you can say so they don't have the opportunity to like
understand what the application is doing so So maybe they are completely oblivious.
What's the behavior of the application?
They have some idea what is doing over there,
but per page wise idea or tracking per page can cost them much,
much high latency.
So from their perspective,
modifying the hyper hyper hypervisor could be like the best options
to like uh go on so there could be some solutions that can work in the hypervisor level and yet like
let's say you have some very intelligent mechanism you want to run some ml stuffs
let's say you do you run some machine learning algorithm to understand what is the application
behavior or what's the other uh properties and that ml applications you cannot like uh integrate it with kernel because it will
be too much heavyweight uh if we run some mls inside the kernel so for that you need some user
space tool or you need some other mechanisms that can like communicate with the kernel so
based on your use case, based on your requirement,
I think different solution is needed at different layers.
And they could be like communicate with each other.
It could be like completely independent solutions. And I feel there's also some room for proprietary solutions.
And there's some fundamental stuff needs to be done inside the kernel,
which is like Intel's, Samsung's or some other companies are doing.
Have you observed much of an overhead in tracking these page values?
You know, if you're tracking heat, there's obviously an overhead to that.
Is there a point of reflection?
Whenever you are tracking, there will be an overhead.
Like let's say we are tracking by this idle bitmap stuffs.
So in that case, you have to like set and reset
the page one page flag of every pages over there.
And if you're working set size, let's say it's terabyte size
then you have to like frequently modify the bits
of all the pages within a one terabyte.
And it will be like
tremendously high overhead so that can that's one reason why we did not go in the like uh page track
idle bit tracking way then we found okay uh intel gives intel cpus gives uh pips uh pmu is available
there so from there we can have some cpu counters let's just do sample from the cpu counters and
come up with there.
But even when you're using the CPU counters, you need to like burn some CPU cores to get
these numbers and parse them and then actually get some analysis.
So that's another reason like this paves-based tool or this counter-based tool that has not
been in the kernel side, we did not go that route because if we from the kernel
if we use this CPU counters we will lose the freedom from that we will have less counter
available in the user space it will halt the CPU every now and then and if the workload is very
much CPU heavy maybe your monitoring system will halt every now and then your application may crash so that's why we separated this as a like two separate tools one is
for characterizations you can run it every now and then and halt it every now
and then and another is like very basic thing always running within the kernels
so two different tools one is characterization tools another is like
Linux kernel modifications of page placement. This is very basic. So yeah, the overhead is very high when we run these chameleon tools.
We found like for a very heavy workload, it is around 10% of slowdown. And there are some other
works available where they showed like in IPT-based case, you can have 20 to 90% of overhead
based on the workload behavior.
So whenever you're running these things, you have these overheads.
And even in TPP, we used Autonomo.
Autonomo is, you have also some kind of like faulting mechanism over there because in Autonomo,
there is a page fault happening from the page fault.
You are seeing which page is being accessed and which CPU is trying to access them.
So that has some overhead also. So in our cases, we always try to,
we always thought or we always place pages in a way
the cold pages will be always in the CXL tier.
And if we can manage that,
page becoming hot from the cold data
is comparatively lower frequency.
So that's how we actually hide the overhead of Autonomo.
But if there's a workload that is thrashing everything in the whole tier,
all the whole tiers of memory,
then we will also be at the overhead of Autonomo sampling
all the hot pages from the CXL memory.
So in memory world, not one solution is good fit for everyone. You have to understand
what's your use case, what's the behavior of the workload, and based on that, you need to
come up with different solutions. And yeah. Yeah. And that's probably true of any technology.
You have to have the right horse for the right course. I wonder if you can wrap up by just sort
of giving us a peek at what comes next. So you've got the transparent page placement in the Linux kernel.
At least some of the code is in there.
And you're looking at moving forward with more.
What do you think is coming in 23, 24, 25 on the software side for CXL support that you're excited about?
So in my opinion, what we did is for CXL 1.0,
that's like the proof of concept that CXL works. CXL 2.0 will come with switching,
one layer of switching, we can connect multiple devices there. That could be like go
within beyond the machine's boundary. And with CXL 3.0, which I guess will be available in 26 or 25 time zone.
There we will have multi layers of switching.
That could be like a very gigantic networks of memory.
So in that case, we can have do sharing.
We can have peer-to-peer communications.
We will have the GFAM.
4,000 nodes can be like connected.
So this could be a really crazy time at that period, timeline.
And in that case, how we are going to manage this coherency,
how we are going to communicate between different devices,
like CPU, GPU, those are actually how they are going to share memories,
how they are going to access each other's memory area.
We need some support for that
and i think like right now whatever people are doing mostly focusing on the type 3 devices
where it could be like memory extension memory expansions the next one is like how we can do
the disaggregations disaggregation within a rack scale and this rack scale can theoretically
it can be go beyond a rack it can be connected the whole data center theoretically but I'm not
sure whether any practical use case could be available or not but who knows you can like
make connect the whole data center through CXS in the and that case, all the networking problems we are facing right now will be
somewhat reinvented for the memory perspective. And we may need to handle everything what we are handling right now in the network, in the memory world.
So a lot of work to be done, a lot of exciting stuff happening. And the cool thing is, you know, it works.
And I think that that's the thing that we're all most excited about.
So thank you so much for joining us, Hasan.
It's been really, really fascinating to hear this aspect of it
because we spent most of our time on the utilizing CXL
talking to the hardware companies.
And so it's great to hear a little bit about the Linux kernel support
and the work that you've been doing.
If people want to continue this discussion or learn more about TPP, where can they connect
and where can they find it?
Yeah, so you can connect me with my LinkedIn.
Like my LinkedIn handler is Hassan Almaruf, my name.
So you can shoot me a message and ask me anything, whatever you want to know about that or have
more discussions. And
yeah, you can also give me an email at hasanal.umich.edu. I'm like happy to have any
chat with everyone. All right, cool. And as for me and Craig, you'll see us at Tech Field Day
in March, where we're going to be talking to some of the CXL companies. You'll also see us here every week on Utilizing CXL. And of course, you can find us on social media and we'll include our links in
the show notes. Thank you for listening to the Utilizing CXL podcast, part of the Utilizing Tech
podcast series. If you enjoyed this discussion, please subscribe in your favorite application,
and please do give us a rating and review. This podcast is brought to you by gestaltit.com,
your home for IT coverage from across the enterprise.
For show notes and more episodes,
go to utilizingtech.com
or find us on Twitter at Utilizing Tech.
Thanks for listening, and we'll see you next week.