Disseminate: The Computer Science Research Podcast - Haoran Ma | MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime | #18
Episode Date: January 16, 2023Summary: Far-memory techniques that enable applications to use remote memory and are increasingly appealing in modern data centers, supporting applications’ large memory footprint and improving mach...ines’ resource utilization. In this episode Haoran Ma tells us about the problems with current far-memory techniques and how they focus on OS-level optimizations and are agnostic to managed runtimes and garbage collections (GC) underneath applications written in high-level languages. Owing to different object-access patterns from applications, GC can severely interfere with existing far-memory techniques, breaking remote memory prefetching algorithms and causing severe local-memory misses. To address this Haoran and his colleagues developed MemLiner, a runtime technique that improves the performance of far-memory systems by “lining up” memory accesses from the application and the GC so that they follow similar memory access paths, thereby (1) reducing the local-memory working set and (2) improving remote-memory prefetching through simplified memory access patterns. Listen to the episode to learn more! Links: OSDI'22 MemLiner paperOSDI'22 Presentation Haoran's website Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello and welcome to Disseminate, a computer science research podcast. I'm your host, Jack Wardby.
I'm delighted to say I'm joined today by Horan Ma, who will be talking about his OSDI 22 paper, MemLiner, lining up tracing and application for a far memory-friendly runtime.
Horan is a PhD student in the Department of Computer Science at UCLA, and his research has recently focused on systems.
Horan, welcome to the show.
Hi, thank you. Thank you for inviting me.
It's our pleasure. Let's dive straight in then. So can you start off by explaining to the listeners
what farm memory techniques are and why they're appealing to use in data centers?
Yes, of course. So basically, far memory means that there
is one level of memory that is lower than currently widely
used local DRAM.
I can give some examples.
For example, non-volatile memory could be regarded
as one kind of far memory.
Another example could be using RDMA, remote direct memory access. It is like
there are two servers. The host server is usually equipped with very strong cores and some amount of
memory. It can use local memory as some kind of cache and it accesses remote memory through some kind of special high-speed network,
such as I just said, RDMA over InfiniBand.
But actually, accessing data on remote memory
is still slower than accessing data in local memory.
That's why we call it far memory.
It's slower, so it's far.
Yeah.
And actually, it is the most widely used in data
centers because of um currently the data centers uh have some memory capacity bottleneck problem
um that problem is like uh so basically the growing of processor computation now is actually faster than the growing of memory capacity,
especially in recent years.
So we need a way to still increase the memory capacity.
So we have this kind of far memory techniques.
And at the same time,
the memory in data centers is often actually underutilized.
Okay.
So it causes a big resource waste.
But if we have this kind of farm memory techniques,
we can utilize the free memory on one machine
to satisfy the memory need on another machine.
And in this way, we can, like,
increase the resource utilization in data centers.
That's why this kind of techniques are, like, appealing to data centers. That's why kind of techniques are appealing to data centers.
Awesome.
So how do these farm memory systems that exist at the moment,
how do they currently operate?
And what aspects are there to them that are crucial
to achieving good performance in a data center?
Yeah. So basically, currently,
there are two main ways to use
the far memory systems.
One way is that we let users
or we let programmers
to fully control the system.
It's like the programmers specify
what kind of data should stay
on the remote memory
or stay on the far memory and what kind of data should stay on the remote memory or stay on the farm memory?
And what kind of data should stay on the local memory?
This is one way to use that.
The users can use some API or some interfaces to control that.
And another way to that, to make these farm memory systems transparent to programmers, it can like so basically the family systems act
as some kind of swap system
as like some cache
the local memory is some kind of cache
and the remote memory
is kind of like the main memory
and when the local memory is not enough
the
like the family systems can like
just do the swapping and swap out
to swap out some like code data and swap in the,
the data it needs to access automatically.
And,
and I think currently the second way is the most popular one because,
you know,
the programs that not need to care about it,
care about the details and it's,
it can still like achieve pretty good performance.
Though, actually, I think if the programmers can control everything,
of course it can achieve the best performance,
but it would be highly complicated.
That's really interesting.
So unlike Azure or AWS now,
is there an API for me to go and actually,
you said a user can actually get full control
of the placement of data. Is that
actually exposed to users at the minute? Can I
just log on to AWS
and I have that API available for me?
I think
the API
is kind of low level.
Pretty low level.
We can install some drivers
and use that driver to control everything
to maybe flash some data to remote side
and fetch some data to the local side.
So the programmers take care of a lot of details to do that.
And that's not very...
But there are a lot of like can't work to like simplify
that but like the
programmers still need to specify a lot of things
okay cool cool so you say the
second one's probably the most common way
of operating at the moment
I know in your paper you said
that you identify that in that sort of
set up garbage collection is a
real problem can you
kind of elaborate on why it is a problem?
Yes. So, basically, I would like to start from the cloud applications itself.
So, cloud applications currently are most, I would not say most, I will say like a lot of,
to be like accurate,
a lot of cloud applications
are written in managed languages
like Java.
Yeah.
And in Java,
it has some kind of managed runtime
to like manage the memory
the application uses.
And inside the runtime, the way to manage the memory the application uses.
And inside the runtime, the way to manage the memory is to do some garbage collection,
to collect the garbage and to reclaim the unused space
and to let the application use it.
And for modern GC algorithms,
it usually uses some kind of tracing and evacuation algorithm.
Tracing means it starts from the root.
So basically, when we are writing applications, we know that we have stack, we have heap, and the root means the stack variables. We start from the stack and to trace it from the root and then mark every object
we can trace. So it is pretty like a graph traversal. And the object we marked is alive.
And for other objects that we did not mark, it's dead. Because, you know, from the root,
we cannot access it. Then it means like we will never use it.
We have no way to access it in the future.
So this is what GC is like.
But this GC thing has a big problem.
Why?
Because it is, as I just said,
it is like a graph traversal.
So it's like it has very little locality.
And the locality thing is very crucial to the performance in memory systems.
Because if the locality is good, then actually the prefetching and the catch and swap,
it is beneficial to that kind of catch-and-swap system.
And so we can see the tracing has very bad locality.
And the tracing itself, like the GC itself,
will also compete with the applications for resources.
So we know that if the tracing needs to trace some objects,
it needs to firstly swap in it and trace it,
and then swap it out.
And the application itself also needs to do that.
When it is being executed,
it needs to swap the data it needs to use,
and then swap out it.
So these two things will concurrently
compete with each other
for a lot of resources like
the network bandwidth,
the RDMA bandwidth, and also the local
memory capacity.
And so
the GC
would highly interfere with the application itself. And actually, the GC would highly interfere with the application itself.
And actually, the GC would also interfere with the application in another way.
It is about the prefetching.
So as we know, in traditional swap systems, we like do some prefetching. If we can identify the access pattern,
we can like prefetch some continuous objects
like in the following way.
And it can help improve the performance.
But because we now have the tracing,
we also have the application.
These two kind of like access behaviors are very different
and they will like interfere with each other
and then the prefetcher may not be as useful as before.
If we do not have tracing,
actually, maybe the prefetcher can help prefetch some objects
for the application itself.
Okay, cool.
So I guess let's talk about Memliner then.
So what is Memliner?
I guess it's going to address this problem.
So let's hear about it.
Yeah.
So Memliner is a runtime technique
solving the GC interference problem
on disaggregated memory or on farm memory.
It mainly lines up the tracing in Gc and application itself to improve cloud applications performance
on far memory.
I want to describe it like this.
Okay, awesome.
So I guess we've kind of touched on it a little bit when we spoke about the problem that it's designed to solve.
But what were your design goals when you were developing Memliner?
Yeah, so I would say there are two main design goals for us.
The first one is that we do want great performance.
So the performance is the most important one, the most important goal that we have.
And the second goal is that we still
want a highly decoupled
runtime from kernel.
We do not want to make
the kernel itself and the runtime
itself highly coupled
because, you know,
as a programmer,
we want some decoupled things to make
it better.
Yeah, so it's easier to reason about i guess right yeah okay cool so what were trying
in achieving these design goals what were sort of the key ideas that underpinned memliner
to yeah as i said to to to achieve these these goals so there so we have two observations that motivate our
design of Memliner.
And so the first
observation is that we do
find that application and
GC are not
completely unaligned.
So we find that
the application and GC are just
temporarily unaligned.
Why? Because the live objects are actually just temporarily unaligned. Why? Because the
live objects actually traced
by the GC itself
are mostly accessed by the application
at some point during the execution
because the GC
traces it, it is alive,
so it will be highly possible for the application
to access it in the future.
And the objects accessed by the
application must be live objects.
Yeah, because it is being accessed, so it is live, and it should also be the, like,
targets for the GC.
So basically, these two kind of, like, object sets are not, like, highly decoupled or,
like, highly unrelated.
They are, like, actually related to each other.
And the second observation
we make is that
changing the object access order
in GC is possible.
For the application, we cannot change
its behaviors because it is
specified by the
programmers,
by the application developers. But for GC, inside the runtime, by the application developers.
But for GC, inside the runtime,
we can control it.
We can trace the objects in different orders
to make them related to each other.
Okay, nice, nice.
So what were the sort of,
in kind of putting these ideas into practice,
what were the challenges that you had to
overcome and how did you go about how would you go about doing this yeah um as for the challenges
i would say um so as i said memliners uh memliner it lines up the tracing and the application. So I would say for the biggest challenge here
is that how to line them up.
Yeah.
And so what we do is to classify the objects
into three categories,
local objects, incoming objects, and distant objects.
For local objects,
we mean these kind of objects
are currently being accessed by application threads. So for this kind of objects, we mean these kind of objects are currently being accessed by application threads.
So for these kind of objects, the GC threads should touch them at once.
And for incoming objects, it means the objects are still in remote memory, in far memory, but it will soon be accessed by application threads. And for this kind of objects,
the GC threads should also touch at once
because it will be used in the near future.
And if the GC threads touch them,
it is being swapped in,
but it has no harm
because in the future,
the application still needs to fetch them
to the local environment.
And the third category is the distant objects.
It means that it is in remote memory,
but it will not be accessed by application threads soon.
And for that kind of objects,
the GC threads should actually delay the tracing of them,
delay the access of them.
That's how we align them up.
But inside this process,
there are still some some kind of like small challenges. For example, how to inform the GC threads, the objects that
are currently being accessed. And also, like, how do we identify the objects that will be accessed
by the application thread soon? And the third, the third, like a small challenge is that how to estimate
the location of objects
so that we know
what kind of objects
are distant objects
and what kind of objects
are not distant objects.
Yeah.
And like to make it simpler.
So basically for the first challenge,
how to inform GC threads,
the objects that are
currently being accessed,
we basically used the barrier mechanism
inside the Java runtime.
And I would say that's the simplest answer.
And for the second challenge,
like what kind of objects will be accessed
by application-style soon,
we just regard the objects that are a few steps away
from the objects that are currently being accessed.
We regard those kinds of objects
as the objects that are going to be accessed.
And as for the second thing,
how to estimate the location of objects,
it would be kind of more complicated
than the previous two
because we need to still use the kernel mechanism
to estimate if the object is currently being,
I'm sorry, if the object is currently
in local memory or not.
So it is like more, this kind of work is more heavily relied on the kernel side.
Yeah.
Okay, cool.
So you say you have these three categories there with the different types.
Would it benefit from having, we can maybe touch on this later on when we talk about the evaluation and and whatnot but it just jumped into my mind like are these three why did
you settle on having three categories would you benefit from having more or does that make it more
complex or was this naturally just these are the three obvious categories things would fit into
yeah so basically the actually the three object the three kind of objects are categorized by our needs.
Because we need to line up the tracing and the application.
And so we need to know what kind of objects should be traced at once.
And what kind of objects should not be traced at once. And what kind of objects should not be traced at once.
It should be delayed.
So the criteria we classify
different objects is based on that.
Should we
let the GC start to trace them
now or
we should not trace them
at once. We should delay them.
That's the criteria.
Okay, cool. So I guess
how do you actually go about implementing this then? Can you talk us through your implementation we should delay them. Yeah, that's the criteria. Okay, cool. So I guess my next question is,
how did you actually go about implementing this then?
So can you talk us through your implementation
and how that looks like
and guess how easy it was to implement this, right?
Maybe there's some war stories there
of how it was challenging or whatnot, but yeah.
In terms of the implementation,
I would say it's mostly like to hack the Jumbo Virtual Machine. Okay. Yeah, it's mostly to hack the Java virtual machine.
Okay. Yeah, it's kind of
painful because
I have to say there are not a lot of
materials
about the Java
virtual machine implementation, so it would be
painful at first.
But
yes, I know more about
the Java virtual machine.
It is more smooth.
And we do implement our technique in two different garbage factors in the openJDK in the Java virtual machine.
And I would say it's a great experience.
Okay.
Actually, as for the implementation,
after I implement our technique in the first character,
it is easier for me to implement in another one.
Okay. Yeah, because they do have some kind of similarities to each other yeah so
i see how big are the changes like how big are the things you need to implement here is it
quite a sizable change or is it yes it's not like some little changes to be like um inside the jvm
i forgot the exact number actually but it should be at least like several thousands of lines of code.
Okay, so it's non-trivial then.
Yeah.
Okay, cool.
Cool.
So yeah, I guess my next question is, how does Memliner compare with other contemporary solutions?
What else is out there in the space that kind of compares and tries to solve a similar problem?
And how does Memliner compare against this?
Yeah.
So for contemporary work,
there are two kinds of them.
So the first kind is to like improve the kernel,
to like make the kernel swap system better,
to let it like swap in and swap out faster.
This can also like improve the performance of applications.
But as I just said, the memliner is a runtime technique to solve the GC interference problem.
And so this kind of work is very different from each other.
Or I would say the work that improves the kernel
can also be utilized like underneath the memliner
because like I would say our work is built upon them.
We can use their soft system.
But there are kind of actually another kind of work
or I would say like some of my previous work
are in that category.
It's like we want,
like we sometimes
still solve the GC interference problem,
but we want like to
not line them up.
We want to offload the GC part
to remote memory side.
It's like another way
to solve this GC interference problem.
But memliners' advantage is that it is easy to use because if we offload the GC to remote side,
the kernel and the two machines are highly coupled with each other. And we also need some kernel runtime co-design
to make them highly efficient to swap in and swap out.
I would say Memliner's advantage is
you can easily use it on every swap system.
I see.
I guess this leads naturally into my next question as well,
and it's how did you go about evaluating Memliner?
And what was the key question you were going to ask?
And what did your evaluation experiments look like?
Yeah.
So for evaluation, we basically compare memliners. So the biggest evaluation part
in our paper is to compare memliners with unmodified JDK on different swap systems,
I would say on the same swap system. And we compare the performance of using MAMLiner and using
unmodified JDK on
a range of different applications.
I remember we compared them
on 12 applications.
It's a huge
application pool.
And we compared its throughput
to see
how MAMLiner is faster than another one on modified GDK.
And besides, we also do some experiments to see how MAMLiner can help improve the prefetching effectiveness.
Because I just said, the tracing itself can interfere with the application to destroy the access pattern of the applications
that can be identified by the kernel itself.
But if we line them up,
actually we think the access pattern of the applications
should be now clear to the kernel swap system.
So we did some experiments to check that if the prefetching now is more effective than
before. Okay, cool. I just wanted to jump back a sec to these applications. Can you just briefly
get, I know this is a big application pool, so I'm guessing it covered a wide range of different
types, but can you maybe give us a flavor of some of the applications that were used in the evaluation?
Yeah, so the application is more
like the cloud applications
that are currently being widely used
in data centers. For example, the Spark
itself. We evaluated
three different Spark applications
and we also evaluated
Memlander
on Neo4j and Cassandra.
So basically, they are mostly big cloud applications.
From your experiments, what were the key results and what were the highlights?
I think our highlights is that we improved the cloud applications by an average of 1.5 times speed up
than the unmodified JDK on the same swap system.
Yeah, that would be like the most highlight thing
is like improve the performance
and do not harm it as something else.
Yeah.
That's the headline.
That's really cool.
So maybe we can dig
into the to the to the results a little bit more then so like what are some of the other things
that were interesting that you found from your experiments maybe we can talk about like some
something about the uh like the prefetching uh prefetching effectiveness because i because i
just talked about it yeah for the prefetching effectiveness, we evaluate memliner and unmodified GTK on two different swap systems.
Or maybe I can say two different prefetchers. We can see that for both the
prefetching accuracy and prefetching coverage, for both of those two metrics, the Memliner metrics
are higher than on Modified GDK
on both two swap systems.
And I think that's
also another
highlight result.
Yeah, it's a really interesting result, for sure.
Are there any situations in
when, because at the moment it sounds like
it's pretty much a free lunch, Memliner,
that we can always use it to get better performance.
Are there any scenarios or any applications where the performance is suboptimal?
I guess what I'm asking here is what are the limitations of Memliner?
Yeah, so basically Memliner still has some little overhead, but in, but it's, like, in most cases, it's negligible.
And, but,
like,
as I introduced,
the memliner lines
up tracing in, like,
lines up tracing in GSC and
application itself. So, but,
but, you know, like,
in some cases,
the application may not use that amount,
that large amount of memory.
It might not trigger the concurrent tracing in GC.
And in that case, like Memline has no use to it.
Memline, it is not useful.
It can only like have some like very small overhead,
though not big, but it is still some small overhead.
And it cannot help improve the performance
of applications
because it does not trigger
the concurrent tracing in GSC.
So in that case,
Memliner, I would say,
is not useful.
What is the magnitude of the overhead?
You said it's really small,
but can you maybe give us an example
in terms of an application
on some numbers
of how large the overhead is
in this scenario?
So for the memory amount,
so basically,
I would take the G1GC as an example.
So G1GC is the default garbage collector in OpenJDK.
And the mechanism to trigger a concurrent tracing phase in G1GC
is that now the used memory has reached some threshold,
for example, 60% of the whole heap.
But the G1GC, it also has some kind of
nursery GC to
continuously reclaim some memory,
though it cannot reclaim
a whole dead memory, but it
still can reclaim some memory.
So if the nursery
GC is already enough
to keep up with the applications,
to keep up with the memory
need of the application itself then
it does not need to like trigger the uh concurrent tracing like the memory amount the memory usage
never reaches up to like 60 percent or 50 percent that's right then actually it's not triggered and so the memliner is not used useful in that case awesome so i guess
it it depends on like one of my questions i was going to ask you is as a software developer and
whatnot how do i go about using memliner but i guess it requires me to to kind of understand
and profile my application and see whether this would benefit me. Okay, sure. So I'm guessing it's publicly available, right?
Yes, it's open source on GitHub, yeah.
Okay, is there any sort of long-term plan
of getting it merged into some version of the JDK
or is it always going to stay
as pretty much a research kind of prototype?
So currently our plan is to make it
more like a research prototype.
But we do want to push it to OpenJDK, but it would be pretty difficult.
It is not easy work because we need to make it highly robust to use in every corner case.
Otherwise, just pushing it to OpenDK would be like
not so good.
Okay, cool.
You know, OpenDK is
widely used around the world.
Yeah, of course. If it wasn't
100%, it could result in a lot
of problems, right?
Okay, cool.
Right, so I guess
once you've been working on this project,
what do you think has been the most interesting
lesson you've learned?
Maybe something that kind of caught you off guard
that you weren't really expecting to discover
across this project.
I think the most interesting thing in Memliner
is how we come up with this idea.
Okay, yeah.
Yeah, because actually before Memliner,
we have a work that is called SAMRU.
And the SAMRU is kind of like the idea
of offloading garbage collection to a remote memory site.
But when we were developing the SAM rule,
we found that
why cannot we just like
to
have
a tool
that can just be used on a
single machine without like to
offload some computation to
another one. And
then we come up with this memliners
idea to
implement it, to develop
it, and to get it
published. I think
that's the most interesting part because the memliners
I have to say memliners idea
is the idea of a memliner
is the most important thing.
Right, okay.
But as for the implementation it's just like
you know just implementation yeah it consumes the most the most amount of time but it's just
yeah that's always the case right yeah um cool so i guess leading on from that as you as you
will know like the progress in in research is very up and down it's very non-linear things
there's lots of ups and downs along the way,
along the journey.
So from the initial conception of the idea for Memliner
to the publication at OSDI,
can you tell us more about that journey
and were there things along the way
that you tried that didn't succeed
that maybe the listener could benefit from knowing about?
Yes. Yes, yes actually there are many this kind of things but um um i was i would give uh like a example like
an example of it like at first like when we like want to
like inform the application thread
what kind of objects that are currently being accessed,
our initial idea is to capture the call sites inside the program.
If the program calls one function,
there is a call site,
and in the call site, there would be some parameters.
So at first, we want to instrument the cosites to capture the parameters
and regard these parameters as some kind of rules to trace from them. And this idea does not work in our final implementation
or in our final paper.
Because, so firstly,
the instrumentation itself is too heavy.
It incurs overhead,
like significant overhead.
So it is not acceptable.
And also like the call size itself
is not, like, the complete set
of the objects that are currently being accessed.
Yeah, it's, like, because, like, if you have a very long function
and it has no, like, internal
call sites,
then we might not capture any objects during that execution time.
And it is not good in our scenario.
So finally, we give up that idea,
and then we go to use the barrier thing.
So how far along, how much time did you spend
with trying to
instrument things
before realizing
this isn't going to work,
this is just so much
of a performance overhead,
let's rethink it?
Yeah,
it's about
two months,
one to two months,
yeah.
End to end,
how long was the whole process
from the,
I'm thinking this is a great thing,
we should do this,
to finishing the implementation
and everything,
how long did that take?
It was about one year.
Okay, yeah.
The whole process is about one year.
Awesome, cool.
So I guess, what's next for Memliner?
Where do you take the project next?
So my next project is about,
so basically Memliner, you know,
we focus on runtime thing.
And for my next project,
I want to like focus on the application itself, or know, we focus on runtime thing. And for my next project, I want to focus on the application itself.
Or I would say focus on the language.
Because I want to design some language types to make the file memory or make the remote memory highly efficient and also highly easier, or I would say easily to be used for programmers.
Okay.
So this would be something like, I guess,
some sort of abstraction that allows the developer to,
or the programmer to reason about the fact that this object
is potentially going to be stored in faraway memory
and kind of develop,
having the fact that you have that knowledge in your program,
being able to, I guess guess program using those primitives right
could really yeah could lead to a lot of really interesting yeah yeah I want so
basically the programs knows that it is using remote memory yeah yeah yeah
that's really interesting that that yeah I can see that been really useful for a
lot of applications I'm very intrigued to see what research you come up with that's
really cool and awesome so I guess like that's really interesting and really
interesting idea but can you tell the listeners about other research
you've worked on across your PhD and some of your other projects you know you
mentioned one earlier on I quite quite remember the name and they're kind of
precursor to Memliner maybe you can tell us a little bit about that.
So yeah, during my research, like before Memliner, I do have like worked on two other projects,
SAMRU and Mako. And these two kind of work are like some like very similar to each other.
The idea is like to just offload the GC, to offload the garbage collection to the remote
memory side. But for
SAMUEL, it focuses on the
throughput,
focuses on the performance.
But for the second one, not only the throughput,
we also focus on the
pause time of garbage collection.
You know, in runtime,
if we do some garbage collection,
sometimes we need to pause the application itself
and do some work and then resume the application.
But actually, we do not want the pause to be very long.
Otherwise, when you are using some applications,
okay, it just freezes and then it is not acceptable.
So we want some low-pause garbage collectors
and it imposes some other challenges to our settings. not acceptable no yeah so yeah so we want some like low pause graphic factors and like it in
like it imposes some other challenges to like our settings so like my work mako is to solve that
problem to still like uh give a high high throughput but low low pause characters yeah
cool that's awesome we'll put a link to all that work in the show notes as well
so the interested listener can go and can go and check that out thank you i'm interested in uh
trying to understand how you approach uh generating research ideas and how do you how do you then
select things to work on because obviously as you said before memliner for example took a year to
implement it's a big commitment of your time to take an idea from the initial
conception to the end. So I'd just like to
know more about your process for, one,
generating ideas, and then, two, selecting
ones to pursue.
So my...
So it's kind of a
personal experience, yeah, for me.
So my... As for my projects projects and ideas it's more like
so first i work on some projects uh samru and then like along with them i found that okay the
previous work has some limitations or we can do it better and then i okay I come up with this idea and then like implement.
But, but, but I have to say like my, my criteria to like select projects or ideas is to like,
it should be like interesting enough.
It should be like novel enough. Otherwise, you know, like if I like devote, devote to it for one year, but it's not a really novel thing,
then it
sometimes can be like...
I would
sometimes be sad
about my
projects.
But
if it is interesting enough, I would be like,
okay, I like this idea.
I do love this idea. I think it will be... It's going to work. i would be like okay i like this idea i do love this idea i think it
will be it's gonna work i'll be like confident yeah okay so you kind of get this sort of internal
sort of feeling about the project i believe this is novel and this kind of i guess the idea maybe
like calls to you i guess you could say and you can feel you're very passionate that's really
interesting that's really cool it's like i love this yeah you need to love it to then dedicate
a year of your life working on it, right?
You've got to have some faith in it, I guess.
Yeah.
So how did you end up researching in this area?
How did you decide on this was the sort of stuff you wanted to research?
So it's like, I would tell a story, actually, about it.
So basically, when I went to UCLA at the first year, it's 2019. And now the
professor, a postdoc told me that, okay, I have now three projects to work on because
I need to follow someone at first to know about like the research how the research is going yeah and like
they gave me like some project uh ideas i think one is about the uh like more like a machine
learning system and the other one is about like uh video analytics and third one is about this
like memory disaggregation or farm memory thing. And actually at that
time I know nothing about this
farm memory, but I do think it is very
innovative or
it is like a whole
new thing to me and I want to
know about it. I want to dig
into the area to know about it.
So I joined that project and then
I thought, okay, I love it.
This is interesting area.
And then after that, I continuously work on it.
Awesome.
Yeah, that's a nice story.
Yeah, it sounds like you found something that you're really interested in, which is awesome.
So what do you think is the biggest challenge in like far memory techniques at the minute?
What's the biggest challenge out there that needs tackling and needs solving?
I have to say, like personally, like, you know, different people have different opinions about the biggest challenge.
So for me personally, I would say the application semantics thing is currently the biggest challenging hour
in my research area.
Because like, you know,
if we, like Memline actually is some kind of,
to use some semantics in runtime
to improve the performance.
Like we know that in the runtime we have GC
and we use that knowledge to improve the performance. We know that in the runtime, we have GC, and we use that knowledge to improve
the performance of applications on desegregated memory.
And actually, if we can know more about the application,
we know what the application would do in the future,
or what the application would like,
or maybe what kind of data would be accessed by the application.
If we can know that, then actually the kernel could like first swap in the,
like actively swap in the data that is needed by the application.
But to make the kernel know that is very difficult
because only the application developers know
what the application would do in the future
or in the next several seconds or in the next several hours.
And if we can have a way to express that kind of application semantics
to convey the information to the kernel, then it would be highly beneficial.
But this part is very challenging challenging i have to say yeah that's
what i thought the biggest challenge in my research area fantastic that sounds amazing
and i guess this last question now so what is the one key thing you want the listeners of the
podcast to take away from your research i would say if you want to improve the performance
of applications on your platforms,
like you should not only focus on your platform.
You should also take a look at the application itself
or the intermediate between application and your platform
because sometimes that will be like highly beneficial
yeah amazing brilliant and let's end it there thanks so much for coming on the show if you're
interested in knowing more about horan's work we'll put links to all of the uh all of his papers
and whatnot on in the show notes and we will see you all next time for some more awesome computer
science research thanks again hor, Horan. Thank you. Thank you, Jack.