Disseminate: The Computer Science Research Podcast - Haoran Ma | MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime | #18

Starting point is 00:00:00 Hello and welcome to Disseminate, a computer science research podcast. I'm your host, Jack Wardby. I'm delighted to say I'm joined today by Horan Ma, who will be talking about his OSDI 22 paper, MemLiner, lining up tracing and application for a far memory-friendly runtime. Horan is a PhD student in the Department of Computer Science at UCLA, and his research has recently focused on systems. Horan, welcome to the show. Hi, thank you. Thank you for inviting me. It's our pleasure. Let's dive straight in then. So can you start off by explaining to the listeners what farm memory techniques are and why they're appealing to use in data centers? Yes, of course. So basically, far memory means that there

Starting point is 00:01:08 is one level of memory that is lower than currently widely used local DRAM. I can give some examples. For example, non-volatile memory could be regarded as one kind of far memory. Another example could be using RDMA, remote direct memory access. It is like there are two servers. The host server is usually equipped with very strong cores and some amount of memory. It can use local memory as some kind of cache and it accesses remote memory through some kind of special high-speed network,

Starting point is 00:01:46 such as I just said, RDMA over InfiniBand. But actually, accessing data on remote memory is still slower than accessing data in local memory. That's why we call it far memory. It's slower, so it's far. Yeah. And actually, it is the most widely used in data centers because of um currently the data centers uh have some memory capacity bottleneck problem

Starting point is 00:02:17 um that problem is like uh so basically the growing of processor computation now is actually faster than the growing of memory capacity, especially in recent years. So we need a way to still increase the memory capacity. So we have this kind of far memory techniques. And at the same time, the memory in data centers is often actually underutilized. Okay. So it causes a big resource waste.

Starting point is 00:02:48 But if we have this kind of farm memory techniques, we can utilize the free memory on one machine to satisfy the memory need on another machine. And in this way, we can, like, increase the resource utilization in data centers. That's why this kind of techniques are, like, appealing to data centers. That's why kind of techniques are appealing to data centers. Awesome. So how do these farm memory systems that exist at the moment,

Starting point is 00:03:14 how do they currently operate? And what aspects are there to them that are crucial to achieving good performance in a data center? Yeah. So basically, currently, there are two main ways to use the far memory systems. One way is that we let users or we let programmers

Starting point is 00:03:35 to fully control the system. It's like the programmers specify what kind of data should stay on the remote memory or stay on the far memory and what kind of data should stay on the remote memory or stay on the farm memory? And what kind of data should stay on the local memory? This is one way to use that. The users can use some API or some interfaces to control that.

Starting point is 00:03:54 And another way to that, to make these farm memory systems transparent to programmers, it can like so basically the family systems act as some kind of swap system as like some cache the local memory is some kind of cache and the remote memory is kind of like the main memory and when the local memory is not enough the

Starting point is 00:04:19 like the family systems can like just do the swapping and swap out to swap out some like code data and swap in the, the data it needs to access automatically. And, and I think currently the second way is the most popular one because, you know, the programs that not need to care about it,

Starting point is 00:04:41 care about the details and it's, it can still like achieve pretty good performance. Though, actually, I think if the programmers can control everything, of course it can achieve the best performance, but it would be highly complicated. That's really interesting. So unlike Azure or AWS now, is there an API for me to go and actually,

Starting point is 00:05:03 you said a user can actually get full control of the placement of data. Is that actually exposed to users at the minute? Can I just log on to AWS and I have that API available for me? I think the API is kind of low level.

Starting point is 00:05:19 Pretty low level. We can install some drivers and use that driver to control everything to maybe flash some data to remote side and fetch some data to the local side. So the programmers take care of a lot of details to do that. And that's not very... But there are a lot of like can't work to like simplify

Starting point is 00:05:46 that but like the programmers still need to specify a lot of things okay cool cool so you say the second one's probably the most common way of operating at the moment I know in your paper you said that you identify that in that sort of set up garbage collection is a

Starting point is 00:06:02 real problem can you kind of elaborate on why it is a problem? Yes. So, basically, I would like to start from the cloud applications itself. So, cloud applications currently are most, I would not say most, I will say like a lot of, to be like accurate, a lot of cloud applications are written in managed languages like Java.

Starting point is 00:06:35 Yeah. And in Java, it has some kind of managed runtime to like manage the memory the application uses. And inside the runtime, the way to manage the memory the application uses. And inside the runtime, the way to manage the memory is to do some garbage collection, to collect the garbage and to reclaim the unused space

Starting point is 00:06:54 and to let the application use it. And for modern GC algorithms, it usually uses some kind of tracing and evacuation algorithm. Tracing means it starts from the root. So basically, when we are writing applications, we know that we have stack, we have heap, and the root means the stack variables. We start from the stack and to trace it from the root and then mark every object we can trace. So it is pretty like a graph traversal. And the object we marked is alive. And for other objects that we did not mark, it's dead. Because, you know, from the root, we cannot access it. Then it means like we will never use it.

Starting point is 00:07:46 We have no way to access it in the future. So this is what GC is like. But this GC thing has a big problem. Why? Because it is, as I just said, it is like a graph traversal. So it's like it has very little locality. And the locality thing is very crucial to the performance in memory systems.

Starting point is 00:08:16 Because if the locality is good, then actually the prefetching and the catch and swap, it is beneficial to that kind of catch-and-swap system. And so we can see the tracing has very bad locality. And the tracing itself, like the GC itself, will also compete with the applications for resources. So we know that if the tracing needs to trace some objects, it needs to firstly swap in it and trace it, and then swap it out.

Starting point is 00:08:54 And the application itself also needs to do that. When it is being executed, it needs to swap the data it needs to use, and then swap out it. So these two things will concurrently compete with each other for a lot of resources like the network bandwidth,

Starting point is 00:09:15 the RDMA bandwidth, and also the local memory capacity. And so the GC would highly interfere with the application itself. And actually, the GC would highly interfere with the application itself. And actually, the GC would also interfere with the application in another way. It is about the prefetching. So as we know, in traditional swap systems, we like do some prefetching. If we can identify the access pattern,

Starting point is 00:09:45 we can like prefetch some continuous objects like in the following way. And it can help improve the performance. But because we now have the tracing, we also have the application. These two kind of like access behaviors are very different and they will like interfere with each other and then the prefetcher may not be as useful as before.

Starting point is 00:10:13 If we do not have tracing, actually, maybe the prefetcher can help prefetch some objects for the application itself. Okay, cool. So I guess let's talk about Memliner then. So what is Memliner? I guess it's going to address this problem. So let's hear about it.

Starting point is 00:10:33 Yeah. So Memliner is a runtime technique solving the GC interference problem on disaggregated memory or on farm memory. It mainly lines up the tracing in Gc and application itself to improve cloud applications performance on far memory. I want to describe it like this. Okay, awesome.

Starting point is 00:10:58 So I guess we've kind of touched on it a little bit when we spoke about the problem that it's designed to solve. But what were your design goals when you were developing Memliner? Yeah, so I would say there are two main design goals for us. The first one is that we do want great performance. So the performance is the most important one, the most important goal that we have. And the second goal is that we still want a highly decoupled runtime from kernel.

Starting point is 00:11:31 We do not want to make the kernel itself and the runtime itself highly coupled because, you know, as a programmer, we want some decoupled things to make it better. Yeah, so it's easier to reason about i guess right yeah okay cool so what were trying

Starting point is 00:11:52 in achieving these design goals what were sort of the key ideas that underpinned memliner to yeah as i said to to to achieve these these goals so there so we have two observations that motivate our design of Memliner. And so the first observation is that we do find that application and GC are not completely unaligned.

Starting point is 00:12:18 So we find that the application and GC are just temporarily unaligned. Why? Because the live objects are actually just temporarily unaligned. Why? Because the live objects actually traced by the GC itself are mostly accessed by the application at some point during the execution

Starting point is 00:12:34 because the GC traces it, it is alive, so it will be highly possible for the application to access it in the future. And the objects accessed by the application must be live objects. Yeah, because it is being accessed, so it is live, and it should also be the, like, targets for the GC.

Starting point is 00:12:54 So basically, these two kind of, like, object sets are not, like, highly decoupled or, like, highly unrelated. They are, like, actually related to each other. And the second observation we make is that changing the object access order in GC is possible. For the application, we cannot change

Starting point is 00:13:18 its behaviors because it is specified by the programmers, by the application developers. But for GC, inside the runtime, by the application developers. But for GC, inside the runtime, we can control it. We can trace the objects in different orders to make them related to each other.

Starting point is 00:13:36 Okay, nice, nice. So what were the sort of, in kind of putting these ideas into practice, what were the challenges that you had to overcome and how did you go about how would you go about doing this yeah um as for the challenges i would say um so as i said memliners uh memliner it lines up the tracing and the application. So I would say for the biggest challenge here is that how to line them up. Yeah.

Starting point is 00:14:09 And so what we do is to classify the objects into three categories, local objects, incoming objects, and distant objects. For local objects, we mean these kind of objects are currently being accessed by application threads. So for this kind of objects, we mean these kind of objects are currently being accessed by application threads. So for these kind of objects, the GC threads should touch them at once. And for incoming objects, it means the objects are still in remote memory, in far memory, but it will soon be accessed by application threads. And for this kind of objects,

Starting point is 00:14:46 the GC threads should also touch at once because it will be used in the near future. And if the GC threads touch them, it is being swapped in, but it has no harm because in the future, the application still needs to fetch them to the local environment.

Starting point is 00:15:04 And the third category is the distant objects. It means that it is in remote memory, but it will not be accessed by application threads soon. And for that kind of objects, the GC threads should actually delay the tracing of them, delay the access of them. That's how we align them up. But inside this process,

Starting point is 00:15:25 there are still some some kind of like small challenges. For example, how to inform the GC threads, the objects that are currently being accessed. And also, like, how do we identify the objects that will be accessed by the application thread soon? And the third, the third, like a small challenge is that how to estimate the location of objects so that we know what kind of objects are distant objects and what kind of objects

Starting point is 00:15:51 are not distant objects. Yeah. And like to make it simpler. So basically for the first challenge, how to inform GC threads, the objects that are currently being accessed, we basically used the barrier mechanism

Starting point is 00:16:07 inside the Java runtime. And I would say that's the simplest answer. And for the second challenge, like what kind of objects will be accessed by application-style soon, we just regard the objects that are a few steps away from the objects that are currently being accessed. We regard those kinds of objects

Starting point is 00:16:33 as the objects that are going to be accessed. And as for the second thing, how to estimate the location of objects, it would be kind of more complicated than the previous two because we need to still use the kernel mechanism to estimate if the object is currently being, I'm sorry, if the object is currently

Starting point is 00:17:03 in local memory or not. So it is like more, this kind of work is more heavily relied on the kernel side. Yeah. Okay, cool. So you say you have these three categories there with the different types. Would it benefit from having, we can maybe touch on this later on when we talk about the evaluation and and whatnot but it just jumped into my mind like are these three why did you settle on having three categories would you benefit from having more or does that make it more complex or was this naturally just these are the three obvious categories things would fit into

Starting point is 00:17:39 yeah so basically the actually the three object the three kind of objects are categorized by our needs. Because we need to line up the tracing and the application. And so we need to know what kind of objects should be traced at once. And what kind of objects should not be traced at once. And what kind of objects should not be traced at once. It should be delayed. So the criteria we classify different objects is based on that. Should we

Starting point is 00:18:13 let the GC start to trace them now or we should not trace them at once. We should delay them. That's the criteria. Okay, cool. So I guess how do you actually go about implementing this then? Can you talk us through your implementation we should delay them. Yeah, that's the criteria. Okay, cool. So I guess my next question is, how did you actually go about implementing this then?

Starting point is 00:18:27 So can you talk us through your implementation and how that looks like and guess how easy it was to implement this, right? Maybe there's some war stories there of how it was challenging or whatnot, but yeah. In terms of the implementation, I would say it's mostly like to hack the Jumbo Virtual Machine. Okay. Yeah, it's mostly to hack the Java virtual machine. Okay. Yeah, it's kind of

Starting point is 00:18:48 painful because I have to say there are not a lot of materials about the Java virtual machine implementation, so it would be painful at first. But yes, I know more about

Starting point is 00:19:04 the Java virtual machine. It is more smooth. And we do implement our technique in two different garbage factors in the openJDK in the Java virtual machine. And I would say it's a great experience. Okay. Actually, as for the implementation, after I implement our technique in the first character, it is easier for me to implement in another one.

Starting point is 00:19:40 Okay. Yeah, because they do have some kind of similarities to each other yeah so i see how big are the changes like how big are the things you need to implement here is it quite a sizable change or is it yes it's not like some little changes to be like um inside the jvm i forgot the exact number actually but it should be at least like several thousands of lines of code. Okay, so it's non-trivial then. Yeah. Okay, cool. Cool.

Starting point is 00:20:15 So yeah, I guess my next question is, how does Memliner compare with other contemporary solutions? What else is out there in the space that kind of compares and tries to solve a similar problem? And how does Memliner compare against this? Yeah. So for contemporary work, there are two kinds of them. So the first kind is to like improve the kernel, to like make the kernel swap system better,

Starting point is 00:20:41 to let it like swap in and swap out faster. This can also like improve the performance of applications. But as I just said, the memliner is a runtime technique to solve the GC interference problem. And so this kind of work is very different from each other. Or I would say the work that improves the kernel can also be utilized like underneath the memliner because like I would say our work is built upon them. We can use their soft system.

Starting point is 00:21:17 But there are kind of actually another kind of work or I would say like some of my previous work are in that category. It's like we want, like we sometimes still solve the GC interference problem, but we want like to not line them up.

Starting point is 00:21:36 We want to offload the GC part to remote memory side. It's like another way to solve this GC interference problem. But memliners' advantage is that it is easy to use because if we offload the GC to remote side, the kernel and the two machines are highly coupled with each other. And we also need some kernel runtime co-design to make them highly efficient to swap in and swap out. I would say Memliner's advantage is

Starting point is 00:22:18 you can easily use it on every swap system. I see. I guess this leads naturally into my next question as well, and it's how did you go about evaluating Memliner? And what was the key question you were going to ask? And what did your evaluation experiments look like? Yeah. So for evaluation, we basically compare memliners. So the biggest evaluation part

Starting point is 00:22:49 in our paper is to compare memliners with unmodified JDK on different swap systems, I would say on the same swap system. And we compare the performance of using MAMLiner and using unmodified JDK on a range of different applications. I remember we compared them on 12 applications. It's a huge application pool.

Starting point is 00:23:19 And we compared its throughput to see how MAMLiner is faster than another one on modified GDK. And besides, we also do some experiments to see how MAMLiner can help improve the prefetching effectiveness. Because I just said, the tracing itself can interfere with the application to destroy the access pattern of the applications that can be identified by the kernel itself. But if we line them up, actually we think the access pattern of the applications

Starting point is 00:23:57 should be now clear to the kernel swap system. So we did some experiments to check that if the prefetching now is more effective than before. Okay, cool. I just wanted to jump back a sec to these applications. Can you just briefly get, I know this is a big application pool, so I'm guessing it covered a wide range of different types, but can you maybe give us a flavor of some of the applications that were used in the evaluation? Yeah, so the application is more like the cloud applications that are currently being widely used

Starting point is 00:24:33 in data centers. For example, the Spark itself. We evaluated three different Spark applications and we also evaluated Memlander on Neo4j and Cassandra. So basically, they are mostly big cloud applications. From your experiments, what were the key results and what were the highlights?

Starting point is 00:24:58 I think our highlights is that we improved the cloud applications by an average of 1.5 times speed up than the unmodified JDK on the same swap system. Yeah, that would be like the most highlight thing is like improve the performance and do not harm it as something else. Yeah. That's the headline. That's really cool.

Starting point is 00:25:24 So maybe we can dig into the to the to the results a little bit more then so like what are some of the other things that were interesting that you found from your experiments maybe we can talk about like some something about the uh like the prefetching uh prefetching effectiveness because i because i just talked about it yeah for the prefetching effectiveness, we evaluate memliner and unmodified GTK on two different swap systems. Or maybe I can say two different prefetchers. We can see that for both the prefetching accuracy and prefetching coverage, for both of those two metrics, the Memliner metrics are higher than on Modified GDK

Starting point is 00:26:08 on both two swap systems. And I think that's also another highlight result. Yeah, it's a really interesting result, for sure. Are there any situations in when, because at the moment it sounds like it's pretty much a free lunch, Memliner,

Starting point is 00:26:24 that we can always use it to get better performance. Are there any scenarios or any applications where the performance is suboptimal? I guess what I'm asking here is what are the limitations of Memliner? Yeah, so basically Memliner still has some little overhead, but in, but it's, like, in most cases, it's negligible. And, but, like, as I introduced, the memliner lines

Starting point is 00:26:53 up tracing in, like, lines up tracing in GSC and application itself. So, but, but, you know, like, in some cases, the application may not use that amount, that large amount of memory. It might not trigger the concurrent tracing in GC.

Starting point is 00:27:13 And in that case, like Memline has no use to it. Memline, it is not useful. It can only like have some like very small overhead, though not big, but it is still some small overhead. And it cannot help improve the performance of applications because it does not trigger the concurrent tracing in GSC.

Starting point is 00:27:36 So in that case, Memliner, I would say, is not useful. What is the magnitude of the overhead? You said it's really small, but can you maybe give us an example in terms of an application on some numbers

Starting point is 00:27:49 of how large the overhead is in this scenario? So for the memory amount, so basically, I would take the G1GC as an example. So G1GC is the default garbage collector in OpenJDK. And the mechanism to trigger a concurrent tracing phase in G1GC is that now the used memory has reached some threshold,

Starting point is 00:28:19 for example, 60% of the whole heap. But the G1GC, it also has some kind of nursery GC to continuously reclaim some memory, though it cannot reclaim a whole dead memory, but it still can reclaim some memory. So if the nursery

Starting point is 00:28:37 GC is already enough to keep up with the applications, to keep up with the memory need of the application itself then it does not need to like trigger the uh concurrent tracing like the memory amount the memory usage never reaches up to like 60 percent or 50 percent that's right then actually it's not triggered and so the memliner is not used useful in that case awesome so i guess it it depends on like one of my questions i was going to ask you is as a software developer and whatnot how do i go about using memliner but i guess it requires me to to kind of understand

Starting point is 00:29:19 and profile my application and see whether this would benefit me. Okay, sure. So I'm guessing it's publicly available, right? Yes, it's open source on GitHub, yeah. Okay, is there any sort of long-term plan of getting it merged into some version of the JDK or is it always going to stay as pretty much a research kind of prototype? So currently our plan is to make it more like a research prototype.

Starting point is 00:29:47 But we do want to push it to OpenJDK, but it would be pretty difficult. It is not easy work because we need to make it highly robust to use in every corner case. Otherwise, just pushing it to OpenDK would be like not so good. Okay, cool. You know, OpenDK is widely used around the world. Yeah, of course. If it wasn't

Starting point is 00:30:15 100%, it could result in a lot of problems, right? Okay, cool. Right, so I guess once you've been working on this project, what do you think has been the most interesting lesson you've learned? Maybe something that kind of caught you off guard

Starting point is 00:30:31 that you weren't really expecting to discover across this project. I think the most interesting thing in Memliner is how we come up with this idea. Okay, yeah. Yeah, because actually before Memliner, we have a work that is called SAMRU. And the SAMRU is kind of like the idea

Starting point is 00:30:53 of offloading garbage collection to a remote memory site. But when we were developing the SAM rule, we found that why cannot we just like to have a tool that can just be used on a

Starting point is 00:31:17 single machine without like to offload some computation to another one. And then we come up with this memliners idea to implement it, to develop it, and to get it published. I think

Starting point is 00:31:34 that's the most interesting part because the memliners I have to say memliners idea is the idea of a memliner is the most important thing. Right, okay. But as for the implementation it's just like you know just implementation yeah it consumes the most the most amount of time but it's just yeah that's always the case right yeah um cool so i guess leading on from that as you as you

Starting point is 00:31:59 will know like the progress in in research is very up and down it's very non-linear things there's lots of ups and downs along the way, along the journey. So from the initial conception of the idea for Memliner to the publication at OSDI, can you tell us more about that journey and were there things along the way that you tried that didn't succeed

Starting point is 00:32:21 that maybe the listener could benefit from knowing about? Yes. Yes, yes actually there are many this kind of things but um um i was i would give uh like a example like an example of it like at first like when we like want to like inform the application thread what kind of objects that are currently being accessed, our initial idea is to capture the call sites inside the program. If the program calls one function, there is a call site,

Starting point is 00:33:02 and in the call site, there would be some parameters. So at first, we want to instrument the cosites to capture the parameters and regard these parameters as some kind of rules to trace from them. And this idea does not work in our final implementation or in our final paper. Because, so firstly, the instrumentation itself is too heavy. It incurs overhead, like significant overhead.

Starting point is 00:33:41 So it is not acceptable. And also like the call size itself is not, like, the complete set of the objects that are currently being accessed. Yeah, it's, like, because, like, if you have a very long function and it has no, like, internal call sites, then we might not capture any objects during that execution time.

Starting point is 00:34:11 And it is not good in our scenario. So finally, we give up that idea, and then we go to use the barrier thing. So how far along, how much time did you spend with trying to instrument things before realizing this isn't going to work,

Starting point is 00:34:28 this is just so much of a performance overhead, let's rethink it? Yeah, it's about two months, one to two months, yeah.

Starting point is 00:34:37 End to end, how long was the whole process from the, I'm thinking this is a great thing, we should do this, to finishing the implementation and everything, how long did that take?

Starting point is 00:34:46 It was about one year. Okay, yeah. The whole process is about one year. Awesome, cool. So I guess, what's next for Memliner? Where do you take the project next? So my next project is about, so basically Memliner, you know,

Starting point is 00:35:01 we focus on runtime thing. And for my next project, I want to like focus on the application itself, or know, we focus on runtime thing. And for my next project, I want to focus on the application itself. Or I would say focus on the language. Because I want to design some language types to make the file memory or make the remote memory highly efficient and also highly easier, or I would say easily to be used for programmers. Okay. So this would be something like, I guess, some sort of abstraction that allows the developer to,

Starting point is 00:35:34 or the programmer to reason about the fact that this object is potentially going to be stored in faraway memory and kind of develop, having the fact that you have that knowledge in your program, being able to, I guess guess program using those primitives right could really yeah could lead to a lot of really interesting yeah yeah I want so basically the programs knows that it is using remote memory yeah yeah yeah that's really interesting that that yeah I can see that been really useful for a

Starting point is 00:36:02 lot of applications I'm very intrigued to see what research you come up with that's really cool and awesome so I guess like that's really interesting and really interesting idea but can you tell the listeners about other research you've worked on across your PhD and some of your other projects you know you mentioned one earlier on I quite quite remember the name and they're kind of precursor to Memliner maybe you can tell us a little bit about that. So yeah, during my research, like before Memliner, I do have like worked on two other projects, SAMRU and Mako. And these two kind of work are like some like very similar to each other.

Starting point is 00:36:40 The idea is like to just offload the GC, to offload the garbage collection to the remote memory side. But for SAMUEL, it focuses on the throughput, focuses on the performance. But for the second one, not only the throughput, we also focus on the pause time of garbage collection.

Starting point is 00:37:00 You know, in runtime, if we do some garbage collection, sometimes we need to pause the application itself and do some work and then resume the application. But actually, we do not want the pause to be very long. Otherwise, when you are using some applications, okay, it just freezes and then it is not acceptable. So we want some low-pause garbage collectors

Starting point is 00:37:24 and it imposes some other challenges to our settings. not acceptable no yeah so yeah so we want some like low pause graphic factors and like it in like it imposes some other challenges to like our settings so like my work mako is to solve that problem to still like uh give a high high throughput but low low pause characters yeah cool that's awesome we'll put a link to all that work in the show notes as well so the interested listener can go and can go and check that out thank you i'm interested in uh trying to understand how you approach uh generating research ideas and how do you how do you then select things to work on because obviously as you said before memliner for example took a year to implement it's a big commitment of your time to take an idea from the initial

Starting point is 00:38:06 conception to the end. So I'd just like to know more about your process for, one, generating ideas, and then, two, selecting ones to pursue. So my... So it's kind of a personal experience, yeah, for me. So my... As for my projects projects and ideas it's more like

Starting point is 00:38:28 so first i work on some projects uh samru and then like along with them i found that okay the previous work has some limitations or we can do it better and then i okay I come up with this idea and then like implement. But, but, but I have to say like my, my criteria to like select projects or ideas is to like, it should be like interesting enough. It should be like novel enough. Otherwise, you know, like if I like devote, devote to it for one year, but it's not a really novel thing, then it sometimes can be like... I would

Starting point is 00:39:11 sometimes be sad about my projects. But if it is interesting enough, I would be like, okay, I like this idea. I do love this idea. I think it will be... It's going to work. i would be like okay i like this idea i do love this idea i think it will be it's gonna work i'll be like confident yeah okay so you kind of get this sort of internal

Starting point is 00:39:31 sort of feeling about the project i believe this is novel and this kind of i guess the idea maybe like calls to you i guess you could say and you can feel you're very passionate that's really interesting that's really cool it's like i love this yeah you need to love it to then dedicate a year of your life working on it, right? You've got to have some faith in it, I guess. Yeah. So how did you end up researching in this area? How did you decide on this was the sort of stuff you wanted to research?

Starting point is 00:39:55 So it's like, I would tell a story, actually, about it. So basically, when I went to UCLA at the first year, it's 2019. And now the professor, a postdoc told me that, okay, I have now three projects to work on because I need to follow someone at first to know about like the research how the research is going yeah and like they gave me like some project uh ideas i think one is about the uh like more like a machine learning system and the other one is about like uh video analytics and third one is about this like memory disaggregation or farm memory thing. And actually at that time I know nothing about this

Starting point is 00:40:47 farm memory, but I do think it is very innovative or it is like a whole new thing to me and I want to know about it. I want to dig into the area to know about it. So I joined that project and then I thought, okay, I love it.

Starting point is 00:41:03 This is interesting area. And then after that, I continuously work on it. Awesome. Yeah, that's a nice story. Yeah, it sounds like you found something that you're really interested in, which is awesome. So what do you think is the biggest challenge in like far memory techniques at the minute? What's the biggest challenge out there that needs tackling and needs solving? I have to say, like personally, like, you know, different people have different opinions about the biggest challenge.

Starting point is 00:41:38 So for me personally, I would say the application semantics thing is currently the biggest challenging hour in my research area. Because like, you know, if we, like Memline actually is some kind of, to use some semantics in runtime to improve the performance. Like we know that in the runtime we have GC and we use that knowledge to improve the performance. We know that in the runtime, we have GC, and we use that knowledge to improve

Starting point is 00:42:06 the performance of applications on desegregated memory. And actually, if we can know more about the application, we know what the application would do in the future, or what the application would like, or maybe what kind of data would be accessed by the application. If we can know that, then actually the kernel could like first swap in the, like actively swap in the data that is needed by the application. But to make the kernel know that is very difficult

Starting point is 00:42:37 because only the application developers know what the application would do in the future or in the next several seconds or in the next several hours. And if we can have a way to express that kind of application semantics to convey the information to the kernel, then it would be highly beneficial. But this part is very challenging challenging i have to say yeah that's what i thought the biggest challenge in my research area fantastic that sounds amazing and i guess this last question now so what is the one key thing you want the listeners of the

Starting point is 00:43:20 podcast to take away from your research i would say if you want to improve the performance of applications on your platforms, like you should not only focus on your platform. You should also take a look at the application itself or the intermediate between application and your platform because sometimes that will be like highly beneficial yeah amazing brilliant and let's end it there thanks so much for coming on the show if you're interested in knowing more about horan's work we'll put links to all of the uh all of his papers

Starting point is 00:43:56 and whatnot on in the show notes and we will see you all next time for some more awesome computer science research thanks again hor, Horan. Thank you. Thank you, Jack.

Disseminate: The Computer Science Research Podcast - Haoran Ma | MemLiner: Lining up Tracing and Application for a Far-Memory-Friendly Runtime | #18

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.