Disseminate: The Computer Science Research Podcast - Yazhuo Zhang | SIEVE is Simpler than LRU | #52

Starting point is 00:00:00 Hey everyone, Jack here from Disseminate the Computer Science Research Podcast. Welcome to another episode of our Cutting Edge series. Today we are going to be talking about caching. And on that journey with us today, we're going to have Yeqiu Zhang, who will be talking about her paper, SIV, Simple and Efficient Eviction Policy for Turnkey Web Cache Replacement. This was a paper that was recently published at NSDI and it actually won the community award. So congratulations for that, Yixiu. Recently,

Starting point is 00:00:50 she passed, she finished her PhD at Emory University in the Symbio Systems Lab, and she's going to be soon joining ETH Zurich for a postdoc. So welcome to the show, Yixiu. Thank you, Jack. Thanks for having me. Very happy to be part of the show. You're more than welcome. I'm really excited to learn something about caching today. So it's gonna be really interesting chat, I'm sure. Cool. So it's customary on the show for the guests to tell us a little bit more about themselves before we go on to the cool technical stuff. So yeah, tell us more about yourself and how you became interested in systems research yeah sure as you already introduced me i'm just defended my dissertation and finished my PhD

Starting point is 00:01:33 journey this may and i'm very excited about the joining eth in the future and about how i became interested in system research i i I personally always thought system research is the backbone of computer science. Of course, there are a lot of fascinating areas in computer science, but to me, system research is like a foundation. So just imagine every click on the website is supported by a huge amount of networked machines. And system researchers are working with every level of the whole infrastructure to support our daily digital life. This whole thing just feels so cool to me. And my research starts with understanding the cache performance in most of the initial work is very theory heavy. And then I realized I'm not that very good at theoretical study.

Starting point is 00:02:26 And the turning point was my internship at Twitter, where I got exposed to the real production systems. And the performance problems that exist in real systems really fascinated me. So starting from there, my work has been focusing on understanding the complexity of real systems and then improving their performance, especially in microservices and caching systems. Yeah. So do you think that your sort of grounding, your theoretical grounding really helped you when you actually then moved to the more implementation slash practical side of things yeah i think it helps even though i'm not a fan of that but i have to admit it's like the foundation like i for example for the cash work there are a lot of theoretical study i have done before and it really helps me in the future even though i actually not a fan of that i i cannot do very solid like proof or things like that so i just give up that path yeah i also agree with

Starting point is 00:03:28 what you were saying about systems being fascinating because they're at the bedrock of sort of maybe i'm getting a bit carried away but modern society in a way right because everyone's plugged into the digital world right whether it be it with the tvs everything every smartphones everyone's even washing machines everything's plugged into this sort of digital ecosystem and we're kind of we're working on the sort of on kind of ground zero for it right in a way and a lot of people often ask me kind of like oh what do you what do you do and i tell them and they kind of have a blank look on their faces and they're kind of like yeah that sounds boring as hell but i'm like yeah but every every time you go on your phone

Starting point is 00:04:01 you're probably using a database or something like that. And I'm like, without people like us, you couldn't do that, right? Exactly. Awesome. Cool. So let's talk about caching then. So let's set some context for the chat today. And let's tell, can you tell the listener essentially what caching is and what a web cache is? And I guess why we need them, right? Why are they important? Yeah, let's set the context for this talk context is akin and I think it's

Starting point is 00:04:32 safe to say that cache is everywhere and the caching in a simple word is just an optimization technique that makes things faster so in general a cache refers to an expensive medium like DRAM with limited space to temporarily store popular objects. So in terms of web caching here, we just restrict the starting of caching to web contents like HTML pages, like images and other types of web contents. So as I said, web caches are everywhere. So using web cache brings a lot of benefits. It can help reduce latency by caching frequent accessed data. Rather than fetching requested content from database, web cache can reduce the time it takes to fetch the requested resources. So this leads to faster page loads. And also, cached content does not have to be fetched repeatedly from the original server,

Starting point is 00:05:37 which reduces the amount of data transferred over the internet. So this is beneficial in reducing bandwidth costs. So overall, webcaches can help provide faster and more efficient and reliable web experiences. Yeah, I give the credit to cache for the efficient and reliable web experiences. Awesome. That's a great explanation there. So cool. With that sort of context set then, let's talk about your paper and your research on caching and web caching in SIF. So let's start off with, I guess, the elevator pitch for the work and the high-level goal for the paper. Yeah, sure. Yeah, I just mentioned that cache is important for web experience, but how should we design a cache? The essence of a cache is the eviction algorithm. It decides which object to evict when the cache is full.

Starting point is 00:06:28 So you have to evict something because you want to store new objects. So cache eviction algorithms have been started over six decades, starting with some very simple eviction algorithms, such as LRU, SAFE-O, and CLOCK, which were proposed in 1960s. And this area keeps evolving, especially in the past two decades, there are a lot of new algorithms proposed. But among all those algorithms, some focus on improving efficiency, some focus on improving throughput. we found that none of these algorithms are able to balance both efficiency and throughput.

Starting point is 00:07:10 It also was mentioning that cache eviction algorithms are getting more and more complex. For example, many newly proposed eviction algorithms use machine learning techniques to improve the cache efficiency. But this type of work usually brings a lot of overhead, which makes it hard to be actually deployed in production

Starting point is 00:07:32 because there are a lot of, like you need a lot of time to train the data and the games to make the cache efficient. But cache itself should be a very simple component within the whole infrastructure. So also complex algorithms are usually difficult to debug and maintain.

Starting point is 00:07:49 It will give like mixed engineers, it'd be a headache to the engineers. So facing all these challenges and problems in the cache study, the goal of our paper is to design an eviction algorithm that is simple, efficient, and scalable. So this is the goal of our paper awesome stuffy i know there's a quote in your paper and that's like and simplicity is like beautiful or there's beauty in simplicity i think maybe the exact quote and i have to i have to agree we're kind of maybe people have often fast to like kind of use it these complex techniques machine learning techniques where obviously it's all the rage to use ml and and and whatnot but yeah i like the principles of this approach for sure there are also

Starting point is 00:08:29 a few acronyms you mentioned in there fifo and liu do you maybe give us a quick sort of rundown of what those actually mean for the people not necessarily familiar with the acronyms yeah yeah yeah i should mention that i always forget i. I mispronounce every concept. Like, people are not in this area. Yeah. So, for CISO, it's just the first in, first out. When you have a queue, you insert an object at the head of the queue. And every time you need to evict, you evict the object as a tail of the queue. So, that's first in, first out.

Starting point is 00:09:01 Very simple. And for the LRU, it's least recently used eviction algorithm. So every time when you request the object, if it's already in the cache, you promote that object to the head of the queue. And every time there's a hit, you need to do the promotion. And when you do eviction, you directly evict the object at the tail of the queue. So that's the two very simple eviction algorithms. Okay, awesome. So let's talk about Civ then.

Starting point is 00:09:30 And what is Civ? Where does it fit into this sort of family of eviction algorithms or approaches for caching? And yeah, how does it work? Yeah, actually, I'm very glad you just let me explain what FIFO is because SIEV is just a just to make a very simple tweak on the basic FIFO queue. So SIEV is a new eviction algorithm. Compared to most of the current eviction algorithms, it is very simple. So let's about how it works. Let's start with the data structure. Compared to the basic FIFO queue, SIF has two main differences here.

Starting point is 00:10:19 First, SIF maintains a metadata for each object to indicate whether or not this object has been accessed before. We call it visited bit. And second, SIF has one extra pointer that moves between the tail and the head of the queue. And we call it sieve hand. So that's just the two extra things here. Now I can explain how sieve works. In case of a cache hit, it's like if we request an object that is already in the cache, sieve sets the object's visit bit to 1. That's all. And in the case of cache miss, which means the request object is not in the cache, then there are two steps.

Starting point is 00:10:55 We need to do eviction and insertion. For the eviction, we need to decide which object should be evicted. We start checking the object where sieve hand points. If the object's visit bit is 1, we reset the visited bit to 0 and move the sieve hand one step closer to the head of the queue. The sieve hand keeps moving until it finds an object having an object's visit bit being zero.

Starting point is 00:11:27 Then we can evict that object. If the sieve hand arrives at head, it will go back to tail and start scanning again. So the sieve hand is basically scanning this queue to find the object to evict. And after we evict it, for insertion, the new object is inserted at the head of the queue. So that's it. This is how SIEV works. And I think it's very simple, right? Yeah, yeah.

Starting point is 00:11:56 So I'm going to repeat it back to you now, see if I understood it. So we can have these two additional components over the vanilla FIFO queue. We have this metadata which is sort of every time and that object gets accessed we sort of increment the counter and be like yeah it's been visited it's been visited does that go as if i access it multiple times is that keep going one two three four five six no here it's only one just set it to one so it's a binary thing it's

Starting point is 00:12:21 always accessed on access and you've got this this civ this civ hand that's kind of going back and across the the the the queue and it's saying if it's if it's if it sees it been one i flip it back to zero and continue and it keeps repeating that until it sees zero and then when we see that it being zero that's when we would evict that um that object and then we can insert the next object yeah yeah yeah got it exactly my next question is is how did you arrive at this scheme what was the eureka moment that you thought huh this is going to be better than the normal approach yeah that's a very good question like uh uh actually i have to say this say, we just found this algorithm accidentally. Actually, I won't conclude that way because we were revisiting a lot of eviction algorithms.

Starting point is 00:13:15 And actually, I want to explain this in a bit different way. Rather than directly compare SIEP to other algorithms and how we are confident about the results, I think it's important to talk a little bit about the design principles here. And actually, it's a long journey here, how we arrived at the design of SIEV. So our whole team has been studying what makes cash eviction algorithm being efficient. And we have found lazy promotion and quick demotion are two important properties of efficient cash eviction algorithms. And also I just want to mention these studies have been published in HotOS 23 and SOSP 23. And welcome to Reddit. If you're interested in the details, here I'll just briefly introduce a concept

Starting point is 00:14:08 because this is the foundation of the SIEV design. And for LISI promotion, here it refers to the strategy of promoting cached objects only at eviction time. When you think about SIEV, you only do eviction, like you do not, eviction time. When you think about SEIF, you only do eviction. Actually, we do not do active promotion.

Starting point is 00:14:30 For example, for LRU, you promote every time you have a cache hit. But in SEIF, we don't do that. We only decide whether or not we promote this, keep these cache objects when we need to evict it. Also, maybe one other example is in the queue, the objects are inserted at the head of the

Starting point is 00:14:49 queue and evicted at the tail of the queue. We only decide whether or not we promote this object when we're about to evict it. So this is something we call lazy promotion. And the benefit of this lazy promotion is it can retain popular objects with minimal effort. Because you don't need to do it every time. You just save a lot of computation here. And it can improve throughput due to less computation. And it can also improve efficiency because we have more information about the objects.

Starting point is 00:15:22 But it stays long enough as a queue. So when we do the eviction, you have more information about it, because it stays long enough as a queue. So when we do the eviction, you have more information about it, whether or not it is popular. It's like how the visit beat is shown, is it popular or not? And this is a lazy promotion. And the other thing is,

Starting point is 00:15:37 efficient eviction algorithm not only needs to keep popular objects in the cache, it also needs to evict unpopular objects first. So we call it a quick demotion. So it is a fact that most objects should be quickly removed after they are inserted. If you think about this current web cache workloads, a lot of just one-hit wonders,

Starting point is 00:16:00 which means this object only accessed once after they are inserted to the cache. So let's just get rid of them as soon as possible. So this is what quick demotion means. So we found lazy promotion and quick demotion are two important properties for efficient cache

Starting point is 00:16:16 efficient algorithms. But most algorithms don't have these properties. If you think about it, like FIFO doesn't do any promotion at all, and there's no quick demotion. But for L about it, like FIFO doesn't do any promotion at all, and there's no quick demotion. But for LRU, similarly, no quick demotion. It's just also they are doing eager promotion. And there are some other algorithms, either they have lazy promotion or some of them have quick demotion, but none of them achieve both both and sys is the simplest design we have found achieves both

Starting point is 00:16:46 lazy promotion quick motion with this very simple data structure so it can achieve very competitive efficiency and throughput based on these two properties so i think that may be a better answer to why sys works yeah so i kind of have a few things bouncing around in my mind and while you were giving that answer there hit you and that one the first thing i want to say is we'll put links to those references you mentioned there in the show notes so the list can go and check those out as well so they can quickly have a quick handy link to go and to go and find that but my next question was going to be that these these sort of two principles make kind of intuitive sense right and they obviously you you've demonstrated them with Civ that they do improve efficiency and throughput. You said that

Starting point is 00:17:31 like LIU has kind of one of these properties, but not the other. Is it easy to extend some of these existing other approaches to adapt them in a way to have both of these properties? It's not that easy to directly change their mechanism because LRU by default is you promote everything to the head of queue because every time you have a hit of this object, it means it's popular. Just put it to the head of queue and make it stay longer in the queue, right? And this is contrary to the lazy promotion already, right? And for the quick demotion, you have to, like, based on what LRU holds here, you cannot decide how to quickly evict some unpopular objects. So they just evict things at the tail because they compared to the objects close to the head. Those are less popular. They just evict it.

Starting point is 00:18:24 But there is no quick demotion at all. Every object has to traverse the whole queue to get evicted. Right. Okay. Yeah. Yeah. So it's really hard to directly modify certain eviction algorithms to make it achieve both laser promotion and quick demotion. And we found there's a similar algorithm called CLOCK, and also favorable insertion. There are two different implementations of the

Starting point is 00:18:52 same eviction algorithm, but the CLOCK is implemented in a ring buffer, and favorable insertion is just a single Q. And this algorithm has the most similarity compared to us, but they have the laser promotion because every time they need to do an eviction, they check the tail of the cue. If it's objects with a bit, it's their role to evict it.

Starting point is 00:19:17 It's why it promotes to the head of the cue. That's all. And we made changes by adding one C hand moving in the queue to do the quick demotion. I think if we want to do the comparison, we would really compare to the clock. This is the most close, we share the most similarity, but they don't have, these algorithms don't have quick demotion. Okay, cool. Yeah, I kind of, maybe we can get into this specific question i'm going to ask in a moment later on in a bit more depth when we talk about sort of limitations and scenarios performance

Starting point is 00:19:50 might be so multiple but i want to ask i mean probably all caches suffer from this right but if you if if the if you're if the sieve cache is all the items in there are really really hot right and the sieve hand is kind of in this situation where every time it sweeps across all the objects and it decrements them all back to zero, by the time it starts to do it again, everyone's already at one again. I mean, that feels like maybe something you can't avoid, but is that a potential problem?

Starting point is 00:20:18 Or is it like just in that scenario, get a bigger cash? Or like, what's the way around it? It feels like a isn't it probably an extreme case but yeah how would that handle be handled i think you already get one solution you can have a bigger cash that's the simple solution in in production right and it wouldn't hurt to just add a few capacity here and But the problems you are asking here is very meaningful. And we were thinking it basically can be classified to one direction we will be working on,

Starting point is 00:20:51 is the adversary workload to sieve algorithm. And the scenario you just mentioned could be one possibilities. We have to look at, because we currently have, we test our algorithm on production traces, and also we test on artificial Zephyr distribution data set that we created. So both of them, we are the best among all these art algorithms. But there exists some extreme point we are worse than others. Most of the time we are good, but there are some cases we're not that good.

Starting point is 00:21:29 Sometimes we are worse. So we are working on these small cases, like try to classify, like give the list of what traces specifically are not good at when you use sieve. So for example, the scenario you're mentioning, we actually haven't found that in the robotics. Every time if it's reset to zero, next time you can evict something. It will not be immediately set

Starting point is 00:21:53 everything to one. But there are some cases like the sieve hand was at the head of queue, but it just stuck there. And everything inserted into the cache just throw away. So it just got stuck at And everything inserted into the cache just threw away. So it just got stuck as the head of the queue.

Starting point is 00:22:09 That's one worst case we found when we do the analysis. For web workloads, we don't have that extreme cases a lot. So that's why we're good at most of the cases. Yeah, well, let's talk about that some more then. So let's talk about evaluation. And tell us, there's talk about that some more then. So let's talk about evaluation. And tell us,

Starting point is 00:22:28 with two things here. The first of all is like, tell us about the implementation and how you went about implementing Civ and how easy that was to do and what system you implemented it in. And then what your approach was to evaluating it and what did you compare it against?

Starting point is 00:22:41 Yeah, great question. So for Civ, initially we just implemented in our simulator, we have a report of a lot of cache eviction algorithms and a lot of set of complicated algorithms. And we also implement SIEV there and do the large scale of analysis across like over like there are five public data sets we collected and there are two preparatory cdn data set and we evaluate all of them across i think over 20 different state of art algorithms but we show we show half of them in the paper because of the limits of the test. And this is the simulation part, but we also implemented SIEF in popular cache libraries. And we just searched the most popular cache libraries

Starting point is 00:23:36 from five different languages, C++, Go, JavaScript, Python, and Rust. And we found those libraries, and we replaced their RRU with CIF. And then we also use that to test, we use these libraries to test the miss ratio and see if, to prove that CIF actually works. And especially we use the cache lib,

Starting point is 00:23:59 which is written in C++ and developed by Meta. And this is a general-purpose caching engine, so we use this and we replace the LRU. We see when we test the throughput by this platform. And we show that our... So in terms of the throughput result, we actually achieved much better throughput. It had six times improvement

Starting point is 00:24:24 compared to their pure LRU implementation. And also, it's just worth mentioning that MITRE has a lot of engineers who spend a lot of energy to optimize their pure LRU because of the locking issue. And they have some improvement, but if they change to C,

Starting point is 00:24:43 they can easily achieve better throughput that's awesome cool yeah so i mean i kind of on that question there on that point really quick there about this been sort of you implemented it in a lot of open source libraries in various different languages is that now available for people to use can can i go and download the latest caching library for rust whatever that be, and use that in my projects? Yeah, of course. And we have the repo called CacheMall that has all the repos we did for this project.

Starting point is 00:25:15 Besides that, there are, thanks to the open source community, people are just so good there. They know these algorithms, they implemented it. So right now, I think I haven't come so far, but before I was presenting this paper in SDI, I know there are over 20 different

Starting point is 00:25:38 implementations in GitHub and I think over 10 different programming languages. So if people want to try out SIEV algorithm, it's easy for them to find a library and plug into their system. We have a website called sievcache.com and we list a lot of the most of these libraries we found. But if people have more findings,

Starting point is 00:26:02 please just submit a PR and we can merge to the website. Awesome stuff. Yeah, we'll put a link to that in the show notes as well. So if there's a missing language out there, you've been told, go and implement it in your favorite language and submit a PR, right?

Starting point is 00:26:15 So yeah, let's get full coverage. Who's going to write in COBOL? Cool. So I have a question that I normally ask around impact and how is it how how is a software developer engineer i can leverage the findings in in in the work that you've that you've done and what impact you think civ can have longer term and well short term and long term but i think we've just kind of spoke about that there right that's some serious short-term impacts and i guess i

Starting point is 00:26:43 guess the next question what impact do you think you can have it can have long term do you see it being used widely in sort of production applications maybe it is already i'm not sure do you have much information on that in production we we actually don't know we know meta is very interested in this work because we already implemented in their cash leave and for for other companies, I'm not sure because this is so simple. And I think if people just want to try it, they won't let us know.

Starting point is 00:27:10 They don't need our help. It's just implemented. So I actually think it would be better like we hear from people who are using this algorithm. And I know one of the SIEV users shared some news in Twitter. They report that after switching their cache algorithm to SIEV users shared some news on Twitter. They report that after switching their cache algorithm to SIEV, they observed a 30 times cache usage reduction and 16% CPU saving.

Starting point is 00:27:36 So this is good news to us. And that's just one user report. They love to share this stuff. But we are expecting more of these reports. Either it's good or not. We just want to see how it is, like how SIEM is behaving in the real world. And it'd be good to know that. And also, while we were discussing SIEM

Starting point is 00:27:57 with some researchers and engineers, we do know there are some limitations. For example, because the SIEM hand, we cannot implement it in a ring buffer. So if companies who are using their cache implemented by a ring buffer, sieve might not be a good choice for you. So this is just one limitation we are aware. But in real system, there might be a lot of other stuff, other issues. So we would love to hear any problems when they are trying to adopt sieve so we can see

Starting point is 00:28:26 what we can do what we can help with that and can we mix this algorithm even better great so yeah so let's talk about sort of future directions and so you mentioned earlier on in the podcast about exploring adversarial workloads what else is on the on the on the agenda for taking this, taking SIV forward? Yeah, yeah. Basically, as I mentioned, one direction is we're interested in classifying the adversary workloads to SIV. And the other direction is the proof about why SIV is effective.

Starting point is 00:28:59 So it is good to understand the theory behind this very effective algorithm. So we are thinking about, so I'm not sure you're familiar with the Biladi algorithm. It is an offline cash eviction algorithm. It's basically, you know the future, you know what's going to be accepted in the future. So you know what's the best miss ratio you can get for cash. So that's an offline algorithm. But we were thinking, no one is doing offline,

Starting point is 00:29:26 right? For the real system, we are all doing online. So it's possible to find the upper bound for the online algorithm. So this is what we're thinking. Is this SIV achieving the upper bound of the online algorithm? So this is the one direction we are thinking. And the third direction is just that we love to collaborate with the people in industry and to understand all the limitations in the real world and to learn more about these experiments and help us improve this algorithm better. Awesome.

Starting point is 00:29:59 So if you're going back to your theoretical roots there, you've gone full circle. You've come around either side of the ring buffer yeah yeah yeah when i said that i was thinking about the same thing awesome stuff cool yeah so my my next question for you is and maybe i can maybe preempt this a little bit given sort of how this work was found how you came across discovering this sort of work. But I want to ask you, what is the most surprising thing you've learned while working with SIV and on caching? A very good question.

Starting point is 00:30:33 I have also a very simple answer to this. It's starting with simple approach. I was surprised to see that SIV is so good. We actually verified very carefully to make sure the results are reliable, right? Because it's just so simple. How could it compete a lot of like fancy algorithms, like a lot of queues and complicated design?

Starting point is 00:30:59 How could this happen? So yeah, we actually, that's why we very carefully checked the results. So, but all the jokes aside, I think starting with a simple approach and finding the essence of your research problems and what I'm doing, what all the analysis I'm doing for, and try to extract some essence of that and start with a very simple approach that it might surprise you. And nothing has to be like, like I think sometimes it's very important to understand the,

Starting point is 00:31:41 the logic behind it, rather than putting all the data together and put it into the black box. And like, you don't know what's happening here, but from here, see, even though we found it accidentally, but we have some theories, like principles that support this algorithm. That's something I think is important

Starting point is 00:32:00 for our system researchers. Yeah, yeah, I completely agree with you that that's a great answer to that question cool my next question for you you show is that it's about creative being being creative the creativity right in the creative process and how are you about how you personally go about sort of generating ideas and then once you've generated them like selecting which ones to work on, right? So do you have a systematic approach for that? Yeah, that's a very good question. I'm actually not sure I can give constructive suggestions here.

Starting point is 00:32:33 I believe we all have different approaches and what I think here is more about a retrospect and the summary of my past experience. I know in the past I made a lot of mistakes. I wish I didn't do that. So people hear this talk, try to avoid that. So like you mentioned the creative process, I actually want to start with the project selection, and then I can talk about idea generation. To me, I feel this is the right order. And first of all, I would choose the problems that really make me excited. Sometimes I feel it's really just the instinct.

Starting point is 00:33:13 You know, you will know it immediately if you are truly interested in something. Like at the beginning of our chat, I mentioned I used to work on theoretical analysis of cash performance. And to be a little bit specific, it's more about the mathematical study of cash performance calculation. I do some misracial curve calculation. I spent tons of time working on formulas and thinking of popularity theory. And now when I look back to that time, I feel like I was forcing myself to start it. I wasn't fulfilled.

Starting point is 00:33:50 But I was very lucky to intern at Twitter at the Crossroads during my PhD when Yao, my manager, introduced the multiple potential projects I could do over that summer. I forgot the rest of the projects she proposed, but I remember two. One is about caching stuff.

Starting point is 00:34:12 Yeah, she mentioned that I can do some data structure design in Redis. That's one project. And the other one is service migration problem if Twitter build another data center. You guess which one I chose? I think I got it, maybe, yeah. Yeah, yeah.

Starting point is 00:34:31 Actually, when I heard that two options, other options I forgot, I know immediately that I want to do the service migration. Even though I knew nothing about microservices at that time, and not to mention I need to use distributed tracing data. I had no clue what that is. But that problem fascinated me. And on the contrary, the first problem probably would be much easier for me. And that should be a comfort zone.

Starting point is 00:35:01 But I can feel what's more intriguing to me. So I love working on more practical problems. That's the specific problems you can explore. But I just learned a lot of stuff at that time. Like I can remember all the things I was trying to learn. That's just very fresh. Of course, it's just a personal preference. But I just feel I got excited. So I won't feel like this is something I don't want to do so I can spend a lot of time working on that and don't feel I'm so tired I've got exhausted so also I understand the PhD students have limited time to explore their interests so save your time I found my research interests basically at the end of my second year.

Starting point is 00:35:46 So which means I was restarting the whole new research direction in the middle of my PhD. That was very stressful indeed. So at the first two years, one suggestion is exposing yourself to many areas. That would be beneficial to find your own like like your true true interest interest so i think it's very important to keep the passion and really love what you do yeah this is about the project selection just to face down what i really like and next i'll be talking about the ideal generation right you mentioned that about that i think now I think back, I feel like reading papers, of course, and reading technical blogs, latest tech news, and knowing what has

Starting point is 00:36:35 been done and what is happening is important. And also talk to people. I think it's just when you observe a lot of information, you're kind of back-processing them. Like someday you will have the moment you feel there are a lot of things I could try. I know I have that moment, but very later on. For example, the intro project I did at Twitter, I developed two call latencies which can estimate the end-to-end latency if some services are delayed. So these two mainly capture causal dependency between services.

Starting point is 00:37:09 Even though it was designed to estimate latency, I feel like it has more potential, but I don't know what it is. So last year, Amazon had a blog talking about how they found the monolithic design actually has much better performance than microservices design. Okay, I know immediately that this tool can help me to explore this idea, and I can extend this tool and to evaluate whether we really need a microservices architecture. So this is something we did before, and we keep in mind and we understand what we are doing. And now you're

Starting point is 00:37:45 exposing yourself to new problems you keep track of the latest news especially from the industry here some new ideas pop up so i think that could be very interesting experiments and it's not boring at all right yeah yeah no i love that i love the answer to that question i like what you're saying about having that breadth not depth exposure at the start get exposed to kind of a lot of different ideas so you can kind of find the one that really resonates with you which then makes working on it easier right like if you if you love what you do you never work a day in your life right as the same so it goes and also sort of that sort of like a three-pronged approach or maybe there was a fourth as well like i've been been reading stuff, reading tech news, keeping up to date,

Starting point is 00:38:26 but more importantly, talking to people and getting exposed and kind of venerating that. And setting a background thread off on something and you never know when at some point in the future, you're going to put two and two together and go, huh, yeah, that could work. So yeah, no, I think that's a fantastic answer to that question, Yucho.

Starting point is 00:38:41 Awesome. So Yucho, so could you tell us a little bit more about the papers? We spoke about it at the top could you tell us a little bit more about the papers? We spoke about it at the top of the show, a little bit about how this kind of idea came around, but can you tell us more about the courage and the story behind the paper? Yeah, sure. I really love to talk about this agenda here. I can share the backstory of SEIF. And here I would say SEIF is a culmination of years of dedicated effort. This is a teamwork, and I will give a lot of credit to my collaborators.

Starting point is 00:39:13 They are all cash experts working on cash starting for over 10 years. Before SIEV was invented, Jin Cheng, who is the co-first author, he wasn't here, has been maintaining a cache simulator since 2016. And it's now called libcache theme, implemented in C++, and it has like 20 different eviction algorithm implementations. And Jin Cheng and I both interned at Twitter before. Our manager Yao is also an experimental engineer. She has a very deep understanding of how CEEV should be designed,

Starting point is 00:39:50 sorry, how cache should be designed to meet production requirements. And also my advisor, Yiming, and Jingcheng's advisor, Rashmi, are experts in theoretical study. So we were the whole team on the path of understanding what makes cache eviction algorithm efficient. So SIEM was discovered

Starting point is 00:40:10 while we were revisiting past old algorithms. So in the past, we have collected a lot of cache traces and it's just natural for us to analyze and evaluate things. So now we present this very simple design, but we actually tried multiple different versions. For example, we tried different hand movement strategies to the head, to the middle of the cue, to the, and so on. And with a lot of effort on understanding sieve cue,

Starting point is 00:40:39 we finally present sieve as what it is now. So it's a long journey, even though it's very simple, but actually it has a lot of people's effort in the past decades, even though I wasn't even there back then. Yeah, it's crazy. You just see the sort of the paper appearing, the proceedings, and then you don't always see the years of research and the culmination of effort that's gone into it and the collaboration. But yeah, that's awesome i mean there's there's no there's no teamwork makes the dream work right so like it's a bit of a cliche saying but yeah it's awesome to see that it was the product of some collaboration and years of of hardware but it's definitely been worth it in the end you know it's

Starting point is 00:41:17 a really awesome contribution to the space cool well we've come to the time for the last word now so um what's the one takeaway you want the listener to get from this podcast episode today? Yeah, one takeaway about this work. So I would say SIF is a simplest eviction algorithm that achieves both laser promotion and quick demotion. So you can use it to add a plugin cache for all your web applications. It's simple, efficient, and scalable. Yeah, welcome to try it.

Starting point is 00:41:54 Yeah, you should go and try it, listener. You've been told. I will put links to all that in the show notes. Thank you so much for speaking to me today, Yachou. It's been a fascinating chat and I've learned a lot and I'm sure the listener will have as well. Best of luck, your postdoc at ETHic as well i'm sure you're gonna keep producing fantastic work and i look forward to keeping my eye on it as well and seeing how things

Starting point is 00:42:12 progress for you over of your career so awesome stuff thank you very much whereabouts can we find you on social media on like x or twitter wherever it's called these days and linkedin where the listener can connect with you if they want. Thank you, Jack. And thanks for the invitation here. I'm very happy to talk with you and share my story and share the work. And people can always find me on X and LinkedIn

Starting point is 00:42:37 and they can just email me. They will find my email address on my personal website. So I'm just easy to approach. Fantastic. Thank you so much, Yisho. And yeah, I guess we'll see you all next time for some more awesome computer science research. Bye.

CODACE Plant Stand

Disseminate: The Computer Science Research Podcast - Yazhuo Zhang | SIEVE is Simpler than LRU | #52

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

Disseminate: The Computer Science Research Podcast - Yazhuo Zhang | SIEVE is Simpler than LRU | #52

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.