Disseminate: The Computer Science Research Podcast - Yazhuo Zhang | SIEVE is Simpler than LRU | #52
Episode Date: May 13, 2024In this episode, we explore the world of caching with Yazhuo Zhang, who introduces the game-changing SIEVE algorithm. Traditional eviction algorithms have long struggled with a trade-off between effic...iency, throughput, and simplicity. However, SIEVE disrupts this balance by offering a simpler alternative to LRU while outperforming state-of-the-art algorithms in both efficiency and scalability for web cache workloads. Implemented in five production cache libraries with minimal code changes, SIEVE's superiority shines through in a comprehensive evaluation across 1559 cache traces. With up to a remarkable 63.2% lower miss ratio than ARC and surpassing nine other algorithms in over 45% of cases, SIEVE's simplicity doesn't compromise on scalability, doubling throughput compared to optimized LRU implementations. Join us as Yazhuo reveals how SIEVE is set to redefine caching efficiency, promising faster and more streamlined data serving in production systems.Links:SIEVE is Simpler than LRU: an Efficient Turn-Key Eviction Algorithm for Web Caches (NSDI'24)FIFO Queues are All You Need for Cache Eviction (SOSP'23)Yazhuo's homepageYazhuo's LinkedInYazhuo's Twitter/XCachemon/SIEVE's websiteS3FIFO website Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hey everyone, Jack here from Disseminate the Computer Science Research Podcast.
Welcome to another episode of our Cutting Edge series.
Today we are going to be talking about caching.
And on that journey with us today, we're going to have Yeqiu Zhang,
who will be talking about her paper,
SIV, Simple and Efficient Eviction Policy for Turnkey Web Cache Replacement.
This was a paper that was recently published at
NSDI and it actually won the community award. So congratulations for that, Yixiu. Recently,
she passed, she finished her PhD at Emory University in the Symbio Systems Lab,
and she's going to be soon joining ETH Zurich for a postdoc. So welcome to the show, Yixiu.
Thank you, Jack. Thanks for having me.
Very happy to be part of the show. You're more than welcome. I'm really
excited to learn something about caching today. So it's gonna be really interesting chat, I'm sure.
Cool. So it's customary on the show for the guests to tell us a little bit more about themselves
before we go on to the cool technical stuff. So yeah, tell us more about yourself and how you became interested in systems research
yeah sure as you already introduced me i'm just defended my dissertation and finished my PhD
journey this may and i'm very excited about the joining eth in the future and about how i became
interested in system research i i I personally always thought system research
is the backbone of computer science. Of course, there are a lot of fascinating areas in computer
science, but to me, system research is like a foundation. So just imagine every click on the
website is supported by a huge amount of networked machines. And system researchers are working with every level of the whole infrastructure to support our daily digital life.
This whole thing just feels so cool to me.
And my research starts with understanding the cache performance in most of the initial work is very theory heavy.
And then I realized I'm not that very good at theoretical study.
And the turning point was my internship at Twitter, where I got exposed to the real production
systems. And the performance problems that exist in real systems really fascinated me.
So starting from there, my work has been focusing on understanding the complexity of real systems and then improving their performance, especially in microservices and caching systems.
Yeah. So do you think that your sort of grounding, your theoretical grounding really helped you when you actually then moved to the more implementation slash practical side of things yeah i think it helps even though i'm not a fan
of that but i have to admit it's like the foundation like i for example for the cash
work there are a lot of theoretical study i have done before and it really helps me in the future
even though i actually not a fan of that i i cannot do very solid like proof or things like
that so i just give up that path yeah i also agree with
what you were saying about systems being fascinating because they're at the bedrock of
sort of maybe i'm getting a bit carried away but modern society in a way right because
everyone's plugged into the digital world right whether it be it with the tvs everything every
smartphones everyone's even washing machines everything's plugged into
this sort of digital ecosystem and we're kind of we're working on the sort of on kind of ground
zero for it right in a way and a lot of people often ask me kind of like oh what do you what do
you do and i tell them and they kind of have a blank look on their faces and they're kind of
like yeah that sounds boring as hell but i'm like yeah but every every time you go on your phone
you're probably using a database or something like that. And I'm like, without people like us, you couldn't do that, right?
Exactly.
Awesome. Cool.
So let's talk about caching then.
So let's set some context for the chat today.
And let's tell, can you tell the listener essentially what caching is and what a web cache is?
And I guess why we need them, right?
Why are they important? Yeah, let's set the context for this talk context is akin and I think it's
safe to say that cache is everywhere and the caching in a simple word is just an optimization
technique that makes things faster so in general a cache refers to an expensive medium like DRAM
with limited space to temporarily store popular objects. So in terms of web caching here,
we just restrict the starting of caching to web contents like HTML pages, like images and other types of web contents. So as I said, web caches are everywhere.
So using web cache brings a lot of benefits. It can help reduce latency by caching frequent
accessed data. Rather than fetching requested content from database, web cache can reduce the time it takes to fetch the requested resources.
So this leads to faster page loads.
And also, cached content does not have to be fetched repeatedly from the original server,
which reduces the amount of data transferred over the internet.
So this is beneficial in reducing bandwidth costs.
So overall, webcaches can help provide faster and more efficient and reliable web experiences. Yeah, I give the credit to cache for the efficient and reliable web experiences.
Awesome. That's a great explanation there. So cool. With that sort of context set then,
let's talk about your paper and your research on caching and web caching
in SIF. So let's start off with, I guess, the elevator pitch for the work and the high-level
goal for the paper. Yeah, sure. Yeah, I just mentioned that cache is important for web
experience, but how should we design a cache? The essence of a cache is the eviction algorithm. It decides which object to evict when the cache is full.
So you have to evict something because you want to store new objects.
So cache eviction algorithms have been started over six decades,
starting with some very simple eviction algorithms,
such as LRU, SAFE-O, and CLOCK, which were proposed in 1960s.
And this area keeps evolving, especially in the past two decades, there are a lot of new
algorithms proposed. But among all those algorithms, some focus on improving efficiency,
some focus on improving throughput. we found that none of these algorithms
are able to balance both efficiency and throughput.
It also was mentioning that
cache eviction algorithms are getting more and more complex.
For example, many newly proposed eviction algorithms
use machine learning techniques to improve the cache efficiency.
But this type of work
usually brings a lot of overhead,
which makes it hard
to be actually deployed in production
because there are a lot of,
like you need a lot of time
to train the data and the games
to make the cache efficient.
But cache itself
should be a very simple component
within the whole infrastructure.
So also complex algorithms are usually difficult to debug and maintain.
It will give like mixed engineers, it'd be a headache to the engineers.
So facing all these challenges and problems in the cache study, the goal of our paper
is to design an eviction algorithm that is simple, efficient, and scalable.
So this is the goal of our paper
awesome stuffy i know there's a quote in your paper and that's like and simplicity is like
beautiful or there's beauty in simplicity i think maybe the exact quote and i have to i have to agree
we're kind of maybe people have often fast to like kind of use it these complex techniques
machine learning techniques where obviously it's all the rage to use ml and and and whatnot but yeah i like the principles of this approach for sure there are also
a few acronyms you mentioned in there fifo and liu do you maybe give us a quick sort of rundown of
what those actually mean for the people not necessarily familiar with the acronyms yeah yeah
yeah i should mention that i always forget i. I mispronounce every concept.
Like, people are not in this area.
Yeah.
So, for CISO, it's just the first in, first out.
When you have a queue, you insert an object at the head of the queue. And every time you need to evict, you evict the object as a tail of the queue.
So, that's first in, first out.
Very simple.
And for the LRU, it's least recently used eviction algorithm.
So every time when you request the object, if it's already in the cache, you promote that object to the head of the queue.
And every time there's a hit, you need to do the promotion.
And when you do eviction, you directly evict the object at the tail of the queue.
So that's the two very simple eviction algorithms.
Okay, awesome.
So let's talk about Civ then.
And what is Civ?
Where does it fit into this sort of family of eviction algorithms or approaches for caching?
And yeah, how does it work?
Yeah, actually, I'm very glad you just let me explain what FIFO is because SIEV is just a
just to make a very simple tweak on the basic FIFO queue. So SIEV is a new eviction algorithm.
Compared to most of the current eviction algorithms, it is very simple. So let's
about how it works. Let's start with the data structure.
Compared to the basic FIFO queue, SIF has two main differences here.
First, SIF maintains a metadata for each object to indicate whether or not this object has been accessed before.
We call it visited bit.
And second, SIF has one extra pointer that moves between the tail and the head of the queue. And we call it sieve hand. So that's just the two extra things here. Now I can explain
how sieve works. In case of a cache hit, it's like if we request an object that is already in the cache, sieve sets the object's visit bit to 1.
That's all.
And in the case of cache miss,
which means the request object is not in the cache,
then there are two steps.
We need to do eviction and insertion.
For the eviction, we need to decide which object should be evicted.
We start checking the object where sieve hand points.
If the object's visit bit is 1,
we reset the visited bit to 0
and move the sieve hand one step closer to the head of the queue.
The sieve hand keeps moving until it finds an object
having an object's visit bit being zero.
Then we can evict that object.
If the sieve hand arrives at head, it will go back to tail and start scanning again.
So the sieve hand is basically scanning this queue to find the object to evict. And after we evict it, for insertion,
the new object is inserted at the head of the queue.
So that's it.
This is how SIEV works.
And I think it's very simple, right?
Yeah, yeah.
So I'm going to repeat it back to you now,
see if I understood it.
So we can have these two additional components
over the vanilla FIFO queue.
We have this metadata which is
sort of every time and that object gets accessed we sort of increment the counter and be like yeah
it's been visited it's been visited does that go as if i access it multiple times is that keep going
one two three four five six no here it's only one just set it to one so it's a binary thing it's
always accessed on access and you've got this
this civ this civ hand that's kind of going back and across the the the the queue and it's saying
if it's if it's if it sees it been one i flip it back to zero and continue and it keeps repeating
that until it sees zero and then when we see that it being zero that's when we would evict that um that object and then we can insert the next
object yeah yeah yeah got it exactly my next question is is how did you arrive at this scheme
what was the eureka moment that you thought huh this is going to be better than the normal approach
yeah that's a very good question like uh uh actually i have to say this say, we just found this algorithm accidentally.
Actually, I won't conclude that way because we were revisiting a lot of eviction algorithms.
And actually, I want to explain this in a bit different way.
Rather than directly compare SIEP to other algorithms and how we are confident about the results, I think it's important to talk a little bit about the design principles here.
And actually, it's a long journey here, how we arrived at the design of SIEV.
So our whole team has been studying what makes cash eviction algorithm
being efficient. And we have found lazy promotion and quick demotion are two important properties
of efficient cash eviction algorithms. And also I just want to mention these studies have been
published in HotOS 23 and SOSP 23. And welcome to Reddit. If you're interested in the details,
here I'll just briefly introduce a concept
because this is the foundation of the SIEV design.
And for LISI promotion,
here it refers to the strategy of promoting cached objects
only at eviction time.
When you think about SIEV,
you only do eviction,
like you do not, eviction time. When you think about SEIF, you only do eviction.
Actually, we do not do active promotion.
For example, for LRU, you
promote every time you have a cache hit.
But in SEIF, we don't do that.
We only decide whether or not we
promote this, keep these cache
objects when we need to
evict it. Also, maybe
one other example is in the queue, the objects are inserted at the head of the
queue and evicted at the tail of the queue.
We only decide whether or not we promote this object when we're about to evict it.
So this is something we call lazy promotion.
And the benefit of this lazy promotion is it can retain popular objects with minimal effort.
Because you don't need to do it every time.
You just save a lot of computation here.
And it can improve throughput due to less computation.
And it can also improve efficiency because we have more information about the objects.
But it stays long enough as a queue.
So when we do the eviction, you have more information about it, because it stays long enough as a queue. So when we do the eviction,
you have more information about it,
whether or not it is popular.
It's like how the visit beat is shown,
is it popular or not?
And this is a lazy promotion.
And the other thing is,
efficient eviction algorithm
not only needs to keep popular objects in the cache,
it also needs to evict unpopular objects first.
So we call it a quick demotion.
So it is a fact that most objects
should be quickly removed after they are inserted.
If you think about this current web cache workloads,
a lot of just one-hit wonders,
which means this object only accessed once
after they are inserted to the cache.
So let's just get rid of
them as soon as possible. So this
is what quick demotion means.
So we found lazy
promotion and quick demotion are two important
properties for efficient cache
efficient algorithms. But
most algorithms don't have
these properties. If you think about it,
like FIFO doesn't do any
promotion at all, and there's no quick demotion. But for L about it, like FIFO doesn't do any promotion at all, and there's
no quick demotion. But for LRU, similarly, no quick demotion. It's just also they are doing
eager promotion. And there are some other algorithms, either they have lazy promotion
or some of them have quick demotion, but none of them achieve both both and sys is the simplest design we have found achieves both
lazy promotion quick motion with this very simple data structure so it can achieve very competitive
efficiency and throughput based on these two properties so i think that may be a better
answer to why sys works yeah so i kind of have a few things bouncing around in my
mind and while you were giving that answer there hit you and that one the first thing i want to
say is we'll put links to those references you mentioned there in the show notes so the list
can go and check those out as well so they can quickly have a quick handy link to go and to go
and find that but my next question was going to be that these these sort of two principles make
kind of intuitive sense right and they obviously you you've demonstrated them with Civ that they do improve efficiency and throughput. You said that
like LIU has kind of one of these properties, but not the other. Is it easy to extend some of these
existing other approaches to adapt them in a way to have both of these properties?
It's not that easy to directly change their mechanism because LRU by default is you promote everything to the head of queue because every time you have a hit of this object, it means it's popular.
Just put it to the head of queue and make it stay longer in the queue, right?
And this is contrary to the lazy promotion already, right? And for the quick demotion, you have to, like, based on what LRU holds here, you cannot decide how to quickly evict some unpopular objects.
So they just evict things at the tail because they compared to the objects close to the head.
Those are less popular.
They just evict it.
But there is
no quick demotion at all. Every object has to traverse the whole queue to get evicted.
Right. Okay. Yeah.
Yeah. So it's really hard to directly modify certain eviction algorithms to make it achieve
both laser promotion and quick demotion. And we found there's a similar
algorithm called CLOCK, and
also favorable insertion. There are two
different implementations of the
same eviction algorithm,
but the CLOCK is
implemented in a ring buffer, and favorable
insertion is just a single Q.
And this algorithm
has the most similarity
compared to us, but they have the laser promotion because every time they need to do an eviction, they check the tail of the cue.
If it's objects with a bit, it's their role to evict it.
It's why it promotes to the head of the cue.
That's all.
And we made changes by adding one C hand moving in the queue to do the quick demotion.
I think if we want to do the comparison, we would really compare to the clock.
This is the most close, we share the most similarity, but they don't have, these algorithms don't have quick demotion.
Okay, cool.
Yeah, I kind of, maybe we can get into this specific question i'm going to ask in a
moment later on in a bit more depth when we talk about sort of limitations and scenarios performance
might be so multiple but i want to ask i mean probably all caches suffer from this right but
if you if if the if you're if the sieve cache is all the items in there are really really hot right
and the sieve hand is kind of in this situation where every time it sweeps across all the objects
and it decrements them all back to zero,
by the time it starts to do it again,
everyone's already at one again.
I mean, that feels like maybe something you can't avoid,
but is that a potential problem?
Or is it like just in that scenario, get a bigger cash?
Or like, what's the way around it?
It feels like a
isn't it probably an extreme case but yeah how would that handle be handled i think you already
get one solution you can have a bigger cash that's the simple solution in in production right
and it wouldn't hurt to just add a few capacity here and But the problems you are asking here is very meaningful.
And we were thinking it basically can be classified
to one direction we will be working on,
is the adversary workload to sieve algorithm.
And the scenario you just mentioned
could be one possibilities.
We have to look at, because we currently have,
we test our algorithm on production traces, and also we test on artificial Zephyr distribution data set that we created.
So both of them, we are the best among all these art algorithms.
But there exists some extreme point we are worse than others.
Most of the time we are good, but there are some cases we're not that good.
Sometimes we are worse.
So we are working on these small cases, like try to classify, like give the list of what
traces specifically are not good at when you use sieve.
So for example, the scenario you're mentioning, we actually haven't found
that in the robotics.
Every time if it's reset to zero,
next time you can evict something.
It will not be immediately set
everything to one.
But there are some cases
like the sieve hand
was at the head of queue,
but it just stuck there.
And everything inserted into the cache
just throw away. So it just got stuck at And everything inserted into the cache just threw away.
So it just got stuck as the head of the queue.
That's one worst case we found
when we do the analysis.
For web workloads,
we don't have that extreme cases a lot.
So that's why we're good at most of the cases.
Yeah, well, let's talk about that some more then.
So let's talk about evaluation.
And tell us, there's talk about that some more then. So let's talk about evaluation. And tell us,
with two things here.
The first of all is like,
tell us about the implementation
and how you went about implementing Civ
and how easy that was to do
and what system you implemented it in.
And then what your approach was to evaluating it
and what did you compare it against?
Yeah, great question.
So for Civ, initially we just implemented in our simulator, we have a report of a lot
of cache eviction algorithms and a lot of set of complicated algorithms.
And we also implement SIEV there and do the large scale of analysis across like over like there are five public data sets we collected and there are
two preparatory cdn data set and we evaluate all of them across i think over 20 different state of
art algorithms but we show we show half of them in the paper because of the limits of the test.
And this is the simulation part,
but we also implemented SIEF in popular cache libraries. And we just searched the most popular cache libraries
from five different languages,
C++, Go, JavaScript, Python, and Rust.
And we found those libraries,
and we replaced their RRU with CIF.
And then we also use that to test,
we use these libraries to test the miss ratio
and see if, to prove that CIF actually works.
And especially we use the cache lib,
which is written in C++ and developed by Meta.
And this is a general-purpose caching engine,
so we use this and we replace the LRU.
We see when we test the throughput by this platform.
And we show that our...
So in terms of the throughput result,
we actually achieved much better throughput.
It had six times improvement
compared to their pure LRU implementation.
And also, it's just worth mentioning
that MITRE has a lot of engineers
who spend a lot of energy
to optimize their pure LRU
because of the locking issue.
And they have some improvement,
but if they change to C,
they can easily achieve better throughput
that's awesome cool yeah so i mean i kind of on that question there on that point really quick
there about this been sort of you implemented it in a lot of open source libraries in various
different languages is that now available for people to use can can i go and download the
latest caching library for rust whatever that be, and use that in my projects?
Yeah, of course.
And we have the repo called CacheMall
that has all the repos we did for this project.
Besides that, there are, thanks to the open source community,
people are just so good there.
They know these algorithms, they implemented it.
So right now,
I think I haven't come so far,
but before I was presenting
this paper in SDI,
I know there are over 20 different
implementations in GitHub
and I think over 10 different
programming languages.
So if people want to try out SIEV algorithm,
it's easy for them to find a library and plug into their system.
We have a website called sievcache.com
and we list a lot of the most of these libraries we found.
But if people have more findings,
please just submit a PR and we can merge to the website.
Awesome stuff.
Yeah, we'll put a link to that
in the show notes as well.
So if there's a missing language out there,
you've been told,
go and implement it in your favorite language
and submit a PR, right?
So yeah, let's get full coverage.
Who's going to write in COBOL?
Cool.
So I have a question
that I normally ask around impact and how is it how how is a software
developer engineer i can leverage the findings in in in the work that you've that you've done
and what impact you think civ can have longer term and well short term and long term but i think we've
just kind of spoke about that there right that's some serious short-term impacts and i guess i
guess the next question what impact do
you think you can have it can have long term do you see it being used widely in sort of production
applications maybe it is already i'm not sure do you have much information on that in production
we we actually don't know we know meta is very interested in this work because we already
implemented in their cash leave and for for other companies, I'm not sure
because this is so simple.
And I think if people just want to try it,
they won't let us know.
They don't need our help.
It's just implemented.
So I actually think it would be better
like we hear from people who are using this algorithm.
And I know one of the SIEV users
shared some news in Twitter. They report that after switching their cache algorithm to SIEV users shared some news on Twitter.
They report that after switching their cache algorithm to SIEV,
they observed a 30 times cache usage reduction and 16% CPU saving.
So this is good news to us.
And that's just one user report.
They love to share this stuff.
But we are expecting more of these reports. Either it's good or not.
We just want to see how it is,
like how SIEM is behaving in the real world.
And it'd be good to know that.
And also, while we were discussing SIEM
with some researchers and engineers,
we do know there are some limitations.
For example, because the SIEM hand,
we cannot implement it in a ring buffer.
So if companies who are using their cache implemented by a ring buffer, sieve might not be a good choice for you.
So this is just one limitation we are aware.
But in real system, there might be a lot of other stuff, other issues.
So we would love to hear any problems when they are trying to adopt sieve so we can see
what we can do what we can help with that and can we mix this algorithm even better great so yeah
so let's talk about sort of future directions and so you mentioned earlier on in the podcast about
exploring adversarial workloads what else is on the on the on the agenda for taking this, taking SIV forward?
Yeah, yeah.
Basically, as I mentioned,
one direction is we're interested in classifying the adversary workloads to SIV.
And the other direction is the proof
about why SIV is effective.
So it is good to understand the theory
behind this very effective algorithm.
So we are thinking about, so I'm not sure you're familiar with the Biladi algorithm.
It is an offline cash eviction algorithm.
It's basically, you know the future, you know what's going to be accepted in the future.
So you know what's the best miss ratio you can get for cash.
So that's an offline algorithm.
But we were thinking, no one is doing offline,
right? For the real system, we are all doing online. So it's possible to find the upper bound
for the online algorithm. So this is what we're thinking. Is this SIV achieving the upper bound
of the online algorithm? So this is the one direction we are thinking. And the third direction is just that we love to collaborate
with the people in industry
and to understand all the limitations in the real world
and to learn more about these experiments
and help us improve this algorithm better.
Awesome.
So if you're going back to your theoretical roots there,
you've gone full circle.
You've come around
either side of the ring buffer yeah yeah yeah when i said that i was thinking about the same thing
awesome stuff cool yeah so my my next question for you is and maybe i can maybe preempt this a
little bit given sort of how this work was found how you came across discovering this sort of work. But I want to ask you, what is the most surprising thing you've learned
while working with SIV and on caching?
A very good question.
I have also a very simple answer to this.
It's starting with simple approach.
I was surprised to see that SIV is so good.
We actually verified very carefully
to make sure the results are reliable, right?
Because it's just so simple.
How could it compete a lot of like fancy algorithms,
like a lot of queues and complicated design?
How could this happen?
So yeah, we actually,
that's why we very carefully checked the results.
So, but all the jokes aside, I think starting with a simple approach and finding the essence of your research problems and what I'm doing, what all the analysis I'm doing for,
and try to extract some essence of that and start with a very simple
approach that it might surprise you.
And nothing has to be like,
like I think sometimes it's very important to understand the,
the logic behind it,
rather than putting all the data together
and put it into the black box.
And like, you don't know what's happening here,
but from here, see, even though we found it accidentally,
but we have some theories, like principles
that support this algorithm.
That's something I think is important
for our system researchers.
Yeah, yeah, I completely agree with you that
that's a great answer to that question cool my next question for you you show is that
it's about creative being being creative the creativity right in the creative process
and how are you about how you personally go about sort of generating ideas and then once you've
generated them like selecting which ones to work on, right? So do you have a systematic approach for that?
Yeah, that's a very good question.
I'm actually not sure I can give constructive suggestions here.
I believe we all have different approaches
and what I think here is more about a retrospect
and the summary of my past experience.
I know in the past I made a lot of mistakes. I wish I didn't do that.
So people hear this talk, try to avoid that. So like you mentioned the creative process,
I actually want to start with the project selection, and then I can talk about idea
generation. To me, I feel this is the right order. And first of all, I would choose the problems that really make me excited.
Sometimes I feel it's really just the instinct.
You know, you will know it immediately if you are truly interested in something.
Like at the beginning of our chat, I mentioned I used to work on theoretical analysis of cash performance.
And to be a little bit specific, it's more about the mathematical study of cash performance calculation.
I do some misracial curve calculation.
I spent tons of time working on formulas and thinking of popularity theory.
And now when I look back to that time,
I feel like I was forcing myself to start it.
I wasn't fulfilled.
But I was very lucky to intern at Twitter
at the Crossroads during my PhD
when Yao, my manager,
introduced the multiple potential projects
I could do over that summer.
I forgot the rest of the projects she proposed,
but I remember two.
One is about caching stuff.
Yeah, she mentioned that I can do
some data structure design in Redis.
That's one project.
And the other one is service migration problem
if Twitter build another data center.
You guess which one I chose?
I think I got it, maybe, yeah.
Yeah, yeah.
Actually, when I heard that two options, other options I forgot,
I know immediately that I want to do the service migration.
Even though I knew nothing about microservices at that time,
and not to mention I need to use distributed tracing data.
I had no clue what that is.
But that problem fascinated me.
And on the contrary, the first problem probably would be much easier for me.
And that should be a comfort zone.
But I can feel what's more intriguing to me.
So I love working on more practical problems.
That's the specific problems you can explore. But I just learned a lot of stuff at that time.
Like I can remember all the things I was trying to learn. That's just very fresh. Of course,
it's just a personal preference. But I just feel I got excited. So I won't feel like this is something I don't want
to do so I can spend a lot of time working on that and don't feel I'm so tired I've got exhausted
so also I understand the PhD students have limited time to explore their interests
so save your time I found my research interests basically at the end of my second year.
So which means I was restarting the whole new research direction in the middle of my PhD.
That was very stressful indeed.
So at the first two years, one suggestion is exposing yourself to many areas.
That would be beneficial to find your own like like your true true interest interest
so i think it's very important to keep the passion and really love what you do yeah this is about the
project selection just to face down what i really like and next i'll be talking about the ideal
generation right you mentioned that about that i think now I think back, I feel like
reading papers, of course, and reading technical blogs, latest tech news, and knowing what has
been done and what is happening is important. And also talk to people. I think it's just when you
observe a lot of information, you're kind of back-processing
them.
Like someday you will have the moment you feel there are a lot of things I could try.
I know I have that moment, but very later on.
For example, the intro project I did at Twitter, I developed two call latencies which can estimate
the end-to-end latency if some services are delayed.
So these two mainly capture causal dependency between services.
Even though it was designed to estimate latency,
I feel like it has more potential, but I don't know what it is.
So last year, Amazon had a blog talking about how they found the monolithic design
actually has much better performance
than microservices design. Okay, I know immediately that this tool can help me to explore this
idea, and I can extend this tool and to evaluate whether we really need a microservices architecture.
So this is something we did before, and we keep in mind and we understand what we are
doing. And now you're
exposing yourself to new problems you keep track of the latest news especially from the industry
here some new ideas pop up so i think that could be very interesting experiments and it's not boring
at all right yeah yeah no i love that i love the answer to that question i like what you're saying
about having that breadth not depth exposure at the start get exposed to kind of a lot of different
ideas so you can kind of find the one that really resonates with you which then makes
working on it easier right like if you if you love what you do you never work a day in your
life right as the same so it goes and also sort of that sort of like a three-pronged approach
or maybe there was a fourth as well like i've been been reading stuff, reading tech news, keeping up to date,
but more importantly, talking to people
and getting exposed and kind of venerating that.
And setting a background thread off on something
and you never know when at some point in the future,
you're going to put two and two together and go,
huh, yeah, that could work.
So yeah, no, I think that's a fantastic answer
to that question, Yucho.
Awesome.
So Yucho, so could you tell us a little bit more
about the papers? We spoke about it at the top could you tell us a little bit more about the papers?
We spoke about it at the top of the show, a little bit about how this kind of idea came around,
but can you tell us more about the courage and the story behind the paper?
Yeah, sure. I really love to talk about this agenda here. I can share the backstory of SEIF.
And here I would say SEIF is a culmination of years of dedicated effort.
This is a teamwork, and I will give a lot of credit to my collaborators.
They are all cash experts working on cash starting for over 10 years.
Before SIEV was invented, Jin Cheng, who is the co-first author,
he wasn't here, has been maintaining a cache
simulator since 2016. And it's now called libcache theme, implemented in C++, and it has like
20 different eviction algorithm implementations. And Jin Cheng and I both interned at Twitter
before. Our manager Yao is also an experimental engineer.
She has a very deep understanding
of how CEEV should be designed,
sorry, how cache should be designed
to meet production requirements.
And also my advisor, Yiming,
and Jingcheng's advisor, Rashmi,
are experts in theoretical study.
So we were the whole team on the path of understanding
what makes cache eviction algorithm efficient.
So SIEM was discovered
while we were revisiting past old algorithms.
So in the past, we have collected a lot of cache traces
and it's just natural for us to analyze and evaluate things.
So now we present this very simple design,
but we actually tried multiple different versions.
For example, we tried different hand movement strategies
to the head, to the middle of the cue, to the, and so on.
And with a lot of effort on understanding sieve cue,
we finally present sieve as what it is now.
So it's a long journey, even though it's very simple,
but actually it has a lot of people's effort in the past decades, even though I wasn't even there
back then. Yeah, it's crazy. You just see the sort of the paper appearing, the proceedings,
and then you don't always see the years of research and the culmination of effort that's
gone into it and the collaboration. But yeah, that's awesome i mean there's there's no there's no teamwork makes the dream work right so like
it's a bit of a cliche saying but yeah it's awesome to see that it was the product of some
collaboration and years of of hardware but it's definitely been worth it in the end you know it's
a really awesome contribution to the space cool well we've come to the time for the last word now
so um what's the one takeaway you want the listener to get from this podcast episode today?
Yeah, one takeaway about this work.
So I would say SIF is a simplest eviction algorithm that achieves both laser promotion and quick demotion.
So you can use it to add a plugin cache
for all your web applications.
It's simple, efficient, and scalable.
Yeah, welcome to try it.
Yeah, you should go and try it, listener.
You've been told.
I will put links to all that in the show notes.
Thank you so much for speaking to me today, Yachou.
It's been a fascinating chat and I've learned a lot
and I'm sure the listener will have as well.
Best of luck, your postdoc at ETHic as well i'm sure you're gonna keep
producing fantastic work and i look forward to keeping my eye on it as well and seeing how things
progress for you over of your career so awesome stuff thank you very much whereabouts can we find
you on social media on like x or twitter wherever it's called these days and linkedin where the
listener can connect with you if they want.
Thank you, Jack.
And thanks for the invitation here.
I'm very happy to talk with you
and share my story and share the work.
And people can always find me on X and LinkedIn
and they can just email me.
They will find my email address on my personal website.
So I'm just easy to approach.
Fantastic. Thank you so much, Yisho.
And yeah, I guess we'll see you all next time
for some more awesome computer science research. Bye.