Disseminate: The Computer Science Research Podcast - Eleni Zapridou | Oligolithic Cross-task Optimizations across Isolated Workloads | #51
Episode Date: April 29, 2024In this episode, we talk to Eleni Zapridou and delve into the challenges of data processing within enterprises, where multiple applications operate concurrently on shared resources. Traditional resour...ce boundaries between applications often lead to increased costs and resource consumption. However, as Eleni explains the principle of functional isolation offers a solution by combining cross-task optimizations with performance isolation. We explore GroupShare, an innovative strategy that reduces CPU consumption and query latency, transforming data processing efficiency. Join us as we discuss the implications of functional isolation with Eleni and its potential to revolutionize enterprise data processing.Links:CIDR'24 PaperEleni's TwitterEleni's LinkedIn Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello and welcome to Disseminate the Computer Science Research Podcast. As usual, Jack here.
Today, we are going to be talking to Eleni Zapardou, who is a PhD student at EPFL.
And we're going to be talking about, let's see if I can pronounce this, this title is going to test my pronunciation of a few words.
So, oligolithic cross-task optimizations across isolated workloads. So,'s get started. So welcome, Eleni.
Hello, thanks for having me. And very good pronunciation, by the way.
Thank you. Cool. So yeah, I'm really excited to chat to you today. So the kind of the usual thing
we do when we start off is I kind of get you to tell me more about your story and how you kind of
became interested in databases, database management research,
and this really cool field that we all love.
So, sure. Towards the end of my bachelor's, actually, I came to EPFL to do an internship with Professor Elamaki.
And at the time, I was actually working on data cleaning.
And this is really how I got exposed to research in general and database research more specifically.
And I quite liked it. So I decided
to come back for a PhD and work in the same lab. And a fun fact here is that in between actually
my internship and starting my PhD, I worked on formal verification for the use case of autonomous
driving. So completely different topic. But at the end, I was more excited about working on systems so here I am. Fantastic yeah
I see a nice little detour into formal verification and autonomous vehicle that's really cool so I'm
guessing what sort of things are you trying to verify there other than sort of will this car
not crash into other cars or is that sort of the general gist of it? It was kind of like that so
basically the goal was you had some simulation with autonomous vehicles and then you had to monitor them and basically have a way to actually tell whether some properties hold while the cars are driving around.
OK, cool. So I guess, will we ever solve that problem? Because it feels like a pretty impossible problem to solve right yes yes exactly it's it's hard to define like what sort of properties you
would want the cars to to achieve so you know it was fine as well but i think a bit too theoretical
for my station yeah anyway cool so let's get so let's get talking about oligolithic cross-task
optimizations across isolated workloads and so. And so let's start off by
setting some context for this chat today then. So yeah, give us some background on the paper.
And I guess sort of the main question here is kind of how organizations today use data sharing
platforms and what are the sort of problems with the approach they use today? Yes. So first,
I would like to start by
explaining what we actually mean when we say data sharing platforms in the paper. So a data sharing
platform is basically common infrastructure that gives to multiple systems, or you could even think
like multiple jobs of the same system, access to the same shared data. And so companies use these platforms in order to allow their teams to deploy many, many
applications that operate concurrently on the same data and they use the same infrastructure.
So the ultimate goal here is basically for companies to be able to maximize the value
they extract from data by being able to serve multiple business functions.
And you can think that every business function could itself be composed of many subtests.
So this could be from a company like, let's say, Uber that runs applications like fraud
detection and monitoring to a much smaller company like an e-commerce website,
let's say, that has a dashboard and maybe it also has some recommendation system.
So now where does resource isolation come into play in this setup actually?
So all these different applications use the same data and infrastructure, but they can
actually have very different performance requirements. So the problem then becomes that the performance of one task can affect the rest of the workload.
And this problem is known as performance interference.
And it is well known, not only in database research, but in general for concurrent programming since the 1960s. And the solution that has been put into place for
databases is resource isolation. And so this solution by now is very well rooted into database
systems, I would say. So with resource isolation, essentially the system assigns a specific subset
of resources to every task based on the requirement that this task has.
And then the goal is to isolate basically tasks by enforcing these resource boundaries
between them so that one task cannot affect the performance of the other.
So you can think of resource negotiators and techniques like first scheduling and stride
scheduling as ways to enforce resource isolation.
So now, since with resource isolation, we're provisioning separate resources for every task,
as we increase the number of queries in our workload, we must also increase the number of resources that are available in the system. And in the past, this was quite okay, since with a
small cluster, we could execute our workloads.
But today, because we need to run more and more queries in the orders of, let's say, hundreds or even thousands, this becomes more and more expensive.
Because we literally need to grow our infrastructure proportionally to concurrence. And in our community, there have been developed some techniques that we call cross-task optimization techniques, like, for example, data and work sharing, that have the goal of maximizing resource efficiency.
Basically, you find some optimization opportunities across tasks, and then you use those to run the same amount of tasks with fewer resources. But the problem here is that these cross-task optimization techniques can actually penalize some individual queries. So then this makes them inapplicable in
cases where it is critical to meet the performance needs of individual queries. Basically, to sum up,
resource isolation is very expensive because of the high concurrency of today's workloads.
And even further, it does not allow us to use cross-task optimizations that would actually make processing more resource efficient.
Awesome stuff.
Yes, you've kind of got this problem of the noisy neighbor, right?
Sort of interrupting my work. And it's funny, we see this a guess, cloud and multi-tenant environments with sort of various different things interfering.
Like one VM ooms and that causes, sorry, one process ooms and that causes something else to oom and like totally separate applications.
And yes, you said this was kind of becoming more and more of a problem with scale as well.
So and like kind of the number of of concurrent queries that are workloads that are running so kind of to give us sort of a i don't know a yardstick to measure like what sort of
scale are we talking here like what sort of a rough number when this becomes a really big problem
yeah so actually i think we're talking in the scale of hundreds to thousands of queries we're
trying to run on the database
system cool yeah that's pretty big it might yeah yeah that's that this sort of thing kind of doing
the like adding more kind of resources so you can keep doing the resource isolation approach might
be might might work well if you google right but if you're kind of a smaller company then it's not
going to be feasible to kind of keep doing that we need another approach right so that kind of is a
nice segue into into your work and the kind of yeah so tell us about kind of what the goal of your paper
was and then there's this really key concept in there called functional in the functional isolation
so yeah tell us about that right yes so as i said we have these two options right in one case we can
isolate the resources for each query but as as I said before, this becomes more and more expensive.
And in the other case, we can cross-optimize queries and maximize this way resource efficiency.
But this way, we are risking penalizing some queries.
Basically, with cross-task optimization, users really have no guarantees about the performance of their individual queries.
And this is why in many cases,
users would decide not to actually employ cross-optimization. So what we argue in the paper
is that we don't really have to choose between resource efficiency and performance isolation,
but rather we can actually do both. So users do not care per se about resource isolation, but rather resource isolation is just a way to ensure that all the queries will get the performance they need.
And so, what we call functional isolation is basically achieving the same performance metric as you would achieve if the queries were really isolated. But you actually might choose to use some of this cross-task optimization
so that you are more resource efficient.
Basically, an execution schedule would be functionally isolated
when it provides the same or better performance
when we compare it to isolated execution.
So this means that what systems should be doing
instead of just using separate resources for every task,
which does not scale and becomes very expensive,
is to leverage cross-task optimizations
only when these do not interfere with performance isolation.
So essentially the goal here is to selectively choose
which cross-task optimization can you employ so that no query is penalized.
Awesome. That sounds great. Best of both worlds, right? We're getting the performance and we're more resource efficient.
So let's talk about how you went about achieving this and determining which types of cross-task optimizations could be performed.
And you did this in something called group share, right?
So yeah, tell us more about group share, what it is and how it works.
Exactly, yes.
So with group share, we wanted to test this vision
for the specific case of work sharing across analytical queries.
And so the goal for group share specifically
is to achieve performance isolation
and at the same time exploit opportunistically
some of the work sharing opportunities in the workload so that we reduce the total processing
time. So the key idea is that group share splits queries into sharing groups. So a sharing group
is a set of queries that pull their resources together and share some common computation. And a critical condition for a group of queries to be a sharing group is that all the participating
queries should achieve the same or lower latency when we compare it to isolate execution.
So we basically have this twofold objective.
On the one hand, we want at least the same latency as isolated execution.
And on the other hand, we want to minimize total processing time.
So putting it more simply, you can imagine sort of that all the queries are being egocentric.
So in order for them to participate in a sharing group and share execution with other queries, they need to benefit from it.
Otherwise, they want to be executed on their own.
So we had two main challenges here when we're trying to design an algorithm to solve this problem. So the first challenge was a scalability challenge. Basically, the space of all the
possible groupings is very big. So exhaustively just searching for the best grouping would be very, very expensive.
And so we solve this scalability challenge by designing group share to be opportunistic.
Basically, we avoid searching for the optimal grouping. And instead, the algorithm just focuses on finding just one grouping, one partition that consists of sharing groups, basically that doesn't penalize any query.
So this was the first challenge. And then another challenge we had to solve is related to accuracy.
So how do we actually evaluate whether a query would be penalized if it is part of a specific
group a priori? So this would be quite hard because it would necessitate relying on cost estimation and
intermediate cardinalities and we know from database research that these are known to be
inaccurate. So what we chose to do here is instead use runtime information. So basically instead of
trying to estimate a priori what would be the performance of a query in a particular group.
We just measure at runtime the progress rate of the individual queries and also of the shared
plans and use these measurements in order to make decisions about how to group our queries.
So these were the two main challenges and I'll briefly say how the algorithm works.
So overall, group share follows an iterative process.
So initially, the algorithm will start by forming a very big group
that contains all of the queries.
And at any point in time, the algorithm will keep the groups sorted
based on their processing rate.
So your first group will be the slowest one
and the last group will be the one
making the fastest progress.
And then at every step of execution,
the algorithm will identify the query
that makes the slowest progress
and its corresponding group.
And it will check whether this query is being penalized.
What does this mean?
It means basically we want to check
whether the query would run faster
if it was being executed in isolation. And if this is the case, What does this mean? It means basically we want to check whether the query would run faster if
it was being executed in isolation. And if this is the case, we move this query to the next group.
Basically, at the high level, the idea here is if we find the query that is progressing too slow in
its current group, we try the next group which is progressing faster and see whether this would work instead. So from time to time, we move basically groups around until we convert to some grouping that
doesn't penalize any of the participating queries.
And then from then onwards, we just continue with this group.
Basically I would say here that you can think of group share as a planner that decides what
the groups are.
And then you just use an existing work sharing algorithm to figure out execution schedule
inside every group.
And then you need a scheduler that will ensure that it will split fairly the CPU time across
these groups.
Awesome, because that was going to be one of my questions, was like once you've kind of got these groups
and they're all sort of fixed by their processing time, essentially,
how do you watch the next step in terms of then being like,
okay, how do I actually go and take this sort of these buckets
I've got of jobs to then put them actually execute on the CPU?
But you can use basically any work sharing algorithm
you want after that point.
Is that kind of the way it would work?
It's like you can look and play essentially.
Exactly.
So group share would be determining basically the limits inside which the work sharing algorithm can operate the groups.
And then the work sharing algorithm decides for each group how to execute it.
And then you pass this to a scheduler.
You can use some scheduler that's achieving first scheduling let's say side scheduling
so that you split the cpu across all these different groups awesome another question as
well when you was explaining it there was about the when you determine whether someone's kind of
the slowest in the group or the fastest in the group and you want to move it to a different
group essentially and you're trying to work out whether it's been penalized. How does it work?
Yes.
So for this, we actually use the runtime statistics that I mentioned before. So what happens is that from time to time, the system will actually run the queries in a sample of the data in isolation,
just in order to measure how fast would I progress if I was running on my own.
Gotcha, right.
Okay, so we're periodically having this sample of,
okay, this is actually my absolute runtime.
This is how long it actually takes
if I'm the only person alone in the system
and I've got all the resources I want.
So we kind of have this ground truth, I guess,
to sort of determine.
And then you can compare it against,
okay, nice, nice, awesome.
So there's a few things.
Maybe we'll talk about this in the evaluation.
We'll talk about the evaluation and the results.
But I just wanted to ask upfront,
how does this scheme deal with sort of the dynamism
in the workload and changing workloads over time?
Does that disrupt the convergence
and make the convergence faster?
Slower, sorry?
Right, yes.
So this is a very good question.
And actually, for this paper, just to simplify the problem as a first step,
we assume that we have a set of queries and these are fixed.
And we receive all the queries in a batch.
And also, we assume that the data distribution across the entire table we're
working on is the same um but the problem would become a lot more harder and also interesting
if you raise these assumptions yeah yeah for sure because i was thinking that if you are
the example you gave at the top of the show about being i don't know you're having like
a dashboard or some sort of kind of a limited set of queries,
then that assumption probably holds quite true that if you're Uber or whatever,
your application isn't going to develop and change necessarily that fast
for this to be a problem.
But if you are running a generic platform that anyone can submit jobs to,
then obviously you're kind of at the whim of the user
and there can be a lot more heterogeneity in the workload
and it can change a lot faster, I guess.
But okay, yeah, you've got to make some assumptions, right?
To make some progress
and then you can always relax them later on, right?
Yes, I mean, this would still work
if you have ad hoc queries,
so if you don't know the query type,
but you would need to receive all the queries in a batch
so that you find basically
the work sharing opportunities within this batch. So you batch so that you find basically the work sharing opportunities
within this batch. So you cannot assume that you have some sharing groups and then suddenly
a new query appears and then you place it in one of these groups. So you work with this batch of
queries until you finish executing them. Awesome stuff. Okay, cool. So let's talk about the
evaluation then. So how did you go about evaluating this scheme? You mentioned it a second ago about there's a few assumptions you made.
But yeah, tell us more about the evaluation and how that experimental setup looked like.
Yes.
So we compare group share with resource isolation and also with full sharing.
So basically, since here we're talking about CPU resources, resource isolation in this case is achieved via first scheduling.
So our first baseline is executing each query on its own.
And we basically split fairly the CPU across all the queries.
We use stride scheduling concretely.
The second baseline is full sharing. Basically, this means that all the queries are being executed with a single shared schedule
and all the common computation among the queries is being shared.
So the objective here is to minimize the overall execution time while achieving the same or
lower latency when we compare to first scheduling.
Awesome.
So what was the data, sorry, Binucci, and some of the queries?
Do you have any sort of, just like kind of a rough idea of what these queries actually were doing?
Yes. So for the data, we actually had synthetic data so that we can play around
and kind of change the parameters and see how group share performs.
And then for the queries, you can think of a query type that has the shared part for us
as a join, and then we have some extra operator after the join that might not be shared.
Okay, cool. So yeah, tell us about the results and how much better was it than the baselines?
Right, so how much better? Overall, we saw up to about 80% less CPU time without penalizing the
performance of participating queries with
group share.
But of course, okay, this, how much CPU time we can save, it would depend on the amount
of formal work between the queries.
And here, an interesting aspect is that the more queries in your workload, then the higher
the chances are that group share will be actually able to detect and employ some of the
sharing opportunities. And then some other observations we had from the results are that
we saw that full sharing penalizes short-running queries while it benefits long-running ones.
So it is kind of unpredictable whether your individual query will get better performance or worse. And it more or less depends on what other queries are also running.
And then another observation was that group share operates within smaller groups.
So it forms these smaller groups inside which it can apply work sharing
without penalizing the queries.
And interestingly, group share doesn't only
benefit the shared queries, but also the non-shared ones. And for the shared queries,
you can imagine that they benefit because basically they participate in a sharing group,
so they pull the resources together, and also they eliminate some duplicate work. So you have
more resources and you do less work. So great, this is where the benefit comes from.
Now for the non-shared queries,
why they benefit is actually because
the shared queries run faster.
So then the scheduler can actually allocate
more CPU time to these non-shared queries.
So this is an indirect benefit for the non-shared queries
that they can also finish faster after all.
Nice, yeah, they've got more, there's more capacity there for them to kind of do their work, right?
So there's more stuff around. So they go, oh, great, I'm going to use all this.
And yeah, faster I go. That's awesome. Yeah.
And that's a nice, interesting observation about the more variety, I guess, there is in the workload,
the more opportunity there is to sort of employ these these optimizations and
stuff the the the thing that i was interested in by is like how many sort of i mean and this is
obviously very sensitive to the workload but how many groups once it's sort of converged how many
different groups are we kind of talking about here is it like 10 groups five groups 100 groups
obviously i know it's probably a function of the number of kind of tasks that
have been run but yeah yeah right this is really hard to tell it's very workload dependent
in our cases we didn't have many many groups just because of the queries we tried but basically in
the best case scenario you would have only a single group if really all of your queries are
doing some common join, let's say, and then they can all benefit from being in this shared group
and perhaps resource pooling benefits all the queries so they can all be in the same group.
And then in the worst case, really, there is no work sharing opportunity you can use.
So you can literally have as many groups as the number of queries and then you're basically
doing what first scheduling would be doing yeah and how long was this period of convergence
for to the to the number of groups that were optimal so for this actually we don't have an
experiment in the paper but more or less i saw that queries run for a few seconds and group share would always converge before that.
So you would have enough time to converge and see the benefit.
But I don't have an exact number on how many milliseconds convergers to pick.
I mean, it's in the magnitude of milliseconds, right?
Which is negligible when you're talking about, I guess,
these types of analytical workloads, right?
Or whatever's working on top of
them is is is yeah milliseconds is fine we can we can we can we can live with milliseconds that's
okay okay hours or days right it's pretty quick so yeah cool so this is obviously a very sort of
i was i'm trying to say so yeah this is kind of you've kind of proposed this initial uh this
initial work and i guess i want to know kind of where you've kind of proposed this initial work.
And I guess I want to know kind of where you go next with this.
And we spoke about some of the assumptions you've made.
Is it to relax those assumptions and explore that space?
Yeah, what's next?
So yes, one thing would be to relax the assumptions.
And really, this could mean also play with different, like experiment with different
optimizations and different resources
and performance metrics.
For example, now in this work, we're focused only on work sharing and CPU resources and
then latency.
But you could imagine other cases, like, for example, let's say you have a use case where
you have a couple of applications that are running and they're using some indexes and
some other data structures. So with functional isolation, what you could do is share these caches and these data
structures so that you save memory, while at the same time, you can guarantee that all the
applications meet their latency deadline. So one thing would be to basically really work with different types
of optimizations and resources.
And also one could take this problem into different types of workloads as well.
For example, now we're focused only on analytical queries,
but the idea is more general.
So it could be applied on streaming queries as well, perhaps even ML workloads.
And the problem would become,
of course, even more challenging if you think about mixed workloads.
Yeah, so that was going to be my sort of next, we've preempted my next question now. I was going to put my reviewer number two hat on and be like, what are the limitations with the work? And this
is kind of all analytical queries. What about if we want to start doing different things? We want
to start writing stuff as well? What happens there? but yeah i guess uh you've kind of you've answered that there that's going
to be something you look to tackle in in future work but kind of i guess the similar similar sort
of question based off that though is what do you think are the current limitations with this
with this approach and maybe with this approach in general? Right. So one thing is, as I said, that we only consider CPU resources and only work sharing as a type
of optimization.
And also group share requires the underlying system to be able to adaptively change the
query plans.
And this is something that there is a lot of research on it.
So there are ways to do it, to adaptively change the equipment.
But the other hand, it's not something that every single engine supports right now.
Okay.
But that feels more like an engineering challenge to overcome rather than like, that's doable for sure.
Yeah.
Awesome stuff.
Cool.
So, yeah, my next question is about the impact you you think this work can have
or maybe already already has had so yeah i guess what what what impact do you think it can have
longer term and has there been any impact in the short term right so really how i think about it
is that without the idea of functional isolation, although cross-task optimization sounds tempting to use,
developers would actually in many cases not use them because of practicality reasons and also because of the unpredictability they bring.
Basically, you don't know what the effects will be on individual tasks.
So our goal with this work was really to give a framework to developers that allows them
to employ different cross-task optimizations in order to use resources more efficiently. Basically
to be able to use all of this research that has been happening for cross-task optimizations
without sacrificing performance isolation. Nice, yeah, that's awesome. I can definitely see it
having a big impact on on on making it easy
for people to access the latest sort of techniques and things it can definitely be a massive win so
i look forward to seeing how it how it develops over the over the coming years for sure so yeah
i mean we off before we started recording we kind of spoke about this being a slightly
detour from your normal research I guess so has there been
any surprises kind of dipping your toes into a slightly different area of data management and
yeah what's the kind of most interesting you've kind of interesting thing you've learned whilst
yeah working on this project right so I think one interesting thing was that this is a problem where the search space is really huge.
And so it's really a good example where you should not try to find the optimal solution,
but rather just try to find a good enough solution, but do it as fast as you can.
So converge to the groups as fast as you can. And this is something that group share achieves with this elegant iterative process that it
is using
to deterministically basically convert to the sharing groups.
So this was one interesting lesson I would say.
And then another interesting thing is the importance of using runtime information rather
than just trying to predict things a priori and like have these estimates that can be
quite inaccurate.
Yeah, definitely. I can see the way that the runtime sort of stuff, if you can collect that information, predict things a priori and like have these estimates that can be quite inaccurate yeah
definitely i can see the way that the runtime sort of stuff if you can collect that information
efficiently and how you can use that to optimize various things i guess there is is there a
possible space as well for combining some of the upfront information with also the runtime stuff
as well and kind of having almost the best of both worlds there as well for sure yes so definitely i think there is room to think of some sort of heuristic that would do
some course grouping at the beginning and then group share with the runtime information would
only like do some small tweaks to make sure that the final grouping is actually consist of what we
call sharing groups so this would be definitely something interesting.
It could speed up the convergence.
Yeah, nice.
Cool.
Yeah, so I teased a second ago there that some of your other research is in a different area.
So now would be a good time for you to tell our listeners all about the work you've done on stream processing.
So yeah, the floor is yours, Lenny.
Tell us all about it.
Yes.
So as we said, my main line of research is on stream processing.
And the overarching goal of my PhD is to enable scalable and efficient stream processing.
So previously, I was working on how to partition streaming data to the concurrent workers of a streaming engine
in order basically to eliminate execution strugglers and essentially just maximize performance.
And at the moment, we're actually working on applying ideas from the CIDR paper
we have been talking about today to data streams.
So the problem there becomes a lot more challenging because of the volatility
of streams, but also due to the fact that streaming queries produce results
on a window fashion.
So you need to meet the requirements on every single window.
So hopefully you will soon be able to read more about this problem
and we will share more stuff with the paper.
Fantastic.
Yeah, we look forward for that to come out.
You can come back on the podcast as well and tell us all about that
once that's been published.
So, yeah.
That would be the best, yes.
Awesome stuff.
So, yeah, kind of going back to the current side of paper,
we've been talking about how did this sort of come about then?
What was the origin story here?
It's like you say, it was different to your normal line of work.
So how did this idea sort of synthesize? I know it's something you say it was different to your normal line of work so how did this idea
sort of synthesize I know it's something you did with one of your friends and you kind of thought
hi yeah let's do this so yeah what was that kind of like? Right so Panos who is actually a co-author
in the paper was graduating at the time close to graduation and his PhD is on work sharing so
with his work he really made work sharing a lot more practical.
And so we were discussing about it.
And we said that although not only work sharing, but in general, cross-task optimizations can
make processing more resource efficient, people might still not be using it just because of
the risk of penalizing some individual queries.
And this is how we basically started discussing about this problem
and trying to come up with a solution about it.
Awesome, yeah.
It's always nice when you can kind of work with one of your colleagues as well
and something kind of away from your usual sort of space, right?
And kind of, yeah, that's nice.
You see these ideas synthesized and the serendipity of it almost.
Yeah, definitely.
You also learn a lot of stuff when you work on new things and with new collaborators
you you definitely learn a lot well yeah for sure i mean and you kind of you never know what
techniques in your area can be applied to somebody else's area and vice versa right you can always
like oh yeah we could use that looks very similar to problem x why don't i use that and yes for sure
and that's actually a nice little lead into this
my favorite question and about the kind of creative process and how you go about sort of
generating ideas and then choosing which ones to pursue because i mean i have like loads of ideas
which are just probably stupid and terrible and i need to discard them but sometimes it's hard to
let go of an idea you've had so yeah tell us about your
creative process right I think this is the question that every PhD student has at the
beginning of their PhD like how do you come up with ideas how do you choose so I think I'm
actually quite lucky to be part of a very diverse lab where people work really on many different topics and are also very open to discuss.
So I think this is a great environment to generate ideas, but also to test your ideas.
Because as we said before, like you present an idea and then someone from a completely different perspective will criticize it.
And this is very valuable feedback. So this is one way to come up with ideas.
Another way is more like, you know,
one project sort of leads to the next.
For example, here with this paper,
we kind of linked it a bit to scope
so that we can actually solve the problem,
but then you can raise an assumption
and see how could you solve,
how could you provide a more general solution
or apply the same idea to a different
type of workload yeah so kind of what i'm taking there from that is that the the environment's
really important being surrounded by super smart clever people in a collaborative environment where
there's always open discussion going on is is definitely a big plus for for kind of innovation
i guess and for creating ideas but then also i like what you're saying
about having when you approach a problem imposing some constraints initially and then solve it for
n equals two before you then try and solve it it's like find the general solution right kind of and
relaxing assumptions iteratively and then explore the space otherwise you'll go crazy trying to find
the perfect solution straight away right yeah? Yeah, yeah, definitely, definitely.
Plus, another thing is when you work on experimental research,
you work on one topic in one particular project
and you're trying to solve some particular aspect of the system,
but then you observe different bottlenecks as well.
So this is another way to get an idea about the next project.
Yeah, that's it there's projects keep
coming along coming along so and then there's kind of a sub question to what i when i asked
a second ago about kind of okay these these projects keep coming up and this is a cool thing
to pursue and you get some important feedback from your colleague you've been like maybe this is not
a good idea for x y and z but like how do you when you see a bottleneck how do you know okay how do you know
not to pursue that specific thing straight away and be like okay maybe i should do this and how
do you avoid context switching constantly uh that's a hard question i mean i think personally
for me when i am working on a particular project and then I find different sort of bottlenecks, I just, I guess, focus sometimes the opposite happens. For example, when I started my PhD,
I was working on data integration for streaming data.
And then there was this problem with load balancing
and really the approach I was designing
was not performing well because of just load balancing.
So this is how I ended up working on partitioning
and I never finished the project on data integration.
So sometimes the opposite happens yeah yeah I know I just because personally I'm all I'm a I'm a sucker
for like oh new shiny thing and then I get distracted and like I kind of forget about the
thing I originally did and I've gone down some massive rabbit hole and but anyway yeah sometimes
it works out right um but yeah great stuff so yeah it's time for for the last word now so what's the one thing
you want the listener to take away from this podcast today so i would say that with this
project we are targeting the problem of having workloads with very high query concurrency and
why is this a problem is because processing all of these queries really puts a lot of pressure on databases.
And if we continue to use resource isolation, this becomes very expensive.
So our goal with this work was to show that there is actually a practical and robust solution that allows us to use cross-task optimizations while not penalizing individual queries.
And we also highlight the importance of incorporating in a solution both the macro and micro view of performance,
because I think this is really important for someone to actually use your solution.
Basically, you shouldn't care about just the performance achieved for the entire workload,
but also look at the performance of individual queries.
Well, thank you very much for speaking to us today, Eleni.
It's been a fantastic chat.
If the listener is interested, we'll link all of the relevant materials that we spoke
about today in the show notes.
And yeah, we'll see you all next time for some more awesome computer science research.