Disseminate: The Computer Science Research Podcast - Eleni Zapridou | Oligolithic Cross-task Optimizations across Isolated Workloads | #51

Starting point is 00:00:00 Hello and welcome to Disseminate the Computer Science Research Podcast. As usual, Jack here. Today, we are going to be talking to Eleni Zapardou, who is a PhD student at EPFL. And we're going to be talking about, let's see if I can pronounce this, this title is going to test my pronunciation of a few words. So, oligolithic cross-task optimizations across isolated workloads. So,'s get started. So welcome, Eleni. Hello, thanks for having me. And very good pronunciation, by the way. Thank you. Cool. So yeah, I'm really excited to chat to you today. So the kind of the usual thing we do when we start off is I kind of get you to tell me more about your story and how you kind of became interested in databases, database management research,

Starting point is 00:01:05 and this really cool field that we all love. So, sure. Towards the end of my bachelor's, actually, I came to EPFL to do an internship with Professor Elamaki. And at the time, I was actually working on data cleaning. And this is really how I got exposed to research in general and database research more specifically. And I quite liked it. So I decided to come back for a PhD and work in the same lab. And a fun fact here is that in between actually my internship and starting my PhD, I worked on formal verification for the use case of autonomous driving. So completely different topic. But at the end, I was more excited about working on systems so here I am. Fantastic yeah

Starting point is 00:01:47 I see a nice little detour into formal verification and autonomous vehicle that's really cool so I'm guessing what sort of things are you trying to verify there other than sort of will this car not crash into other cars or is that sort of the general gist of it? It was kind of like that so basically the goal was you had some simulation with autonomous vehicles and then you had to monitor them and basically have a way to actually tell whether some properties hold while the cars are driving around. OK, cool. So I guess, will we ever solve that problem? Because it feels like a pretty impossible problem to solve right yes yes exactly it's it's hard to define like what sort of properties you would want the cars to to achieve so you know it was fine as well but i think a bit too theoretical for my station yeah anyway cool so let's get so let's get talking about oligolithic cross-task optimizations across isolated workloads and so. And so let's start off by

Starting point is 00:02:46 setting some context for this chat today then. So yeah, give us some background on the paper. And I guess sort of the main question here is kind of how organizations today use data sharing platforms and what are the sort of problems with the approach they use today? Yes. So first, I would like to start by explaining what we actually mean when we say data sharing platforms in the paper. So a data sharing platform is basically common infrastructure that gives to multiple systems, or you could even think like multiple jobs of the same system, access to the same shared data. And so companies use these platforms in order to allow their teams to deploy many, many applications that operate concurrently on the same data and they use the same infrastructure.

Starting point is 00:03:36 So the ultimate goal here is basically for companies to be able to maximize the value they extract from data by being able to serve multiple business functions. And you can think that every business function could itself be composed of many subtests. So this could be from a company like, let's say, Uber that runs applications like fraud detection and monitoring to a much smaller company like an e-commerce website, let's say, that has a dashboard and maybe it also has some recommendation system. So now where does resource isolation come into play in this setup actually? So all these different applications use the same data and infrastructure, but they can

Starting point is 00:04:21 actually have very different performance requirements. So the problem then becomes that the performance of one task can affect the rest of the workload. And this problem is known as performance interference. And it is well known, not only in database research, but in general for concurrent programming since the 1960s. And the solution that has been put into place for databases is resource isolation. And so this solution by now is very well rooted into database systems, I would say. So with resource isolation, essentially the system assigns a specific subset of resources to every task based on the requirement that this task has. And then the goal is to isolate basically tasks by enforcing these resource boundaries between them so that one task cannot affect the performance of the other.

Starting point is 00:05:15 So you can think of resource negotiators and techniques like first scheduling and stride scheduling as ways to enforce resource isolation. So now, since with resource isolation, we're provisioning separate resources for every task, as we increase the number of queries in our workload, we must also increase the number of resources that are available in the system. And in the past, this was quite okay, since with a small cluster, we could execute our workloads. But today, because we need to run more and more queries in the orders of, let's say, hundreds or even thousands, this becomes more and more expensive. Because we literally need to grow our infrastructure proportionally to concurrence. And in our community, there have been developed some techniques that we call cross-task optimization techniques, like, for example, data and work sharing, that have the goal of maximizing resource efficiency. Basically, you find some optimization opportunities across tasks, and then you use those to run the same amount of tasks with fewer resources. But the problem here is that these cross-task optimization techniques can actually penalize some individual queries. So then this makes them inapplicable in

Starting point is 00:06:32 cases where it is critical to meet the performance needs of individual queries. Basically, to sum up, resource isolation is very expensive because of the high concurrency of today's workloads. And even further, it does not allow us to use cross-task optimizations that would actually make processing more resource efficient. Awesome stuff. Yes, you've kind of got this problem of the noisy neighbor, right? Sort of interrupting my work. And it's funny, we see this a guess, cloud and multi-tenant environments with sort of various different things interfering. Like one VM ooms and that causes, sorry, one process ooms and that causes something else to oom and like totally separate applications. And yes, you said this was kind of becoming more and more of a problem with scale as well.

Starting point is 00:07:20 So and like kind of the number of of concurrent queries that are workloads that are running so kind of to give us sort of a i don't know a yardstick to measure like what sort of scale are we talking here like what sort of a rough number when this becomes a really big problem yeah so actually i think we're talking in the scale of hundreds to thousands of queries we're trying to run on the database system cool yeah that's pretty big it might yeah yeah that's that this sort of thing kind of doing the like adding more kind of resources so you can keep doing the resource isolation approach might be might might work well if you google right but if you're kind of a smaller company then it's not going to be feasible to kind of keep doing that we need another approach right so that kind of is a

Starting point is 00:08:04 nice segue into into your work and the kind of yeah so tell us about kind of what the goal of your paper was and then there's this really key concept in there called functional in the functional isolation so yeah tell us about that right yes so as i said we have these two options right in one case we can isolate the resources for each query but as as I said before, this becomes more and more expensive. And in the other case, we can cross-optimize queries and maximize this way resource efficiency. But this way, we are risking penalizing some queries. Basically, with cross-task optimization, users really have no guarantees about the performance of their individual queries. And this is why in many cases,

Starting point is 00:08:45 users would decide not to actually employ cross-optimization. So what we argue in the paper is that we don't really have to choose between resource efficiency and performance isolation, but rather we can actually do both. So users do not care per se about resource isolation, but rather resource isolation is just a way to ensure that all the queries will get the performance they need. And so, what we call functional isolation is basically achieving the same performance metric as you would achieve if the queries were really isolated. But you actually might choose to use some of this cross-task optimization so that you are more resource efficient. Basically, an execution schedule would be functionally isolated when it provides the same or better performance when we compare it to isolated execution.

Starting point is 00:09:42 So this means that what systems should be doing instead of just using separate resources for every task, which does not scale and becomes very expensive, is to leverage cross-task optimizations only when these do not interfere with performance isolation. So essentially the goal here is to selectively choose which cross-task optimization can you employ so that no query is penalized. Awesome. That sounds great. Best of both worlds, right? We're getting the performance and we're more resource efficient.

Starting point is 00:10:16 So let's talk about how you went about achieving this and determining which types of cross-task optimizations could be performed. And you did this in something called group share, right? So yeah, tell us more about group share, what it is and how it works. Exactly, yes. So with group share, we wanted to test this vision for the specific case of work sharing across analytical queries. And so the goal for group share specifically is to achieve performance isolation

Starting point is 00:10:42 and at the same time exploit opportunistically some of the work sharing opportunities in the workload so that we reduce the total processing time. So the key idea is that group share splits queries into sharing groups. So a sharing group is a set of queries that pull their resources together and share some common computation. And a critical condition for a group of queries to be a sharing group is that all the participating queries should achieve the same or lower latency when we compare it to isolate execution. So we basically have this twofold objective. On the one hand, we want at least the same latency as isolated execution. And on the other hand, we want to minimize total processing time.

Starting point is 00:11:29 So putting it more simply, you can imagine sort of that all the queries are being egocentric. So in order for them to participate in a sharing group and share execution with other queries, they need to benefit from it. Otherwise, they want to be executed on their own. So we had two main challenges here when we're trying to design an algorithm to solve this problem. So the first challenge was a scalability challenge. Basically, the space of all the possible groupings is very big. So exhaustively just searching for the best grouping would be very, very expensive. And so we solve this scalability challenge by designing group share to be opportunistic. Basically, we avoid searching for the optimal grouping. And instead, the algorithm just focuses on finding just one grouping, one partition that consists of sharing groups, basically that doesn't penalize any query. So this was the first challenge. And then another challenge we had to solve is related to accuracy.

Starting point is 00:12:33 So how do we actually evaluate whether a query would be penalized if it is part of a specific group a priori? So this would be quite hard because it would necessitate relying on cost estimation and intermediate cardinalities and we know from database research that these are known to be inaccurate. So what we chose to do here is instead use runtime information. So basically instead of trying to estimate a priori what would be the performance of a query in a particular group. We just measure at runtime the progress rate of the individual queries and also of the shared plans and use these measurements in order to make decisions about how to group our queries. So these were the two main challenges and I'll briefly say how the algorithm works.

Starting point is 00:13:24 So overall, group share follows an iterative process. So initially, the algorithm will start by forming a very big group that contains all of the queries. And at any point in time, the algorithm will keep the groups sorted based on their processing rate. So your first group will be the slowest one and the last group will be the one making the fastest progress.

Starting point is 00:13:48 And then at every step of execution, the algorithm will identify the query that makes the slowest progress and its corresponding group. And it will check whether this query is being penalized. What does this mean? It means basically we want to check whether the query would run faster

Starting point is 00:14:04 if it was being executed in isolation. And if this is the case, What does this mean? It means basically we want to check whether the query would run faster if it was being executed in isolation. And if this is the case, we move this query to the next group. Basically, at the high level, the idea here is if we find the query that is progressing too slow in its current group, we try the next group which is progressing faster and see whether this would work instead. So from time to time, we move basically groups around until we convert to some grouping that doesn't penalize any of the participating queries. And then from then onwards, we just continue with this group. Basically I would say here that you can think of group share as a planner that decides what the groups are.

Starting point is 00:14:46 And then you just use an existing work sharing algorithm to figure out execution schedule inside every group. And then you need a scheduler that will ensure that it will split fairly the CPU time across these groups. Awesome, because that was going to be one of my questions, was like once you've kind of got these groups and they're all sort of fixed by their processing time, essentially, how do you watch the next step in terms of then being like, okay, how do I actually go and take this sort of these buckets

Starting point is 00:15:15 I've got of jobs to then put them actually execute on the CPU? But you can use basically any work sharing algorithm you want after that point. Is that kind of the way it would work? It's like you can look and play essentially. Exactly. So group share would be determining basically the limits inside which the work sharing algorithm can operate the groups. And then the work sharing algorithm decides for each group how to execute it.

Starting point is 00:15:38 And then you pass this to a scheduler. You can use some scheduler that's achieving first scheduling let's say side scheduling so that you split the cpu across all these different groups awesome another question as well when you was explaining it there was about the when you determine whether someone's kind of the slowest in the group or the fastest in the group and you want to move it to a different group essentially and you're trying to work out whether it's been penalized. How does it work? Yes. So for this, we actually use the runtime statistics that I mentioned before. So what happens is that from time to time, the system will actually run the queries in a sample of the data in isolation,

Starting point is 00:16:20 just in order to measure how fast would I progress if I was running on my own. Gotcha, right. Okay, so we're periodically having this sample of, okay, this is actually my absolute runtime. This is how long it actually takes if I'm the only person alone in the system and I've got all the resources I want. So we kind of have this ground truth, I guess,

Starting point is 00:16:38 to sort of determine. And then you can compare it against, okay, nice, nice, awesome. So there's a few things. Maybe we'll talk about this in the evaluation. We'll talk about the evaluation and the results. But I just wanted to ask upfront, how does this scheme deal with sort of the dynamism

Starting point is 00:16:56 in the workload and changing workloads over time? Does that disrupt the convergence and make the convergence faster? Slower, sorry? Right, yes. So this is a very good question. And actually, for this paper, just to simplify the problem as a first step, we assume that we have a set of queries and these are fixed.

Starting point is 00:17:18 And we receive all the queries in a batch. And also, we assume that the data distribution across the entire table we're working on is the same um but the problem would become a lot more harder and also interesting if you raise these assumptions yeah yeah for sure because i was thinking that if you are the example you gave at the top of the show about being i don't know you're having like a dashboard or some sort of kind of a limited set of queries, then that assumption probably holds quite true that if you're Uber or whatever, your application isn't going to develop and change necessarily that fast

Starting point is 00:17:54 for this to be a problem. But if you are running a generic platform that anyone can submit jobs to, then obviously you're kind of at the whim of the user and there can be a lot more heterogeneity in the workload and it can change a lot faster, I guess. But okay, yeah, you've got to make some assumptions, right? To make some progress and then you can always relax them later on, right?

Starting point is 00:18:13 Yes, I mean, this would still work if you have ad hoc queries, so if you don't know the query type, but you would need to receive all the queries in a batch so that you find basically the work sharing opportunities within this batch. So you batch so that you find basically the work sharing opportunities within this batch. So you cannot assume that you have some sharing groups and then suddenly a new query appears and then you place it in one of these groups. So you work with this batch of

Starting point is 00:18:36 queries until you finish executing them. Awesome stuff. Okay, cool. So let's talk about the evaluation then. So how did you go about evaluating this scheme? You mentioned it a second ago about there's a few assumptions you made. But yeah, tell us more about the evaluation and how that experimental setup looked like. Yes. So we compare group share with resource isolation and also with full sharing. So basically, since here we're talking about CPU resources, resource isolation in this case is achieved via first scheduling. So our first baseline is executing each query on its own. And we basically split fairly the CPU across all the queries.

Starting point is 00:19:17 We use stride scheduling concretely. The second baseline is full sharing. Basically, this means that all the queries are being executed with a single shared schedule and all the common computation among the queries is being shared. So the objective here is to minimize the overall execution time while achieving the same or lower latency when we compare to first scheduling. Awesome. So what was the data, sorry, Binucci, and some of the queries? Do you have any sort of, just like kind of a rough idea of what these queries actually were doing?

Starting point is 00:19:51 Yes. So for the data, we actually had synthetic data so that we can play around and kind of change the parameters and see how group share performs. And then for the queries, you can think of a query type that has the shared part for us as a join, and then we have some extra operator after the join that might not be shared. Okay, cool. So yeah, tell us about the results and how much better was it than the baselines? Right, so how much better? Overall, we saw up to about 80% less CPU time without penalizing the performance of participating queries with group share.

Starting point is 00:20:27 But of course, okay, this, how much CPU time we can save, it would depend on the amount of formal work between the queries. And here, an interesting aspect is that the more queries in your workload, then the higher the chances are that group share will be actually able to detect and employ some of the sharing opportunities. And then some other observations we had from the results are that we saw that full sharing penalizes short-running queries while it benefits long-running ones. So it is kind of unpredictable whether your individual query will get better performance or worse. And it more or less depends on what other queries are also running. And then another observation was that group share operates within smaller groups.

Starting point is 00:21:15 So it forms these smaller groups inside which it can apply work sharing without penalizing the queries. And interestingly, group share doesn't only benefit the shared queries, but also the non-shared ones. And for the shared queries, you can imagine that they benefit because basically they participate in a sharing group, so they pull the resources together, and also they eliminate some duplicate work. So you have more resources and you do less work. So great, this is where the benefit comes from. Now for the non-shared queries,

Starting point is 00:21:49 why they benefit is actually because the shared queries run faster. So then the scheduler can actually allocate more CPU time to these non-shared queries. So this is an indirect benefit for the non-shared queries that they can also finish faster after all. Nice, yeah, they've got more, there's more capacity there for them to kind of do their work, right? So there's more stuff around. So they go, oh, great, I'm going to use all this.

Starting point is 00:22:12 And yeah, faster I go. That's awesome. Yeah. And that's a nice, interesting observation about the more variety, I guess, there is in the workload, the more opportunity there is to sort of employ these these optimizations and stuff the the the thing that i was interested in by is like how many sort of i mean and this is obviously very sensitive to the workload but how many groups once it's sort of converged how many different groups are we kind of talking about here is it like 10 groups five groups 100 groups obviously i know it's probably a function of the number of kind of tasks that have been run but yeah yeah right this is really hard to tell it's very workload dependent

Starting point is 00:22:51 in our cases we didn't have many many groups just because of the queries we tried but basically in the best case scenario you would have only a single group if really all of your queries are doing some common join, let's say, and then they can all benefit from being in this shared group and perhaps resource pooling benefits all the queries so they can all be in the same group. And then in the worst case, really, there is no work sharing opportunity you can use. So you can literally have as many groups as the number of queries and then you're basically doing what first scheduling would be doing yeah and how long was this period of convergence for to the to the number of groups that were optimal so for this actually we don't have an

Starting point is 00:23:38 experiment in the paper but more or less i saw that queries run for a few seconds and group share would always converge before that. So you would have enough time to converge and see the benefit. But I don't have an exact number on how many milliseconds convergers to pick. I mean, it's in the magnitude of milliseconds, right? Which is negligible when you're talking about, I guess, these types of analytical workloads, right? Or whatever's working on top of them is is is yeah milliseconds is fine we can we can we can we can live with milliseconds that's

Starting point is 00:24:10 okay okay hours or days right it's pretty quick so yeah cool so this is obviously a very sort of i was i'm trying to say so yeah this is kind of you've kind of proposed this initial uh this initial work and i guess i want to know kind of where you've kind of proposed this initial work. And I guess I want to know kind of where you go next with this. And we spoke about some of the assumptions you've made. Is it to relax those assumptions and explore that space? Yeah, what's next? So yes, one thing would be to relax the assumptions.

Starting point is 00:24:38 And really, this could mean also play with different, like experiment with different optimizations and different resources and performance metrics. For example, now in this work, we're focused only on work sharing and CPU resources and then latency. But you could imagine other cases, like, for example, let's say you have a use case where you have a couple of applications that are running and they're using some indexes and some other data structures. So with functional isolation, what you could do is share these caches and these data

Starting point is 00:25:11 structures so that you save memory, while at the same time, you can guarantee that all the applications meet their latency deadline. So one thing would be to basically really work with different types of optimizations and resources. And also one could take this problem into different types of workloads as well. For example, now we're focused only on analytical queries, but the idea is more general. So it could be applied on streaming queries as well, perhaps even ML workloads. And the problem would become,

Starting point is 00:25:49 of course, even more challenging if you think about mixed workloads. Yeah, so that was going to be my sort of next, we've preempted my next question now. I was going to put my reviewer number two hat on and be like, what are the limitations with the work? And this is kind of all analytical queries. What about if we want to start doing different things? We want to start writing stuff as well? What happens there? but yeah i guess uh you've kind of you've answered that there that's going to be something you look to tackle in in future work but kind of i guess the similar similar sort of question based off that though is what do you think are the current limitations with this with this approach and maybe with this approach in general? Right. So one thing is, as I said, that we only consider CPU resources and only work sharing as a type of optimization.

Starting point is 00:26:31 And also group share requires the underlying system to be able to adaptively change the query plans. And this is something that there is a lot of research on it. So there are ways to do it, to adaptively change the equipment. But the other hand, it's not something that every single engine supports right now. Okay. But that feels more like an engineering challenge to overcome rather than like, that's doable for sure. Yeah.

Starting point is 00:26:59 Awesome stuff. Cool. So, yeah, my next question is about the impact you you think this work can have or maybe already already has had so yeah i guess what what what impact do you think it can have longer term and has there been any impact in the short term right so really how i think about it is that without the idea of functional isolation, although cross-task optimization sounds tempting to use, developers would actually in many cases not use them because of practicality reasons and also because of the unpredictability they bring. Basically, you don't know what the effects will be on individual tasks.

Starting point is 00:27:41 So our goal with this work was really to give a framework to developers that allows them to employ different cross-task optimizations in order to use resources more efficiently. Basically to be able to use all of this research that has been happening for cross-task optimizations without sacrificing performance isolation. Nice, yeah, that's awesome. I can definitely see it having a big impact on on on making it easy for people to access the latest sort of techniques and things it can definitely be a massive win so i look forward to seeing how it how it develops over the over the coming years for sure so yeah i mean we off before we started recording we kind of spoke about this being a slightly

Starting point is 00:28:21 detour from your normal research I guess so has there been any surprises kind of dipping your toes into a slightly different area of data management and yeah what's the kind of most interesting you've kind of interesting thing you've learned whilst yeah working on this project right so I think one interesting thing was that this is a problem where the search space is really huge. And so it's really a good example where you should not try to find the optimal solution, but rather just try to find a good enough solution, but do it as fast as you can. So converge to the groups as fast as you can. And this is something that group share achieves with this elegant iterative process that it is using

Starting point is 00:29:05 to deterministically basically convert to the sharing groups. So this was one interesting lesson I would say. And then another interesting thing is the importance of using runtime information rather than just trying to predict things a priori and like have these estimates that can be quite inaccurate. Yeah, definitely. I can see the way that the runtime sort of stuff, if you can collect that information, predict things a priori and like have these estimates that can be quite inaccurate yeah definitely i can see the way that the runtime sort of stuff if you can collect that information efficiently and how you can use that to optimize various things i guess there is is there a

Starting point is 00:29:35 possible space as well for combining some of the upfront information with also the runtime stuff as well and kind of having almost the best of both worlds there as well for sure yes so definitely i think there is room to think of some sort of heuristic that would do some course grouping at the beginning and then group share with the runtime information would only like do some small tweaks to make sure that the final grouping is actually consist of what we call sharing groups so this would be definitely something interesting. It could speed up the convergence. Yeah, nice. Cool.

Starting point is 00:30:11 Yeah, so I teased a second ago there that some of your other research is in a different area. So now would be a good time for you to tell our listeners all about the work you've done on stream processing. So yeah, the floor is yours, Lenny. Tell us all about it. Yes. So as we said, my main line of research is on stream processing. And the overarching goal of my PhD is to enable scalable and efficient stream processing. So previously, I was working on how to partition streaming data to the concurrent workers of a streaming engine

Starting point is 00:30:41 in order basically to eliminate execution strugglers and essentially just maximize performance. And at the moment, we're actually working on applying ideas from the CIDR paper we have been talking about today to data streams. So the problem there becomes a lot more challenging because of the volatility of streams, but also due to the fact that streaming queries produce results on a window fashion. So you need to meet the requirements on every single window. So hopefully you will soon be able to read more about this problem

Starting point is 00:31:16 and we will share more stuff with the paper. Fantastic. Yeah, we look forward for that to come out. You can come back on the podcast as well and tell us all about that once that's been published. So, yeah. That would be the best, yes. Awesome stuff.

Starting point is 00:31:30 So, yeah, kind of going back to the current side of paper, we've been talking about how did this sort of come about then? What was the origin story here? It's like you say, it was different to your normal line of work. So how did this idea sort of synthesize? I know it's something you say it was different to your normal line of work so how did this idea sort of synthesize I know it's something you did with one of your friends and you kind of thought hi yeah let's do this so yeah what was that kind of like? Right so Panos who is actually a co-author in the paper was graduating at the time close to graduation and his PhD is on work sharing so

Starting point is 00:32:03 with his work he really made work sharing a lot more practical. And so we were discussing about it. And we said that although not only work sharing, but in general, cross-task optimizations can make processing more resource efficient, people might still not be using it just because of the risk of penalizing some individual queries. And this is how we basically started discussing about this problem and trying to come up with a solution about it. Awesome, yeah.

Starting point is 00:32:30 It's always nice when you can kind of work with one of your colleagues as well and something kind of away from your usual sort of space, right? And kind of, yeah, that's nice. You see these ideas synthesized and the serendipity of it almost. Yeah, definitely. You also learn a lot of stuff when you work on new things and with new collaborators you you definitely learn a lot well yeah for sure i mean and you kind of you never know what techniques in your area can be applied to somebody else's area and vice versa right you can always

Starting point is 00:32:57 like oh yeah we could use that looks very similar to problem x why don't i use that and yes for sure and that's actually a nice little lead into this my favorite question and about the kind of creative process and how you go about sort of generating ideas and then choosing which ones to pursue because i mean i have like loads of ideas which are just probably stupid and terrible and i need to discard them but sometimes it's hard to let go of an idea you've had so yeah tell us about your creative process right I think this is the question that every PhD student has at the beginning of their PhD like how do you come up with ideas how do you choose so I think I'm

Starting point is 00:33:37 actually quite lucky to be part of a very diverse lab where people work really on many different topics and are also very open to discuss. So I think this is a great environment to generate ideas, but also to test your ideas. Because as we said before, like you present an idea and then someone from a completely different perspective will criticize it. And this is very valuable feedback. So this is one way to come up with ideas. Another way is more like, you know, one project sort of leads to the next. For example, here with this paper, we kind of linked it a bit to scope

Starting point is 00:34:15 so that we can actually solve the problem, but then you can raise an assumption and see how could you solve, how could you provide a more general solution or apply the same idea to a different type of workload yeah so kind of what i'm taking there from that is that the the environment's really important being surrounded by super smart clever people in a collaborative environment where there's always open discussion going on is is definitely a big plus for for kind of innovation

Starting point is 00:34:41 i guess and for creating ideas but then also i like what you're saying about having when you approach a problem imposing some constraints initially and then solve it for n equals two before you then try and solve it it's like find the general solution right kind of and relaxing assumptions iteratively and then explore the space otherwise you'll go crazy trying to find the perfect solution straight away right yeah? Yeah, yeah, definitely, definitely. Plus, another thing is when you work on experimental research, you work on one topic in one particular project and you're trying to solve some particular aspect of the system,

Starting point is 00:35:17 but then you observe different bottlenecks as well. So this is another way to get an idea about the next project. Yeah, that's it there's projects keep coming along coming along so and then there's kind of a sub question to what i when i asked a second ago about kind of okay these these projects keep coming up and this is a cool thing to pursue and you get some important feedback from your colleague you've been like maybe this is not a good idea for x y and z but like how do you when you see a bottleneck how do you know okay how do you know not to pursue that specific thing straight away and be like okay maybe i should do this and how

Starting point is 00:35:50 do you avoid context switching constantly uh that's a hard question i mean i think personally for me when i am working on a particular project and then I find different sort of bottlenecks, I just, I guess, focus sometimes the opposite happens. For example, when I started my PhD, I was working on data integration for streaming data. And then there was this problem with load balancing and really the approach I was designing was not performing well because of just load balancing. So this is how I ended up working on partitioning and I never finished the project on data integration.

Starting point is 00:36:45 So sometimes the opposite happens yeah yeah I know I just because personally I'm all I'm a I'm a sucker for like oh new shiny thing and then I get distracted and like I kind of forget about the thing I originally did and I've gone down some massive rabbit hole and but anyway yeah sometimes it works out right um but yeah great stuff so yeah it's time for for the last word now so what's the one thing you want the listener to take away from this podcast today so i would say that with this project we are targeting the problem of having workloads with very high query concurrency and why is this a problem is because processing all of these queries really puts a lot of pressure on databases. And if we continue to use resource isolation, this becomes very expensive.

Starting point is 00:37:33 So our goal with this work was to show that there is actually a practical and robust solution that allows us to use cross-task optimizations while not penalizing individual queries. And we also highlight the importance of incorporating in a solution both the macro and micro view of performance, because I think this is really important for someone to actually use your solution. Basically, you shouldn't care about just the performance achieved for the entire workload, but also look at the performance of individual queries. Well, thank you very much for speaking to us today, Eleni. It's been a fantastic chat. If the listener is interested, we'll link all of the relevant materials that we spoke

Starting point is 00:38:17 about today in the show notes. And yeah, we'll see you all next time for some more awesome computer science research.

Your Ad Here

Disseminate: The Computer Science Research Podcast - Eleni Zapridou | Oligolithic Cross-task Optimizations across Isolated Workloads | #51

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.