Disseminate: The Computer Science Research Podcast - Tammy Sukprasert | Move Your Workloads To Sweden! | #53
Episode Date: May 27, 2024In this episode, we dip our toes into the world of sustainable computing and interview Tammy Sukprasert about her research on reducing carbon emissions in cloud computing through workload scheduling. ...Tammy explores the concept of shifting cloud workloads across different times and locations to coincide with low-carbon energy availability. Unlike previous studies that focused on specific regions or workloads, her comprehensive analysis uses carbon intensity data from 123 regions to assess both batch and interactive workloads. She considers various factors such as job duration, deadlines, and service level objectives (SLOs). Tammy's findings reveal that while spatiotemporal workload shifting can reduce carbon emissions, the practical upper bounds of these reductions are limited and far from ideal. Simple scheduling policies often achieve most of the potential reductions, with more complex techniques offering minimal additional benefits.Additionally, Tammy's research highlights that as the energy grid becomes greener, the benefits of carbon-aware scheduling over carbon-agnostic approaches decrease. This discussion offers crucial insights for the future of cloud computing and sustainable technology. Whether you're a tech enthusiast, environmental advocate, or cloud industry professional, Tammy's work provides valuable perspectives on the intersection of technology and sustainability. Join us to learn more about how innovative scheduling strategies can contribute to a greener cloud computing landscape.Links:Tammy's LinkedInOn the Limitations of Carbon-Aware Temporal and Spatial Workload Shifting in the Cloud EuroSys'24 Paper Carbon Savings Upper Bound Analysis Hosted on Acast. See acast.com/privacy for more information.
Transcript
Discussion (0)
Hello everyone, Jack here from Disseminate and welcome to another of our cutting edge episodes.
The topic of discussion today will be sustainable computing.
Specifically, we'll be talking to Tammy Sucre-Bissett, who will be telling us about her recent work on the limitations of carbon aware temporal and spatial workload shifting in the cloud, which was recently published at EuroSys 2024. Tammy is a PhD student
in the Sustainable Computing Lab at the University of Massachusetts Amherst, where her research
interests are system software. Welcome to the show, Tammy. Thanks for having me. The pleasure
is all ours. Let's get started then. So it's customary on the podcast. Obviously, I give you
the very high level overview there, the highlight reel of who you are but tell us more about yourself in your own words
and yeah what got you interested in this area of research in the first place? Hi everyone I'm Tammy
I'm a second year PhD student at the University of Massachusetts Amherst. I'm currently doing a
second year of my PhD but during my undergrad, part of it was during the pandemic.
And it was challenging to just wake up and join a Zoom call because it was really boring.
But there was one lecture that it was not too bad to join in the morning.
And as you may guess, it's a systems lecture yeah it was
interesting to learn how system software makes computer hardware solve complex problems and serve
different purposes so when i was about to do a phd i was like yeah maybe this is something i could do
so yeah and here i am talking to jack awesome stuff what are the yeah what are the, I mean, it must've been really hard doing your, doing your undergrad
during, during, during the pandemic.
I mean, I found it hard enough to go to lectures anyway, when I was doing my undergraduate.
And so even if it was online and I could watch it, I'd always convince myself, I'll watch
it later on and watch it later on.
Yeah.
I don't know.
That never really happened in reality.
So did, did you always know Tammy from sort of, um um kind of a young age that the PhD was the
end goal or was it something that you discovered as part of your studies and thought maybe PhD
would be something that I'd want to pursue yeah so during actually during the pandemic that was
when I decided maybe PhD is something I could do I was fortunate enough to be offered some position
in the lab as an undergrad. And
there's a systems lab actually. So it was like, oh yeah. I mean, I went in not knowing anything,
but came out actually deciding to do a grad school. So pretty interesting opportunity.
Well, let's talk about your research then. So let's, before we do that,
let's set some background for the chat today then let's set some context so we're talking about sustainable computing and this sort of idea of being carbon aware and being kind of tech companies
being aware of their carbon footprint and so maybe you can kind of start us off by telling us why
there is a focus on this why do we need to reduce our carbon footprint and then we can maybe dig
into a little bit more kind of the rationale behind being able to shift workloads around.
So, yeah, set us some background, Tammy.
All right. Ready to take the ride.
OK, so to understand why we have this focus on computing's carbon footprint,
we need to first understand that there is an increase in computing demand. And in fact, it's increasing
like at a rapidly accelerating rate.
And part of the reason why
is the rise of artificial intelligence.
Like, you know, everyone knows like chat GPT, right?
So research has consistently shown that,
and I can give you one of the name is the Epic AI
that the training compute has grown by a factor
of in fact, like 10 billion since 2010.
That's a lot.
That's a big number.
A lot of zeros.
Yeah.
Yeah, a lot of zeros.
So the amount of computing is growing, right?
And what that means is that the electricity consumption of data centers is also increasing at a rapidly accelerating rate as well.
So currently, just to set the context here, data centers consume about 3% of world electricity.
That's a lot. And it is projected to increase in the future because you need more data storage
and processing in data centers. So it makes sense, right? So to break it down,
in the US, it's projected to increase from 200 terawatt hour in 2022 to 260 terawatt hour in
2026. So that's like 30% increase of data center energy consumption. And in Europe, in 2022, the
data center energy consumption is about 100 terawatt hour. But in 2026, this is projected to be about 150 kilowatt hour for data center consumption.
So that's like 50%.
So that's a crazy increase in energy consumption.
So although there is an increasing energy efficiency in hardware, the demand of energy
consumption still outpaced the energy efficiency in hardware, the demand of energy consumption still outpaced the energy
efficiency. So moving forward, the cloud exponential growth will translate more directly into the rise
in energy demand, right? So this makes data center to be one of the primary contributors of the
global carbon emissions. So now the energy efficiency is reaching like the bottleneck so instead of
focusing on just reducing energy consumption you need to now also look at the carbon footprint of
eating side so because in the end of the day what is the the real matter here is the issue of climate
change yeah so let's let's shift it on now so we know know this is a problem. So we want to reduce our carbon footprint for, I mean, 3% of the total energy consumption of the globe is a big number. But yeah, of course, we want to sort of mitigate against this problem because we want the planet to keep, we don't want to ruin the planet, right? So we need to sort of be aware of climate change stuff and we need to tackle this problem and tech is aware of it so where does the temporal and
spatial workload shifting aspects of it come into this then so can you tell us more about that
yeah so it is interesting how computing has many dimensions of flexibility so it allows people to
decide when how quickly and where to compute so in reducing computing carbon footprint, we want to shift the workload
to the time period or location with low carbon intensity, right? So simple. And that's the nice
part because most computing workload has that temporal and spatial flexibility. For example,
the batch machine learning training jobs have a substantial temporal flexibility that
enable workloads to be suspended during the high carbon period and you can resume the workload
during the low carbon period and another example is the interactive inference requests
for object detection which may have spatial flexibility that enable the like request to
be migrated and serve at the location with low
carbon intensity. Nice timing. So when you say carbon intensity, what do you mean by that exactly?
Yeah, so different energy sources like coal, solar, wind has different emission factors. So let's say
50% of your electricity is from coal and another 50% is from solar, right?
So it's just a weighted average from those two sources.
Okay, cool. Nice.
So given this domain, we know the demand's high, we want to solve that problem.
We've got this really nice property of computing that workload's flexibility
that we can shift them with respect to time and with respect to space,
depending on the type of workload we want to do where does your your paper fit into this thing so what is the the elevator
pitch for your your research i guess tanny sure so given all the discussion about how workload can
be shifted across time and location right so like it is important to understand the potential of
carbon reductions from the temporal and spatial workload shifting, and also sometimes
referred to as spatial temporal workload shifting. So the goal of this paper is to quantify the
upper limits of carbon reductions from the spatial temporal workload shifting for different workload
characteristics. And the diversity in workload characteristics matters because we want to help the research
community understand where we are today and where we should focus on to achieve substantial impact
on carbon reduction. Nice. Yeah. I love the word spatio-temporal. It sounds cool.
That could be a tongue twister sometimes. A little bit. But if you nail it, it sounds cool,
right? You go, oh yeah, I'm doing some spatio-temporal analysis people look you're like wow okay yeah yeah cool right anyway
let's talk about this study you performed then so can you give us the the overview of the study i
guess then and just tell us more about the analysis you performed yeah sure So to understand the upper limits of spatio-temporal workload shifting, we use a carbon trace data set that includes 100 and her region worldwide from Electricity Maps Web API.
Each region's data has an hourly carbon intensity over three years, so 2020 to 2022. So that's where the analysis, like the data set that we use for the analysis.
And the analysis is grouped into four main categories. So global carbon analysis,
spatial migration, temporal shifting, and lastly, a what-if scenario. In the global carbon analysis,
we look into magnitude and variation of the carbon intensity across 123 regions.
In spatial migration, we quantify how much carbon reduction is possible from spatially migrating the workloads and how the capacity and latency constraints impact the carbon reductions.
On top of that, we also look into the workload migration policies.
For temporary shifting, we quantify how much carbon reduction is possible from temporarily shifting a delayed tolerant batch workloads and how much the carbon reduction
varies with different workload characteristics. So like the job length and how much delay we can
put in for different scenarios. and in the final part of
the analysis we look into some what-if scenarios and in this case like the main one is what happened
when a region becomes greener okay cool yeah i mean that's kind of hopefully the goal for the
whole world right we're gonna go in towards net zero right so hopefully all this is going to
become greener so but we'll see how that actually plays out in practice but awesome cool so there's these nice four sort
of breakdowns here that the global the global level of space and the time and these what if
scenarios so what what techniques did you actually use to sort of help construct this upper bound of
the carbon reductions we can achieve yeah what are the sort of techniques you used to analyze
this data set yeah so to
construct an upper bound on the carbon reductions we modeled the compute in a very ideal setting
so we only include like key aspects like job land some network latency and data center capacity
so the the fundamentals that may affect carbon emissions, right? But yeah, despite not
being like a complete analysis, like, I mean, of course, in the real scenario, there's so many
constraints you can put in, right? So we just focus on the fundamentals. And that's where,
so yeah, so the ideal case, so we can quantify the upper bound. And for temporal and spatial workload scheduling, they require two different constructions of the upper bound scenarios.
So in spatial migration, the scenario where workload can achieve the upper bound is when the workload itself can be migrated to anywhere in the world.
And all the regions have data centers
with infinite capacity, right? And on top of that, there's no data transfer overhead when migrating
the workload. So basically, you just place anywhere you want for the workload. And for
temporary scheduling, the scenario in which the workload can achieve the upper bound is when the workload has a perfect knowledge of the future carbon intensity in this
case for the whole year so with this level of flexibility a specific job can choose the best
time slot to run when doing the temporary shifting to achieve the lowest carbon emission in the region nice yeah i say i guess it's such a big
space initially with all the constraints you've kind of got to pick the important ones to sort
of make sense of all the sort of i because otherwise like you would never like get anything
down on paper right if it was just kind of there's like thousands of different dimensions to it so
yeah that's definitely yeah it's definitely the way to approach this problem cool so let's
talk about the the results then like what the findings were so what's the message that can
spatio-temporal workload shifting actually help us and sort of reduce our carbon footprint and
save the environment yeah so the results so we let's start with the global carbon analysis first
to set the context here so right so the result shows that an average carbon intensity across all 123 regions is 368 grams of carbon per kilowatt hour.
So it doesn't make any sense yet. So bear with me.
And then what happens is this. This is the carbon emission of a unit job. And to reduce the carbon emission
through spatial migration, we want to migrate the workload to the lowest region. And in this case,
in our data set is Sweden. And when migrating all the workload to Sweden, we reduce the carbon
emissions of the whole world by 96%. And because every data center has an infinite capacity in this case. However,
in the paper, we also have a constrained scenario where every data center has only 50% unused
capacity. And with this scenario, we can only achieve the carbon reduction of the whole world
by about 52%. For temporary shifting, the takeaway is that temporal shifting offers more benefits to short jobs in high-variance regions, and the high-variance regions include regions like those in Oceania.
However, high-variance regions that could harness the benefit of temporal shifting
has low wearings, so they can't do much of temporal shifting.
Also, as a region becomes greener, the average carbon intensity of the region decreases,
causing the carbon agnostic scheduling to also yield lower emissions.
So what this means is that as a region become greener,
the benefits of carbon-aware scheduling diminish as the, yeah.
So yeah, it's interesting that on the first dimension of spatial,
I guess if everywhere has got infinite capacity,
it makes sense to move to the place that has the lowest carbon intensity,
which happens to
be sweden so we're all moving to sweden right we're all going to run our jobs in sweden's data
centers now right um but i guess um you said that in when you actually relax that or kind of
introduce that constraint of it not having infinite capacity the best you can do there is like 52
percent how how much does that map to actual like of all the world's compute versus how much
capacity there is actually in sweden like i don't know how many data centers there are in sweden
like is that even practical if we could do that like is it like if we just then okay we're moving
everything from eu west into um into sweden now and is that even possible no short answer is no so data centers have um they need idle capacity for
stability purposes so in general they need some and they need some level on use capacity
and in the sense that so even we say like yeah we everyone migrate to sweden and we're done and in reality you can't do
that or like you you have to fill one capacity of the lowest data center and then move on to the
second lowest in reality you can't do that yeah yeah cool yeah and on the temporal stuff like i
guess it's sort of um what's the the phrase for it but like the places that could benefit from it i
think that in your
in your paper you mentioned i think it was mumbai maybe about kind of one of these places that would
benefit from being able to shift workloads around but then the problem is the energy that's produced
there has is so carbon intensive and there's such a low sort of variance in it that all the all the
energy is actually produced by burning fossil fuels right so it's not actually any benefit
whether you run it at midnight or at lunchtime the energy is still coming from the nearest power station right so
you're kind of a bit screwed on on that front as well um and it's interesting that oceania has the
um the highest variance and that's that was an interesting interesting observation and on this
magic number 368 and the per unit job a pair of 368 grams of carbon per unit job.
How did you arrive at that number?
Yeah, so it's an average carbon intensity for all 123 regions in, I think, in year 2022.
Okay, right.
So the most recent one in our data,
we just like summed them them up, average it out.
Yeah.
Yeah.
Cool.
I'm getting the vibe here that spatial temporal shifting
isn't the silver bullet that we need it to be, right?
So yeah, what is the take on it then?
So is it going to save us spatial temporal workload shifting
or is it not what we need it to be really?
So we still should do it not like oh like after everyone listening to podcasts everyone just drop what they're doing
like please don't do that like it's still gonna work in theory but it's not yeah of course like
what you said it's not a silver bullet you still need some other like aspects of computer system to harness to reduce carbon emission not just like one way
and and i think it's just in general like there's no one mega solution but in small groups of
solutions working together so yeah yeah yeah i guess as well i mean there's you touched on it
earlier on where you're talking about the talking about some of the practical constraints of that.
It's all good and well if we do move to Sweden, right?
All our compute there.
But if I'm in Australia
and I'm running an application that's latency sensitive,
it's not good for my application, right?
And I guess there's other constraints like GDPR
and sort of these sorts of constraints as well,
which adds to this complexity of optimising
for all these various constraints.
And it kind of, I guess, limits this approach
even more, I guess.
Yeah, exactly.
Cool.
Is there any way we can,
from the way we actually construct our jobs,
is there anything we can do to make the jobs
more amenable, amenable, amenable?
There's a word. That's the word, I think, amenable amenable amenable there's a word that's the word i think
amenable and to this approach like if we can kind of make our jobs more flexible with respect to
time and have less big jobs and have more small jobs would could that help stuff as well if we
kind of change the way people engage with cloud computing environments yeah i think we should so
if you want to harness like temporal shifting you should
make your job like smaller chunks that's like the basis right like basically you want to fit
your job into like the low carbon period as much as you can but at the same time people also look
into like dynamic voltage frequency scaling or dv. So there are multiple outlets for this,
not just shifting the workload like this paper.
Yeah.
Is this something, Tommy,
that the cloud providers are actually actively doing
at the moment in terms of,
obviously these tech companies are aware
of trying to reduce the recalibration footprint,
but how much work are AWS, for example,
actually doing on this front to shift computation around?
Because kind of, I guess, at the moment,
my interactions with cloud environments, when I want to run a job,
I go on there and say, hey, give me a VM in, I don't know,
US South or whatever, and then it'll run there, right?
But I'm very much sort of constraining it.
So how much do they do stuff behind the scenes
that I'm not aware of, I guess is what I'm asking?
So cloud providers, it's a little tricky for them, right?
Because they have like deadline sensitive workloads
that need to, you know, serve.
So maybe that might not be there.
That might not be something
they can reduce carbon emissions at the moment.
But I'm pretty sure they have like some non-urgent jobs that they can schedule during low carbon period.
And that is like similar idea for like, you can schedule your workload during the time where the electricity bill is cheaper.
So yeah, so probably they probably look into something like that yeah for sure i mean
it kind of makes sense right if and i was kind of when we've been having this chat i've been thinking
that my mom always puts the washing machine on and before like 5 a.m or whatever before half five
because it's cheaper right so i guess cloud it goes cloud providers they will do the same thing
right because there's a business incentive there if the energy is cheaper i'm gonna if i can run
these jobs there then then yeah makes total sense right and i was also thinking as well i don't know if there's any
anything kind of uh in this space at the moment for this or if cloud providers do do this but
i know i'm kind of thinking when i go and book a train now or a flight or whatever it always says
on my ticket on the booking whatever that this is the amount of carbon that this flight is going to
cost right i'm going to put into the environment?
And it's just, I guess, making the consumer aware that they're actually having an impact on the environment.
Is there anything, like, is there a way that cloud providers
could express that information to me as a consumer of buying compute
to be aware of the fact that I am using kind of,
I am emitting carbon into the environment
and sort of then
providing an API for me to be more flexible with my workload to reduce my carbon emissions.
Yeah so I mean cloud providers can definitely do that because they sometimes have their own like
hourly matching like let's say for example google they have hourly matching
data definitely so yeah my short answer is yes they definitely have resources or data to do that
but it's hard to measure carbon in general right like like which source of electricity is actually
serving your vm like we never know so yeah i guess it's hard to be that granular right
on something like that because it is the pulling energy from a kind of a host of places right i
guess um but yeah that's interesting cool um yeah so my uh my next question is where do we go next
or where do you go next tell me from this from this work what's next on your research agenda
for spatio-temporal workload shifting?
Yeah, so this paper in general talks about cloud data centers' workload scheduling to harness low carbon intensity period or location from electricity grids, right?
So it's cloud attending to the grid.
However, in the future, as the electricity grid integrate more solar and wind to lower the emissions, they will need more flexible solutions to balance the electricity demand from data centers and the variations in their resource availability. carbon emissions currently wearing the grid cloud platforms might have to be more effective in
supporting the grid's wearable operations so that the grid can incorporate more renewable energy so
maybe our future work will quantify the potential of cloud platforms and supporting
grids increasing renewable penetration yeah for sure that seems a really interesting research direction
for sure tommy cool yeah i mean we obviously when we do research and we always want to have impact
right that's kind of i guess the goal to affect the the real well be that industry or just people's
day-to-day lives so i guess if i put this question to you kind of what impact do you think your work
on spatio-temporal uh workload shifting kind of have and yeah how
do you think as a as a software engineer as a developer i can like leverage these findings of
your work in my day-to-day life yeah so we briefly touch upon this so let's say for spatial shifting
right if we want to go for that path we need to be aware that there will be an under-provisioning of resources in high carbon intensity regions and over-provisioning in low carbon intensity regions, like the Sweden example.
If everyone moved to Sweden, then the rest of the world is not doing anything. for the low carbon regions, they should focus on utilizing idle capacity, but also balancing the
stability aspect so that we briefly touch upon that. And for high carbon intensity regions,
they need to look at aspect of energy efficiency. So like what I just said, like dynamic voltage
frequency scaling, DVFS.
Yeah, also like spatial migration doesn't have to be just data centered.
It can happen through like edge.
So that could be more advantage than just relying on the cloud because maybe you can exploit like local renewables. and for a temporary shifting we also need to think about how we can break down big tasks
to small jobs so that we can exploit the low carbon period that doesn't happen
during like throughout the day yeah yeah so i guess this this must have been a really fun
fun project to work on tommy i mean what was the the sort of most surprising thing when you were
doing this study um and yeah what was the kind of the most interesting the most interesting lesson what
was the most surprising thing you learned while working on this topic yeah i mean during actually
the one thing that i really like about like the most interesting i really like about this work is
like when we started carbon traces carbon intensity is like up and down traces like a sign graph right somewhat so we thought like yeah maybe we have a
lot of regions we have 123 regions at some point like the the carbon intensity traces will like
cross each other so we're like okay so spatial migration maybe it's like you really need a
complex solution for this like to harness every low point or what is the optimal migration policies
right but it turns out that in an ideal world an ideal case where every region has infinite capacity and every region can migrate to the other like no GDPR you just need to migrate
once and you get most of the savings so that was like wow is that easy like what
yeah yeah you'd expect this big complex sort of solution and then it ends up being sort of
yeah just move to Sweden on a Tuesday at lunchtime and then stay there because that's the best place right yes so yeah that was one of the things
that i usually tell people like yeah in ideal world you migrate to sweden but and you're done
so that's interesting that's awesome um where did this sort of project and this sort of uh topic and
this like research idea come from originally then, Tammy?
What was the backstory of the paper, really?
How did it come about?
Yeah, so for a while, there are papers that focus on exploiting
either spatial or temporal scheduling to reduce the workload carbon emissions, right?
Like there's Wait a While, there's Emma Struble's paper that my great AI
workload. And we noticed that the papers evaluated the work in a limited setting. So they usually
pick a few regions or using a narrow set of workloads. So we just wonder what is the potential
of the carbon we're scheduling for in a large scale like for the whole
world and over a long period of time right like three years from 2020 to 2022 so that is how this
paper came about like you just want to see what is the big picture of the whole world instead of
a few regions or just one particular workload
yeah nice taking this sort of holistic view of it and that's really awesome as a nice segue from
from what we've been talking about there and talking about ideas and where things origin
kind of things originate from um is like how do you personally approach generating ideas
and then once you've sort of i mean i have thousands of terrible ideas
every day right how do you then choose which ones are actually worth pursuing and dedicating a
significant portion of your time to working on yeah so reading a lot of papers as any FPHD student is expected to do.
So yeah, when you read,
so reading other interesting papers help a lot.
So, and ask the question along the way,
like why did they do this particular experiment,
this particular evaluation or this particular setup?
Why not the other way around?
And yeah, like how this paper came about
it's the same thing like instead of picking four regions why not 10 why not 100 so just read a lot
ask a lot so i think that's that's really really satisfying challenge assumptions i like that when
you come across something like why did they do it this way maybe if we did it different like what
would happen then so yeah i think that's um that's definitely a good way to approach the creative
process and cool tell me well we've arrived at the time for the last word now and and so yeah
what's the one takeaway you want the listeners to get from this podcast episode today we're ending
i don't want to go.
So at the current state of the world, carbon-aware spatio-temporal workload shifting is likely not a panacea for significantly the goal of reducing carbon emissions in different layer of computer systems i think we can come up with new ideas to reduce carbon emissions
and with that complement this temporal and spatial shifting yeah so that's my take that's great i
mean there's a call to action for everyone in industry and everyone in research we need to
work together to solve this problem so yeah they say that's the mission of the podcast as well. So we're bridging that gap. So yeah, everyone's been told now. Excellent stuff, Tammy. Thank you very much for talking to us today. It's been a fascinating chat and I'm sure the listener will have really enjoyed it. We'll put links to all of the relevant materials in the show notes as well. And where can we find you on social media? Are you on LinkedIn, Twitter and where can we find you on social media are you on linkedin twitter
where can we find you yeah so um linkedin would work so tana thorn suppressor my first and last
name or t suppressor cool you'll find me there yeah awesome well we'll put that as well in the
show notes everyone can connect with you and reach out if they wish so.
And yeah, we'll see you all next time
for some more awesome computer science research. Bye.