PurePerformance - Optimizing Cloud Native Power Consumption using Kepler with Marcelo Amaral
Episode Date: January 29, 2024Marcelo Amaral is a Researcher for Cloud System Optimization and Sustainability. With his background in performance engineering where he optimized microservice workloads in containerized environments ...making the leap towards analyzing and optimizing energy consumption was easy.Tune in to this episode and learn about how Kepler, the CNCF project Marcelo is working on, which provides metrics for workload energy consumption based on power models it was trained on by the community. Marcelo goes into details about how Kepler works and also provides practical advice for any developer to keep energy consumption in mind when making architectural and coding decisions.To learn more about Kepler and the episode today check out:LinkedIn from Marcelo: https://www.linkedin.com/in/mcamaral/CNCF Blogpost on Kepler: https://www.cncf.io/blog/2023/10/11/exploring-keplers-potentials-unveiling-cloud-application-power-consumption/Kepler GitHub Repo: https://github.com/sustainable-computing-io/kepler
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance.
My name is Brian Wilson and as always I have with me my wonderful co-host.
Hi Andy.
Hey, what's up Brian?
Nothing, I had a weird dream.
I gotta tell you about my dream.
Yeah, yeah, this was a weird one. I had this dream that you and I started a podcast together, like, long time ago, 2015, I think, and that today we were recording our 200th episode or something. It was a very bizarre dream. And I don't know if it's real, but it's really cool if it is.
200 episodes?
Yeah.
Are you serious?
That doesn't include all the perform stuff. Yeah. Yeah. Wow. 200 episodes yeah that doesn't include all the perform stuff
yeah
200 episodes and I mean we couldn't
have found a better guest
and a better topic
well before we go to our guests I want to give a special
shout out to any of our listeners out there
especially for any who have been with us since the
beginning because you know without
people listening to it we wouldn't be allowed to
keep doing this
so big, big thank you to everyone who's been here
but the people
who make it all possible as you're
getting to are our guests
nobody would want to hear you and I talk to each other
200 times
especially when we look at the download statistics
those that we do solo, they are typically
not as well rated
anyway, hey, we want to not keep Marcelo waiting.
Marcelo, thank you so much for being on the show.
I'm just looking at your LinkedIn profile,
which we also link to in the description.
It says, Research on Cloud System Optimization and Sustainability,
which is a very hot topic.
Before we dive into the topic, Marcelo,
can you just introduce yourself a little bit to our listeners,
like who you are, what you do, and what actually brought you to the place where you are right now?
Yeah, sure. I'm trying to do it quickly, you know.
First of all, thank you for the invitation.
So I'm very glad to be part of the podcast, to be nice experience to be here.
And so I'm Marcelo Amaral.
I'm actually from Brazil and I did my undergrad in Brazil.
And I went to Barcelona, Spain for my PhD,
where I was working at Barcelona Supercomputer Center for five years with
collaborations with IBM, IBM Research.
And after I finished my PhD, I went to IBM Tokyo for a postdoc and then I became a regular
researcher, an employee in IBM.
My background, it's performance, as you guys. So my PhD is related
to performance analysis and optimization on the cloud, HPC and cloud. Also, I was doing
things related to that. And when I joined IBM, I was working on performance analysis of workloads on the cloud,
especially microservice to try to understand the bottlenecks, you know,
where the where performance problems are coming from all the hard problems in it.
That's the classical problems when we are analyzing applications.
And after that, the IBM acquired the Hat, and we had some collaborations between IBM Research and Red Hat.
And I joined the project that's called KubeVert, which is a project to create virtual machines on Kubernetes ecosystem.
And I was responsible to do performance analysis and optimization
on kubewirt, especially for scalability analysis.
I created the CI-CD pipeline for performance analysis in the kubewirt.
And then after that, I joined the Kepler project.
The Kepler is the project to measure the energy consumption of applications on the cloud.
And that is a project that started with Red Hat.
And IBM joined efforts to improve the project.
Yeah, that's the quick background.
Thank you.
That's awesome.
Yeah, and if people want to connect with you, because obviously you have a really cool background tying it to performance. Our podcast was initially launched
to really talk about performance engineering topics. That's why
pure performance. Before we dive into the sustainability topic and
more into Kepler, I got a question for you. So you said you've optimized
and analyzed performance problems in microservice environments on
Kubernetes or in containerized
applications. What is the number one thing that shines out of being typically the reason for
systems not performing, not scaling? Is there a number one thing or maybe what are the typical
things you found? Well, it's many things, of course, but I would say the two things are most important ones.
First, network, you know, latency.
It's very important for a microservice.
It's a very small message.
So throughput normally is, of course, depends on depends on which microservice you are
analyzing. But it's latency typically is more important for microservice.
There are small services sending small messages around.
And if there are some jitter or something that is happening with the network, it is not stable enough.
We can see a lot of problems with that.
And storage also depends.
If you have database, things like that, storage impacts a lot.
And if it's storage impacts a lot. And it's if it's, you know, distributed storage system.
So then it's network plus storage, isn't it?
So it's hard also to isolate when you have distributed storage
to see what's the problem is really the storage or the network.
But I would say that these are the most common thing
that impacts the performance
CPU is important, but I think this is not the first number one thing that happened.
But there are a lot of things that impact the performance of the micro server.
But I would say this is true.
Start with that and say the other parts.
Yeah. And obviously it makes sense, right?
If you think about that traditional architecture,
like monolithic, you put data in,
a lot of activities happening,
and then you get data out
and you send this over the network.
Now with microservices,
you are breaking up this problem
into small individual pieces
and you always have to, you know,
send the individual pieces of the result
to the next service. And, next service and things have to be
stored somewhere in caches or wherever it is.
So it makes a lot of sense that there's more constraint and more pressure on these systems
that connect the microservices.
Now do you feel that most of these problems could have been avoided by better design of the microservice
and also the type of data that was sent back and forth and stored?
Or was it more that the underlying setup of, let's say, your network itself?
And I don't know, maybe in a Kubernetes environment, for me, always service meshes comes to mind.
Or any proxies in the middle was it more like the underlying system was not properly configured
and sized or was it more an application architectural issues of not like implementing
the microservice well enough yeah again so everything that we were saying, so both can impact the performance. So I would say that the design decision is important, of course.
If an application is created with the monolithic in mind,
so it means it's not full parallelized, it has dependence, synchronization problem,
it will be hard to to scale isn't it so but if it can be more
independent and the request can go more parallel you don't have like single queue you know it's
typically we see queues in microservice but it's something that it's impacts the scalability and performance. Or maybe multiple kills. So it should have in mind parallelism.
And then performance will be better for that.
Especially if a lot of parallelism happens,
latency between services can be minimized.
It's like you don't see too much the latency
of one request because the other request is going
well. So things like that. So it's getting
better.
The design decision
also impacts that.
That's interesting, Andy.
I think we've been talking about this
or at least internally.
I don't know if we've talked about it much in the podcast.
But I think one of the, especially when you're going from monolith to microservice,
the idea of properly modeling that setup, observing that setup, and then tweaking, right?
Because it's very tempting to just take, here are my different functions,
let me break them out into microservices, And hey, I'm in microservices now.
What we've seen, or at least what I've been aware of, is sometimes people go ahead and create, I don't know if this is a real term or not, but there's this idea of a nano service.
Something that's way too small that you shouldn't have even broken out.
All you did was add network latency, which feeds back into what you're saying.
But I hadn't even thought, not that I'm doing computer programming, but I love this idea of making sure you can run things in parallel, and also
queues being a problem. I don't think I've heard about that,
at least not in my world. But it goes back to that idea
of microservices isn't just picking what you want to run in a
microservice. It's a real design consideration, and you have to spend some time experimenting and
see how it runs and then tweak and fine-tune it so that you get that performance out of it.
And it's almost like some of the common problem patterns we've seen in traditional performance
where these ideas have been around for quite a long time.
We've been talking about proper microservices for for a while but it sounds from your experience
that this is still a very very common issue yeah yeah cool hey marcello thank you so much but it's
always interesting when we when we have somebody with the performance background on the calls to
just uh you know discuss a little bit about the stuff that Brian and I know about.
But this podcast was actually triggered by Henrik.
Henrik Rex is one of our colleagues,
and I think you've worked with him as he was looking into Kepler
in his Is It Observable channel.
And then you also pointed me to a blog post that, folks,
if you're listening to this and you want to read that blog post,
the link will be in the description.
It's called Exploring Kepler's Potentials,
Unveiling Cloud Application Power Consumption.
And it was a guest post by you and some other colleagues
on the CNCF, on the Cloud Native Computing Foundation blog,
really talking about how Kepler gives you the insights
into the actual consumption of power in your applications Computing Foundation blog, really talking about how Kepler gives you the insights into
the actual consumption of power in your applications and kind of tying this back to these design
decisions and implementation decisions.
Every decision we make in the end needs to be powered by the system they run on.
And so power consumption always should be on top of our mind.
Marcelo, can you tell us a little bit about the Kepler project and what type of data it produces
and what type of use cases it enables and also how you see people using Kepler?
Yeah, of course. I think first of all, I will start with some introduction,
we know motivation about the project. So energy consumption, I think, I like to think there
are two ways to see the importance of that, you know, to measure energy consumption. First
of all is money. So cloud providers, the infrastructure, it's, says you know they are paying a lot of money
for energy right now the energy consumption costs are not completely exposed to user so user just
rent like this the you know uh resource uh and pay you know without the knowledge of how much energy it's costing to maintain those servers.
Some servers consume more energy than the others.
There are some differences on that.
And especially if we go for the AI workloads now that's using GPUs,
GPUs consume a lot of energy. And this is the capacity planning must be analyzing the energy consumption of the servers on the cloud infrastructure.
If we go for public cloud or private cloud, this is all things that need to be analyzed. aspect is i would say the social uh uh responsibility of the you know co2 emissions
the global warming all of these things that we have been you know here for years and then it's
the data centers is consuming a lot of energy again especially for ai workloads and it's
something important to not to pay attention to that.
So if we think of, for example, chat GPT, it's being trained on a lot of computers.
So and it's not only one training shot, so it's retraining all the time.
So it's consuming a lot of resource, a lot of energy, and it can be like some
sustainability problem.
So the first thing is to bring awareness for people.
People need to know how much energy is really doing, isn't it? So what's the energy consumption of data center?
There are some analysis for that.
I don't remember exactly, you know, the projection for that, but it's like, if
they, there are some comparison between U.S that data
centers consumes like a thousand of house you know home appliance uh energy consumption so
it's something big that we need to really pay attention on data centers um and then
that's what Kepler the Kepler project comes from.
The first aspect is to enable observability, to expose the energy consumption of a map,
the energy consumption of holes to application.
This is not a easy problem, and there are some ways to solve that.
So then I'm going to describe a little bit about that.
There is no way to, right now, there is no harder counters on the CPU or on the machine to account the energy consumption of application for instructions, for example, that it's running.
Storage operation, CPU CPU operation memory operation there is no higher counter uh calculate the energy consumption you know accumulating that on the
hardware so we need to do that in a software-based form what we do now it's the the way that we do it simple but it comes with a lot of challenge okay so so think about
that the energy consumption there are two aspects of energy consumption first what we describe it's
the idle power with a constant power consumption that when nothing's running the machine there is
something being energized in the node
and it's consuming power.
And there is the dynamic power,
where is the power that is associated with the load.
So the static power, there is like the GHG protocol
that defines what's the fair association
of power consumption to applications.
And it's defined that association of power consumption to applications and it's defined that the static
power consumption should be divided based on the size of application it's like if you are in a
condo and then the bigger house pays more you know so if the application your virtual machines
allocate more resources so it will be you you know, associated more static power to this virtual machine. And the dynamic power is related to the resource
utilization. So it's using CPU. And then it's the analysis is if 10%, if that one
application is using 10% of the CPU, the 10% of the energy consumption is associated with this
application. It's very simple like that.
Of course, defining CPU utilization can be complex.
It's instruction cache.
There are a lot of components inside.
But just to general view, like we get resource utilization and do this one to one
mapping for resource utilization with the dynamic power
so given that so then we have like uh two scenarios on the cloud where we have directly
access to bare metal and virtual machines so bare metal we have more flexibility more access to
things uh and virt you know virtualization actually hides things from the users.
And it's a virtual machine actually doesn't expose things from the bare metal.
Then we need to discuss like these two different ways that how we tackle the problem.
So bare metal is the easiest one.
So typically bare metal has sensors that measure the energy consumption of the node and the resource.
For example, Intel machines, x86, Intel created the application that it's called Rapple,
that do some analysis.
It's software-based, but it's based on harder counters and the currents and
voltage it estimates the energy consumption of the cpu and with a very good accuracy so there
there are a lot of works that have done some analysis with like external meters and compare
what's rapid x is it's a exporting and what the external meter is is actually saying what the cpu is
consuming the energy and it typically has like a good match there's of course always
some precision loss when we are doing estimation but there is a good occurrence for that um
amd also has things like that uhs, NVIDIA GPUs
actually at least NVIDIA GPUs
also expose
the energy consumption of GPUs
ARM machines
I don't know
we have an idea to extend it for ARM
but it's an ongoing project
that we are doing that so we don't know exactly
how to do it right now
at least I don't know so maybe other people
from the community know but I'm not aware how much energy consumption
of ARM system right now.
But it should have some API.
Just need to do some investigation for that.
So given that, I'm saying there are some APIs that expose the energy consumption of the
node, the total node energy consumption, or for a specific resource.
So we can break down the energy consumption
cpu d-ran storage so depends on the availability of information we can associate this energy
consumption application based on the resource again the dynamic power based on the resource
utilization and bare mat we have access to that on vms on the other hand we don't have access to that
especially because it's there are two two two problems many problems why vm is not exposed that
so first of all is is secured um if the direct information from the host, we don't typically get in the public cloud,
that is not exposed to VMs, because it contains information of other VMs in it.
So the total information from the node doesn't go there.
It could be, so we have like this also envisioned this idea in Kepler.
So maybe in the future of cloud providers, they will expose these things, but not right now.
It's to measure the energy consumption of VMs and expose that.
It can be in different ways.
Just with hypervisor hypercalls inside the VM, we can access that.
A file that is mounted to the VM or an external API.
The user has its own token and can access information about this on VM.
And then, but since we don't have that right now,
it's something that we are just proposing and maybe in the future
cloud providers can have that. It will be much better.
We use power models. So so for VMS that's what
capper is doing what's the power model it's just like a simple regression so we collect in a bare
metal node collect the energy consumption of the of uh the node the resource utilization of application and by running a lot of different workloads
from with different configuration, different change the CPU frequency, you
know, collecting a lot of data, we just do some regression and the power model is
the regression can be linear, can be nonlinear depends.
It just run multiple algorithms and check which one has better
occurrence to estimate and this will be the power model that will be used that's kepler is doing
and and then those power models are public uh available so kepler i think i didn't introduce
that i think it's a good time to do that Kepler is a open source project totally open so
there is no commercial version of Kepler right now and there are the community a lot of people
with different companies are contributing to Kepler and Kepler is the first project related
to measured energy consumption of application that becomes a CNCF project. So it's a Kubernetes official project that is related to CNCF.
So I think that's one of the main differentiation of Kepler right now.
It's fully implemented to be part of Kubernetes,
but it can also run standalone outside Kubernetes for, for example,
IoT use case where it cannot be running
Kubernetes, you know, kubelet in the node because it's not powerful enough, you know, the device,
and can run Kepler standalone inside an exposed matrix. So, yeah, back into the virtual machine.
So we have power models and power models has limitations, of course.
First of all, occurrence, it can have some penalties because it's a regression.
It's impossible to run all the scenarios where all different kinds of applications.
We try to stress a lot of different scenarios a lot of
different applications but it will be never perfect um but it's we have a very high occurrence for
the models that we train um but there are some other uh more important you know limitations for
power models one that it's i think this is important to say is,
again, so I was telling you that we have dynamic power
and the static power, the idle power.
And the idle power must be divided by the number of the VMs
that are running the node on the cloud.
But on public cloud,
we have no idea how many vms are running the note
although we can know the you know the cpu architecture the the bare metal note that
the vm i have some information about what which the bare metal note that the vm is running on
we don't know how many vms are running so for right now what we do is we don't know how many VMs are running. So right now, what we do is we don't expose the idle power.
We just focus on the dynamic power for the public cloud.
So when we are in the bare-math, we have all of the information.
And then I would say this is one of the challenges.
The second challenge, I think this is also very important,
is power models are architecture dependent.
So if we create a power model that is related to some specific CPU model,
it will be different than a different model
because it's a different CPU model, it has the different than a different model because it's different.
CPU model consumes, has the power consumption curve differently.
So it's not only the baseline, you know, idle power, but how the energy consumption
increase with the load change.
Number of CPUs also impact, hyper threads also impact.
And all of this information is important.
So we need to have power models specific for CPU models.
And then for public cloud, especially, for example, Amazon Cloud,
there are a lot of different machines.
So there are some efforts now for the community to create power models
actually kepler we have in mind that we ask the community to contribute so if different companies
can come and help to create power models for different nodes and everything it's open uh it's
we are always improving how the way that we train the power model to make it easier for people to contribute
for that. And if
someone has a different enough
CPU version, different
even not only for
servers, it can be like for end users
like laptops. People can
sometimes run things like that and just
want to know what the application
is consuming on their laptop.
And Capra can also run that.
So but it's a different CPU model.
So you need to train a power model for that.
And this is something that the community can help, you know.
And that's actually a good maybe reminder, folks.
First of all, we will link to the blog post.
We will link to the Kepler project.
And there's actually a really nice overview
of Kepler, the architecture,
where you also see the online learning model server.
I guess that's where you have your energy models in
and where people can actually then,
as you said, contribute the energy consumption
both in idle and also in the dynamic stage.
Because this is, I think, obviously the great benefit
of having a big community that works together
because there's so many different hardware settings out there.
And if you already have a model that knows
what's the energy consumption for idle versus static,
then you can do the actual calculation.
That's really good.
Yeah, definitely.
So yeah, so if people can join our efforts,
it would be very welcome, especially for trained parameters.
Yeah.
So Marcelo,
I know you obviously know Kepler in and out,
and that's why it's fascinating to listen to you also,
listening to what are actually the challenges,
what problems do you solve, how do you solve it,
the difference between on-premise,
like bare metal, and in the cloud.
From an end-user perspective,
because you mentioned earlier that capacity planning,
everybody that does capacity planning
must look into energy consumption,
so Kepler is a great way to get this level of insights.
Can you quickly tell us what you see out there
as people get started with Kepler?
What are the first things that they do
to fully leverage the data that comes out?
What are the biggest wins and the fastest wins
that people can achieve with Kepler?
Yeah, as I was mentioning to you,
first we have observability, so people can be aware
and understand what's the energy consumption.
And there are some techniques, especially in the implementation of code, to try to
minimize the energy consumption of applications.
So it's not fully clear for everyone that, but there are some techniques.
For example, if the application is waking up the CPU too much time, it's called like a power virus.
Because it makes too intense requests to the CPU.
And it changes the power mode of the CPU. So from the energy saving perspective and to full performance and start to consume more energy.
So I think the first thing is developers can get their energy consumption from application and try different things and try to understand also discover that it's it's still like
a uh open research area to understand how to minimize damage consumption of application
so observability and the next thing is the optimization so optimization can be in
different ways as i mentioned changing the code um or resource allocation so there are different perspective like
when you are it's if you are there are some definition like if we were running
less application in a node then all is less energy efficient because the load is not linear.
So with the energy consumption, if you have more load to the node, it's consuming less
energy if you have less node spread in different nodes, something like that.
So then you can do consolidation.
This is only the energy efficient perspective but if you go to the
CO2 it also has
the impact of
what's the energy source
what's the
solar panels
is it like
wind
wind
thermal
it changes the CO2 emission Wind, exactly. Wind, exactly. So thermal.
It's changed the CO2 emission.
And also it's not only regional-based, it's also time-based.
Also, it's seasonal.
So if it's winter or summer, it depends on the country,
it's changed also the CO2 emission for the data center. So based on that information, it's possible to do optimization to allocate resource.
There are some projects that are called Kepler.
It's part of Kepler.
It's inside the, you know, sustainability AI umbrella.
It's like a sibling project of Kepler.
So it's using Kepler information to actually
optimize and allocate resource on Kubernetes and try to understand the energy consumption
and CO2 emissions for different regions and time-based and allocating scheduling the pods to different nodes based on that to try to optimize things.
So again, I think the first thing that the user should do is just install Kepler.
It's exposed metrics to Prometheus.
So deploy Kepler.
Then we have the Grafana dashboard, so the user can just install and go there and check the energy consumption.
See how the application is scaling, what's the energy consumption is also scaling there, and see how the application is using.
And play around. So try to understand and do some optimizations
to see how the application can be more energy efficient.
And this the other hand, if it has access,
depends, so the user sometimes doesn't have access
to the Kubernetes, it's just the user perspective.
So, but maybe can ask, you know,
the system administrator too,
okay, so we want to make our cluster more energy efficient,
so we want to have better scheduling decision in Kubernetes
to allocate resource more in a sustainable way.
So then we can go for different projects,
for example, Clever,
which try to optimize the resource allocation.
Yeah, cool.
And I got to remind everybody,
like you've talked a lot
about the architecture
and what type of data it produces.
The blog post that you wrote, Marcelo,
a co-author with a couple of your colleagues,
is really doing a great job
also with visualizations
on the architecture,
how you get the data
on the different types of hardware.
So really, you know really check it out.
I know this is a podcast and it's audio,
but sometimes visually it just helps.
And this is why check out the blog post,
also the Git repository.
Now you said Kepler is obviously producing
the Prometheus metrics
that give you the energy consumption on.
And I'm looking at one of the dashboards.
There was a Grafana dashboard on the page that is linked.
I think it's on the consumption per namespace,
the consumption per pot.
Obviously, you can do all of your analytics.
You can probably then also compare different versions of Kubernetes, different
versions of your software, different versions of the stack.
You can compare different types of hardware, different types of sizes.
So that's the interesting new field where performance engineers, site reliability engineers,
I don't know, energy engineers, maybe we need to have a new term
that energy optimization
engineers that need to do
all this because I don't think we can
ask every developer
to really, you know,
by default just get all this data
and analyze it. There needs to be some
entity in an organization
that really knows what to do, how
to get this data, what to do with it,
and then mentor and
help engineers to actually optimize.
They would need a cool name.
I'm sorry.
They would need a cool name, not just like energy engineers
that have to come up with something a lot
cooler. Sorry about that,
Marcelo. What were you going to say there, Marcelo?
I think
you mentioned something very interesting.
So, you know, the users, you know, run that and start to understand what the metrics means.
So I would say there are a couple of things that we were actually talking since the beginning that we were saying about microservice.
What's the design decision?
Does the design impact the performance, things like that. Instead of
it, it's the same. So design decision impacts the energy consumption and also
the programming language. So there are some stats that say, for example, Python,
because it's interpreter language, it consumes a lot of energy, much more than
C, for example. Also performance is much better with C, C++.
So then it depends on the perspective,
the decisions, how we implement things.
If we want to improve the performance,
but also improve the energy consumption,
the programming language is also something important.
Of course, sometimes it's not possible to change all the applications, but in the microservice
world that's the interesting part.
So you can change one service, maybe.
Start small.
Change one thing.
Okay, let's switch for a different programming language, this one, and see how it impacts
the performance and the energy. And people can start to play around and try to understand better how things behave.
And I also want to give a shout out here, Brian, to our friends from Akamas.
Marcelo, for you, Akamas is a company from Italy.
And what they have done, they have a system where they can, it's like goal-based optimizations. And basically,
you say you want to optimize your JVMs or your Kubernetes configurations on, let's say,
memory consumption on CPU, and then their system is using observability data, and then constantly
making changes to all of the hundreds and thousands of settings we have in the JVMs and in
the CLRs and in Kubernetes to basically find an optimal setting
for the application under certain workload. Because what they've found
out, if you go to the Java world, where Brian and I have
done a lot of work, the way how you're selecting your garbage collector
that has a big impact, the heap sizes. And so
optimizing the settings is something
that can also win you a lot of improvements from a CPU memory perspective.
And it's a very good point. So it's classical to analyze the performance of all of those
you know fine tuning things. But how is the energy consumption?
So maybe it's linear related, but maybe it's not. So that's the interesting part.
Like energy consumption is starting to be some hot topic and we are trying to not
only analyze the performance, but also check how the energy consumption is related to tuning applications as well.
Yeah, that brings me to the thought I've had for...
A lot of thoughts have come into my mind as this discussion has unfolded.
So first of all, thank you for getting my brain working this morning and really sparking
a lot of imagination.
But you mentioned, we had someone on the podcast a little while ago who was talking about data
center power, right?
And as you mentioned, power source considerations, time of day considerations, and shifting your
load to different centers that you have.
Now you're bringing into the idea of the code and the application power consumption, observability of that power consumption, which then leads me directly into the similar idea of taking those numbers, the observability data,
and automatically making changes based on the power consumption, but going back to our, um,
Andy and I's and your original career path,
performance has to be consideration as well, right?
What might be good for power consumption might destroy your performance and
vice versa, right? So as you were saying,
and that was actually one of the questions, but you, you, you addressed it is,
you know, is power consumption linear or logarithmic, right?
So at certain points it's more efficient and then it loses efficiency.
So just like we see, as performance response times and all starts going down,
we can have tools that automate spinning up new instances of microservices
to take care of that performance.
But same thing, if we take that, then take the energy data
and start managing based on that,
now we suddenly can automate and optimize for both performance and energy, which I think would be a real fantastic win.
Obviously, some of the challenges you'd have on that are some of these incomplete models that you have with the cloud providers and virtualization that you don't completely get to.
But I think models are where we have to start, right?
Yeah. get to, but I think models are where we have to start. One other thought I wanted to get in here before we run out of time was it would be
really interesting to see, way back I had a five
minute dabble or five minute exploration into capacity planning.
This was back in the bare metal days and they had all these models and data
algorithms based on CPU, brand, memory, and all this that you can
plug in to see what your capacity might be based on CPU, brand, memory, and all this that you can plug in to see what your
capacity might be based on your current workload. It'd be interesting to get those models into a
developer's IDE, right? So that when they develop just in their IDE, it's running against that
model. And as you're saying, the coding decisions, depending on what CPU or whatever model you're
running on, it could then say, hey, and I just said say hey, and that's a new
speech pattern that people do these days, and I just did it, so sorry everybody listening.
The model can suggest
rewriting the code or tell you that
based on the model you picked on, this is an inefficient way to write your code. Maybe
with AI in the future it can suggest a fix for it, but at least
even at the developer level, before you even get to the hardware, it'll expose
these power, what would we call them,
energy regressions? That might be a new term to their...
Maybe we need to start patenting
these new terms.
Yeah, I think there are a lot of things to explore.
As you mentioned, AI, it's become a very hot topic to have this generative AI to write
codes and they are solving problems and of course not like uh the full calls but it gives like the skeleton
you know to write things but it's not based on energy efficient calls and so it's gonna be like
maybe the future should improve those kind of things and
and uh have those suggested skeletons also to be energy efficient.
Awesome. Hey, Marcelo, I know this is,
the topic should have been discussed much longer than just the last couple of months or years,
since we, as you mentioned earlier, as a world,
need to fight a lot of the impact we cause
because of too much energy consumption.
But I'm sure the topic will be discussed
much longer.
So hopefully, keep doing your work.
Keep doing the great stuff on Kepler
and on educating.
Great that you contribute to the CNCF.
And let's stay in touch
and let's make sure we have you back
on the show with updates
in a couple of months from now to see what's happening.
Yeah, thank you very much.
And if anyone has some questions, you can contact me and I will be glad to answer things.
Yeah, we'll definitely make sure to link your LinkedIn profile
and whatever else you typically use for social media.
Maybe one question that
typically Brian asks, but I'll ask it now. Are you coming to any conferences? I don't
know, KubeCon is coming up in Paris next, like in March. Is there a chance for people
to meet you?
I've been to the last KubeCon in, you know cook-on i present kepler there it was like something that
was interesting um i submit a talk for the next cook-on but let's see it's yeah how it goes well
if anyone from kubcon is listening put them on you know it's interesting that you mentioned that it's
in paris because i was hearing some i didn't dive deep into it, but I was hearing some of the reports from the Paris Climate Summit that just happened.
And a lot of what they were seeing is although there was a commitment to draw down on energy consumption, and I don't know if it was just energy or oil, whatever they were looking at, but there's been an increase in a lot of these areas. And I think that goes and ties hand in hand to the more we introduce things like chat GPT,
the more everything in our life becomes electronified,
relying on computers for everything,
that's going to keep driving up that demand.
So at this point, this is a critical point
in Kepler and this energy modeling,
because if we just continue going forward
without considering that, it's just going to get worse and worse and worse.
So I think the timing is, it's, yeah, we should have been working on it long ago, obviously, but I think there's now starting to be that public awareness of it.
Right. So it's a great time to really be pushing these things.
Thank you for what you're doing there.
Thank you. Just to mention the last thing,
I think
sustainability is becoming a hot topic
as well in Europe.
Especially maybe in the Cooke comparison,
we can have some discussion about that.
The government,
the European government, is asking for all
the companies that are using AI workloads
to report the energy consumption.
So, then it's
starting to become something that the government
is pushing and
it will attract much more
attention.
Yeah.
In the future.
And as I was joking about our last
sustainability podcast, if we ever see
the United States getting on board, then we know
we'll be the last ones, right?
But yeah, no, I think it's important.
And to your point,
there has to be that government regulation in there
because we know we can't just necessarily
just trust businesses to do, right?
They're always going to do what's best
for the bottom line, right?
And if sustainability is good for the bottom line, they'll do it.
But there's the incentivizations that need to be in there.
So awesome.
I think you and everyone who are working on this, everyone who's working in that sustainability field is, again, thank you.
Thank you for our children's future, really.
Because although it sounds a little cheesy cheesy it's definitely a very important topic
alright and thank you for our listeners
this was I think
an interesting topic for our 200th episode
because we consume power recording these
we consume power putting them up
but it's been
a great run
so far and there's more to come everybody
hopefully we'll see you But it's been a great run so far, and there's more to come, everybody.
Hopefully, we'll see you.
Hey, maybe we'll have you back on for episode 400. No, that's going to be five years from now.
It'll be too late.
It should be sooner.
Yeah, absolutely sooner.
Love to get updates from you.
Great, thank you.
All right.
Thank you, everybody.
Thank you.
Thank you, everybody. Thank you. Thank you.