PurePerformance - How to optimize performance and cost of k8s workloads with Stefano Doni
Episode Date: October 10, 2022Over the years we learned how to optimize the performance of our JVMs, our CLRs or our databases instances by tweaking settings around heap sizes, garbage collection behavior or connection and thread ...pools.As we move our workloads to k8s we need to adapt our optimization efforts as they are new nobs to turn. We need to factor in how resource and request limits on pods impact your application runtimes that run on your clusters. Out of memory problems are all of a sudden no longer just depending on the java heap size alone!To learn more about k8s optimization best practices we have invited Stefano Doni, CTO of Akamas. Stefano walks us through key learnings as the team at Akamas has helped organizations optimize the performance, resiliency and cost of their k8s workloads. You will learn about proper memory settings, CPU throttling and how to start saving costs as you move more workloads to k8s. To learn more about Akamas go here: https://www.akamas.io/If you happen to be at KubeCon 2022 in Detroit make sure to visit their boothShow Links:Stefano on Linkedin: https://www.linkedin.com/in/stefanodoni/A Guide to Autonomous Performance Optimization with Dynatrace and Akamas: https://www.youtube.com/watch?v=i7MuEjeOvX0
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance.
My name is Brian Wilson and today I have my very, very special guest host.
The one and only, can you guess?
Andy Grabner. Andy Grabner.
Thank you for being a guest host this week.
A guest host? What does that mean even?
Well, Andreas Grabner couldn't come today, so Andy Grabner.
Ah, that's the way it works.
I don't know. It's the only thing I could think of today.
It's not funny, but it's okay. Next time you get another shot.
I get a C for effort.
Maybe you should Google for funny openings in podcasts
i was going to try to get with them a muppet show type of intro but that would be too loud and interrupt my wife on her business call and i don't think a lot of our listeners would get
the whole muppet show reference because that's from the early 80s late 70s but yeah what are you
gonna do you know what we're gonna do we're going to jump right into the topic because we have limited time as always.
And we have a lot of things to cover today.
We have a repeat guest on the show.
I'm not sure how many times has he been on?
I think definitely one full podcast.
I know he was on for one of the ones we did for Perform, I think.
I don't know if there was a second full podcast or not.
So this is at least his third time
maybe fourth
and it's not because we like Italy
and Italians just in general for the food
and everything else it's really because
the culture is so much
everything yeah exactly
but now without further ado
Stefano welcome to the show
hey thank you
thank you for having me here Stefano for those that the show. Hey, thank you. Thank you for having me here.
Hey, Stefano, for those that kind of may have escaped the episodes prior where you spoke with us,
can you quickly introduce yourself?
Yeah, sure.
I'm Stefano Doni.
I'm co-founder at ACAMAS.
Basically, what we do at ACAMAS is an optimization platform powered by AI
to help teams optimize their application performance, cost, and resiliency.
I started out doing performance engineering work since early 2000s, so optimizing system performance and efficiency has always been my passion.
That's why we invite you. That's why we like you, because that's the same for Brianrian and myself and i'm sure many of our listeners right
we are always interested in making systems more efficient i think especially in the in the times
we live in right now efficiency uh saving not only cpu and memory but especially energy in the end is
very important for us um so thank you stefano and with your work at Akamas that you are helping the community to do so.
Well, actually, let's jump right into topics.
I think the last time we had you, we talked about this.
We also did a webinar together.
I think we talked about Java performance optimization.
I think this was one of the first things you talked about.
You also created some great papers, blog posts.
We also talked about database performance optimizations. Today's topic, however, is focusing on Kubernetes. And you have been
spending quite some time on analyzing Kubernetes clusters and nodes and pods and containers and
everything that runs on it. And let's go step by step and item by item through the things that
you have found, lessons learned, so that our listeners can learn
from those things and not make the same mistakes. All right, so let's dive into it.
So actually, the topic of today is actually how to extract most performance or cost efficiency
out of Kubernetes application, literally. And while doing so, it's important, of course, to preserve application resiliency.
So how do I decrease cost of my Kubernetes application
while not actually going into out of memory
or CPU throttling issues?
So those are among the most common question
that we literally get from many, many customers.
And we are working very heavily, of course, on Kubernetes
as it's becoming one of the most common
layers and the next cloud native platform
to run modern microservices. So I guess that's the topic of this session.
So I guess I want to start with
I would say resiliency.
One of the biggest questions is that of, Kubernetes is going to rise in terms of
adoption and with that comes typically increasing the cost of the footprint of the environment. So
one of the key questions is how can I reduce the cost? But in doing that, the usual problem that
people run into is that, of course, you run the risk of, in a way, shrinking too much the container size or wrongly configure your application and ending up into the dreaded out-of-memory errors.
So that's especially, in a way, hard to manage for Java workloads.
So that's one of the most common issues that we see.
So basically, the problem that we are talking about is that, of course, people working on Kubernetes
knows about resource management.
The way Kubernetes manages the memory in particular is
pretty interesting, meaning that, of course,
we all know that Kubernetes will kill
your container as soon as
your container memory uses the limits.
That's about how Kubernetes works. So the current approach that we see teams
that want to reduce costs is, of course, shrinking the size of containers
because at the end of the day, what you pay in terms of Kubernetes infrastructure
is directly tied to basically the size of your pods,
which comes down to the CPU requests and CPU memory requests
and limits that you configure.
As a developer or DevOps or SRE, you configure within your YAML files that actually detects how Kubernetes will allocate the resources.
So especially as regards resiliency, we see typically most people struggling with out-of-memories. So that is due to the fact that Kubernetes will kill your container
if you don't properly size your Kubernetes memory limits.
So the thing that we have actually talked a little bit about that
also in the beginning of this year in several conferences,
the thing to realize is that most people today adopt an approach
that relies on pretty much
observability or kind of capacity planning approaches where they actually look at basically
memory usage and then compare memory usage with your memory limits.
So sounds like a kind of sensible approach, meaning that you need to kind of put your
limits above your memory usage and you'll be fine.
So that's kind of the current approach.
And again, it might be sensible, might look as if we were.
But at the end of the day, people with those kind of approaches
still are suffering with lots of out-of-memory errors.
And then let me ask you a question on this.
Because first of all, for people that might not be familiar with Java
and the details, in the old days, unquote old days right we you ran a java container you gave it a
certain memory and then when the java application tried to allocate more memory than was available
in the java heap it threw an out of memory exception if obviously the garbage collector
couldn't you know clean up things now we're kind of wrapping around the JVM,
another concept where in Kubernetes,
we basically say this is how much memory we give you overall.
First of all, these two need to be aligned, I guess, right?
Because it doesn't make sense to make the outer wrap smaller
than what's allowed internally.
But also the other way around,
if the JVM, if the rep around it is much bigger,
then the JVM internally will never use it anyway.
So I think that that needs to be
properly aligned with each other, correct?
Yeah, that's the point, Andy.
You're summarized pretty correctly.
So the thing is that currently the JVM,
basically it's a pretty big engine.
It's pretty highly multi-threaded
and kind of marvelous engineering kind of effort
that has been done over the years by Oracle, et cetera.
The thing is that we are putting this kind of big engine,
which is highly concurrent, et cetera,
into a small container.
So the challenge that we have here,
especially for Java workloads,
by the way, it's not just about Java.
We are going to see also Golang going into pretty much the same direction, which is interesting.
There's news about the current time.
But focusing on the JVM, the problem is that when actually you move the JVM within the container,
you don't really have a way to say to the JVM, OK, that's my memory limit.
Please just stay below that limit.
So what the kind of control,
the kind of parameters that you have on the JVM
typically deals with how much heap memory that you can set.
So the pretty famous max heap settings,
pretty much every Java developer knows this setting,
which dictates how much heap memory
the Java VLT machine will allocate.
But the problem is that the JVM will allocate memory also outside the Java heap.
So let's say that you have a 4 gig memory limit in your container.
So the big question that most people are struggling with is
how big will be my memory heap within that 4 gig limit?
So we see people that are putting 4 gig amount of heap within the 4 gig memory.
So kind of pretty obviously wrong, I would say.
But the thing is that what is the right size?
So shall I put, I don't know, 3 gig of memory HIP
to stay within 4 gig of memory?
Or shall I go with half 2 gig, 50% of my memory?
The thing is that the memory used by the JVM,
it's comprised not only by the memory heap,
but with the so-called off heap.
And what people are not able to realize
is that actually the amount of off heap,
the whole memory usage of the process
can be actually pretty much higher
with respect to the heap.
So it's not just, I don't know, 10% or 20%.
We regularly see JVMs that are even twice the size of the memory heap allocated or even bigger.
So that actually is the root cause that triggers out of memory.
So people are having a hard time to try to identify the right amount of heap.
And that's actually because it's kind of hard.
So the JVM has literally no nodes
that in a way dictates the total amount of memory usage,
and that's becoming the actual problem.
How do I feed my JVM workloads to actually play nice with
a container and not actually have those out-of-memory,
which are sudden events
where Kubernetes will
kill your job, the case shown in the night without actually any prior signal about that.
So and that's also, I think, as far as I know, you call it off heap memory.
I think it's the native memory that the JVM itself needs, but especially depending on what
your app is loading, because your app is loading libraries that are also allocating native memory.
And then I assume, and Stefano correct me if I'm wrong, but there's just no rule of thumb
where you say the JVM in that version always needs 20% or 30% because it really depends on
the app and on the workload.
Yeah, right. And it's also, for example, depends highly on the settings of the GVM, of course,
especially on the garbage collectors.
So garbage collectors also need to allocate those extra IP,
so native memory, the JIT compiler needs to allocate the class,
the amount of classes, so the metadata that actually
your application actually loads, as you mentioned.
So the way you configure is pretty key.
So if you don't look at this piece,
what is the counter-intuit is that you might see
from a memory user perspective that your container will use
even 50 percent of memory usage.
So let's say that out of the four gig of your container,
you will see that just two gig of memory will be used.
So looking at this picture,
kind of the first reaction would be, I need to drop.
I can safely put the limit to, I don't know, three gig,
and that would be fine.
The problem is that the GBM that runs within the container
will suddenly, might suddenly allocate much more memory.
And that's a root cause of the many out of memory errors that we see.
So that means basically every performance engineer, every capacity engineer
will still have a job, even more importantly in the Kubernetes world,
because we need to do more testing and more optimization, configuration optimization.
I know you guys
obviously at akamasi have automated a lot of this already but um it's just really interesting to see
right we're adding a layer on top of a layer in this case around we're putting something around
already a very complex runtime and therefore have new side effects that are hard to comprehend and actually hard to understand.
Yes, exactly.
Brian, you go ahead.
Yeah, I was going to ask, so we talk about the native, the heap.
Is there a single memory metric that people can look at to see what's my total memory consumed by this?
Like if someone was going to visually go through and say,
all right, I want to find this metric and allocate based on that.
Is that something easier or is it always a calculation
based on several different metrics?
Yeah, it's actually an interesting question, Brian.
It's actually, there's no single matrix actually.
So what you need, you can look at is utilization within the heap,
which is something that most observability tools provide.
You can, of course, on the other side,
look at the total consumption within the container.
So of course, which comprise
the total JVM process memory usage, for example.
So you will get very
different pictures, but you will be able in a way to start to correlate those. So for example,
understanding what would be the kind of extra heap, extra memory that is required. So it's kind
of trying that unfortunately. So it's not that the JVM provides good support about that. So there has
been lots of effort also to try to estimate
those kind of extra heap amount based on different metrics,
but it's not easy.
So I guess the current approach will be to revise your configuration,
be conscious of, for example, the maximum heap
that you are putting into your containers.
And it's kind of a trial and error, unfortunately.
So you need to be able to test it out
and see what works for your application.
And Stefano, I asked you in preparation
what is called to send us a couple of bullet points.
And I'm just looking at something that you wrote,
which is another lesson learned.
It says, out of memory kills due to sudden memory allocation peaks.
I think the way I read it when I first read it,
that an application, right, if you put an application into a container,
then you have certain requests that are memory, let's say,
easy on the memory and other ones are very intensive.
So for instance, let's say an application is just doing a read from a database and put something out,
or it is calculating a complex report where it needs to pull in a lot of data into memory.
So I'm actually wondering if this is a great chance for application developers and especially architects to say,
hey, we have this big Java application and we need to figure out what type of workloads cost how much and then cost, I mean, how much CPU and
memory so that I can redirect certain requests to maybe a different container that runs with
a different memory and CPU limit than containers that can handle requests that are easier. I mean,
it's basically breaking up the monolith, but maybe in this case, not necessarily breaking
the monolith really from a code-based perspective, but maybe deploying the same container multiple times, but with
different memory limits so that more costly requests go to a container that can actually
handle the costly request.
It's like if you go to a supermarket, right?
You have to self-check out lanes where it's super easy and just do it if you have up to
six items.
But if you have more, then you go to another lane.
And I think this is the same thing we need to consider
when we architect software and do traffic routing.
Yeah, I guess that may be a sensible traffic management policy.
So it's always good if you can kind of reduce the viability
for the amount of work
that you need to do that makes doing capacity planning,
forecasting, optimization easier because the workload
becomes much more homogeneous and it's easier to deal
with this kind of situation.
Yeah, cool.
So memory, right? Memory is a big thing.
So that means what I take away from this is um a a runtime
wrapped by another runtime and then we just need to understand the proper settings i also like that
there's not just java heap there's also the native memory the total memory needs to be taken into
consideration and a lot of settings that can be and has to be adjusted and um i'm pretty sure you guys have
already put a cool stuff into your product into akamas to automate some of that right yeah that's
literally that's literally why while doing those kind of optimization for customers that's why we
how do we discover those kind of insights so example, we discovered that certain garbage collectors are much more heavy
on the native memory, like for example G1. It needs much more memory with respect to the serial or
parallel. So that becomes also an optimization kind of knobs that you can turn. So how do I then
configure the GVNs to make sure that I can bring, I can lower the limit in a safe
way because actually I can shrink the heap size, but I can also control the total amount
of memory.
And literally the JVM has plenty of options that also in a way dictates how the JVM allocates
off-heap memory that makes for additional cost reduction opportunities.
So besides memory, what's next? What other things have you found out in your work? that makes for additional cost reduction opportunities.
So besides memory, what's next? What other things have you found out in your work?
Yeah, the other big issue is about CPU throttling.
So I guess this one is, and again, related to that is, of course, CPU limits.
I guess it's one of the most debated topic around Kubernetes sizing and performance.
By the way, there's also the question of,
shall I put CPU limits or shall I just go to CPU request?
I won't focus on that because the other most interesting thing is that people is using CPU limits.
So people is realizing the benefits of CPU limits
in terms of performance isolation, basically.
So not having, I don't know, a runaway workload
impacting my performance sensitive
and business critical workloads.
So what we find is that people are leveraging CPU limits
in their real life.
And I think that's considering pros and cons,
that's still the best choice for Kubernetes today.
But then the next question literally becomes, how do I deal
with CPU throttling?
CPU throttling means that, of course,
again, due to those
Kubernetes resource management
mechanisms, we are
talking about them quite some
time also in some blog posts.
So it's kind of
counterintuitive and hard to understand
how actually Kubernetes, in a way, manages resources, CPU resources especially.
So while on the memory we are talking about, it's kind of easy.
So basically, as soon as you hit the limits, you get killed.
It's kind of a little bit more complex as you got the CPU. So Kubernetes actually start to, when you say,
okay, I want to have, I don't know,
two CPU available for Kubernetes.
What is happening is that Kubernetes is allowing
your container to actually use all the CPUs on your host.
So not just two CPUs.
What is actually considering Kubernetes
equivalent CPU time of two CPUs.
So what that implies is that, for example, if you have a highly multi-threaded workloads,
like pretty much any microservices is doing, or again, JVM or even Go routines or even
Python, it's kind of the norm today.
So you will be consumed the equivalent of your two CPUs
quarters in a short amount of time.
So the rest of the time that once you have, in a way,
consumed your CPU quarter, Kubernetes
is going to pull out from the CPU.
So it's going to throttle you.
So your application will get stalled.
It's like a garbage collection post for those familiar with the JVM,
but it is happening at the Kubernetes layer.
And it is impacting pretty much any application.
So not just about Java application, but again,
it's kind of the mechanism that Kubernetes is using.
So that's kind of the problem.
And with that problem, you mentioned the artifacts that you
would see would be a slowdown in the performance. And I guess
within the container itself, you would see that its CPU is being
maxed out at the same time. So it would look like I'm maxing
out my CPU, but it's really because of that throttle. I
mean, I guess that's the same as if it was a VM or anything else, right?
It's the maximum allocated CPU,
and it's going to manifest itself as a standard CPU slowdown
upon that situation.
Yeah, what is counterintuitive with Kubernetes
and kind of different with respect to the VMware world
or operating system world is that this throttling can happen
at a very low CPU usage.
So it's kind of different with respect to the memory. So it's not that when your CPU hits 100%,
you will get throttled. But in our experience, also working with the customer, we see that
CPU throttling can arise as early as 30% of CPU used with respect to your limits.
So what is counterintuitive is that people see,
okay, I'm 30% CPU usage.
I have plenty of spare capacity,
but in the reality,
Kubernetes is already throttling your workloads.
So that's kind of a counterintuitive indifference
with respect to our usual, in a way,
threshold that we have always used when sizing,
for example, VMs.
And that's why those kind of usual best practices don't work anymore with containers.
And that's because it's based on the CPU time, right?
Not because it's based on the apparent estimate of what two core or two CPUs would actually get.
Okay, interesting.
Exactly.
Yeah.
So memory and CPU are two well-known entities in performance engineering.
But they just are bringing...
So this goes back to the basics, right?
Yeah, exactly.
Looking through the notes,
and now I'm jumping ahead,
but you've mentioned this quite a bit already.
You always talk about costs, and this was also something that you put
into the notes for us to kind of talk about.
It's really reducing cost of Kubernetes applications because
in the end, whatever we do, we try to run our software cost efficient.
And with Kubernetes Kubernetes with seemingly endless
scalability options, if you add enough nodes, it's obviously very hard to always have an
eye and keep an eye on the costs. So what are the lessons learned there? How can you
really manage and reduce costs on Kubernetes? Yeah, that's right. So actually, Kubernetes is great,
meaning that it allows you to have self-filling application,
highly scalable, automatic management, et cetera.
So that's great.
But at the end of the day, even with auto-scaling,
if your container or pod template is not properly sized and configured,
you're just going to duplicate or multiply your inefficiency.
So the thing that actually it's key to realize while actually working on lots of customers
to optimize the cost of Kubernetes is that typically people start just by looking at
the infrastructure layer.
In the microservice world, we mean by that looking at CPU and memory
sizing of your container. So let's say CPU requests and CPU limits. So the thing is that,
of course, that gives you a kind of initial benefit, meaning that, of course, if you are
running with 10x the amount of CPUs, you are highly over-provisioned. That's, of course,
the first thing to do is right size on that regard but to
do to do the kind of the next level is what is important to realize is that at the end of the day
if you want to reduce your footprint of your containers if your memory requests and limits
what is literally driving the consumption of the resource consumption is what runs within the
container right and that is the application of course that depends a lot of the resource consumption is what runs within the container, right? And that is the application.
Of course, that depends a lot on the application code.
That's pretty clear.
But there's another big important area which drives the resource consumption within the application, which is the application runtime.
So again, that's the role of the JVM or even the Golang runtime.
So we have worked quite a bit also on the Golang runtime.
So what is important to realize, for example,
again, coming back to the memory.
So how much memory my pod is really consuming
is basically dictated by how the JVM is configured
or how the Golang garbage collection is configured.
So it depends on how the application works,
but the kind of the lion's share is dictated
by how the JVM is configured, because the JVM but the kind of the lion's share is dictated by how the JVM is
configured because the JVM at the end of the day will pretty much in many cases use all the memory
that it has been configured to use again on the settings, pretty much irrespective of the actual
kind of demand that code requires. So that's kind of the important takeaway I guess.
And I think this is, I mean, we kind of reminds me of our discussions we had over the years on frameworks like Hibernate, right?
Like Hibernate, all the runtimes are general purpose, generic runtimes that can really run any type of workload.
Therefore, by default, it can never be optimized to your specific workload.
But it gives you a lot of screws or switches or whatever configuration options to optimize it.
But then it means you need to A, understand your current workload,
and I guess that's the next big thing.
While you may know your workload today,
it may not be the same workload tomorrow.
And I think this is where also
the topic of continuous optimization comes in
because you need to continuously
re-evaluate your current workload
and then adjust the runtime settings
to optimally run for exactly
your workload well i wish there was someone who could do that for us you know it'd just be
so amazing if that could be automated to some degree never it would never work do you think
something like this would ever exist i don't know it's like science fiction you're talking about
here maybe we should found a company and call it Akamas.
It's interesting, right?
We're coming back to the same things that we obviously discussed in previous
sessions on these
many different knobs that you can
turn in these runtimes.
It's the first time, though, I hear you
talk about Golang,
at least in that aspect. So why Golang all of a sudden? Why is Golang all of a sudden on your radar? Yeah, that's a great question, because actually, the thing that we are more and more
realizing is that, of course, the application runtimes is playing a big role in the whole
picture, meaning that cost reduction,
performance improvements, and also reliability, as we mentioned.
So those are kind of three legs tool that people must talk about that needs to be reconciled.
So it's kind of trade-off.
And within this trade-off, what we find is that besides, of course, Kubernetes settings,
what needs to be aligned is what runs within the container.
So we talk a lot about the GVM because it actually is, of course,
one of the most common language people run microservices on Kubernetes today.
But it's interesting to see that, of course, many workers are running on Golang.
So actually, if you look at how Golang actually managed the resources,
especially the memory, I was a little bit kind of surprised to see that actually it's not actually
by default playing well with Kubernetes in a way. So it's kind of a big statement, but I was kind of
in a way surprised due to the fact that both Kubernetes and Golang comes from Google.
So I thought that they would kind of work magically out of the box.
But I didn't find that.
Instead, how Golang works in terms of managing the memory is the fact that simply Gololan has a pretty simple gabless collection algorithm that basically sees
what is the amount of live memory
that your application is actually using.
So we can say, okay, my application requires
100 meg of live objects or real objects
that my application is using.
And then actually you have one single tunable
within the Golan runtime. application is using. And then actually, you have one single tunable
within the Golang runtime.
So that has been one key decision of the Google Golang
runtime team that actually decided not to avoid
the whole JVM configuration issues.
So they just decided on using one knob that basically
dictates when the garbage collection will be triggered.
So how much memory, in a way,
garbage memory will need to be
accumulated when the new garbage collection will be triggered.
So by default, this variable which is called goGC is equal to 100,
which means that if your application requires 100 meg,
basically the goLang GC will trigger when
memory usage reaches 200 meg.
So you can imagine the two-throw pattern here.
So that's how the Golang runtime
memory manager has always worked over the years.
So the first thing to realize is that Golang is not actually
looking at all at your
memory usage, memory limits, in a way. So contrary to the JVM, which we know, of course, that JVM
tries to self-adapt to the container size, both in terms of CPU and limits, there has been a huge
work in terms of adapting the so-called JVM ergonomics to play well within containers.
Actually, Golang is not doing any,
those kind of automatic tuning within the containers.
So basically, what you need to do is,
you need to ensure that your memory usage won't be, will be actually fit
within your container memory limit.
So that's kind of the first point that people need to understand.
I remember, isn't that similar?
Maybe I remember this incorrectly, but in the early days of.NET runtime,
there was also just two modes, right?
There was the server mode and the workstation mode or whatever they called it.
And other than that,
it didn't really have a whole lot to configure at all.
And I'm not sure how much it has changed now,
but it feels like the Golang is very opinionated
on what it's doing.
And you can only do a little bit from the outside.
Yeah.
But I think it's there.
Even with this model, you have this tunable,
which kind of play an interesting role already
because actually we did experiment
and by rising the GoGC variable,
you will be able to allocate more memory.
And at the same time, for example,
you will reduce the CPU usage of the garbage collector.
So even with this single tunable,
basically you are already able to kind of decide
your trade-off, perhaps 100 percent is doing too many GCs,
so you can even put 1,000 and you will have 10x the memory allocation,
but at the same time you will reduce the pauses and the garbage collection work.
That's how the Golang has worked it up to now.
But what is interesting is that with the new release,
Golang 1.19, which I guess came out a couple of months ago,
they actually introduced a soft memory limit.
So basically, they had realized that,
and that was a request coming from many users,
that people were having issues with out-of-memory
due to the fact that actually the Golang runtime
allocates memory kind of irrespective with respect to your memory limits.
So it would be pretty easy to hit your limit
and again trigger an out-of-memory error
by Kubernetes.
So with the new release, the Golang runtime
is moving more towards how the JVM has always worked.
So basically, you will have a kind of max heap,
so a max heap size, which is the amount of memory
that the Golang runtime will try to use without going after.
And that's kind of interesting because that brings the Golang runtime
much more in a way similar to the JVM,
we expect to what we were talking about before.
You know, as we talk
about memory, something just came to my memory.
And I remember
exactly, I remembered, yeah.
I exactly remember now when we
had our last podcast recording.
It was when
Austria played Italy
in the Euro Cup, and you were with
the Italian flag, and I was with the Austrian
flag. I remember with the Austrian flag.
I remember now. There you go.
Anyway, strange memories coming to my mind. Maybe I need some garbage collection to clear this up.
Stefano, I know you've done a lot of work from the beginning when you started Akamas on JVM and then the database, now Kubernetes.
What is next, kind of as a final thought?
What is next?
Do you already have some other runtimes in mind?
Do you have some other, I don't know, what's the next items?
Yeah, what we just announced, a kind of next generation evolution of the platform we are
very proud of, which is what we call the ability to optimize application directly in production.
So for people that know or don't know actually Ak Akamasa, our focus was up to now actually helping
mostly performance engineers or software developers, SREs, kind of optimize their
application configuration, pretty much the thing that we are talking about today,
in a staging or pre-prod environment, because that makes a lot of sense to explore all the
configuration and see what works best with a kind of log testing approach
that is still interesting for many use cases,
but actually many customers are asking us to actually do the next step
of bringing this kind of approach,
which can bring these kind of values and benefits to production.
So what we just announced is Akamas 3.0,
which brings the ability to actually enhance our platform.
So we will be able to do, we still retain, of course,
the capability to optimize application pre-prod environments,
leveraging load tests like JMeter, load runner, et cetera.
But now what is very interesting is that,
especially for Kubernetes environment, we are able to do this kind of optimization work automatically meter, load runner, et cetera. But now what is very interesting is that, especially
for Kubernetes environment, we are
able to do this kind of optimization work
automatically, leveraging AI directly in production.
And that means you're then changing your deployment.
Yeah, I mean, you're changing the deployment configuration
in Kubernetes.
Are you doing this as an operator in Kubernetes,
or how does this work?
Well, actually, at the moment, it's not a Kubernetes operator. So basically we interact, we have basically two ways. So one way, the first way would be to interact with the Kubernetes APIs.
So again, like you said, putting the parameters into the deployment
YAML files, CPU memory request limits, or the JVM
or Golan configurations.
The other kind of option, which is what actually people
are mostly interested in, is kind of more GitOps approach.
So actually, the recommendation from MacAvans
won't actually touch the clusters at all,
but it would be simply an update into a Git repo where
typically people are already
storing their application configurations.
Then we have approval process,
pull request for example,
where people can see the changes and then they can apply
the changes live in production triggering pipelines or
leveraging the kind of automation
that DevOps has already invested in, basically.
Yeah, that makes a lot of sense.
It's also the, you're right,
it's the GitHub's way to do it
and it's the human aspect
is also still in there
that they have to approve it
and see what your suggestion is.
Cool.
Yeah, exactly.
Good.
I would say, well, the last thing that I read on your notes, which is very exciting because it means I will also get to see you in person in a couple of weeks because you are going
to be at KubeCon in Detroit.
Yes, exactly.
So we are very, very happy about that.
So we will be, we will have a booth there so not a lot we are very very
happy to be on the floor again after a couple of years so i'm meeting lots of people running on
working on kubernetes and hear their stories their problems see if we can help with them yeah you
will you need to make sure to also uh then meet up with henrik he's going to be there
right with right i call him mr mr is it observable and uh we have a booth uh from from the captain
side we also have a booth i think open feature is another open source project the um app the
um app delivery seek is also there with the booth where we are present. And then, you know, as Dynatrace, we also have a presence there.
Yeah. Cool.
Great. Great.
Can't wait to finally meet you in person again.
Yeah. Four more weeks or five more weeks.
Yeah, exactly.
Good. Any final words, Brian, Stefano?
Anything else? Any final thoughts?
No. The only thing that was going through my mind with the live optimization
was just the thinking about all the different ways that could be connected.
I mean, it makes total sense in a Kubernetes environment
because you can have multiple instances of containers running
and observe and see what's working.
And started thinking about working that into feature flags
and all just the possibilities of how that can be used and leveraged just becomes so much more
ornate in a good way. Reminds me of something I was just reading this morning where
somebody from another company was trying to say they don't see robots taking over the IT industry.
And it's like, well, you kind of have to,
because so much of this is not going to be manageable on a human level.
So there's all, you know,
automation is going to be the key to making this work.
You automate what you learned and then go on to the next and then automate
that. And then the next, and you know, and it's that cycle.
And as always Stefano, whenever you're on,
it always excites me to hear what you all are doing because it just,
it just sounds so cutting edge.
And it's also at a layer that most people are not paying attention to, especially in the space we're in.
Everyone's looking at the code performance and even holistically at the infrastructure and container performance,
but they're not looking at those different settings and that tuning of those settings, which most people I think take for granted. And it's just great that you guys are really shining a very focused spotlight on those
areas to remind people like this is where you can make a lot of gains, right?
But you fix your foundation.
So it's always a pleasure to have you on.
Thank you.
Thank you.
Thank you, Brian, for the great words.
Actually, we are also excited about this opportunity because actually it's literally a very, very
big topic and actually it's very hard.
So it's not that people are not skilled or they don't have time.
It's literally that there's really too much complexity to be dealt with considering just
one single microservice
so basically what we are just talking about is the complexity of optimizing even just single
microservice but as you consider that you would have hundreds or office not thousands of them
and they are constantly changing the workers is changing so it's actually it's actually a pretty huge problem.
Awesome.
Well, Stefano, we'll see each other anyway,
but also keep us posted,
and I'm sure we'll have you back in the upcoming months with more lessons learned
as you're optimizing these environments.
All right. Thanks a lot, guys.
All right. Look forward to the next time
we can have you on for some of the latest and greatest.
So enjoy,
enjoy Detroit KubeCon and,
uh,
wish I could be there,
but I won't.
So,
uh,
in our hearts,
get me some swag,
Andy,
get me an Akamas,
Kichin or whatever you're giving out.
Okay.
All right.
Thank you so much for everyone listening.
And thank you, Stefano, for being on.
And as always, thank you, Andy, for being my partner in this and making this possible.
Anyhow, thanks, everybody.
See you next time.
Bye-bye.
Bye-bye.