PurePerformance - How to optimize performance and cost of k8s workloads with Stefano Doni

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and today I have my very, very special guest host. The one and only, can you guess? Andy Grabner. Andy Grabner. Thank you for being a guest host this week. A guest host? What does that mean even?

Starting point is 00:00:48 Well, Andreas Grabner couldn't come today, so Andy Grabner. Ah, that's the way it works. I don't know. It's the only thing I could think of today. It's not funny, but it's okay. Next time you get another shot. I get a C for effort. Maybe you should Google for funny openings in podcasts i was going to try to get with them a muppet show type of intro but that would be too loud and interrupt my wife on her business call and i don't think a lot of our listeners would get the whole muppet show reference because that's from the early 80s late 70s but yeah what are you

Starting point is 00:01:20 gonna do you know what we're gonna do we're going to jump right into the topic because we have limited time as always. And we have a lot of things to cover today. We have a repeat guest on the show. I'm not sure how many times has he been on? I think definitely one full podcast. I know he was on for one of the ones we did for Perform, I think. I don't know if there was a second full podcast or not. So this is at least his third time

Starting point is 00:01:46 maybe fourth and it's not because we like Italy and Italians just in general for the food and everything else it's really because the culture is so much everything yeah exactly but now without further ado Stefano welcome to the show

Starting point is 00:02:02 hey thank you thank you for having me here Stefano for those that the show. Hey, thank you. Thank you for having me here. Hey, Stefano, for those that kind of may have escaped the episodes prior where you spoke with us, can you quickly introduce yourself? Yeah, sure. I'm Stefano Doni. I'm co-founder at ACAMAS. Basically, what we do at ACAMAS is an optimization platform powered by AI

Starting point is 00:02:24 to help teams optimize their application performance, cost, and resiliency. I started out doing performance engineering work since early 2000s, so optimizing system performance and efficiency has always been my passion. That's why we invite you. That's why we like you, because that's the same for Brianrian and myself and i'm sure many of our listeners right we are always interested in making systems more efficient i think especially in the in the times we live in right now efficiency uh saving not only cpu and memory but especially energy in the end is very important for us um so thank you stefano and with your work at Akamas that you are helping the community to do so. Well, actually, let's jump right into topics. I think the last time we had you, we talked about this.

Starting point is 00:03:13 We also did a webinar together. I think we talked about Java performance optimization. I think this was one of the first things you talked about. You also created some great papers, blog posts. We also talked about database performance optimizations. Today's topic, however, is focusing on Kubernetes. And you have been spending quite some time on analyzing Kubernetes clusters and nodes and pods and containers and everything that runs on it. And let's go step by step and item by item through the things that you have found, lessons learned, so that our listeners can learn

Starting point is 00:03:45 from those things and not make the same mistakes. All right, so let's dive into it. So actually, the topic of today is actually how to extract most performance or cost efficiency out of Kubernetes application, literally. And while doing so, it's important, of course, to preserve application resiliency. So how do I decrease cost of my Kubernetes application while not actually going into out of memory or CPU throttling issues? So those are among the most common question that we literally get from many, many customers.

Starting point is 00:04:20 And we are working very heavily, of course, on Kubernetes as it's becoming one of the most common layers and the next cloud native platform to run modern microservices. So I guess that's the topic of this session. So I guess I want to start with I would say resiliency. One of the biggest questions is that of, Kubernetes is going to rise in terms of adoption and with that comes typically increasing the cost of the footprint of the environment. So

Starting point is 00:04:52 one of the key questions is how can I reduce the cost? But in doing that, the usual problem that people run into is that, of course, you run the risk of, in a way, shrinking too much the container size or wrongly configure your application and ending up into the dreaded out-of-memory errors. So that's especially, in a way, hard to manage for Java workloads. So that's one of the most common issues that we see. So basically, the problem that we are talking about is that, of course, people working on Kubernetes knows about resource management. The way Kubernetes manages the memory in particular is pretty interesting, meaning that, of course,

Starting point is 00:05:34 we all know that Kubernetes will kill your container as soon as your container memory uses the limits. That's about how Kubernetes works. So the current approach that we see teams that want to reduce costs is, of course, shrinking the size of containers because at the end of the day, what you pay in terms of Kubernetes infrastructure is directly tied to basically the size of your pods, which comes down to the CPU requests and CPU memory requests

Starting point is 00:06:04 and limits that you configure. As a developer or DevOps or SRE, you configure within your YAML files that actually detects how Kubernetes will allocate the resources. So especially as regards resiliency, we see typically most people struggling with out-of-memories. So that is due to the fact that Kubernetes will kill your container if you don't properly size your Kubernetes memory limits. So the thing that we have actually talked a little bit about that also in the beginning of this year in several conferences, the thing to realize is that most people today adopt an approach that relies on pretty much

Starting point is 00:06:45 observability or kind of capacity planning approaches where they actually look at basically memory usage and then compare memory usage with your memory limits. So sounds like a kind of sensible approach, meaning that you need to kind of put your limits above your memory usage and you'll be fine. So that's kind of the current approach. And again, it might be sensible, might look as if we were. But at the end of the day, people with those kind of approaches still are suffering with lots of out-of-memory errors.

Starting point is 00:07:18 And then let me ask you a question on this. Because first of all, for people that might not be familiar with Java and the details, in the old days, unquote old days right we you ran a java container you gave it a certain memory and then when the java application tried to allocate more memory than was available in the java heap it threw an out of memory exception if obviously the garbage collector couldn't you know clean up things now we're kind of wrapping around the JVM, another concept where in Kubernetes, we basically say this is how much memory we give you overall.

Starting point is 00:07:54 First of all, these two need to be aligned, I guess, right? Because it doesn't make sense to make the outer wrap smaller than what's allowed internally. But also the other way around, if the JVM, if the rep around it is much bigger, then the JVM internally will never use it anyway. So I think that that needs to be properly aligned with each other, correct?

Starting point is 00:08:15 Yeah, that's the point, Andy. You're summarized pretty correctly. So the thing is that currently the JVM, basically it's a pretty big engine. It's pretty highly multi-threaded and kind of marvelous engineering kind of effort that has been done over the years by Oracle, et cetera. The thing is that we are putting this kind of big engine,

Starting point is 00:08:37 which is highly concurrent, et cetera, into a small container. So the challenge that we have here, especially for Java workloads, by the way, it's not just about Java. We are going to see also Golang going into pretty much the same direction, which is interesting. There's news about the current time. But focusing on the JVM, the problem is that when actually you move the JVM within the container,

Starting point is 00:08:57 you don't really have a way to say to the JVM, OK, that's my memory limit. Please just stay below that limit. So what the kind of control, the kind of parameters that you have on the JVM typically deals with how much heap memory that you can set. So the pretty famous max heap settings, pretty much every Java developer knows this setting, which dictates how much heap memory

Starting point is 00:09:21 the Java VLT machine will allocate. But the problem is that the JVM will allocate memory also outside the Java heap. So let's say that you have a 4 gig memory limit in your container. So the big question that most people are struggling with is how big will be my memory heap within that 4 gig limit? So we see people that are putting 4 gig amount of heap within the 4 gig memory. So kind of pretty obviously wrong, I would say. But the thing is that what is the right size?

Starting point is 00:09:51 So shall I put, I don't know, 3 gig of memory HIP to stay within 4 gig of memory? Or shall I go with half 2 gig, 50% of my memory? The thing is that the memory used by the JVM, it's comprised not only by the memory heap, but with the so-called off heap. And what people are not able to realize is that actually the amount of off heap,

Starting point is 00:10:14 the whole memory usage of the process can be actually pretty much higher with respect to the heap. So it's not just, I don't know, 10% or 20%. We regularly see JVMs that are even twice the size of the memory heap allocated or even bigger. So that actually is the root cause that triggers out of memory. So people are having a hard time to try to identify the right amount of heap. And that's actually because it's kind of hard.

Starting point is 00:10:42 So the JVM has literally no nodes that in a way dictates the total amount of memory usage, and that's becoming the actual problem. How do I feed my JVM workloads to actually play nice with a container and not actually have those out-of-memory, which are sudden events where Kubernetes will kill your job, the case shown in the night without actually any prior signal about that.

Starting point is 00:11:13 So and that's also, I think, as far as I know, you call it off heap memory. I think it's the native memory that the JVM itself needs, but especially depending on what your app is loading, because your app is loading libraries that are also allocating native memory. And then I assume, and Stefano correct me if I'm wrong, but there's just no rule of thumb where you say the JVM in that version always needs 20% or 30% because it really depends on the app and on the workload. Yeah, right. And it's also, for example, depends highly on the settings of the GVM, of course, especially on the garbage collectors.

Starting point is 00:11:49 So garbage collectors also need to allocate those extra IP, so native memory, the JIT compiler needs to allocate the class, the amount of classes, so the metadata that actually your application actually loads, as you mentioned. So the way you configure is pretty key. So if you don't look at this piece, what is the counter-intuit is that you might see from a memory user perspective that your container will use

Starting point is 00:12:16 even 50 percent of memory usage. So let's say that out of the four gig of your container, you will see that just two gig of memory will be used. So looking at this picture, kind of the first reaction would be, I need to drop. I can safely put the limit to, I don't know, three gig, and that would be fine. The problem is that the GBM that runs within the container

Starting point is 00:12:40 will suddenly, might suddenly allocate much more memory. And that's a root cause of the many out of memory errors that we see. So that means basically every performance engineer, every capacity engineer will still have a job, even more importantly in the Kubernetes world, because we need to do more testing and more optimization, configuration optimization. I know you guys obviously at akamasi have automated a lot of this already but um it's just really interesting to see right we're adding a layer on top of a layer in this case around we're putting something around

Starting point is 00:13:17 already a very complex runtime and therefore have new side effects that are hard to comprehend and actually hard to understand. Yes, exactly. Brian, you go ahead. Yeah, I was going to ask, so we talk about the native, the heap. Is there a single memory metric that people can look at to see what's my total memory consumed by this? Like if someone was going to visually go through and say, all right, I want to find this metric and allocate based on that. Is that something easier or is it always a calculation

Starting point is 00:13:57 based on several different metrics? Yeah, it's actually an interesting question, Brian. It's actually, there's no single matrix actually. So what you need, you can look at is utilization within the heap, which is something that most observability tools provide. You can, of course, on the other side, look at the total consumption within the container. So of course, which comprise

Starting point is 00:14:21 the total JVM process memory usage, for example. So you will get very different pictures, but you will be able in a way to start to correlate those. So for example, understanding what would be the kind of extra heap, extra memory that is required. So it's kind of trying that unfortunately. So it's not that the JVM provides good support about that. So there has been lots of effort also to try to estimate those kind of extra heap amount based on different metrics, but it's not easy.

Starting point is 00:14:51 So I guess the current approach will be to revise your configuration, be conscious of, for example, the maximum heap that you are putting into your containers. And it's kind of a trial and error, unfortunately. So you need to be able to test it out and see what works for your application. And Stefano, I asked you in preparation what is called to send us a couple of bullet points.

Starting point is 00:15:16 And I'm just looking at something that you wrote, which is another lesson learned. It says, out of memory kills due to sudden memory allocation peaks. I think the way I read it when I first read it, that an application, right, if you put an application into a container, then you have certain requests that are memory, let's say, easy on the memory and other ones are very intensive. So for instance, let's say an application is just doing a read from a database and put something out,

Starting point is 00:15:45 or it is calculating a complex report where it needs to pull in a lot of data into memory. So I'm actually wondering if this is a great chance for application developers and especially architects to say, hey, we have this big Java application and we need to figure out what type of workloads cost how much and then cost, I mean, how much CPU and memory so that I can redirect certain requests to maybe a different container that runs with a different memory and CPU limit than containers that can handle requests that are easier. I mean, it's basically breaking up the monolith, but maybe in this case, not necessarily breaking the monolith really from a code-based perspective, but maybe deploying the same container multiple times, but with different memory limits so that more costly requests go to a container that can actually

Starting point is 00:16:34 handle the costly request. It's like if you go to a supermarket, right? You have to self-check out lanes where it's super easy and just do it if you have up to six items. But if you have more, then you go to another lane. And I think this is the same thing we need to consider when we architect software and do traffic routing. Yeah, I guess that may be a sensible traffic management policy.

Starting point is 00:16:58 So it's always good if you can kind of reduce the viability for the amount of work that you need to do that makes doing capacity planning, forecasting, optimization easier because the workload becomes much more homogeneous and it's easier to deal with this kind of situation. Yeah, cool. So memory, right? Memory is a big thing.

Starting point is 00:17:20 So that means what I take away from this is um a a runtime wrapped by another runtime and then we just need to understand the proper settings i also like that there's not just java heap there's also the native memory the total memory needs to be taken into consideration and a lot of settings that can be and has to be adjusted and um i'm pretty sure you guys have already put a cool stuff into your product into akamas to automate some of that right yeah that's literally that's literally why while doing those kind of optimization for customers that's why we how do we discover those kind of insights so example, we discovered that certain garbage collectors are much more heavy on the native memory, like for example G1. It needs much more memory with respect to the serial or

Starting point is 00:18:12 parallel. So that becomes also an optimization kind of knobs that you can turn. So how do I then configure the GVNs to make sure that I can bring, I can lower the limit in a safe way because actually I can shrink the heap size, but I can also control the total amount of memory. And literally the JVM has plenty of options that also in a way dictates how the JVM allocates off-heap memory that makes for additional cost reduction opportunities. So besides memory, what's next? What other things have you found out in your work? that makes for additional cost reduction opportunities. So besides memory, what's next? What other things have you found out in your work?

Starting point is 00:18:49 Yeah, the other big issue is about CPU throttling. So I guess this one is, and again, related to that is, of course, CPU limits. I guess it's one of the most debated topic around Kubernetes sizing and performance. By the way, there's also the question of, shall I put CPU limits or shall I just go to CPU request? I won't focus on that because the other most interesting thing is that people is using CPU limits. So people is realizing the benefits of CPU limits in terms of performance isolation, basically.

Starting point is 00:19:25 So not having, I don't know, a runaway workload impacting my performance sensitive and business critical workloads. So what we find is that people are leveraging CPU limits in their real life. And I think that's considering pros and cons, that's still the best choice for Kubernetes today. But then the next question literally becomes, how do I deal

Starting point is 00:19:45 with CPU throttling? CPU throttling means that, of course, again, due to those Kubernetes resource management mechanisms, we are talking about them quite some time also in some blog posts. So it's kind of

Starting point is 00:20:01 counterintuitive and hard to understand how actually Kubernetes, in a way, manages resources, CPU resources especially. So while on the memory we are talking about, it's kind of easy. So basically, as soon as you hit the limits, you get killed. It's kind of a little bit more complex as you got the CPU. So Kubernetes actually start to, when you say, okay, I want to have, I don't know, two CPU available for Kubernetes. What is happening is that Kubernetes is allowing

Starting point is 00:20:34 your container to actually use all the CPUs on your host. So not just two CPUs. What is actually considering Kubernetes equivalent CPU time of two CPUs. So what that implies is that, for example, if you have a highly multi-threaded workloads, like pretty much any microservices is doing, or again, JVM or even Go routines or even Python, it's kind of the norm today. So you will be consumed the equivalent of your two CPUs

Starting point is 00:21:06 quarters in a short amount of time. So the rest of the time that once you have, in a way, consumed your CPU quarter, Kubernetes is going to pull out from the CPU. So it's going to throttle you. So your application will get stalled. It's like a garbage collection post for those familiar with the JVM, but it is happening at the Kubernetes layer.

Starting point is 00:21:30 And it is impacting pretty much any application. So not just about Java application, but again, it's kind of the mechanism that Kubernetes is using. So that's kind of the problem. And with that problem, you mentioned the artifacts that you would see would be a slowdown in the performance. And I guess within the container itself, you would see that its CPU is being maxed out at the same time. So it would look like I'm maxing

Starting point is 00:21:59 out my CPU, but it's really because of that throttle. I mean, I guess that's the same as if it was a VM or anything else, right? It's the maximum allocated CPU, and it's going to manifest itself as a standard CPU slowdown upon that situation. Yeah, what is counterintuitive with Kubernetes and kind of different with respect to the VMware world or operating system world is that this throttling can happen

Starting point is 00:22:23 at a very low CPU usage. So it's kind of different with respect to the memory. So it's not that when your CPU hits 100%, you will get throttled. But in our experience, also working with the customer, we see that CPU throttling can arise as early as 30% of CPU used with respect to your limits. So what is counterintuitive is that people see, okay, I'm 30% CPU usage. I have plenty of spare capacity, but in the reality,

Starting point is 00:22:51 Kubernetes is already throttling your workloads. So that's kind of a counterintuitive indifference with respect to our usual, in a way, threshold that we have always used when sizing, for example, VMs. And that's why those kind of usual best practices don't work anymore with containers. And that's because it's based on the CPU time, right? Not because it's based on the apparent estimate of what two core or two CPUs would actually get.

Starting point is 00:23:20 Okay, interesting. Exactly. Yeah. So memory and CPU are two well-known entities in performance engineering. But they just are bringing... So this goes back to the basics, right? Yeah, exactly. Looking through the notes,

Starting point is 00:23:39 and now I'm jumping ahead, but you've mentioned this quite a bit already. You always talk about costs, and this was also something that you put into the notes for us to kind of talk about. It's really reducing cost of Kubernetes applications because in the end, whatever we do, we try to run our software cost efficient. And with Kubernetes Kubernetes with seemingly endless scalability options, if you add enough nodes, it's obviously very hard to always have an

Starting point is 00:24:14 eye and keep an eye on the costs. So what are the lessons learned there? How can you really manage and reduce costs on Kubernetes? Yeah, that's right. So actually, Kubernetes is great, meaning that it allows you to have self-filling application, highly scalable, automatic management, et cetera. So that's great. But at the end of the day, even with auto-scaling, if your container or pod template is not properly sized and configured, you're just going to duplicate or multiply your inefficiency.

Starting point is 00:24:46 So the thing that actually it's key to realize while actually working on lots of customers to optimize the cost of Kubernetes is that typically people start just by looking at the infrastructure layer. In the microservice world, we mean by that looking at CPU and memory sizing of your container. So let's say CPU requests and CPU limits. So the thing is that, of course, that gives you a kind of initial benefit, meaning that, of course, if you are running with 10x the amount of CPUs, you are highly over-provisioned. That's, of course, the first thing to do is right size on that regard but to

Starting point is 00:25:26 do to do the kind of the next level is what is important to realize is that at the end of the day if you want to reduce your footprint of your containers if your memory requests and limits what is literally driving the consumption of the resource consumption is what runs within the container right and that is the application of course that depends a lot of the resource consumption is what runs within the container, right? And that is the application. Of course, that depends a lot on the application code. That's pretty clear. But there's another big important area which drives the resource consumption within the application, which is the application runtime. So again, that's the role of the JVM or even the Golang runtime.

Starting point is 00:26:01 So we have worked quite a bit also on the Golang runtime. So what is important to realize, for example, again, coming back to the memory. So how much memory my pod is really consuming is basically dictated by how the JVM is configured or how the Golang garbage collection is configured. So it depends on how the application works, but the kind of the lion's share is dictated

Starting point is 00:26:24 by how the JVM is configured, because the JVM but the kind of the lion's share is dictated by how the JVM is configured because the JVM at the end of the day will pretty much in many cases use all the memory that it has been configured to use again on the settings, pretty much irrespective of the actual kind of demand that code requires. So that's kind of the important takeaway I guess. And I think this is, I mean, we kind of reminds me of our discussions we had over the years on frameworks like Hibernate, right? Like Hibernate, all the runtimes are general purpose, generic runtimes that can really run any type of workload. Therefore, by default, it can never be optimized to your specific workload. But it gives you a lot of screws or switches or whatever configuration options to optimize it.

Starting point is 00:27:21 But then it means you need to A, understand your current workload, and I guess that's the next big thing. While you may know your workload today, it may not be the same workload tomorrow. And I think this is where also the topic of continuous optimization comes in because you need to continuously re-evaluate your current workload

Starting point is 00:27:42 and then adjust the runtime settings to optimally run for exactly your workload well i wish there was someone who could do that for us you know it'd just be so amazing if that could be automated to some degree never it would never work do you think something like this would ever exist i don't know it's like science fiction you're talking about here maybe we should found a company and call it Akamas. It's interesting, right? We're coming back to the same things that we obviously discussed in previous

Starting point is 00:28:15 sessions on these many different knobs that you can turn in these runtimes. It's the first time, though, I hear you talk about Golang, at least in that aspect. So why Golang all of a sudden? Why is Golang all of a sudden on your radar? Yeah, that's a great question, because actually, the thing that we are more and more realizing is that, of course, the application runtimes is playing a big role in the whole picture, meaning that cost reduction,

Starting point is 00:28:45 performance improvements, and also reliability, as we mentioned. So those are kind of three legs tool that people must talk about that needs to be reconciled. So it's kind of trade-off. And within this trade-off, what we find is that besides, of course, Kubernetes settings, what needs to be aligned is what runs within the container. So we talk a lot about the GVM because it actually is, of course, one of the most common language people run microservices on Kubernetes today. But it's interesting to see that, of course, many workers are running on Golang.

Starting point is 00:29:22 So actually, if you look at how Golang actually managed the resources, especially the memory, I was a little bit kind of surprised to see that actually it's not actually by default playing well with Kubernetes in a way. So it's kind of a big statement, but I was kind of in a way surprised due to the fact that both Kubernetes and Golang comes from Google. So I thought that they would kind of work magically out of the box. But I didn't find that. Instead, how Golang works in terms of managing the memory is the fact that simply Gololan has a pretty simple gabless collection algorithm that basically sees what is the amount of live memory

Starting point is 00:30:08 that your application is actually using. So we can say, okay, my application requires 100 meg of live objects or real objects that my application is using. And then actually you have one single tunable within the Golan runtime. application is using. And then actually, you have one single tunable within the Golang runtime. So that has been one key decision of the Google Golang

Starting point is 00:30:32 runtime team that actually decided not to avoid the whole JVM configuration issues. So they just decided on using one knob that basically dictates when the garbage collection will be triggered. So how much memory, in a way, garbage memory will need to be accumulated when the new garbage collection will be triggered. So by default, this variable which is called goGC is equal to 100,

Starting point is 00:30:59 which means that if your application requires 100 meg, basically the goLang GC will trigger when memory usage reaches 200 meg. So you can imagine the two-throw pattern here. So that's how the Golang runtime memory manager has always worked over the years. So the first thing to realize is that Golang is not actually looking at all at your

Starting point is 00:31:26 memory usage, memory limits, in a way. So contrary to the JVM, which we know, of course, that JVM tries to self-adapt to the container size, both in terms of CPU and limits, there has been a huge work in terms of adapting the so-called JVM ergonomics to play well within containers. Actually, Golang is not doing any, those kind of automatic tuning within the containers. So basically, what you need to do is, you need to ensure that your memory usage won't be, will be actually fit within your container memory limit.

Starting point is 00:32:00 So that's kind of the first point that people need to understand. I remember, isn't that similar? Maybe I remember this incorrectly, but in the early days of.NET runtime, there was also just two modes, right? There was the server mode and the workstation mode or whatever they called it. And other than that, it didn't really have a whole lot to configure at all. And I'm not sure how much it has changed now,

Starting point is 00:32:29 but it feels like the Golang is very opinionated on what it's doing. And you can only do a little bit from the outside. Yeah. But I think it's there. Even with this model, you have this tunable, which kind of play an interesting role already because actually we did experiment

Starting point is 00:32:47 and by rising the GoGC variable, you will be able to allocate more memory. And at the same time, for example, you will reduce the CPU usage of the garbage collector. So even with this single tunable, basically you are already able to kind of decide your trade-off, perhaps 100 percent is doing too many GCs, so you can even put 1,000 and you will have 10x the memory allocation,

Starting point is 00:33:12 but at the same time you will reduce the pauses and the garbage collection work. That's how the Golang has worked it up to now. But what is interesting is that with the new release, Golang 1.19, which I guess came out a couple of months ago, they actually introduced a soft memory limit. So basically, they had realized that, and that was a request coming from many users, that people were having issues with out-of-memory

Starting point is 00:33:39 due to the fact that actually the Golang runtime allocates memory kind of irrespective with respect to your memory limits. So it would be pretty easy to hit your limit and again trigger an out-of-memory error by Kubernetes. So with the new release, the Golang runtime is moving more towards how the JVM has always worked. So basically, you will have a kind of max heap,

Starting point is 00:34:03 so a max heap size, which is the amount of memory that the Golang runtime will try to use without going after. And that's kind of interesting because that brings the Golang runtime much more in a way similar to the JVM, we expect to what we were talking about before. You know, as we talk about memory, something just came to my memory. And I remember

Starting point is 00:34:29 exactly, I remembered, yeah. I exactly remember now when we had our last podcast recording. It was when Austria played Italy in the Euro Cup, and you were with the Italian flag, and I was with the Austrian flag. I remember with the Austrian flag.

Starting point is 00:34:46 I remember now. There you go. Anyway, strange memories coming to my mind. Maybe I need some garbage collection to clear this up. Stefano, I know you've done a lot of work from the beginning when you started Akamas on JVM and then the database, now Kubernetes. What is next, kind of as a final thought? What is next? Do you already have some other runtimes in mind? Do you have some other, I don't know, what's the next items? Yeah, what we just announced, a kind of next generation evolution of the platform we are

Starting point is 00:35:32 very proud of, which is what we call the ability to optimize application directly in production. So for people that know or don't know actually Ak Akamasa, our focus was up to now actually helping mostly performance engineers or software developers, SREs, kind of optimize their application configuration, pretty much the thing that we are talking about today, in a staging or pre-prod environment, because that makes a lot of sense to explore all the configuration and see what works best with a kind of log testing approach that is still interesting for many use cases, but actually many customers are asking us to actually do the next step

Starting point is 00:36:15 of bringing this kind of approach, which can bring these kind of values and benefits to production. So what we just announced is Akamas 3.0, which brings the ability to actually enhance our platform. So we will be able to do, we still retain, of course, the capability to optimize application pre-prod environments, leveraging load tests like JMeter, load runner, et cetera. But now what is very interesting is that,

Starting point is 00:36:44 especially for Kubernetes environment, we are able to do this kind of optimization work automatically meter, load runner, et cetera. But now what is very interesting is that, especially for Kubernetes environment, we are able to do this kind of optimization work automatically, leveraging AI directly in production. And that means you're then changing your deployment. Yeah, I mean, you're changing the deployment configuration in Kubernetes. Are you doing this as an operator in Kubernetes,

Starting point is 00:37:04 or how does this work? Well, actually, at the moment, it's not a Kubernetes operator. So basically we interact, we have basically two ways. So one way, the first way would be to interact with the Kubernetes APIs. So again, like you said, putting the parameters into the deployment YAML files, CPU memory request limits, or the JVM or Golan configurations. The other kind of option, which is what actually people are mostly interested in, is kind of more GitOps approach. So actually, the recommendation from MacAvans

Starting point is 00:37:40 won't actually touch the clusters at all, but it would be simply an update into a Git repo where typically people are already storing their application configurations. Then we have approval process, pull request for example, where people can see the changes and then they can apply the changes live in production triggering pipelines or

Starting point is 00:38:04 leveraging the kind of automation that DevOps has already invested in, basically. Yeah, that makes a lot of sense. It's also the, you're right, it's the GitHub's way to do it and it's the human aspect is also still in there that they have to approve it

Starting point is 00:38:18 and see what your suggestion is. Cool. Yeah, exactly. Good. I would say, well, the last thing that I read on your notes, which is very exciting because it means I will also get to see you in person in a couple of weeks because you are going to be at KubeCon in Detroit. Yes, exactly. So we are very, very happy about that.

Starting point is 00:38:41 So we will be, we will have a booth there so not a lot we are very very happy to be on the floor again after a couple of years so i'm meeting lots of people running on working on kubernetes and hear their stories their problems see if we can help with them yeah you will you need to make sure to also uh then meet up with henrik he's going to be there right with right i call him mr mr is it observable and uh we have a booth uh from from the captain side we also have a booth i think open feature is another open source project the um app the um app delivery seek is also there with the booth where we are present. And then, you know, as Dynatrace, we also have a presence there. Yeah. Cool.

Starting point is 00:39:28 Great. Great. Can't wait to finally meet you in person again. Yeah. Four more weeks or five more weeks. Yeah, exactly. Good. Any final words, Brian, Stefano? Anything else? Any final thoughts? No. The only thing that was going through my mind with the live optimization was just the thinking about all the different ways that could be connected.

Starting point is 00:39:54 I mean, it makes total sense in a Kubernetes environment because you can have multiple instances of containers running and observe and see what's working. And started thinking about working that into feature flags and all just the possibilities of how that can be used and leveraged just becomes so much more ornate in a good way. Reminds me of something I was just reading this morning where somebody from another company was trying to say they don't see robots taking over the IT industry. And it's like, well, you kind of have to,

Starting point is 00:40:25 because so much of this is not going to be manageable on a human level. So there's all, you know, automation is going to be the key to making this work. You automate what you learned and then go on to the next and then automate that. And then the next, and you know, and it's that cycle. And as always Stefano, whenever you're on, it always excites me to hear what you all are doing because it just, it just sounds so cutting edge.

Starting point is 00:40:46 And it's also at a layer that most people are not paying attention to, especially in the space we're in. Everyone's looking at the code performance and even holistically at the infrastructure and container performance, but they're not looking at those different settings and that tuning of those settings, which most people I think take for granted. And it's just great that you guys are really shining a very focused spotlight on those areas to remind people like this is where you can make a lot of gains, right? But you fix your foundation. So it's always a pleasure to have you on. Thank you. Thank you.

Starting point is 00:41:22 Thank you, Brian, for the great words. Actually, we are also excited about this opportunity because actually it's literally a very, very big topic and actually it's very hard. So it's not that people are not skilled or they don't have time. It's literally that there's really too much complexity to be dealt with considering just one single microservice so basically what we are just talking about is the complexity of optimizing even just single microservice but as you consider that you would have hundreds or office not thousands of them

Starting point is 00:41:56 and they are constantly changing the workers is changing so it's actually it's actually a pretty huge problem. Awesome. Well, Stefano, we'll see each other anyway, but also keep us posted, and I'm sure we'll have you back in the upcoming months with more lessons learned as you're optimizing these environments. All right. Thanks a lot, guys. All right. Look forward to the next time

Starting point is 00:42:21 we can have you on for some of the latest and greatest. So enjoy, enjoy Detroit KubeCon and, uh, wish I could be there, but I won't. So, uh,

Starting point is 00:42:33 in our hearts, get me some swag, Andy, get me an Akamas, Kichin or whatever you're giving out. Okay. All right. Thank you so much for everyone listening.

Starting point is 00:42:46 And thank you, Stefano, for being on. And as always, thank you, Andy, for being my partner in this and making this possible. Anyhow, thanks, everybody. See you next time. Bye-bye. Bye-bye.

PurePerformance - How to optimize performance and cost of k8s workloads with Stefano Doni

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.