PurePerformance - Perform 2020: AI Assisted instance tuning with Akamas

Episode Date: February 5, 2020

...

Transcript
Discussion (0)
Starting point is 00:00:00 Hold on, I'll give the intro. Coming to you from Dynatrace Perform in Las Vegas, it's Pure Performance. Hola amigos, here your friend Leandro Melendez, aka Señor Performo, broadcasting from Dynatrace Perform in Las Vegas 2020. Is that good? Yes, and all PerfBytes and Pure Performance.
Starting point is 00:00:35 And together, multitasking, broadcasting with PerfBytes. I'm really excited for the intro. Yes, I like the intro. So we'd like to say ciao, ciao to our guests today. We have, was it again? It's sorry. Oh, Andrea and Luca from Akamas. Yeah.
Starting point is 00:00:55 And we've had, who was the one Andy and I had on the show in the past? I think Stefano Doni. Yes, Stefano. Stefano. Yes. And so that was a really cool episode. I really loved hearing what you all do. So for any listeners in the past, you already know what we're talking about,
Starting point is 00:01:09 but why don't you all explain what it is that Akamas does initially, and then we'll go into exploring where you're at now. Yeah, so if you already follow Stefano's speech, basically the idea is that Akamas helps to close the loop of continuous optimization. Our goal is to help companies to making sure that they're running their application on their any IT stack
Starting point is 00:01:34 configured in the best way possible. So we work in the space of any companies run their technology stack using default value and keep increasing the technology depth and technology stack using default value and keep increasing the technology depth and technology stack using default value because that's what the market says. Right, right.
Starting point is 00:01:50 Until something crashes. Okay? What we do is basically we help companies to optimize that and making sure that whatever value they are using, it's the best for their specific workload, for their specific technology stack. Right. And I believe we spoke with Stefano about the JVM tunings, and there's like what, maybe 300 or 700 different settings,
Starting point is 00:02:11 somewhere in that. Last one was 770, close to 800. Yeah, different settings you can make on a JVM. Most people know about their GC, right? Very few of them. But the idea here was it's going to run, and it's going to start tweaking those based on how you're running and see what's going to perform best
Starting point is 00:02:26 and use the AI to leverage to optimize that. Because most people, you know, I had no idea that there were that many options. But this extends a lot further than JVMs, right? Yeah, exactly. So the main concept is that even the 800 parameters that we just matched with the JVM, most of them, most of the people don't even know that there are 800.
Starting point is 00:02:47 And even experts may not know everything on what every single flag does. So the problem is not only with a single component like a JVM, but if then you work at the same level to have like JVM and operating system at the same time, and you have to interlace different technologies. At that point, the question is that who you are going to talk to, to your Oracle expert or to your operating system expert? And generally, that leads to a lot of interesting conversation. So Arcamast, what it does is go over this type of distinction
Starting point is 00:03:18 and optimize whatever parts of the stack at the same time. So we can work optimizing JVM and operating system at the same time. And we can work optimizing JVM and operating system at the same time. And if I may, how does this optimization work? Like, you have this 800 list of settings that you can go through and optimize, check. What do you do? Just, like, tweet it up a little,
Starting point is 00:03:37 tweet it down a little, and start to... How does it... So it's actually much smarter. So what we do is that we use a loop of configuration, measurement, test. Sorry, a configuration, test, measurement, and reconfiguration. So we use, for example, technology like Neotis for running performance tests on the application. We use technology like Dynatis to give us an understanding on how the application reacts and performs. And then based on this information
Starting point is 00:04:05 and deciding what type of goal, like where do I want to go, do I want to increase the throughput, minimize the CPU utilization, minimize the memory footprint, the AI engine behind the scenes decides what is the best next set of parameters. And we explore the entire space
Starting point is 00:04:22 of possible configuration. Right. And I think it's really interesting, too, because you give, when you talk about the 800 parameters or thinking about tuning your JVM between the JVM settings and the OS settings, these are probably things that people were never doing to begin with. Exactly.
Starting point is 00:04:37 It was, we think we've maxed out the performance of it or this is what it is, so we'll just start a new instance, where now there's this option of we might be able to squeeze another 30% performance into this if you use a tool like Akamasa to improve the settings, and suddenly you're going to get that boost of performance. They really just have the JVM out of the box and running the data. Yeah, or maybe they'll say, I'll take a different GC strategy, maybe, right? A couple little things they'll try, and then they throw up their hands.
Starting point is 00:05:05 Like, that's it. We need more machines. Yeah. So this opens a whole new world of, yeah, really, really awesome stuff. And actually, it's completely aligned to the new idea of DevOps and AIOps. So things about even solutions like Kapton. So you get in the point of deploying the new application, do the check on the quality gates,
Starting point is 00:05:30 and then you can even run an optimization stack so that you know that you're running with your very last code release the best solution possible. Getting the very, very best bang for the buck that they have in settings. So think about doing the same solution that you just said of changing the garbage about doing the same solution that you just said
Starting point is 00:05:45 of changing the garbage collection for every code release that you're having because you're changing the code and you're going to change the workload on what the application does. Nobody does that. And also that could be, those settings should not be static either
Starting point is 00:05:58 because your utilization patterns, the way that everything comes into your application might be changing, might be experiencing. You need to always be. Yeah, actually, customer is asking us, how many studies should I run? And the question is, well, it depends how many configurations you want to have. Maybe overnight there are less people going to your service. You can decide to use a more conservative way to reduce, for example, costs running on your AWS instances.
Starting point is 00:06:21 And during peak, you change the configuration because you know that you want the most performance possible. It would be a little bit like your thermostat settings. Exactly. If you're not home, just let it cool, let it not work so much, and if you're home and it's summer, you want it cooling up stuff. Exactly. I'm from Boston, so I have the opposite problem
Starting point is 00:06:39 when I have to start the heating. But it's important that you have something monitoring again with the thermostat example you have something like saying hey i probably movement sensors that there's people at home or you're coming home in a couple hours i need to heat it up a little bit same with the system you you will know that you have like a black friday event or you have you are going to have a create a dedicated configuration exactly for that specific case that's really cool so what's been going on since last we spoke with Stefano I let Luca reply because he's the news certainly news for sure we extended the scope of technology that they were able to to address with Akamas so at the beginning we start with
Starting point is 00:07:21 Linux and OS that are one let's say the and Java, the most common on our customer base. But nowadays, we work also on databases, on application server, on Spark, on big data platform, on Elasticsearch, several technology, and also now AWS, which is really interesting. Because with AWS, we were able to pick the right instance and reduce the cost of our customer, guarantee the same service level to the end user. So, at the end of the day,
Starting point is 00:07:56 each instance can be a parameter because you have to pick up the amount of CPU, the amount of RAM, the kind of disk. So, all of them are parameters. You can have a nice Ansible playbook able to engage through the API AWS and spawn an instance. So Akamas was able to integrate with all those aspects
Starting point is 00:08:17 and do a study in order to pick the right instance, pick the right amount of CPU and RAM and reduce the cost at the end of the day for our customers. That was really nice. It's a kind of new goal in Akamas with respect to the traditional performance throughput or response time-based approach. The interesting piece that we are finding,
Starting point is 00:08:36 the more we are using Akamas, is that originally we started with the idea of a transactional type of workload, like a number of users connected to the same service and running performance tests. But the more we are using it, the more we realize how broad is the scope. So Luca mentioned, for example, the AC2
Starting point is 00:08:51 instances. He mentioned Spark, but for example, even all the batch job. There is a huge amount of requests on how do I know, how can I squeeze the jobs in a shorter amount of time by changing what is the order that I'm executing those jobs. And we run
Starting point is 00:09:07 a scenario and basically keep changing parameters and the execution to making sure that you are running at the best way possible. So the more we are using, the more we find out there are more interesting things that we can do about it. Yeah, you keep like no pun intended, AI
Starting point is 00:09:23 learning how it is, how can you expand, what other things can you start to tweak. Exactly. And who knows probably you'll get to a thousand settings that you can start playing with. So there are a lot of interesting new features. And it's great that you talked about the AWS side because
Starting point is 00:09:39 one of the goals of performance in the cloud isn't just to get a better performing application, it's to reduce your cost because most people when they go to the cloud isn't just to get a better performing application. It's to reduce your cost because most people, when they go to the cloud, they just throw everything up. And developers have unlimited resources, so they just do whatever they want. What do you mean I have to do tuning? It's just a cloud. I don't need to do that anymore. We don't pay for it.
Starting point is 00:09:59 But the finance is getting the bill, and it's like, oh, my gosh, what are we all doing here? So that's really great that you can look at it and see we can reduce these and pull them in and even just automate that process. I mean, we've seen in GCP, even though if you have a Google instance, they'll kind of say you're oversubscribed on this VM, but they're not going to do anything, right? So you're taking that extra step of saying we're going to tweak it and change it down. Now, earlier when I came by the booth to say hi to you all, hopefully I heard this right. I thought you said something about Kubernetes as well. Yeah. It's one of the brand new.
Starting point is 00:10:33 We just had someone from Red Hat over here. We were talking about a lot about Kubernetes. And I wish you could be here to hear what you're about to say. So let's go ahead. No, Kubernetes was, I forgot it, but it's one of the new technology that we are able to address. And we were able, and a customer, even to increase the throughput of the application,
Starting point is 00:10:52 reducing the footprint on the data center working with Kubernetes. From Akamas, it's really easy to work with Kubernetes because it's CLI, it's API, it's really a nice way to integrate with that technology. And it was a really good experience because at the beginning, the idea was just to increase the throughput. But working with the customer on their Kubernetes,
Starting point is 00:11:14 we found out that there was really nice configuration reducing the amount of the size of the pod, the memory, so memory for pre-introduction. But at the same time, working on the JVM side, we were able to increase the throughput. So the two layers combined bring us really great results. And they were running, it seems to me, a decrease of 20% of the memory footprint on the single pod. But they were running 100 pods. So at the end of the day, something like 300 gigabytes of memory reduced on the data center was really a nice outcome from our perspective.
Starting point is 00:11:48 And that works as well, like on-prem Kubernetes as well as the cloud solutions like EKS and all those? Yeah. Akamas just needs to be able to apply the parameters, generate workload, and measure how the test goes. And we can even work in production. This is a new feature that we are working on. In our vision, let's say that it's a sort of canary deployment in which the canary is not a new release, but it's a new configuration.
Starting point is 00:12:12 So we can just run maybe one microservice with a specific configuration, compare how it behaves with respect to the old configuration running for most of the service. And then Akamas can, it's the same approach, but in production. And you can decide to roll it of the service. And then Akamas can, it's the same approach, but the production is set.
Starting point is 00:12:27 And you can decide to roll it over. Yeah, and without the test. So you can actually measure how users expect that. That's great. I started to look at the deviation before I started to fully step on the next. Everything, so one of the things we were just talking about with Justin at Red Hat was one of the changes in CoreOS
Starting point is 00:12:44 is that you can now manage the deployment of it as if it was code. You're just pushing it. You're not actually going in, logging in, making updates. You have an OS update that's being pushed as code. So this idea of everything as code, where everything has a push event, treating your
Starting point is 00:13:00 settings as canary releases. This all fits in together and it's just trending everywhere. From OS updates to your code updates, obviously. We had on, I forget who it was a while back, talking about network as code, putting out your network configurations as code, which you can then, again, in a Canary or blue-green style, do all this stuff with.
Starting point is 00:13:17 And it's amazing how this idea has taken off, and it's great to hear that you're right on top of it with all this. Anything else? Yeah, any great to hear that you're right on top of it. Anything else? Yeah, any news or things that you see coming or happening to Akama soon that you're excited about? We were saying earlier like
Starting point is 00:13:35 if you cannot give spoilers but Well, there are some things that we can say, some things that we cannot say. Yeah, understandable. But of the ones that you can and you're excited that you see happening soon for for akamas so one of the things that i'm more excited about and then luca you can go with yours is that we are expanding the ecosystem of of partners of partners that we are working with so akamas is we are we are investing a lot of time to work with dynatrace both from a monitoring perspective as well as captain we're working a
Starting point is 00:14:04 lot with neotis to have an integration in place, because what we want to do is to have the chance to have Akamas part of your cycle and make that as easy as possible. So one of the things that we have is that Akamas is actually on the Dynatrace marketplace now. So there is this continuous involvement of being part of this closed-loop methodology that we keep saying to our customers, both from a performance engineering perspective other than Akamas, and that's amazing.
Starting point is 00:14:31 So the ecosystem of people that we keep working with keeps us to find new ideas and where we can go. And the partnership that we have, it's amazing. Yeah, I do believe it's like a great mission what you're aiming to because in the same way as monitoring that i personally probably not many share my thought should be everywhere when you have a system that you know what is going on for what you're describing what happens us should also be on every system so that you are able to tweak and tune and have the best shape possible yeah everything and it's a good purpose that you are able to tweak and tune and have the best shape possible. Everything.
Starting point is 00:15:06 It's a good purpose that you're trying to serve. It's pretty cool. Hopefully you'll reach. Before we get to Luca's answer, I just wanted to also mention last year the trend was we were talking a lot about API integrations with different tools where you're getting tools to work and do things that they weren't initially designed to do. But because that API exists, someone comes along and says, ah, I can use that and do this. And it's like, wow, what a great idea.
Starting point is 00:15:34 And when you talked about the trio of Neotis, Dynatrace, and Akamas and how there's all these API connections and, you know, you have your observability piece, you have your load piece, you have your tweaking, the tuning piece all working together. Behind the scenes, once you have it all connected up, you don't have to have a human being in there
Starting point is 00:15:54 tweaking things and doing stuff. It's just like an endless loop of robots talking to each other. Exactly. Suddenly becoming aware and alive and then the Terminator comes and destroys us all. Becoming Skynet. That's not a good point.
Starting point is 00:16:06 The most optimal Skynet. Yes, exactly. But Luca, what were your thoughts? For the future, okay. For sure next week there will be a big announcement. The official integration will be public with the Neotis. Now it's working mostly on our labs and, let's say, beta customer, but it will be officially public next week.
Starting point is 00:16:26 So we're quite happy about that. And what I really another interesting point is the approach that we are starting to have with our customer, the idea with this is to build a community where everyone can contribute, add new technology to the scope.
Starting point is 00:16:42 So we have this concept of optimization pack, which is a piece of knowledge where we put all our knowledge about a given technology for instance all the parameters that are relevant for a technology all the metrics and the idea is to make this sort of public space where anybody can add new technology in order to help Akamas to improve, but other customers to work with the same technology. And another news could be, I'm not sure by when it will be available, but it is to have a way to.
Starting point is 00:17:15 Pay attention what are you saying. Yeah, because dev team is listening. I know that dev team is listening, so I know that when I come back, they will say, no, no, no. You just revealed our secrets. It's a way to simplify a sort of trial or adoption because right now we have not a trial. And customers are keeping us.
Starting point is 00:17:37 It's your initial integration or the hookup. Here's a taste. Yes, exactly. You pay for the next one. It's almost like a drug dealer does, right? You give your little taste, you get hooked on it. The first one's always free. Yes, exactly.
Starting point is 00:17:53 Pretty cool. Anything else that's your just general technology side, things that are happening in the tech world outside of what you specifically do that you see going on that you're excited about. And smile for the camera. Which camera? That camera.
Starting point is 00:18:13 There's also a live stream. Just in general, obviously you guys are outside of what you're specifically doing. When you look at what else is going on, what do you see that you just are like, that's really, really awesome and I really want to see where that goes in the world. So one thing that I'm more interested about, and I see a lot of customers going into that direction,
Starting point is 00:18:34 it's basically what has been mentioned also today during the main stage. It's all the concept about automation, which Akamas fits in, but it's not just a matter to promote even more the concept beyond Akamas, but it's something that every company, every of our customers, so Akamas is part of a broader group that does performance engineering as consulting services. So we work with customers to help them to optimize their performance engineering practice, either APM or load test, capacity optimization,
Starting point is 00:19:03 and so on. And everywhere that we go, the concept of automation and thinking what's going to happen in the next five years start becoming the late motive where everybody keeps, how do I make that so that I automate as much as possible and I have to have less people involved in my operations? Which means, and that is important i i see this i see this shift a couple of years ago having this type of conversation leads to nothing because people were scared of what i'm gonna what i'm supposed to do when everything is automated right now instead every start thinking it's not i'm losing my activities but i will shift to something at a
Starting point is 00:19:43 higher level right so i don't spend time on doing performance analysis tests, but I educate other teams. You can do more of those activities or better. Or even managing all the automation. Someone's got to do that too, right? And educate how do I embrace, how to make sure that the application developer, which is actually doing the job of pushing new code, are actually aligned to the practice that I am suggesting them to use.
Starting point is 00:20:06 And this shift is becoming more and more prominent above all here because applications are starting to get pushed much, much more frequently and people cannot keep hiring new people to increase the team. So the problem is automation and try to make that as easy as possible to be embraced by anyone in the company.
Starting point is 00:20:22 So automation and visibility, those are the two main thing that i changed that i that i've seen even done this start from being a performance engineering tool to uh everybody in the company has to see everything yeah would you any anything on your side luca yeah let's say that i'm based in europe so my perception of the market, of the IT market, could be a bit different. I am the lucky one. But what I noticed over the last months, I'd say that finally, even in Europe, and even in the enterprise company, those approaches that Andrea just mentioned,
Starting point is 00:20:58 even Kubernetes and microservices are becoming a reality. Maybe it's just a pilot project, some experiment. They're putting their foot in the water, right? Yeah. Attending conferences like that, I've been learning about microservices, Kubernetes, for years.
Starting point is 00:21:18 But in my daily job, I never had the chance to work with my customers. In our lab, we have it, because with Akamas, we need to work with this kind of technology, but I never had the chance to work with my custom. In our lab, we have it because with Akamas, we need to work with this kind of technology, but I never had the chance to work in reality. But now it's happening, and I'm quite happy with that because
Starting point is 00:21:31 we'll enable a new kind of approaches to testing, performance, optimization, and so on that were not possible with the old approaches, monolithic application, long deployment process, and so on. So I'm really glad that also in Europe we are getting there. Where are you getting, like, with some customers that were early adopters
Starting point is 00:21:52 or when you started to speak about it, did you get, like, the expressions of the people that you were exposing that, like it was science fiction or something that you were doing? Some of them, yeah. I have to admit that for them, it seems okay. You can automate all that? Yeah. Yeah, it's something that in some regions, as you mentioned,
Starting point is 00:22:15 everything seems to be flowing, and some others are just like ahead of and out of something, and I don't know if I can adapt it and try to customize. Yeah, it's true. It's also true that a lot of companies now are starting moving with the idea of being disruptive if they don't do make that type of changes. So what basically happens is that they are more,
Starting point is 00:22:34 they are planning for the transformation to the cloud and to our Kubernetes infrastructure because if they don't do that, that will become the way that they don't stay in their business. So entire organization restructuring to move from a monolithic approach with the database team and the application team and the network team to a full horizontal.
Starting point is 00:22:53 So there is a team that supports the entire business process. I see also those as well. Again, probably in the U.S. it's much easier than in Europe for the type of way of understanding business, of running IT and running business. But I've seen those type of changes and it's happening more and more and more frequently. Great. And how long are you
Starting point is 00:23:13 out here for? Sorry? How long are you in Vegas? I've been here since Friday and I will be flying back. I will be attending the CMG Impact event next week. We'll be at the Westin. So it's been two weeks here, and I'm not ready for that. Oh, wow.
Starting point is 00:23:30 Is this your first time in Vegas? Yeah, first time. Okay. Are you enjoying it? Yeah. Let's say that I'm starting to get rid of the jet lag today, so I'm going to understand where I am, actually. Do you have to head back right after the show,
Starting point is 00:23:44 or are you going to spend some time to check some things out? No, I will be back at the end of next week. Oh, you're going to CM where I am, actually. Do you plan on, do you have to head back right after the show, or are you going to spend some time to check some things out? No, I will be back at the end of next week. Oh, you're going to CMG as well. Okay. So you're two together. Well, we're working in Boston anytime soon, so if anybody wants to visit us in Boston, please come on. We have, yeah, we're in Waltham.
Starting point is 00:23:58 You've got to say Waltham. Waltham. As they say it up there. It's not Waltham, it's Waltham. Two A's. But, yeah, that's awesome. Boston accent. You have to apply to that.
Starting point is 00:24:07 You know, I like to do accents, and I like to do other, you know, be silly and all, but the Boston accent is a very tough one for me to do. I am refusing to do any possible sentence that is approachable to that. Yes. I don't have a Boston accent. All right, excellent.
Starting point is 00:24:22 Well, really, thank you guys for coming by today, and awesome to hear all the updates. So we look forward, you know, knowing that we have all these integrations, I look forward to seeing and we'll see more of what we're doing together. And also, would you like to give a heads-up where can people find out more about
Starting point is 00:24:38 Akamas or hear from you? So for people here that perform, they can come at the booth of Moviya and Akamas where we can actually show the tool and show how that result can be achieved. And otherwise, I invite them to visit akamas.io, pronunciated as it spells with a K. Okay. So we can find resources and all information about how Akamas works and how to get in contact with us. Excellent.
Starting point is 00:25:03 All right. Thank you. Thank you very much, guys. Thank you so much.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.