PurePerformance - How to scale Performance Engineering in enterprises with Roman Ferstl

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and as always I have my very wonderful, talented, extraordinary, busy co-host Andy Grabner. Andy, how are you doing today? I have to come up with some more unique descriptors of you or more interesting ones. How are you, Andy? I'm good, but it's definitely better than devil because you used devil earlier when I jumped on the recording session. I know this is something we shouldn't tell anybody,

Starting point is 00:00:53 but I just want to make sure the world knows that we have the nice names and then sometimes we have the interesting names. Well, it wasn't done in a negative context. In fact, many people can argue that the devil is a positive. But it was more about a speak of a devil reference than Andy popped in. So by default, you are a devil. You're wearing your headset with your thing pointing up so you have a single horn.

Starting point is 00:01:17 So you can either be a unicorn or a devil. But anyway, we are going... I don't even know where you brought us, Andy. It's usually me that goes off on, off the rails here, but you started this with your devil, with your devil worship. Yeah. But let me bring it back because, uh, the reason why I kind of interrupted your conversation when you called me the devil, because you were really on that call and you were actually

Starting point is 00:01:41 on a call with Roman, Roman Fiestel, who is our guest today. And I want to say, hi, Roman, how are you doing? on the call with Roman, Roman Fiestel, who is our guest today. And I want to say, hi, Roman. How are you doing? Hi, Andy. Hi, Brian. I'm fine. Thanks. Greetings from Austria, Vienna. See, and that's the fun thing, Brian, also for you, because I know you love Falco. And when we, when Roman and I... Oh yeah, I remember that. That went over Roman's head, I could tell. You made a Vienna Calling reference, right? Is that where you're going? So I was watching the video and Andy made the Vienna Calling reference

Starting point is 00:02:13 and you kind of looked at him a little bit like, huh? Because obviously, I mean, Falco probably is not as popular as he once was. What, 40 years ago? 50 years ago at this point? So Roman, I don't think you picked up on that right away was that the case or was it stage fright um of course anyone in austria or everyone knows falco i was just uh not expecting andy to start with a falco. He really got me there. But yeah, again, today I guess Vienna is calling. Yeah. I wasn't sure if it was just... Because I'll make musical

Starting point is 00:02:52 references here and there, and then I'm like, wait a minute, I'm a lot older than a lot of these people, and they probably have no idea what I'm talking about. Although I was just talking to an account rep I work with, and I mentioned I made a Radiohead reference. You all know Radiohead, right? Yes, sure.

Starting point is 00:03:08 One of the biggest bands, U2 level, sort of popular. He had no idea. He's like, Radiohead? I'm like, oh my gosh. Okay, anyhow. So anyway. It is what it is. So that was at Perform, right?

Starting point is 00:03:20 So Roman was on virtual stage with you at Perform. Exactly. And I was so fascinated, not only from the Perform conference presentation, but all the stuff we've been doing over the last year or so. No, more than a year, because I think we met back in 2019 at a Neotis event and then stayed in touch. Roman and his team has helped a lot in the innovation around Captain, the quality gates, and also has been helping a lot and giving us a lot of great feedback on how to better use Dynatrace, especially now in performance engineering environments. And well, no, Roman, first of all, before we go into the talk,

Starting point is 00:03:59 because I really want to talk about some of the findings and the things you said at Perform at the Breakout, which was titled Turbocharging Your Performance Engineering. I want to first give you a little chance to introduce yourself, who you are, your background, how you came to performance, because I think that's always interesting so people can relate. And also the company you run, Triscon, maybe just a little background on that yes sure um so my name is roman i founded triscon a couple of years ago a company where we are fully dedicated to all topics about performance so all started with the focus of perform on performance engineering and performance testing so we do a lot of load tests and stuff like that

Starting point is 00:04:46 and more and more our attention was drawn also to apm and dynatrace so um this is another thing where we have uh in our portfolio tasks like setting up apm solutions, integrating APM solutions at customer sites, caring about the processes all around that come with it, including DevOps approaches. And these two things we do there, they actually link together. So performance testing obviously benefits a lot if you have a proper APM solution. So this is kind of our way where we were drawn into the dynatrace world i'd say it's also how we met you andy and i just want to say uh say back thank you i'm also very very grateful that we met uh in 2019 back at the neodymus conference a lot of things a lot of good

Starting point is 00:05:41 things happened since then and i think it was a hell of a journey and also a hell of a journey at some customer sites that I'm going to talk about today. The way how I was getting to perform was actually we started from scratch implementing performance testing at a customer site. And also we started from scratch there a couple of years later, implementing Dynatrace as their APM solution. We combined these approaches and we had huge success there. But let's keep that as a cliffhanger for now, maybe. Go into detail about it a little bit later. But were you always interested in performance?

Starting point is 00:06:31 Or what made you found that company focusing specifically on performance? Okay, so my personal motivation for this is, I think of, well, I will try to keep it short. So I was always drawn to complex problems in general. And I was like, IT is really there through my entire life. I studied actually astronomy and astrophysics, and I was developing together with a science team in austria algorithms for a space telescope and what i did was there checking out the performance for centroid algorithms so those are responsible for like the star not losing its position or actually telling the attitude orbit control system of such a space telescope

Starting point is 00:07:27 where the star is so of course you need to compute it from the images and yeah it has to operate completely autonomous so it's a huge challenge and since i was actually trying to get into science and then i ended up coding again sitting there in front of my laptop, writing code, dealing with performance. I just thought, well, this is it for me. It's the thing I love. And so I started soon after that, actually, Triscon, where I focus now entirely on performance topics. And it's not only me. I have awesome colleagues around me who feel the same about these topics, and we care about the entire process around software development. That's amazing. I was listening to that. Especially going, like, you know what, I think this rocket science isn't as interesting.

Starting point is 00:08:19 When I watch, anytime there's a SpaceX launch, right? I just imagine all the, how many people are like, I wish I can like get into that field. Like how could, you know, cause it's a, it's a brand new, exciting field. It's amazing that you were,

Starting point is 00:08:32 first of all, that you were doing that stuff period. But then of course with Andy and I having performance so close to your heart that you chose this, this to come back to and put your amazing talents there. And I don't know if it's the same kind, because you said astrophysics, if that's the same as an astrophysicist, but I'm not sure if you're aware of the Brian May from Queen, the band Queen. He got his degree in astrophysics, I think maybe sometime in the last 10 years, finally. He's been studying that like forever.

Starting point is 00:09:03 Anyhow, total sidetrack. Awesome. When you mentioned that, I was like, oh, you and Brian may have something in common now. You should call him up. Yeah, sure thing. If you give me his number, I will call him. So Roman, one big challenge that I think we have in the industry,

Starting point is 00:09:19 even though we have occasional people like you coming in and entering the performance engineering realm, there's still not enough people, I think, in performance engineering. However, what you have been doing and the story that you told at Perform was really about how can a small group of people actually have a major impact in a large organization? And mainly, not just by, and this is also what I want to make the point here, this is not just by installing an APM tool, whether it's Dynatrace or any other tool. But you had a very interesting approach and also what you explained that, you know, you walked through your story on where did they start first in automating performance?

Starting point is 00:09:59 Where did they get the biggest bang for the buck in the beginning before you then actually went into really leveraging APM. And then you also, in the very end of the presentation, you gave some more insights on, hey, if I would start a project, if I would be you, these are the steps that I would do if I would be you. So I want to kind of now hear from you when you enter this account or any account that you're working with that are more, let's say, traditional enterprises, and they are seeking from experts like you to help them with performance engineering, what do you do? What can you tell people that are maybe in a similar situation like you, either external consultants or maybe part of organizations? How do you really, you know, speed up and turbocharge your performance engineering?

Starting point is 00:10:48 What are your recommendations? All right. So there's a couple of things that you should do if you arrive at a fresh new site and check out the new customers. So first, what I do is I try to evaluate where they're standing. So in terms of what tools are they using, what philosophy are they following, a lot of people say they do DevOps,

Starting point is 00:11:16 and there is, of course, it's not binary. It's not like you do DevOps or you do not DevOps. This is something that you can do to some extent, and it is directly linked to how you should do performance tests or how you could do performance tests. So if your DevOps philosophy is already quite widespread in your entire corporation, you may want to go for automating quality gates.

Starting point is 00:11:47 If you're having still like huge silos there, this is something that is probably not even feasible to go there straight away. So first of all, you should understand how the software is built. And you should also check out how performance tests are present and executed and if they are done so and if you focus just on the topic of performance testing the tool chain is interesting this is something that I would look initially like to get clearer now, because you can actually lower your maintenance time just by switching tools. If you're using tools that are not that elaborate for load testing, there is script maintenance. Every performance tester knows this, that most of the time in performance testing is usually spent with script maintenance.

Starting point is 00:12:46 So you could start automating there. What we did is actually we started with performance testing way back, as I mentioned, in Perform 2016 at Ergo Insurance. And they did load tests there, but they did only about 15 to 20 tests per year and this was actually due to a lot of manual work and recently i i fell in love with um with a new topic or it's not really new but i i kind of dived into it a little bit and it's SRE. So why I'm bringing this reference. So our approach there to get more out of everything that is present there, to get more tests done, to become faster in performance testing,

Starting point is 00:13:34 to become more efficient, was actually to get rid of toil. And toil is just a definition of repetitive manual work in the terms of Google, whatever. I don't want to deep dive into SRE now, but I was just thinking of what was our approach, what made us successful there is that you pick up those pieces of work that are repetitively done. So if you have a test, you run your test and you probably need to design your test first, of course. If you execute your test on each release or on each build, you maybe need to adapt your scripts. And this is already done manually in most cases. So you may want to think if you can automate this process.

Starting point is 00:14:22 And some tools offer capabilities there, which are really good, such as NeoLoad, others probably are not as elaborate there. So by simply switching tools, back then it was from just Microsoft Vision Studio web test framework, a lot of manual stuff to do there. We increased our efficiency by 30 to 50 percent which we used last time for script maintenance and further on since you are recording your tests if it's an end-to-end test you click

Starting point is 00:14:57 through each step probably manually if you do a load test and this is what most performance testers still do. But there is functional testers out there who already automated most of the test cases that probably are interesting for your performance tests. So our idea was just grab those and reuse them to generate our scripts automatically and even update them automatically. So this gave us another 40 to 90 percent. And of course,

Starting point is 00:15:27 the final step, I would actually divide this. This is, so to say, the test design part, which is huge in terms of effort and maintenance and stuff. This is one pillar for load testing. First, you have to design it, then you have to execute it this is usually done quite easily you can automate this quite easily and the third pillar is the test analysis and if you have a tool such as dynatrace available there you can build up very smart things with dedicated dashboards for your load tests and you can automate the entire result analysis as we did. And this is, if you put this together, if you automate the entire process and get feedback, if it's whether good or not to go to the next stage, then you can go for an automated

Starting point is 00:16:20 quality gate if you have automated all these things. But to summarize, first of all, you should check if you could save some time for your script maintenance, maybe switch tools, maybe use automated tests that are somewhere else in Tosca, Selenium, whatever tools you're using. I'm 100% sure at any bigger corporation there is test automation present it's just a matter of if your load test tool allows to import these scripts like on the fly if they're executed and yeah of course if you have a proper apm solution there for test analysis make use of it as much as you can. It's so awesome in Dynatrace that I can tell straight away after my load test, I can identify each and every single request that I made

Starting point is 00:17:11 and I can instantly find it and go for the root cause analysis. Okay, Andy, I want to dive in. Before you go, I want to dive in on two things there because they're a little bit new to me and I wanted to see maybe if, Roman, you can share with our listeners how this works. So you mentioned NeoLoad, and we all know and love NeoLoad.

Starting point is 00:17:34 Part of the automation of once you record the script, the click and record, which has been a great feature of load testing for a long time now, the pain point has always been the correlation of data points in the end. I know, again, I haven't really done any earnest load testing since 2011, so 10 years of possible improvements

Starting point is 00:17:55 have been in there since. Have there been improvements to the correlation engine? I know when I finished, Lode Runner was trying to do something there and it would sometimes catch things, sometimes not. Part of that speed and automation

Starting point is 00:18:07 or maybe the drive to automate the scripts would be a better correlation engine. Does that exist yet or is that still something that's time consuming? Very good question.

Starting point is 00:18:22 So if it comes to the script maintenance, what the NeoLoad, if we stick with. So for performance testing, think of it that you do not automate browsers. You automate on a protocol level, and this is giving you a hard time. Why is this giving you a hard time? Think of it, you script some, or you record some login process with a user,

Starting point is 00:19:01 and you proceed through some forms, maybe a webshop, and we're including a checkout. Each time this user logs in, he gets another session ID. And as soon as you replay the scripts and your session is not valid anymore, it's going to break. So these are the things that you want to correlate. This is what we're talking about. And to automate this process,

Starting point is 00:19:21 you want to always get the session ID freshly and use it for all consecutive requests in your performance test script, you can automate this with framework parameters. So you can define an automation in Neoloot telling you each time you find this pattern in the responses, replace it and the the variables needed for the correlation so they offer at neotis some out-of-the-box correlation so for dotnet and j session ids for instance so there is out-of-the-box technology but of course in 99 you still have to do this manually at first. But as soon as you say, I do the correlation manually, you can say move to framework parameter.

Starting point is 00:20:12 And the next time you record the script, it's automatically correlated. So I have demos available. If anyone who is interested in what I'm saying about now, flick me an email. I can show you automatically generated load test scripts from the scratch with Selenium, where we just have an application.

Starting point is 00:20:33 The browser goes up. NeoLoad is listening to, we are a sproxy to all the traffic that is generated for a Selenium-driven browser. And after the test case is done and the clicks were done in the browser, NeoLoad will kick in the post-processing after the script is already there.

Starting point is 00:20:54 And this is when these framework parameters are then executed and magic happens. This is one thing. And there's a second thing, which is cool. It's called the maintenance mode in Neolode. So what you also have is if you re-record something that is already correlated, you can say, I don't want a new test case. I want to maintain this one.

Starting point is 00:21:18 And then Neolode is trying to keep your correlation that you've already done manually or with framework parameters, and it's not throwing it away anymore. So with these two things combined, you reach a really, really high degree of automation here for the correlation part. Great, and you answered my second question. It was going to be about the NeoLoad Selenium integration,

Starting point is 00:21:39 but it sounds like you're just telling NeoLoad to record the Selenium browser. It's not like it's going to ingest the script and generate it from that. But I think these are important things to identify because I know myself when I talk to people, there's always a struggle to improve performance. A lot of times people are like, oh yeah, we don't have a performance team yet.

Starting point is 00:21:57 And they always start looking at open source, say like JMeter. Now there's nothing wrong with JMeter, but when you're using something like JMeter. Now there's nothing wrong with JMeter, but when you're using something like JMeter, a lot of these designer features, let's call them, because they are amazing, but they're not 100% required just to execute

Starting point is 00:22:14 a load test. They're not in there. JMeter is a lot more of a manual process. So I think these are really important things for people to consider when they're looking at load testing tools. I'm not here to make a commercial for NeoLoad. We all know it's got a lot of great things going on with it. But at the same time, when you're having those considerations,

Starting point is 00:22:31 what else is it going to get you? If you're going to pay for your tool instead of going for the free, that's going to get you to this point that you're talking about where you have all these automation capabilities as opposed to we have a checkbox of a load tool. Now we have no time to do it because it's all manual. So these are important. Yeah, it's important for people to understand these contexts.

Starting point is 00:22:50 So thanks for explaining that some more. Actually, I want to add one more thing here because what is awesome is if you do the correlation stuff for one app, you're testing one app, and usually in performance tests, you focus on the happy cases on the things that happen most in production are critical and you do not do all the functional

Starting point is 00:23:11 tests so it's probably a number a handful of test cases that you have but still let's say if it's five to ten if you do the correlation one for once for one application the the IDs that you correlate for your first test case are probably the same, are highly likely for the second, third, and fourth test case as if you have some, I don't know, order ID or something and you're just ordering in different areas, the technical parameters are the same. So as soon as you've automated this process in a framework,

Starting point is 00:23:46 you even speed up your own test scripting initially not if you only maintain your script it's even in the script generation you already get out a huge boost there in efficiency especially i guess because if you're working with an organization they most likely share very common, let's say, frameworks on the application side, like you mentioned earlier, J session ID or ESP net, they do it themselves in a certain way. And if you record it, if you define these rules once, then you can use it in all the apps that are basically coming out of the same application teams or like application teams that are building similar apps the same they can yeah this is this is exactly how how we're doing it so you can like for these frameworks you can define

Starting point is 00:24:30 a name i usually give it a name like the tested app and then i i don't know um i want to come up with some name let's say i'm i'm not creative today let have Webshop A. So I have Webshop A framework and Mereload consisting of all the parameters that are necessary. And I think for me, the way you explained that story, and because you started on script generation or script maintenance is the big thing. This was an aha moment for me when I saw your slides because we are always pushing from the, you need to automate your quality gates. We have built all these great things around Captain

Starting point is 00:25:12 and you can pull in the metrics from any monitoring tool for every test. Now you can run your tests 50 times a day. And then you said, well, I cannot start there because I cannot run my tests 50 times a day because it's so much manual effort because right now we may use the wrong tools and therefore we have all this toil and that doesn't give us the benefit of having quality gates in the end and so this was kind of for me that aha

Starting point is 00:25:37 moment and that's why I can really encourage everyone to look at your presentation and really the last slide where you said start with getting better maintainable scripts automate your script generation and maintenance then start with the next steps right quality gates is important but you don't get the benefits if for the quality gate run you need to spend two hours fixing your scripts so exactly that's the point um And to pick up on that, if you reach the point where you think about automating quality gates, what is the next step? If you say, well, I'm fine. I do not spend actually a lot of time for script maintenance.

Starting point is 00:26:17 So how do you take the challenge? What's the next step to building an automated quality gate? So first of all you want you do not want to start with a front-end end-to-end test that you want to automate as it's more difficult in terms of the result and the automation of the result analysis so best practice would be pick a microservice world is becoming smaller and packaged in microservices anyway. So this is awesome. If you have a microservice there, probably a developer or DevOps person is sitting behind there,

Starting point is 00:26:53 even being very grateful for your automated feedback as those in this philosophy for microservice people care about what happens. The people who build the app are caring about what happens to it in production. If they are SREs, they want to keep the error budgets on a certain level. So they're really, really grateful if you can offer them to do that. So first of all, go into discussion, communication with these people, tell them what you want to build. And this is, of course, from the perspective of a performance

Starting point is 00:27:33 engineering or testing expert. If you have a center of excellence there, you probably have it because performance testing on itself has a huge and steep learning curve of all the things to consider. So it's good if you have the center of excellence and if you want to provide your performance tests as a service to the DevOps people, to the guys building or folks building the microservices. So you go into the communication, you pick a service,

Starting point is 00:28:06 you tell them exactly what you're going to do. You design a test speaking with them and what they care about to identify the metrics. And you do not break their builds. You do not go and somehow hack their pipelines and get an automated quality gates in there? No, the first step would be to automate the feedback and provide it to them as a service as often as they want it.

Starting point is 00:28:35 So enable them to kick off your test. It can be via Neolot Web on a GUI. It doesn't need to be in a pipeline. And if you've done this a couple of times and you have automated the feedback loop then you can think about integrating it in the pipeline that would be the next step now to maybe uh if it sounds a little bit too spooky for the the performance test is out there like what is he talking about? How do you automate the analysis? It doesn't work at all. So if you're thinking about that, I want to catch you up here. So Andy

Starting point is 00:29:12 showed us a way how we could use Captain. And so we were working together a lot, thinking about how Captain could help us to build an automated quality gate. And I want to break it down to some simple statements. So Captain can help you to build a final score on your metrics. And when I say metrics, it's everything that you define in a load test anyway.

Starting point is 00:29:41 You probably have your load testing dashboard, and you print somehow visually what how your test was performing you you build your graphs you check your cpu level you check your error rates you check your response times you check whatever you want and then there is an automated way captain offers it to combine all these things into a score. And again, you are able to define the goals and the parameters on that. But once you have done this step, then you can make the decision simple again for others. And that is the point. Because you are the expert as performance engineer and no one else, if they are not doing this like for months or decades

Starting point is 00:30:26 they don't know what to look at like all the different hundreds of metrics that care that you have to care about in performance testing so you take this away from them you just say okay this is predefined i have thought about this and i offer this to you as a service and you can use it as much as you want. That is the idea. One thing I wanted to touch upon that you just mentioned in there, right towards the end, you said all the hundreds of metrics that you track, right? And if I go back, or most of us go back to pre-APM load testing, we're talking about tens of metrics. Post-CPU, process CPU, total response time for the transaction,

Starting point is 00:31:09 exceptions or errors maybe. Maybe not even exceptions necessarily, just errors. Very high-level views into this data. And when you add the APM side, like Dynatrace for instance, you can get into more microscopic measurements. We can look at CPU time per microscopic measurements we can look at cpu time per transaction we can look at the response time from service to service we could look at number of database calls time spent in database right so many other different kind of metrics which then leads someone to think oh my gosh if i expand into hundreds of metrics how the hell am i going to

Starting point is 00:31:40 manage all this i have a hard time enough copying and pasting my metrics into excel and creating grant you know all this but that's exactly where this comes in right you can expand that complexity and automate the analysis so that you don't have to deal with that number but you can get such richer responses you know we always talk about finding failing early finding the problems before they get too big so if you're're running your code, even, you know, one thing I look at a lot when we're in our demo environment is I'll, I'll go to the, um, uh, multidimensional graphs, Andy,

Starting point is 00:32:13 and I'll show response time by transactions by lock time, right? And it's always zero. I'm like, great. It's zero. It should be zero. I should, I want to be able to track that at zero with every freaking build because if it suddenly is not zero,'s wrong you know something you would never normally think about doing you now have these kind of automation pieces open with combined with the ability to observe that those data that those data points now really open the world to making performance testing so much more powerful and so much more informative. And I always say, although I'm really, really glad to be on the sales engineering side,

Starting point is 00:32:48 if I was back in it, what a great time it would be back to be on the performance side because there's so many awesome things you can do. Anyhow, sidetracked there. Well, not really sidetracked, but just wanted to focus on that hundreds of metrics because it's really, really important. Yes, it's absolutely right. So these hundreds of metrics, just for the people who are not now

Starting point is 00:33:10 from the field of performance testing, you can actually divide them into three areas. That's what I do. That's my definition. So you care about three things in performance testing. You care about stability. So you have metrics to measure your stability error rates crashes and stuff like that this is one pillar and one thing to look at and

Starting point is 00:33:31 there is a tree of hundreds of metrics that tell you how your availability and stability is and then there is the second thing obviously for performance tests are your performance metrics like response times etc and the third thing you care about is resource consumption. Because if you want to transition from one stage to the other or upgrade or update an app in production, you want to know three things. You want to know, does it crash? You want to know, is it getting slower? And you want to know, is it getting more expensive?

Starting point is 00:34:03 And those are the three things that performance tests can answer and a lot more of course but i want to break it down to some simple things and below that there is hundreds of metrics that you can care about and the awesome thing that you mentioned i want to give an example for this is um it's recently something it's not if you I want to actually encourage the performance testers out there, you know, because I was talking not to other performance testers, not to tech people. I was talking on C-level about this of a very, very big company. And they showed them the automated quality gates and they showed them what we can do

Starting point is 00:34:42 with it. And this specific use case was to build an automated quality gate to track the expenses from the mainframe. So you can put this into a metric, the number of transaction calls you do or your microservices does, because this is directly translated to your license cost. And if this increased by a factor of 10, you would like to be informed, of course. And by combining these approaches now, you're able to do that. Because what you mentioned is,

Starting point is 00:35:14 as soon as you have Dynatrace, where you pull out your metrics, you have this information there and available of everything that happens below the surface. If you're just taking the information from your load test without manually gathering performance counters or setting up some automation to do this from all the systems you care about, then you only have end-to-end response times and error rates. And even defining all these performance counters, the information

Starting point is 00:35:45 you need to grab manually, it's toy like and you don't want to do that. So if you have an APM solution that is watching everything, you want to make use of that as much as you can. And I like this idea of the three pillars. The last thing I wanted to say here was the pillar

Starting point is 00:36:02 of cost, the expense is, at least for me, when I was in the low-testing world, I always loved bringing news of something breaking. Because it was like, I did my job, I found something. Which is bad news for everybody else. But when it comes to the expense thing, your example, even with the mainframe, make sure to highlight where people improve that cost because if you think about that mainframe one for example which is very concrete in dollars and cents right of course compute spend and aws and all it's a

Starting point is 00:36:38 real thing but it's not as easy to show but that mainframe especially if you were to turn around and be able to go to your development team or their bosses and be like, Hey yeah, test passed. And we reduced costs as aimed by 15%. That's something everyone will celebrate. And then that'll ingratiate, Hey, you know, Brian, the performance engineer guy helped prove that we did our job and, you know, save money for the company. So people will then like you more because you're not those at the bearer of bad news. It's, you know, when it comes to money news too, people love it even more. So a great way to make yourself more popular.

Starting point is 00:37:09 And I want to encourage, actually, everyone else who is listening and doing similar stuff to do this. What I've learned is that it's highly underrated. The performance tests and what it can give you or any company, it's such a huge thing to do and there is so many aspects where it goes into it i i cannot even i'm missing the words as you can see but um you brought it to the point um what people understand is if you say, I'm coming faster, that is nice. But if you say, we are not getting like this more expensive, we are even getting cheaper. This is something that a lot of people care about.

Starting point is 00:37:56 And so it's good to make this transparent as much as you can. If you ever have the possibility to put a price tag on your performance tests, what you actually saved, do it. Go ahead and even tell me about it because I love to hear these stories. So then I have a challenge then for all of us. You said it's not easy to show the value always from performance engineering to the company. I think adding cost is great. But the other thing that you mentioned in the very beginning, you're really excited now about the topic,

Starting point is 00:38:33 and that's SRE, site reliability engineering. So maybe is this the chance for us in the performance engineering community to kind of use this new kind of hype that was thankfully created by Google? Because in the end, site reliability engineering is also not magically new, right? And can we use that hype and really latch on to it and say, hey, you know, we need to do site reliability engineering, which includes obviously performance engineering, because in the end, as you just said, you're testing for reliability, you're testing for performance.

Starting point is 00:39:07 And then I think this is the great next point, because I've not seen Google, or at least not the best practices around SLIs and SLOs talk a whole lot about costs. And so maybe we can say, hey, we need to do SRE, but with our experience from performance engineering, we want to elevate it even to the next level. It's reliable, it is performant, and we're saving you costs. Absolutely. So this is something that I'm looking into really deeply now because I see huge synergies. SRE is such a hot topic

Starting point is 00:39:46 and a lot of people are talking about it. And what I do is I'm picking out the cherries of this concept here. SRE is huge. There is stuff like you should do postmortems, etc. And what you mentioned is SLIs and SLOs. And if you think about it, what you do as performance engineer is exactly this. So if you are thinking about doing SRE

Starting point is 00:40:13 in your company, then you will think about error budgets and you will think about how to measure those. And this is exactly the job of a performance tester. He's probably doing this in test stages for decades. So it makes so much sense to combine these two. And even SRE is a way to do DevOps for me. That's how I define it. DevOps is a philosophy. SRE is some specific concepts to make DevOps happen. And I think it's a really awesome thing to do to get a performance engineer

Starting point is 00:40:48 and to provide this feedback loop. It's actually as I'm not at Google, I can tell if they do this, but the job is at an SRE is everything also that happens around writing the code lines. So everything that happens after it and then till production of course they are also deaths but it's all everything that matters is everything that comes for resilience so this is actually what a performance test does and i've been thinking a lot about this to frame performance testing in a new way because it fits so perfectly into the entire idea of what SRE is about. Because, of course, you want to measure your SLOs, etc.,

Starting point is 00:41:34 in production. But first of all, you need to think of what are your SLOs. And if you want to keep your error budget low and not always break it, you need to do tests beforehand. And if you do the performance tests, it's like you need to do it anyway if you define your metrics. And you already thought about this in test stages.

Starting point is 00:41:56 You thought about this in performance testing. And these are the people who thought about it for decades. So you can take these metrics and put them in production to calculate error budgets. Basically the modern performance engineer then is an SRE that enables an organization to always be able to deploy because the error budget is always under control. Because that's what the whole concept of error budget is about. The error budget tells you, do we

Starting point is 00:42:29 have enough budget left for another deployment? And are our deployments safe enough that they don't eat up more error budget than we have available? And therefore, the modern performance engineer is exactly that. It's defining the right metrics, the right SLIs and SLOs to measure the error budget and then do whatever it takes

Starting point is 00:42:48 to make the system more resilient and also make sure that now bad deployments can actually then impact your error budget in production. Exactly. That is what I thought about. And of course, you may want to start there with the most critical things. Maybe don't automate quality gates for things that are running smoothly for decades

Starting point is 00:43:10 and are fine on each change. But if things already break now and then every time, then this is probably the way to implement the safety mechanism of an automated quality gate to get better feedback loops for the developers. That is what I'm thinking about. I have even more here to postulate. And I don't know if you have ever thought about this. If you have Dynatrace available, I want to stick with APM for a minute. If you have Dynatrace at the customer site and Dynatrace is being sold to monitor everything

Starting point is 00:43:48 and all the cool things that you can do with it, probably it's a lot to take for new customers. That's what I see. To get the most out of it, this is actually something that we are working on to unleash the full power of what you can do with Dynatrace. And this is not from a technical perspective, actually. What we see is that

Starting point is 00:44:11 the processes around Dynatrace, the organizational processes are missing. So Dynatrace actually, in my opinion, can help you to do better DevOps. And why is this the case? Because if you think about SRE, what is SRE and what is its core principles and what are the things to focus on? Dynatrace is actually a platform that offers a lot of things to do SRE. So you can build around processes there. I want to give you an example. So you care about availability.

Starting point is 00:44:49 You have synthetic monitoring in Dynatrace. You care about SLOs. You just implemented this. You care about postmortems. Is there any better way to discuss what happened than with the Davis root cause analysis? No, I can't think of anything. So I want to postulate today Dynatrace as SRE platform, please.

Starting point is 00:45:11 I like that. And I'm pretty sure that in case any one of our sales engineers is listening or maybe even participating in our conversation here, you can take this and use this in your next sales pitches. Yeah, absolutely. This is really cool. Roman, I know we can probably talk more because you've done not only your work at Ergo, the company you were kind of featuring at before, but also work with other organizations. But maybe this is something for another episode because I'm sure there's a lot of stuff that you have done over the last couple of years

Starting point is 00:45:48 that will also help performance engineers to become better and more efficient. Kind of concluding for today's talk, is there anything else you want to kind of tell the performance engineers, any material, the way to follow up with you, anything else we want to make sure people understand? So if anyone is happy to talk about performance,

Starting point is 00:46:12 just check out our company. It's Triscon, spells T-R-I-S-C-O-N, or just contact me on LinkedIn. I'm always happy to talk about these topics and also to show some demos um apart from that if there is like a little bit more into the future if you ever come vienna to vienna we can also i'm happy to invite you face to face for a face-to-face meeting to have some coffee together it's a nice view from our office over vienna so you you will probably remember that if you come by. And yeah, what is actually what I want to tell people? So I want to speak to the people who are frustrated out there, who are listening to your awesome DevOps approaches

Starting point is 00:46:55 and all the cool things that you can do and that Kepton offers. And what I want to tell is that probably these awesome approaches are not, it's not possible to implement them straight away and we're not living in the perfect world. But what I have learned so far is that you can pick out the pieces of these concepts to make your own life better. So this is what we did for our performance tests.

Starting point is 00:47:26 We applied SRE principles. We automated our manual work away to be more efficient. So you can always start there by yourself. And then you can look around and find like-minded people. The bigger your company is and the more rich it is. And even if you have still these huge silos where you have dev and ops completely separated and there is apps thrown around the wall and from dev to ops and people don't care anymore,

Starting point is 00:47:55 there is people out there who are like-minded. I suggest that you connect with them. And what I'm trying to do now is to apply DevOps approaches at a customer site where there are still these silos is to gather a team, a virtual team, not an organizational team, of the people who are interested in everything that happens around the code that is being built.

Starting point is 00:48:25 So just in an SRE-like manner to build up these virtual teams and to talk about how you can implement automation, how you can automate your performance tests. So to conclude, what I want to tell is that if you've been doing this for like tens of years and you think that this all automation stuff, it's not happening in my company, it's way too complicated, no one cares. It's not true. I'm sure there's people there who care. And if you care it yourself, you can apply the principles for your daily work. And I suggested you connect with the very same people in your company and to have a really deep conversation of what applies in your corporate world and what's possible there to break down the silos. I had to take a lot of notes now at the end because this is really good.

Starting point is 00:49:20 I think we should take a blurb out of the last like two minutes because you started like i want to speak to the people feel the frustration it's pretty good really good actually it's good that you mentioned it when i when i started this sentence what i actually wanted to conclude with is there is so much there's so many awesome things going on you and the field of performance testing is just becoming more and more important. The stuff that you do is so extremely important and it's going to be seen more and more.

Starting point is 00:49:54 And by combining it with principles like SREs, which are hot topics, which people know that it is important, it's getting now framed in the right sense. So people will see how important this is. This is actually my personal mission that I'm on. One last thing I would add to the frustrated users is we talked a lot today about features in Neotis and in Dynatrace.

Starting point is 00:50:22 And of course, we like these two tools. A lot of times people don't have these tools available. Maybe you have other tools that have equal capabilities. Good chance you have other tools that have lesser capabilities. And it shouldn't dishearten you to say, well, I can't do all these things. I don't have that correlation engine. I can't do X, Y, or Z.

Starting point is 00:50:41 There's some level of something you can do. Even like, let's say you're using JMeter, and I don't know if JMeter has a correlation plugin, because I know there's a lot of plugins. Somebody might've written one for it. But even if it didn't, you can at the very least try to speed up your automation. If you have a known list of parameters

Starting point is 00:50:59 that you want to fix, you could automate, you know, if you use the SED command set or something, just to do a find and replace on these things in your script just to speed it up. Small things here or there that would show some benefit help speed you along. Might even be able to eventually build up a use case to take to your upper management to say, hey, look, I put all this effort in some of this manual stuff. Look at the small improvement we got from this. If we were to spend the money on, let's see, a Neotis or something like that that has all these things built in and more,

Starting point is 00:51:27 we can get a benefit and that can help bring you there. But there's probably at least some places you can start with your tool set if they don't have all the fancy things in there. So don't be disheartened if you're hearing all these features that you don't have. Take a look and get creative. And again, you'll just need the time. And that's going to be the trickiest part is finding that time to break out of your daily routine to start looking at these improvements. All right. Andy, any other thoughts on your side? I kind of hogged this one a little bit today, huh? No, it's all good. It's all good. I'm glad that you talked a little more today than I typically do. No, other than that, I know we will have a continued partnership here with Roman and drive

Starting point is 00:52:08 and influence the performance engineering community in the future and encourage more people to join our community. Just looking forward to the next event. I think you are probably also part of a part of new to spec. Yes. And I'm talking about one of my, I'd say, favorite topics there in terms of bugs and nasty things that you can track down. And as fate usually is, I'm talking there about concurrency issues. And right now, today and the last couple of days, I am still tracking down some really nasty concurrency problem

Starting point is 00:52:49 that is causing huge pain at the customer side. So yes, it will all be about concurrency testing at the PAC event. Make sure to join in. Awesome. All right. Thank you, everybody, for listening. Andy, thanks for doing this with me every week. And Rowan, thank you you everybody for listening. Andy, thanks for being, doing this with me every week and Rowan, thank you so much for sharing.

Starting point is 00:53:12 This was amazing. I always love when we get to talk about performance. And thanks everyone for listening. If anybody has any questions, comments, you can reach us at pure underscore DT, or you can send us a email at pure performance at dynatrace.com. Roman, I forgot, did you announce you mentioned something about people can follow you or reach out to you? How should they contact you?

Starting point is 00:53:31 Or if they wanted to follow you? Sure. If you want to contact me personally, probably best way is to search my name on LinkedIn. Spells R-O-M-A-N-F-E-R-S-T-L. Great.

Starting point is 00:53:43 Happy to talk to you. Awesome. Thank you so very much. And we'll be back soon. Thanks, everybody. Bye-bye. Bye-bye. Bye-bye.

PurePerformance - How to scale Performance Engineering in enterprises with Roman Ferstl

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.