PurePerformance - How to scale Performance Engineering in enterprises with Roman Ferstl
Episode Date: April 12, 2021Performance Engineering is not about running a performance test twice a year. That is just a poor attempt trying to validate your non functional requirements.Roman Ferstl, Managing Directory at Trisco...n, has discovered his love for performance engineering while optimizing code for software used in a space program. He then founded Triscon who is now helping to establish and scale performance engineering at large enterprises. In this episode we get his insights on how he approaches a new project, which bottlenecks to address first and how to motivate more people within an organization to invest in performance engineering.If you want to learn more don’t miss to check out Roman’s presentation from Perform 2021 titled “Turbocharging your Performance Engineering teams to scale efficiently”https://www.linkedin.com/in/roman-ferstl/https://www.triscon-it.com/en/https://perform.dynatrace.com/2021-americas/breakouts-single-day-3-turbocharging-your-performance-engineering-teams
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance.
My name is Brian Wilson and as always I have my very wonderful, talented, extraordinary, busy co-host Andy Grabner.
Andy, how are you doing today? I have to come up with some more unique descriptors of you or more interesting ones. How are you, Andy? I'm good, but it's definitely better than devil
because you used devil earlier
when I jumped on the recording session.
I know this is something we shouldn't tell anybody,
but I just want to make sure the world knows
that we have the nice names
and then sometimes we have the interesting names.
Well, it wasn't done in a negative context.
In fact, many people can argue that the devil is a positive.
But it was more about a speak of a devil reference than Andy popped in.
So by default, you are a devil.
You're wearing your headset with your thing pointing up so you have a single horn.
So you can either be a unicorn or a devil.
But anyway, we are going...
I don't even know where you brought us, Andy.
It's usually me that goes off on, off the rails here, but you started this with your
devil, with your devil worship.
Yeah.
But let me bring it back because, uh, the reason why I kind of interrupted your conversation
when you called me the devil, because you were really on that call and you were actually
on a call with Roman, Roman Fiestel, who is our guest today.
And I want to say, hi, Roman, how are you doing? on the call with Roman, Roman Fiestel, who is our guest today. And
I want to say, hi, Roman. How are you doing? Hi, Andy. Hi, Brian. I'm fine. Thanks. Greetings
from Austria, Vienna. See, and that's the fun thing, Brian, also for you, because I know you
love Falco. And when we, when Roman and I... Oh yeah, I remember that. That went over Roman's head, I could tell.
You made a Vienna Calling reference, right?
Is that where you're going?
So I was watching the video and Andy made the Vienna Calling reference
and you kind of looked at him a little bit like, huh?
Because obviously, I mean, Falco probably is not as popular as he once was.
What, 40 years ago? 50 years ago at this point?
So Roman, I don't think you picked up on that right away was that the case or was it stage fright um of course anyone in austria or everyone
knows falco i was just uh not expecting andy to start with a falco. He really got me there. But yeah, again, today
I guess Vienna is calling.
Yeah.
I wasn't sure if it was just... Because I'll make musical
references here and there, and then I'm like,
wait a minute, I'm a lot older than a lot
of these people, and they probably have no idea what I'm talking
about. Although I was just talking to
an account rep I work with, and I mentioned
I made a Radiohead reference.
You all know Radiohead, right?
Yes, sure.
One of the biggest bands, U2 level, sort of popular.
He had no idea.
He's like, Radiohead?
I'm like, oh my gosh.
Okay, anyhow.
So anyway.
It is what it is.
So that was at Perform, right?
So Roman was on virtual stage with you at Perform.
Exactly.
And I was so fascinated, not only from the Perform conference presentation, but all the stuff we've been doing over the last year or so.
No, more than a year, because I think we met back in 2019 at a Neotis event and then stayed in touch.
Roman and his team has helped a lot in the innovation around Captain, the quality gates,
and also has been helping a lot and giving us a lot of great feedback on how to better use Dynatrace,
especially now in performance engineering environments.
And well, no, Roman, first of all, before we go into the talk,
because I really want to talk about some of the findings and the things you said at Perform at the Breakout,
which was titled Turbocharging Your Performance Engineering. I want to first give you a little
chance to introduce yourself, who you are, your background, how you came to performance,
because I think that's always interesting so people can relate. And also the company
you run, Triscon, maybe just a little background on that yes sure um so my name
is roman i founded triscon a couple of years ago a company where we are fully dedicated to all topics
about performance so all started with the focus of perform on performance engineering and
performance testing so we do a lot of load tests and stuff like that
and more and more our attention was drawn also to apm and dynatrace so um this is another thing
where we have uh in our portfolio tasks like setting up apm solutions, integrating APM solutions at customer sites, caring about the processes
all around that come with it, including DevOps approaches.
And these two things we do there, they actually link together.
So performance testing obviously benefits a lot if you have a proper APM solution.
So this is kind of our way where we were drawn into the dynatrace world
i'd say it's also how we met you andy and i just want to say uh say back thank you i'm also very
very grateful that we met uh in 2019 back at the neodymus conference a lot of things a lot of good
things happened since then and i think it was a hell of a journey and also a hell of a journey at some customer sites that I'm going to talk about today.
The way how I was getting to perform was actually we started from scratch implementing performance testing at a customer site.
And also we started from scratch there a couple of years later,
implementing Dynatrace as their APM solution.
We combined these approaches and we had huge success there.
But let's keep that as a cliffhanger for now, maybe.
Go into detail about it a little bit later.
But were you always interested in performance?
Or what made you found that company focusing specifically on performance?
Okay, so my personal motivation for this is, I think of, well, I will try to keep it short.
So I was always drawn to complex problems in general.
And I was like, IT is really there through my entire life.
I studied actually astronomy and astrophysics,
and I was developing together with a science team in austria algorithms for a space
telescope and what i did was there checking out the performance for centroid algorithms so those
are responsible for like the star not losing its position or actually telling the attitude orbit control system of such a space telescope
where the star is so of course you need to compute it from the images and yeah it has to operate
completely autonomous so it's a huge challenge and since i was actually trying to get into science
and then i ended up coding again sitting there in front of my laptop, writing code, dealing with performance.
I just thought, well, this is it for me. It's the thing I love.
And so I started soon after that, actually, Triscon, where I focus now entirely on performance topics.
And it's not only me. I have awesome colleagues around me who feel the same about these topics, and we care about the entire process around software development.
That's amazing. I was listening to that.
Especially going, like, you know what, I think this rocket science isn't as interesting.
When I watch, anytime there's a SpaceX launch, right? I just imagine all the, how many people are like,
I wish I can like get into that field.
Like how could,
you know,
cause it's a,
it's a brand new,
exciting field.
It's amazing that you were,
first of all,
that you were doing that stuff period.
But then of course with Andy and I having performance so close to your heart
that you chose this,
this to come back to and put your amazing talents there.
And I don't know if it's the same kind, because you said astrophysics, if that's the same as an astrophysicist, but I'm not sure if you're aware of the Brian May from Queen, the band Queen.
He got his degree in astrophysics, I think maybe sometime in the last 10 years, finally.
He's been studying that like forever.
Anyhow, total sidetrack.
Awesome.
When you mentioned that, I was like,
oh, you and Brian may have something in common now.
You should call him up.
Yeah, sure thing.
If you give me his number, I will call him.
So Roman, one big challenge that I think we have in the industry,
even though we have occasional people like you coming in
and entering the performance engineering realm,
there's still not enough people, I think, in performance engineering. However,
what you have been doing and the story that you told at Perform was really about how can a small
group of people actually have a major impact in a large organization? And mainly, not just by,
and this is also what I want to make the point here, this is not just by installing an APM tool, whether it's Dynatrace or any other tool.
But you had a very interesting approach and also what you explained that, you know,
you walked through your story on where did they start first in automating performance?
Where did they get the biggest bang for the buck in the beginning before you then actually went into really leveraging APM. And then you also, in the very end of the presentation, you gave some more insights on,
hey, if I would start a project, if I would be you, these are the steps that I would do if I
would be you. So I want to kind of now hear from you when you enter this account or any account
that you're working with that are more, let's say,
traditional enterprises, and they are seeking from experts like you to help them with performance
engineering, what do you do? What can you tell people that are maybe in a similar situation like
you, either external consultants or maybe part of organizations? How do you really, you know,
speed up and turbocharge your performance engineering?
What are your recommendations?
All right.
So there's a couple of things that you should do if you arrive at a fresh new site and check
out the new customers.
So first, what I do is I try to evaluate where they're standing.
So in terms of what tools are they using,
what philosophy are they following,
a lot of people say they do DevOps,
and there is, of course, it's not binary.
It's not like you do DevOps or you do not DevOps.
This is something that you can do
to some extent, and it is directly linked to how
you should do performance tests or how you could do performance tests.
So if your DevOps philosophy is already quite widespread
in your entire corporation, you
may want to go for automating quality gates.
If you're having still like huge silos there, this is something that is probably not even
feasible to go there straight away.
So first of all, you should understand how the software is built.
And you should also check out how performance tests are present and executed
and if they are done so and if you focus just on the topic of performance testing the tool
chain is interesting this is something that I would look initially like to get clearer now, because you can actually lower your maintenance time just by switching tools.
If you're using tools that are not that elaborate for load testing, there is script maintenance.
Every performance tester knows this, that most of the time in performance testing is usually spent with script maintenance.
So you could start automating there.
What we did is actually we started with performance testing way back,
as I mentioned, in Perform 2016 at Ergo Insurance.
And they did load tests there, but they did only about 15 to 20 tests per year
and this was actually due to a lot of manual work and recently i i fell in love with um with a new
topic or it's not really new but i i kind of dived into it a little bit and it's SRE. So why I'm bringing this reference.
So our approach there to get more out of everything that is present there,
to get more tests done, to become faster in performance testing,
to become more efficient, was actually to get rid of toil.
And toil is just a definition of repetitive manual work in the terms of Google, whatever. I don't want to
deep dive into SRE now, but I was just thinking of what was our approach, what made us successful
there is that you pick up those pieces of work that are repetitively done. So if you have a test,
you run your test and you probably need to design your test first, of course.
If you execute your test on each release or on each build, you maybe need to adapt your scripts.
And this is already done manually in most cases.
So you may want to think if you can automate this process.
And some tools offer capabilities there,
which are really good, such as NeoLoad,
others probably are not as elaborate there.
So by simply switching tools,
back then it was from just Microsoft Vision Studio
web test framework, a lot of manual stuff to do there.
We increased our efficiency by 30 to 50 percent which we used last time for script
maintenance and further on since you are recording your tests if it's an end-to-end test you click
through each step probably manually if you do a load test and this is what most performance
testers still do.
But there is functional testers out there who already automated most of the test cases
that probably are interesting for your performance tests.
So our idea was just grab those and reuse them
to generate our scripts automatically
and even update them automatically.
So this gave us another 40 to 90 percent. And of course,
the final step, I would actually divide this. This is, so to say, the test design part,
which is huge in terms of effort and maintenance and stuff. This is one pillar for load testing.
First, you have to design it, then you have to execute it this is usually done quite easily you can automate this
quite easily and the third pillar is the test analysis and if you have a tool such as dynatrace
available there you can build up very smart things with dedicated dashboards for your load tests
and you can automate the entire result analysis as we did.
And this is, if you put this together, if you automate the entire process and get feedback,
if it's whether good or not to go to the next stage, then you can go for an automated
quality gate if you have automated all these things.
But to summarize, first of all, you should check if you could save some time for your script maintenance,
maybe switch tools, maybe use automated tests that are somewhere else in Tosca, Selenium,
whatever tools you're using.
I'm 100% sure at any bigger corporation there is test automation present it's just a matter of if your load test tool allows to
import these scripts like on the fly if they're executed and yeah of course if you have a proper
apm solution there for test analysis make use of it as much as you can. It's so awesome in Dynatrace that I can tell straight away after my load test,
I can identify each and every single request that I made
and I can instantly find it
and go for the root cause analysis.
Okay, Andy, I want to dive in.
Before you go, I want to dive in on two things there
because they're a little bit new to me
and I wanted to see maybe if, Roman,
you can share with our listeners how this works.
So you mentioned NeoLoad, and we all know and love NeoLoad.
Part of the automation of once you record the script,
the click and record, which has been a great feature
of load testing for a long time now,
the pain point has always been the correlation
of data points in the end.
I know, again, I haven't really done
any earnest load testing since 2011,
so 10 years of possible improvements
have been in there since.
Have there been improvements to the correlation engine?
I know when I finished,
Lode Runner was trying to do something there
and it would sometimes catch things,
sometimes not.
Part of that speed
and automation
or maybe the drive
to automate the scripts
would be a better
correlation engine.
Does that exist yet
or is that still something
that's time consuming?
Very good question.
So if it comes
to the script maintenance, what the NeoLoad, if we stick with. So for performance testing,
think of it that you do not automate browsers.
You automate on a protocol level,
and this is giving you a hard time.
Why is this giving you a hard time?
Think of it, you script some,
or you record some login process with a user,
and you proceed through some forms,
maybe a webshop, and we're including a checkout.
Each time this user logs in, he gets another session ID.
And as soon as you replay the scripts
and your session is not valid anymore, it's going to break.
So these are the things that you want to correlate.
This is what we're talking about.
And to automate this process,
you want to always get the session ID freshly and use it for all consecutive
requests in your performance test script, you can automate this with framework parameters.
So you can define an automation in Neoloot telling you each time you find this pattern
in the responses, replace it and the the variables needed for the correlation so
they offer at neotis some out-of-the-box correlation so for dotnet and j session ids
for instance so there is out-of-the-box technology but of course in 99 you still have to do this manually at first.
But as soon as you say, I do the correlation manually,
you can say move to framework parameter.
And the next time you record the script,
it's automatically correlated.
So I have demos available.
If anyone who is interested in what I'm saying about now,
flick me an email.
I can show you automatically generated load test scripts
from the scratch with Selenium,
where we just have an application.
The browser goes up.
NeoLoad is listening to,
we are a sproxy to all the traffic
that is generated for a Selenium-driven browser.
And after the test case is done
and the clicks were done in the browser,
NeoLoad will kick in the post-processing
after the script is already there.
And this is when these framework parameters
are then executed and magic happens.
This is one thing.
And there's a second thing, which is cool.
It's called the maintenance mode in Neolode.
So what you also have is if you re-record something that is already correlated,
you can say, I don't want a new test case.
I want to maintain this one.
And then Neolode is trying to keep your correlation that you've already done manually
or with framework parameters,
and it's not throwing it away anymore.
So with these two things combined,
you reach a really, really high degree of automation here
for the correlation part.
Great, and you answered my second question.
It was going to be about the NeoLoad Selenium integration,
but it sounds like you're just telling NeoLoad to record
the Selenium browser.
It's not like it's going to ingest the script and generate it from that.
But I think these are important things to identify
because I know myself when I talk to people,
there's always a struggle to improve performance.
A lot of times people are like,
oh yeah, we don't have a performance team yet.
And they always start looking at open source,
say like JMeter.
Now there's nothing wrong with JMeter,
but when you're using something like JMeter. Now there's nothing wrong with JMeter, but when you're using
something like JMeter, a lot of these
designer features, let's
call them, because they are amazing, but they're
not 100% required just to execute
a load test. They're not in there.
JMeter is a lot more of a manual
process. So I think these are really important things
for people to consider when they're looking
at load testing tools.
I'm not here to make a commercial for NeoLoad.
We all know it's got a lot of great things going on with it.
But at the same time, when you're having those considerations,
what else is it going to get you?
If you're going to pay for your tool instead of going for the free,
that's going to get you to this point that you're talking about
where you have all these automation capabilities
as opposed to we have a checkbox of a load tool.
Now we have no time to do it because it's all manual.
So these are important.
Yeah, it's important for people to understand these contexts.
So thanks for explaining that some more.
Actually, I want to add one more thing here
because what is awesome is
if you do the correlation stuff for one app,
you're testing one app,
and usually in performance tests,
you focus on the happy cases
on the things that happen most in production are critical and you do not do all the functional
tests so it's probably a number a handful of test cases that you have but still let's say if it's
five to ten if you do the correlation one for once for one application the the IDs that you correlate for your first test case
are probably the same,
are highly likely for the second, third, and fourth test case
as if you have some, I don't know, order ID or something
and you're just ordering in different areas,
the technical parameters are the same.
So as soon as you've automated this process in a framework,
you even speed up your own test scripting initially not if you only maintain your script it's even in the script
generation you already get out a huge boost there in efficiency especially i guess because if you're
working with an organization they most likely share very common, let's say, frameworks on the
application side, like you mentioned earlier, J session ID or ESP net, they do
it themselves in a certain way.
And if you record it, if you define these rules once, then you can use it in all the
apps that are basically coming out of the same application teams or like application
teams that are building similar apps the same they can yeah this is this is exactly how how we're doing it so you can like for these frameworks you can define
a name i usually give it a name like the tested app and then i i don't know um i want to come up
with some name let's say i'm i'm not creative today let have Webshop A. So I have Webshop A framework and Mereload consisting of all the parameters that are necessary.
And I think for me, the way you explained that story,
and because you started on script generation or script maintenance is the big thing.
This was an aha moment for me when I saw your slides
because we are always pushing from the,
you need to automate your quality gates.
We have built all these great things around Captain
and you can pull in the metrics
from any monitoring tool for every test.
Now you can run your tests 50 times a day.
And then you said, well, I cannot start there
because I cannot run my tests 50 times a day
because it's so much manual effort
because right now we may use the wrong tools and therefore we have all this toil and that doesn't
give us the benefit of having quality gates in the end and so this was kind of for me that aha
moment and that's why I can really encourage everyone to look at your presentation and really
the last slide where you said start with getting better maintainable
scripts automate your script generation and maintenance then start with the next steps right
quality gates is important but you don't get the benefits if for the quality gate run you need to
spend two hours fixing your scripts so exactly that's the point um And to pick up on that, if you reach the point where you think about automating quality gates,
what is the next step?
If you say, well, I'm fine.
I do not spend actually a lot of time for script maintenance.
So how do you take the challenge?
What's the next step to building an automated quality gate?
So first of all you want you do not
want to start with a front-end end-to-end test that you want to automate as it's more difficult
in terms of the result and the automation of the result analysis so best practice would be
pick a microservice world is becoming smaller and packaged in microservices anyway. So this is awesome.
If you have a microservice there, probably a developer
or DevOps person is sitting behind there,
even being very grateful for your automated feedback
as those in this philosophy for microservice people
care about what happens.
The people who build the app are caring about what happens to it in production.
If they are SREs, they want to keep the error budgets on a certain level.
So they're really, really grateful if you can offer them to do that.
So first of all, go into discussion, communication with these people,
tell them what you want to build. And this is, of course, from the perspective of a performance
engineering or testing expert. If you have a center of excellence there, you probably have it
because performance testing on itself has a huge and steep learning curve
of all the things to consider.
So it's good if you have the center of excellence
and if you want to provide your performance tests as a service
to the DevOps people, to the guys building or folks building the microservices.
So you go into the communication,
you pick a service,
you tell them exactly what you're going to do.
You design a test speaking with them
and what they care about to identify the metrics.
And you do not break their builds.
You do not go and somehow hack their pipelines
and get an automated quality gates in there?
No, the first step would be to automate the feedback
and provide it to them as a service as often as they want it.
So enable them to kick off your test.
It can be via Neolot Web on a GUI.
It doesn't need to be in a pipeline.
And if you've done this a couple of times and you
have automated the feedback loop then you can think about integrating it in the pipeline that
would be the next step now to maybe uh if it sounds a little bit too spooky for the the performance
test is out there like what is he talking about? How do you automate the analysis? It doesn't work at all. So if you're thinking about that, I want to
catch you up here. So Andy
showed us a way how we could use
Captain. And so we were working together
a lot, thinking about how Captain could help us to build an
automated quality gate.
And I want to break it down to some simple statements.
So Captain can help you to build a final score on your metrics.
And when I say metrics, it's everything
that you define in a load test anyway.
You probably have your load testing dashboard,
and you print somehow visually
what how your test was performing you you build your graphs you check your cpu level you check
your error rates you check your response times you check whatever you want and then there is an
automated way captain offers it to combine all these things into a score. And again, you are able to define
the goals and the parameters on that. But once you have done this step, then you can make the
decision simple again for others. And that is the point. Because you are the expert as performance
engineer and no one else, if they are not doing this like for months or decades
they don't know what to look at like all the different hundreds of metrics that care that
you have to care about in performance testing so you take this away from them you just say okay
this is predefined i have thought about this and i offer this to you as a service and you can use it as much as you want.
That is the idea.
One thing I wanted to touch upon that you just mentioned in there, right towards the
end, you said all the hundreds of metrics that you track, right?
And if I go back, or most of us go back to pre-APM load testing, we're talking about
tens of metrics. Post-CPU, process CPU, total response time for the transaction,
exceptions or errors maybe. Maybe not even exceptions necessarily,
just errors. Very high-level views
into this data. And when you add the APM side, like Dynatrace
for instance, you can get into more microscopic measurements.
We can look at CPU time per microscopic measurements we can look at cpu time per
transaction we can look at the response time from service to service we could look at number of
database calls time spent in database right so many other different kind of metrics which then
leads someone to think oh my gosh if i expand into hundreds of metrics how the hell am i going to
manage all this i have a hard time enough copying and pasting my metrics into excel and
creating grant you know all this but that's exactly where this comes in right you can expand
that complexity and automate the analysis so that you don't have to deal with that number but you
can get such richer responses you know we always talk about finding failing early finding the
problems before they get too big so if you're're running your code, even, you know,
one thing I look at a lot when we're in our demo environment is I'll,
I'll go to the, um, uh,
multidimensional graphs, Andy,
and I'll show response time by transactions by lock time, right?
And it's always zero. I'm like, great. It's zero. It should be zero.
I should,
I want to be able to track that at zero with every freaking build because if
it suddenly is not zero,'s wrong you know something you would never normally
think about doing you now have these kind of automation pieces open with combined with the
ability to observe that those data that those data points now really open the world to making
performance testing so much more powerful and so much more informative. And I always say, although I'm really, really glad to be on the sales engineering side,
if I was back in it, what a great time it would be back to be on the performance side
because there's so many awesome things you can do.
Anyhow, sidetracked there.
Well, not really sidetracked, but just wanted to focus on that hundreds of metrics
because it's really, really important.
Yes, it's absolutely right.
So these hundreds of metrics,
just for the people who are not now
from the field of performance testing,
you can actually divide them into three areas.
That's what I do.
That's my definition.
So you care about three things in performance testing.
You care about stability.
So you have metrics to measure your
stability error rates crashes and stuff like that this is one pillar and one thing to look at and
there is a tree of hundreds of metrics that tell you how your availability and stability is and
then there is the second thing obviously for performance tests are your performance metrics
like response times etc and the third thing you care about is resource consumption.
Because if you want to transition from one stage to the other
or upgrade or update an app in production, you want to know three things.
You want to know, does it crash?
You want to know, is it getting slower?
And you want to know, is it getting more expensive?
And those are the three things that performance tests can answer and a lot more of course but i want to break it down
to some simple things and below that there is hundreds of metrics that you can care about and
the awesome thing that you mentioned i want to give an example for this is um it's recently
something it's not if you I want to actually encourage the performance
testers out there, you know, because I was talking not to other performance testers,
not to tech people.
I was talking on C-level about this of a very, very big company.
And they showed them the automated quality gates and they showed them what we can do
with it. And this specific use case was to build
an automated quality gate to track the expenses from the mainframe. So you can put this into a
metric, the number of transaction calls you do or your microservices does, because this is directly
translated to your license cost. And if this increased by a factor of 10,
you would like to be informed, of course.
And by combining these approaches now,
you're able to do that.
Because what you mentioned is,
as soon as you have Dynatrace,
where you pull out your metrics,
you have this information there
and available of everything that happens below the surface.
If you're just taking the information from your load test without manually gathering performance counters
or setting up some automation to do this from all the systems you care about,
then you only have end-to-end response times and error rates.
And even defining all these performance counters, the information
you need to grab manually, it's
toy like and you don't want to do that.
So if you have an APM solution
that is watching everything,
you want to make use of that
as much as you can.
And I like this idea of the three pillars.
The last thing I wanted to say here was the pillar
of cost, the expense
is, at least for me, when I was in the low-testing world, I always loved bringing news of something breaking.
Because it was like, I did my job, I found something.
Which is bad news for everybody else.
But when it comes to the expense thing, your example, even with the mainframe,
make sure to
highlight where people improve that cost because if you think about that mainframe one for example
which is very concrete in dollars and cents right of course compute spend and aws and all it's a
real thing but it's not as easy to show but that mainframe especially if you were to turn around
and be able to go to your development team or their bosses and be like, Hey yeah, test passed. And we reduced costs
as aimed by 15%. That's something everyone will celebrate. And then that'll ingratiate,
Hey, you know, Brian, the performance engineer guy helped prove that we did our job and,
you know, save money for the company. So people will then like you more because you're not
those at the bearer of bad news. It's, you know, when it comes to money news too, people
love it even more.
So a great way to make yourself more popular.
And I want to encourage, actually, everyone else who is listening
and doing similar stuff to do this.
What I've learned is that it's highly underrated.
The performance tests and what it can give you or any company,
it's such a huge thing to do and there is so many aspects where it goes into it i i cannot even i'm missing the words as you
can see but um you brought it to the point um what people understand is if you say, I'm coming faster, that is nice.
But if you say, we are not getting like this more expensive, we are even getting cheaper.
This is something that a lot of people care about.
And so it's good to make this transparent as much as you can.
If you ever have the possibility to put a price tag on your performance tests, what you actually saved, do it.
Go ahead and even tell me about it because I love to hear these stories.
So then I have a challenge then for all of us.
You said it's not easy to show the value always from performance engineering to the company.
I think adding cost is great.
But the other thing that you mentioned in the very beginning,
you're really excited now about the topic,
and that's SRE, site reliability engineering.
So maybe is this the chance for us in the performance engineering community
to kind of use this new kind of hype that was thankfully created by Google?
Because in the end, site reliability engineering is also not magically new, right?
And can we use that hype and really latch on to it and say,
hey, you know, we need to do site reliability engineering,
which includes obviously performance engineering,
because in the end, as you just said, you're testing for reliability, you're testing for performance.
And then I think this is the great next point, because I've not seen Google, or at least
not the best practices around SLIs and SLOs talk a whole lot about costs.
And so maybe we can say, hey, we need to do SRE, but with our experience from performance
engineering, we want to elevate it even to the next level.
It's reliable, it is performant, and we're saving you costs.
Absolutely.
So this is something that I'm looking into really deeply now because I see huge synergies.
SRE is such a hot topic
and a lot of people are talking about it.
And what I do is I'm picking out
the cherries of this concept here.
SRE is huge.
There is stuff like you should do postmortems, etc.
And what you mentioned is SLIs and SLOs.
And if you think about it,
what you do as performance engineer is exactly this. So if you are thinking about doing SRE
in your company, then you will think about error budgets and you will think about how to measure
those. And this is exactly the job of a performance tester. He's probably doing this in test stages for decades.
So it makes so much sense to combine these two.
And even SRE is a way to do DevOps for me.
That's how I define it.
DevOps is a philosophy.
SRE is some specific concepts to make DevOps happen.
And I think it's a really awesome thing to do to get a performance engineer
and to provide this feedback loop. It's actually as I'm not at Google, I can tell if they do this,
but the job is at an SRE is everything also that happens around writing the code lines.
So everything that happens after it and then till production of
course they are also deaths but it's all everything that matters is everything that comes for
resilience so this is actually what a performance test does and i've been thinking a lot about this
to frame performance testing in a new way because it fits so perfectly into the entire idea
of what SRE is about.
Because, of course, you want to measure your SLOs, etc.,
in production.
But first of all, you need to think of what are your SLOs.
And if you want to keep your error budget low
and not always break it,
you need to do tests beforehand.
And if you do the performance tests, it's like you need to do it anyway if you define
your metrics.
And you already thought about this in test stages.
You thought about this in performance testing.
And these are the people who thought about it for decades.
So you can take these metrics and put them in production to
calculate error budgets. Basically the modern performance engineer
then is an SRE that enables an organization to always be able to deploy because the error budget
is always under control. Because that's what the whole concept of error budget
is about.
The error budget tells you, do we
have enough budget left for another deployment?
And are our deployments safe enough
that they don't eat up more error budget
than we have available?
And therefore, the modern performance engineer
is exactly that.
It's defining the right metrics, the right SLIs and SLOs
to measure the error budget and then do whatever it takes
to make the system more resilient
and also make sure that now bad deployments
can actually then impact your error budget in production.
Exactly. That is what I thought about.
And of course, you may want to start there
with the most critical things.
Maybe don't automate quality gates
for things that are running smoothly for decades
and are fine on each change.
But if things already break now and then every time,
then this is probably the way to implement
the safety mechanism of an automated quality gate
to get better feedback loops for the developers. That is what I'm thinking about. I have even more here to postulate.
And I don't know if you have ever thought about this. If you have Dynatrace available,
I want to stick with APM for a minute. If you have Dynatrace at the customer site and Dynatrace is being sold
to monitor everything
and all the cool things that you can do with it,
probably it's a lot to take for new customers.
That's what I see.
To get the most out of it,
this is actually something that we are working on
to unleash the full power
of what you can do with
Dynatrace. And this is not from a technical perspective, actually. What we see is that
the processes around Dynatrace, the organizational processes are missing.
So Dynatrace actually, in my opinion, can help you to do better DevOps.
And why is this the case?
Because if you think about SRE, what is SRE and what is its core principles and what are the things to focus on?
Dynatrace is actually a platform that offers a lot of things to do SRE.
So you can build around processes there.
I want to give you an example.
So you care about availability.
You have synthetic monitoring in Dynatrace.
You care about SLOs.
You just implemented this.
You care about postmortems.
Is there any better way to discuss what happened
than with the Davis root cause analysis?
No, I can't think of anything.
So I want to postulate today Dynatrace as SRE platform, please.
I like that.
And I'm pretty sure that in case any one of our sales engineers is listening
or maybe even participating in our conversation here,
you can take this and use this in your next sales pitches.
Yeah, absolutely.
This is really cool.
Roman, I know we can probably talk more because you've done not only your work at Ergo, the company you were kind of featuring at before, but also work with other organizations. But maybe this is something for another episode because I'm sure there's a lot of stuff
that you have done over the last couple of years
that will also help performance engineers
to become better and more efficient.
Kind of concluding for today's talk,
is there anything else you want to kind of tell
the performance engineers, any material,
the way to follow up with you,
anything else we want to make sure people understand?
So if anyone is happy to talk about performance,
just check out our company.
It's Triscon, spells T-R-I-S-C-O-N, or just contact me on LinkedIn.
I'm always happy to talk about these topics and also to show some demos um apart
from that if there is like a little bit more into the future if you ever come vienna to vienna we
can also i'm happy to invite you face to face for a face-to-face meeting to have some coffee
together it's a nice view from our office over vienna so you you will probably remember that if you come by. And yeah, what is actually what I want to tell people?
So I want to speak to the people who are frustrated out there,
who are listening to your awesome DevOps approaches
and all the cool things that you can do and that Kepton offers.
And what I want to tell is that probably these awesome approaches are not,
it's not possible to implement them straight away
and we're not living in the perfect world.
But what I have learned so far
is that you can pick out the pieces of these concepts
to make your own life better.
So this is what we did for our performance tests.
We applied SRE principles.
We automated our manual work away to be more efficient.
So you can always start there by yourself.
And then you can look around and find like-minded people.
The bigger your company is and the more rich it is.
And even if you have still these huge silos where you have dev and ops
completely separated and there is apps thrown around the wall
and from dev to ops and people don't care anymore,
there is people out there who are like-minded.
I suggest that you connect with them.
And what I'm trying to do now is to apply DevOps approaches
at a customer site where there are still these silos
is to gather a team, a virtual team,
not an organizational team,
of the people who are interested in everything
that happens around the code that is being built.
So just in an SRE-like manner to build up these virtual teams
and to talk about how you can implement automation,
how you can automate your performance tests.
So to conclude, what I want to tell is that if you've been doing this for like tens of years and you think that this all automation stuff, it's not happening in my company, it's way too complicated, no one cares.
It's not true.
I'm sure there's people there who care.
And if you care it yourself, you can apply the principles for your daily work. And I suggested you connect with the very same people in your company and to have a really deep conversation of what applies in your corporate world and what's possible there to break down the silos.
I had to take a lot of notes now at the end because this is really good.
I think we should take a blurb out of the last like two minutes because you
started like i want to speak to the people feel the frustration it's pretty good really good
actually it's good that you mentioned it when i when i started this sentence what i actually
wanted to conclude with is there is so much there's so many awesome things going on you
and the field of performance testing
is just becoming more and more important.
The stuff that you do is so extremely important
and it's going to be seen more and more.
And by combining it with principles like SREs,
which are hot topics,
which people know that it is important,
it's getting now framed in the right sense.
So people will see how important this is.
This is actually my personal mission that I'm on.
One last thing I would add to the frustrated users is we talked a lot today
about features in Neotis and in Dynatrace.
And of course, we like these two tools.
A lot of times people don't have these tools available.
Maybe you have other tools that have equal capabilities.
Good chance you have other tools that have lesser capabilities.
And it shouldn't dishearten you to say,
well, I can't do all these things.
I don't have that correlation engine.
I can't do X, Y, or Z.
There's some level of something you can do.
Even like, let's say you're using JMeter,
and I don't know if JMeter has a correlation plugin,
because I know there's a lot of plugins.
Somebody might've written one for it.
But even if it didn't,
you can at the very least try to speed up your automation.
If you have a known list of parameters
that you want to fix,
you could automate, you know,
if you use the SED command set or something, just to do a find and replace on these things in your script
just to speed it up. Small things here or there that would show some benefit
help speed you along. Might even be able to eventually build up a use
case to take to your upper management to say, hey, look, I put all this effort in some of this manual stuff.
Look at the small improvement we got from this. If we were to spend the money on, let's see,
a Neotis or something like that that has all these things built in and more,
we can get a benefit and that can help bring you there. But there's probably at least some places
you can start with your tool set if they don't have all the fancy things in there. So don't be
disheartened if you're hearing all these features that you don't have. Take a look and get creative.
And again, you'll just need the time. And that's going to be the trickiest part is finding that
time to break out of your daily routine to start looking at these improvements.
All right. Andy, any other thoughts on your side? I kind of hogged this one a little bit today, huh?
No, it's all good. It's all good. I'm glad that you talked a little more today than I typically do.
No, other than that, I know we will have a continued partnership here with Roman and drive
and influence the performance engineering community in the future
and encourage more people to join our community.
Just looking forward to the next event.
I think you are probably also part of a part of new to spec.
Yes.
And I'm talking about one of my, I'd say, favorite topics there in terms of bugs and nasty things that you can track down.
And as fate usually is, I'm talking there about concurrency issues.
And right now, today and the last couple of days, I am still tracking down some really nasty concurrency problem
that is causing huge pain at the customer side.
So yes, it will all be about concurrency testing at the PAC event.
Make sure to join in.
Awesome.
All right.
Thank you, everybody, for listening.
Andy, thanks for doing this with me every week. And Rowan, thank you you everybody for listening. Andy, thanks for being,
doing this with me every week and Rowan, thank you so much for sharing.
This was amazing. I always love when we get to talk about performance.
And thanks everyone for listening. If anybody has any questions, comments,
you can reach us at pure underscore DT,
or you can send us a email at pure performance at dynatrace.com.
Roman, I forgot, did you announce you mentioned something
about people can follow you
or reach out to you?
How should they contact you?
Or if they wanted to follow you?
Sure.
If you want to contact me personally,
probably best way
is to search my name
on LinkedIn.
Spells R-O-M-A-N-F-E-R-S-T-L.
Great.
Happy to talk to you.
Awesome. Thank you so very much.
And we'll be back soon.
Thanks, everybody. Bye-bye.
Bye-bye.
Bye-bye.