PurePerformance - Shift-Left Load Testing is a LIE with Hassy Veldstra
Episode Date: July 5, 2021In his SLOConf talk Production load testing as a guardrail for SLOs and in his blog Production Load Testing, Hassy Veldstra, founder of artillery.io makes the case for load testing in production. It h...elped him in various organizations to establish SLOs (Service Level Objectives) and change the way engineers think about performance. He got inspired by Building Evolutionary Architectures which introduces the concept of performance as a fitness function.Tune in into our conversation, hear our arguments pro and contra load testing in the various environments and learn why in the end we agreed on the fact that SLOs – while nothing really new – are a great chance to re-define performance engineering.Linkedinhttps://www.linkedin.com/in/hveldstra/SLOconf: Production load testing as a guardrail for SLOs - by Hassy Veldstrahttps://www.youtube.com/watch?v=Y20K1mJB6tkBlog: Load testing. In production.http://veldstra.org/production-load-testing/Artillery Websitehttps://artillery.io/Book: Building Evolutionary Architectureshttps://www.thoughtworks.com/books/building-evolutionary-architectures
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance.
My name is Brian Wilson and I'm back.
And of course I'm here with Andy Grabner.
And Andy, thank you so much for filling in for me last recording because I sure missed a good one.
How are you doing, Andy?
Good, good. I gotta say, right. We have a,
I think a challenge now because I know you had a good reason for skipping the
last recording, but you know,
I am skipping the first half of most likely our most important soccer match in
Austrian history,
because we may make it to the next round of the Euro cup.
But I think it's worth sticking here with you guys
and talking about something that is obviously dear to both of our hearts.
The difference is, though, is that I could have lost my job
for skipping my work to do the podcast,
whereas there's only social consequences for you.
Yeah, maybe Austria takes my citizenship away.
But didn't they lose the last game?
Didn't they lose the last game you watched?
So maybe you're missing the first half
or give them the edge they need to win.
And if that proves true,
then what I predicted here is 100% true.
And I therefore know how these things work.
If it didn't, then I was wrong.
Otherwise, yeah.
That's true. Good point.
We'll see. We'll talk about this next episode.
I wish you luck.
I'm sure by the time this airs, everything will be no, but I hope the game goes well
for you and hope you have a great time later on.
And thank you.
Thank you for giving ourselves and our guests your time today, your valuable, precious time
away from a football match to come talk about our favorite topic, performance.
Exactly.
But you know what?
I want to let him introduce himself with a line that I got from his recent presentation.
He presented at Open at the SlowConf, and one of his arguments was shift left load testing
is a lie.
I saw that, and I wanted to talk about that.
I thought about that, too, because we've been promoting shift left load testing is a lie. I saw that and I wanted to talk about that. I thought about that too, because we've been promoting shift-left load testing and test-test
earlier, doing it continuously in the pipeline.
To Chris for the first time, I think it said it's bullshit and he crossed it out real live.
That's my first time cursing on the podcast.
There you go.
So, Hasi, welcome to the show.
Thank you.
Very excited to be here.
I love that we're starting with just going straight to the controversial stuff Thank you. Very excited to be here. I love that we're starting with, you know, just going straight to the controversial stuff.
Yeah.
Why not?
So maybe you want to quickly introduce yourself, who you are, also why you spoke and what you spoke about at SlowConf,
and then why you think your controversial statement is not a lie, it's true.
Sure. So my name is Hasi Belstro.
I'm the founder of Artillery.io.
We're a commercial open source company.
And what we're doing is we're building a modern testing stack
for DevOps and SRE.
A bit more controversy. I'll throw that into the mix.
We believe that the state of performance and reliability testing
in general is kind of stuck in 2015, 2016, and want to change that by building tools
which are cloud-native, have really strong focus on developer experience,
and focus on testing and production, which leads me to SLO Conf and my talk there,
which was about using production load testing as a guardrail for your SLO Conf and my talk there, which was about using production load testing as a guardrail
for your SLOs.
And yeah, it turns out, so before founding Artillery, I was a consultant for many years,
consulting for companies like the Trainline and the Zone and Condé Nast, helping them
implement DevOps and SRE practices.
And as I discovered, which formed the basis for my SLO Conf Talk,
SLO, Service Level Objectives, and Production Load Testing,
just happened to go really amazingly well together.
Yeah.
So let me ask you a question.
I really like your talk, and we will link to it.
I also like the format of SlowConf because it was really short i think like yours was like eight to nine minutes to the point um you make it you made a
very good point before you had slos with i think what was the company you were you the main story
was conned uh condenast yes so condenast are an international publisher um they uh they run
i wouldn't know probably close to 100 publications all around the world.
So, you know, magazines like GQ and Vogue and Condé Nast Traveler, huge, huge scale.
They serve about 300 million uniques per month, you know, across all those publications.
So, yeah, that's, you know, that's the experience at Condé Nast is what formed the basis for my talk at SLO.com.
Yeah.
So what you said, and I have the slides here open, you said before SLOs, people are paged constantly with symptoms, right?
That something was restarted, something is very slow, but there was no clear indication of what was really the problem because there were no SLOs at all. And then you started forcing people to actually use SLOs.
And first of all, think about what are my objectives for my services?
Now, here's my question.
You said SLOs and performance testing and production go very well in hand.
That's great.
But people have done load testing before.
SLOs were a big thing.
Brian and I have been doing load testing before.
This was a big thing.
And I believe we've also tested against a certain objective
because you ran a test in pre-production
and then you report it back.
Either nothing works at all,
even with only a fraction of the load,
or here are our KPIs that we can now validate
the system can kind of handle.
Or like, this is the load, the throughput.
Aren't these SLOs as well
stuff we had before or not what do you think um so I think there's several things there that would
be great to actually unbundle and zoom in on but um yes I agree with you uh SLOs are basically KPIs
right um the way I think about SLOs is that the idea is really simple. I mean, it's common
sense. If you run something, if you run it in production, you need some kind of a success
metric. You need some kind of a way to evaluate whether things are working or not. And different
teams throughout the years have basically reinvented that concept in their own way.
So I don't think the technical idea of SLOs is that groundbreaking or interesting or exciting.
What's really exciting and what's really powerful
is all of the common language
and all the common concepts
that come packaged with SLOs.
All of the rest of the SRE is a discipline.
So that's where I think the real power of SLOs comes from
because it gives different teams, different departments as well, this common way of speaking about performance and reliability.
It makes it so much easier to do that.
And I think it also helps that it originates in Google.
So it comes with a bit of that halo effect, which makes implementing SLOs so much easier, because if Google are doing it, then it's probably a good idea.
Kind of, you know, I imagine most people that have been in the industry
will probably agree.
Most of the challenges when it comes to building and running systems at scale
are actually not technical.
They're personal and sociological issues almost.
And I think that's where SLOs are really powerful because they give a way of implementing change in organizations in a much quicker and more streamlined way than trying to invent something that's unique to a specific organization.
Now, when we talk about SLOs too, based on a lot of the conversations Andy and I have had on this podcast, it sounds there are sort of two types of SLOs. Google seems to be a little bit more focused on
the SLO as a measure of the end user. It's from the perspective of the end user. And that consumer
could be a human being, it could be another service and all, but it's things like uptime
availability, error rates, but something that you're going to feel whether or not,
you know, who cares, in a way, who cares. An SLO, from that point of view, of 90% CPU utilization
or something like that would be meaningless because what we're really focusing on is the
page being, is performant and it's being available. Others, though, have taken the SLO side of the
house and pulled it a little bit away from the end user and started putting some things like, we know that if our systems are operating within this range, things are good.
So they've abstracted it down a layer.
When you talk about using SLOs in production for performance, is it a combination?
Where on that spectrum, if on that spectrum, do you see those SLOs?
Yeah, so I would classify myself as someone who's bought into the Google way of doing things, you know, wholesale.
So very, very firmly in the first camp.
And I think I spoke about that as well in my SLO Conf talk. so when we kicked off that sre team at condé nast one of the first and the biggest um projects and
the biggest you know challenges that we had to solve was pager fatigue so people were getting
paged for things that didn't really matter in the end and we wanted to tie that to user experience
so all of the slos that we defined were then tied back to a real user experience. And that's what formed the basis for our new alerting and page duty setup.
And that's also what we then use to drive our production load testing.
So very much in that first camp.
Because Andy, I do think that's an improvement definitely over if you think about it.
As you were mentioning, our old SLOs from back when we were doing the testing of throughput capacity and numbers like that, where it's shifting it to user focus.
It's not that it's a complete departure from where we were looking at, but it's extending that to its natural endpoint, which, you know, I got to say up front, you know, I think that's a very good improvement.
And I think focusing on that end user is a fantastic idea for sure.
It probably depends. Yeah. I'd say it probably depends on the kind of system that you're testing as well. So let's say, I can imagine a situation where maybe you're a web server vendor, right? So that's your product. So in that case,
you wouldn't have a user,
you wouldn't have SLOs
that are tied directly to user experience.
You know, the consumers of the service
would be other systems.
So then your SLOs become, you know,
they move along that continuum
into, you know, the second count
that you mentioned.
And this was just what I wanted to ask.
So when you're doing load testing in production, are you still then,
even though it is for a large publishing company where clearly the end user
SLOs are the most important things,
are you still adding some of your SLOs in terms of, let's say,
how much capacity do you need for a certain user load?
Because in the end, yes, we all have, quote, unquote,
infinite resources available in terms of compute and storage.
But on the other side, everything comes with a cost, right?
That means, do you keep track of some of the SLOs, at least,
where you say, hey, with 1,000 concurrent users,
we used to need, I don't know two kubernetes nodes with these specs
but now we need three with these specs so in your world do you see that slos are also kind of
extended towards more let's say these efficiency metrics let's call the efficiency efficiency from
a resource perspective you add them as well yeah so we definitely track those metrics because they
are important you know one of the motivations for production load testing is to help you um
plan for capacity that you might need right in future given given a certain load profile
but in that specific case at least we didn't define them as SLOs so our SLOs were only based on real user experience
and then the way we use those SLOs was you know twofold it's actually quite interesting when you
think about it once we define those SLOs our production load tests were both let's say
bounded or constrained by those SLOs because we couldn't afford for production load
tests to affect those SLOs negatively because we're testing in production. So the SLOs were
a guardrail for production load tests, but then production load tests were a guardrail in a kind
of wider sense for the SLOs because we use them to build up that margin safety that I talked about.
So, you know, the goal was to get to a point
where at any point in time,
we could add, you know, 20% of extra traffic in production
whilst all of our SLOs, you know, stayed green
and nothing was negatively affecting real users.
So it's an interesting, almost like, you know,
yin-yang type of thing where the thing loops into itself.
Yeah.
And so I think, and this is now the
great way of explaining really what i think you mean with with this and what the benefit is because
we're talking about you have slos for your production system where you have real user load
but then you basically say we have a certain you know buffer right that means let's figure out
if we let's say next month we have a super
cool new product online or like with the publisher a super cool story and we have 50 more traffic
can we withstand that traffic and still be within our silos and this is exactly where
your production load testing comes in so beautifully because you can say yes we can right yeah exactly exactly and
so that you know one of the reasons that we wanted to put production load testing in place was
exactly that scenario so we would we would get massive traffic spikes regularly but not on
schedule because you know as a as a, things go viral all the time, but
there's no way to predict which one of the, you know, pieces of content will go viral
and then exactly how viral it will go.
So we had, you know, we did have a number of many incidents where, you know, parts of
the system would be partially knocked out by something all of a sudden, you know, going
viral in China, for example.
So that's, you know, that's why that focus on that extra buffer of safety
was really, really important.
Yeah, and I mean, SLOs are perfect for keeping that
within that safety box, essentially.
And what you end up with is almost like a,
you end up with a feedback loop,
similar to something like autoscaling controller, kind of, you can up with a feedback loop, you know, similar to something like auto-scaling controller,
kind of, you can think of it in that way.
Because you don't want to start with production load testing
at scale immediately, right?
If you want to do things safely,
if you really want to do them safely in production,
you start really, really slowly
and you build up that extra load very, very gradually.
And that's where having existing SLOs defined really helps you because if anything goes wrong,
that gets reflected immediately so you can back off and then go and try to find that bottleneck
that caused something to go red, fix it, and then repeat again and try again.
In your talk, you also mentioned that once I think people got,
developers especially learned about what you're doing,
more and more developers were asking for,
hey, can we do some load testing in production?
And your first response was, well, do you have an Azure load device?
Yeah.
So that's something we kind of stumbled upon.
So, you know, in my experience, at least,
developers in general love load testing.
It's exciting, right?
So you really get the chance to push something to its limits.
And I think developers get excited about that because things tend to break when they're, you know, stressed.
So that means it's always a learning opportunity and it's always an opportunity to improve something.
Production load testing, you know, takes that to the next level.
It's really, really exciting.
It's also partially exciting because it's dangerous, right?
So we discovered that that was a great way to get people excited about SLOs
because people get excited about production load testing.
And then, as you said,
if a team had a service that they wanted to include in that production load testing and then as you said if a team had a service that they wanted to include in that production load testing path or setup they had to have SLOs because otherwise we couldn't
load test their thing safely and that yeah that made that that sell so so much so much easier
one other question because I took some notes when I listened to your presentation, which, by the way, as I mentioned before, we will link to it.
So if you listen to this somewhere in the description of the podcast, you'll find hopefully a lot of useful links.
You talked about the importance of fitness functions.
And you explicitly pointed out that term.
And I think we covered it already a little bit.
But can you just, I like the term of a fitness function can you explain that just for people that have may
have never heard about it and what benefits that brings yeah so it's it's actually it's one of my
favorite concepts and um you know i have to credit the book which i think i linked in my uh in my
presentation called evolutionary architecture which is where I learned that concept.
And, you know, it's exactly what it sounds like.
Just like in evolution, we have things that evolve a certain way because they're exposed to certain, you know, stressors,
let's say, in the environment.
We can apply the same concept to software systems.
And one, software systems, you know, in the wild,
naturally evolve in response to some, you to some pre-existing fitness functions, which just happen to be part of the environment that they evolve in.
A classic example might be something like Conway's Law, where software, just like it happens in nature, we can then almost use that as a hack and define our own fitness functions, which we know will lead to a certain evolutionary path being taken in the evolution of our software.
So production load testing is one of those fitness functions. And, you know, in practical hands-on terms,
what that means is that if, you know,
if we're a team SRE and we announced to development teams
that we're planning to run production load tests,
all of a sudden, you know, there is a shift in thinking.
And any development team that buys into that, of course,
you know, not every development team, you know, might be happy with that. But any team that buys into that, of course, you know, not every, every development team, you know,
might be happy with that, but any developer that buys into the idea,
all of a sudden they start thinking about the code they write slightly
differently.
All of a sudden things like logging and monitoring become so much more
important because things might break in production and, you know,
you'll need to, you'll need to debug that.
And loads, you know, loads of good stuff happens if you define a good fitness function.
And production load testing, I think,
is one of the best fitness functions, maybe I'm biased,
that, you know, an SRE team could try to implement
in an organization because it will help push through a whole,
you know, a lot of good things that might be difficult to kind of convey the value of
otherwise.
I like what you said that basically you have with these fitness functions, you have the
power with some certain pressure on a certain part of the organizational software that you
have the ability to shape what's coming out of it.
And I think that's a nice thing.
It's quite similar to what you see with chaos engineering.
You know, there is this cooler thought even that production load testing can be seen as
a type of chaos engineering.
And chaos engineering fulfills that same function.
If you know, again, as a developer, for example,
that if I deploy my services to Kubernetes, let's say, for example,
and if I know that a number of those pods could be killed at any time with no warning,
I will start writing my code slightly differently to account for that.
Reliability then becomes one of the concerns that I actively think about day to day,
which then helps the overall reliability of the system.
Yeah, Andy and I have had quite a few discussions with Chaos,
and it definitely seems to be an outgrowth of performance testing,
a mutation of it, if you will, if that's a good way to talk about it.
But I think we both found it very fascinating because of its roots in theoretical practice.
But that brings me to, well, speaking about your talk, I wanted to give you kudos for
mentioning turtles all the way down because I love it when people mention that one because
I just think that's a fantastic story. But going back to this idea of any load testing that's not in production is a lie, right?
And I know it came with some caveats there.
You have your unit testing, your dev testing, and you also mentioned complex systems.
But I was trying to think about situations in which maybe that wouldn't be true.
Because I'm not being defensive on the idea i always love these i love exploring these ideas and i was trying to think about okay what
if we're doing like full-on blue green deployment um or we're one of the rare companies that
deploys everything in every environment the same so you have a full featured full scale
uh i think austria just scored a goal because Andy's freaking out.
I wish people could see what's going on.
He just got really excited.
I thought I was saying something that really excited Andy.
Of course.
No, your information was amazingly special for me this moment.
Oh, yes, yes.
Let's say you are one of these semi-unicorns or rare breeds that have a full production environment
in every, you know, in test, not in dev necessarily, but test and UAT and all that, or it's a blue-green
type of thing. What I was trying to think of, okay, what would be the difference there? And to
me, the difference there, even if you have the same scale, even if you have the same setup,
the big difference is when you're running your tests in production, your base load is non-synthetic
traffic, meaning any time we write load tests, they're fake. That's the part that I would call
BS, right? Because we cannot write what real customers do. And if you take a look at your
users, they're all doing different things. There's no, oh, well, 90% of our customers
click here, click here. I mean, there's always variation.
So you can never recreate exactly what that is.
So to me then, I think the biggest piece is that in production,
you have the real crazy wild things real people do.
You're adding those, let's say, fake users on top of that to fill in the gap.
But your base is so 100% accurate that that's going to be as close
as you can get to a true picture.
Whereas even if you're doing that
in a green environment,
which is exactly the same,
you don't have that real user chaos
in there to begin with.
Is that part of your angle
or what's the angle
that really makes you so strong
about feel so strong about this
or some multiple things?
It's multiple things,
but you know,
that is definitely the angle. I
don't even really have anything to add to what you just said. That's exactly it.
You need real traffic as part of your test setup. One of the reasons I think that load testing is,
if you compare it to other kinds of testing, it's quite a niche activity almost. And part of it is
because it's really, really difficult to write load tests,
which are representative of real traffic.
It's extremely difficult.
It's very time consuming.
And people still try doing that.
And they try running those tests
in a non-production environment.
So one of the arguments is,
why are you doing that?
Why not just go to the real thing and do it there?
So that's one of the that's
one of the aspects of it um the other one is you know i'm really glad to use the word unicorn for
you know a company that might have a staging environment which is exactly like production
uh just like a unicorn i'll believe that when i see it um you know i i've yet to see it myself.
It's just, you know, just even trying to think about a situation or a scenario in which that would be possible,
I can only come up with extremely, extremely kind of limited
and small-scale almost deployments where you could do something like that.
And anything non-trivial, I personally don't think is even possible.
In theory, with a 12-factor app and all, you should be able to, right?
But again, that's the point, is who's actually doing that?
And if you're, yeah.
Yeah, so what I would say here is that I think that low testing
is fundamentally different from other kinds of testing. It's
fundamentally different from unit testing, from end-to-end testing, from integration testing.
And it's different in that something like a unit test, to take an extreme example,
is testing logic and nothing else. It's testing the code and it's testing the things that that code does.
And that's why unit tests are relatively easy to write.
And that's why they're very easy to run in completely different environments.
I can run a unit test on my laptop.
I can run it as part of a CICD pipeline.
Someone else can run it on their laptop.
They're very isolated, so end up being very reproducible.
If you think about a load test,
we're not actually testing the code and the logic
as much as we're testing the environment in which the code runs.
They don't load test operational characteristics of a system
and not just part of the code that's in that
system um which you know makes them fundamentally different which means that you cannot divorce
a load test of a system from everything that makes up that system and that would be code
that would be infrastructure that would be all kinds of configuration that makes up that complex system.
Maybe to make an analogy, let's say if you're in a team that is designing a new kind of
four-wheel drive all-terrain vehicle, so your unit test would be something like
my key fits into the ignition, the button that i used to roll down the window can
actually be pressed your integration test might be something like if i put the key into ignition
and turn it the engine starts so then if you if you extend that analogy to load testing and you
say okay so how would a typical team that doesn't do load testing production might load test this
thing well you know someone will get into the car they'll start the engine they'll go into first gear maybe second gear and they'll drive it
down the road for one minute and the road will be you know perfect tarmac absolutely smooth you know
nothing like nothing weird if you do that like if we you know if we talk about a real world
scenario that sounds ridiculous nobody would do that if you want to really load test or stress
test this thing you would take it into the mountains you would drive across you know swamps
and fields and forests you test it in the actual environment that it will run in um i.e you know
something that is close to production or is production but for some reason we don't like
doing that for software systems which doesn't really make sense i like your analogy with the
car so let me challenge you on this what if you do this load test where somebody drives it you know like five minutes around every
day and then every time measures how long it took uh how much the tires were used afterwards how
much gasoline was used maybe usage of the individual parts. And if you do this from every day,
and you see it's constant, that's great.
But all of a sudden, maybe for one day, you say, hey,
we use 20% more fuel, even though we
were driving down the same road with the same gear.
Something is wrong.
So I think this is where the shift left comes in.
Whether you call it load testing or performance unit testing,
that's obviously up to you. But I think this is why we've been also very strong and we have some very strong
opinions why we believe that load testing can be shifted left right i'll counter you right there
because i'm on your side you know that but what i'll say for for the sake of this is that you're
you're collecting some additional stuff but you're still not hitting you're you're
still not hitting the mountain now all this shift left stuff is going to be extremely beneficial
because there's a million other things you might catch before you try it out on the mountain
that yeah maybe there's going to be something like if you think about you know rocket rockets going
up right you want to find every possible little fault before you put it on that launch pad and do
that that load that production test so that only something when it's actually lifting off the earth is
going to be detected.
And the shift left is finding all those other things that could possibly go wrong with as
much confidence so that when you're getting it production...
I personally believe it's just like, do we pick Kubernetes serverless?
Do we pick service fabric or whatever?
It's what are the goals of the test?
What are the goals of the organization?
And finding a best balance of what needs to be done for that.
The ultimate goal, obviously, would be to be collecting the telemetry,
at a very minimum, collecting the telemetry in production.
But then being able to add those additional tests to bring yourself to the edge of those SLOs,
I think is that last mile for absolute completeness that, okay, we've done everything
we can before production. Now that we're in production, let's hammer that last bit and find
everything. Because one of the things, if you think about it, maybe you have a Kubernetes system
running, but you're still hitting a traditional database. Well, any testing before that real load,
especially unit testing, integration testing and everything, is not going to be stressed.
I mean, your database, if you have a traditional database, is a very finite resource. You can't
just scale. We know in production, I can throw money at the problem. I can put more pods,
I can spin up a new node, I can do whatever. But if you have one of those bottleneck databases or
something else like that in the system, unless you're going to pay for another license to have the full-scale database in pre-prod and the real user scenarios, you're not being able to test what happens if we scale up three more nodes and 100 more pods.
Because we can do that, but can the database handle that?
And I think that's where those little bits come in. And to your point, Andy, there might be things you can do to test
80% of that or whatever percentage of what that might be in pre-prod. I think it's just a comment.
I think it's an ad. And now I'm speaking for you, Hasi, or at least I feel like I'm...
I want to hear from you because I got my own opinions, but you're our guest. Please,
I'll shut up now. Well, I think, yeah, Andy made a great point.
And I think it does demonstrate that there is value
in shifting some of that performance testing left.
But my counter argument to your counter argument
would be that what it comes down to is confidence
and the fact that testing and production
gives you the ultimate confidence
and not testing and production can give you false confidence,
which can be extremely dangerous. So to take it back to our, you know,
car analogy, you're,
you're running this vehicle on your test track and you're measuring,
you know, petrol consumption and how much the tires are wearing and all that
stuff. And you get to the point where, you know,
the car is extremely fuel efficient and then you take it for for its first drive out in the mountains or
in a swamp and you drive into a puddle and a bit of water gets into the exhaust pipe
and blows out the engine and you never tested for that right um so that's you know that's
where production load testing really shines it lets lets you identify and see how your system deals with unknown unknowns almost.
Yeah, no, I agree with you.
I mean, I didn't want to say, I mean, I love your idea.
I just want to make sure I want to make a case that people should not start thinking
about load testing only in production, because I think there's definitely
many, many use cases that at least I can think of
where it makes sense to do some level of performance testing before.
But I have one question to you then,
because it still means you need to write performance scripts.
You still need to figure out what type of load are you simulating.
So now here's my question to you.
Why go through the effort in writing test scripts that simulate 20 on top
of your production load and not just you know mirroring the traffic or duplicating traffic
you should say hey i'm taking certain user traffic and then i'm duplicating it to another
environment or something like that is that something that you've thought of or is this just
very hard because duplicating traffic
is just as hard as or probably even harder because then you have duplicated entries in
the database or i don't know yeah that's a great question and um so there are three main ways to
do load testing in production or three types of load testing in production uh one is traffic replay
um you know the thing that you're talking about. The other one is dark traffic or dark launches.
When you launch a system and it's in production, it's being hit by traffic, but it's not actually visible to end users.
So Facebook have, I think, a white paper where they talk about testing their messenger service when it first launched in that way.
And then the third one is creating synthetic traffic.
So traffic replay is
brilliant and it can work really well but it only works for a certain type of systems
so it can only work well for a system where your requests are they have to satisfy two properties
they have to be idempotent they have to be commutative so idempotent means that the same
request can be replayed you know any number of times and get you the same result.
They have to be commutative, meaning that the order of requests doesn't really make a difference.
And if we think what sort of, let's say, websites that translates into, it will be very content heavy websites.
So like newspapers, magazines, which don't offer personalization.
Websites that maybe have classified ads,
again, no personalizations.
And yeah, really not much else
that I can think of at the moment.
Should we then advocate,
like we advocate for two effect reps?
Should we advocate for load-testable APIs
or load-testable systems?
Meaning, you know, in the arc, like design for load testing?
Yes, yes, 100%.
That's a great shout.
Yeah, we do need to.
We have to if we want to do production load testing, especially.
And, you know, the idea is not as strange as it might sound at first,
because as developers, we already instrument our code
for other operational properties like monitoring,
like observability, logging, security.
So why not do the same to make load testing easier?
And in practice, what that usually means
is that you put in a way to know certain operations in your code.
So a classic example would be something like a checkout flow where you add in a way to not actually charge a card.
So that way you can, you know, low test almost that entire flow without actually triggering a change
you know in in the real world um and that's um so just eat for example um i i saw a talk from one
of their engineers a couple years ago uh they do low testing and production all the time um at peak
as well which is really impressive and that's one of the things that they talked about. There is a way in which their load testing code
can basically go through the full checkout process,
but by toggling certain flags,
they know not to try to charge a card.
Yeah.
It's like feature toggling,
but used for production load testing.
Yeah.
I think a similar thing would have to be done for, let's say, users.
If you have a subscription-based service
and you would have to have users in production to use in your test
that you log in and authenticate with,
from a business point of view,
you can't have those counted as your user base
or when you're reporting to your investors on how big we are.
So there would have to be a way designed into the system to be able to easily count those
people and or identify those accounts and not include them in the business reporting which
means more people have to be thinking about this ahead of time and plan for you know you're it's
not going to be like hey andy go test this in production and you're just going to go hog wild
because then that's just going to throw off a lot of different things regulatory as you know like
so there's there's got to be an organizational plan it sounds like is that easy is that easy to
get people into doing like when you talk to people about this idea you know who is it that they're
going to to rally to their side to say we want to test this and we need you to change the system a bit so that we can do this. Who is that and how do you get that to happen? Yeah. So as a rule,
production load testing is a, it's a team sport, but you know, it does end up involving almost
every department in the company. In terms of how long that might take and what the conversations look like will depend on you know the culture i suppose and you know the the organization and questions
specifically but in my experience um it takes time but in general people are very very receptive
because um i think especially again in my experience, there tends to be intuitive understanding for why performance and reliability are important for any system that has real users, paying users.
People in general understand that outages and downtime and slowdowns are bad. So that makes production load testing a fairly straightforward proposition
because of that ultimate confidence factor that it gives you that things will stay up
and it will stay working if there's a traffic spike.
So yeah, those conversations are just conversations that need to be had so uh with you know folks in marketing folks in sales
customer support often needs to be involved um but there's usually there's usually not very
much pushback you might need to so in your in your um production load testing roadmap say
that you know you need to come up with as an sre team that wants to implement that practice
you need to put aside some time to have all those conversations and to get that buy-in but um in my experience it's just a matter of having those conversations
and it's never really you know it's never really a blocker yeah hey um thank you so much for
all the insights and for the good clan conversation on also the kind of that we,
we have opposites, not opposite views.
I think we also put the same cause,
but it's also always good to have arguments from, from either side.
I want to quickly give you the chance to talk a little bit about artillery
because I think it's a, it's a cool project.
We also have a captain integration, I think lined up or already implemented,
which is another case for shifting left
because Kevin is triggering a performance test
and then evaluating your quality gates.
But do you, for those people
that have never heard about Artillery,
just highlight why to look at Artillery?
What's the, yeah, why should I use Artillery
and not Gmetrix?
Yeah, so Artillery is cloud-native.
It runs in your own LBS account.
You can scale up your tests really easily.
You know, you can run tests at hundreds of thousands of requests per second,
millions of virtual users from many different geographical regions,
and you can do it as easily as running a test on your laptop.
That transition is absolutely seamless.
It's very easy to use.
We have a very strong batteries included philosophy.
So out of the box,
you can test a variety of different systems,
not just HTTP,
but you can do socket IO, web sockets.
There are plugins to basically test
anything else you can think of.
We integrate with monitoring
and observability systems out of the box.
And it's also designed to be very easy to extend.
So if you needed to do something that it doesn't do,
usually just grab a NPM package, write a bit of code, and off you go.
It's designed to be very hackable.
So that's it in a gist.
So that means is it? It just works.
Yeah, it's designed to just work.
As a developer, if you can run a test from your machine,
you change literally a couple of things,
and boom, all of a sudden it runs at massive scale
from your own AWS account.
And I see, you know, I can encourage everyone,
artillery.io, some good documentation.
I'm just looking at the test script reference for scripters
that want to see how it feels uh some great examples and uh yeah thanks for giving the
performance community another great tool to make their life easier i think that's what it's really
all about in the end thank you i really really enjoyed our conversation it was really fun
yeah good luck to austria the second half uh yeah let me let me just double check it's still 1-0 we are uh 45 plus
three so it's the end and the key slo here is obviously the final result which is 1-0 but i
can look at some of the other stats uh they had in the first half 11 shots and ukraine only one
three shots on target ukraine only one all possession is almost the same the ukrainians
are better in pass accuracy so but we'll see but overall it seems the austrians are shooting more
they also have eight corner kicks versus one so i'm confident for the second half so basically
this would be if the austrians were playing a scrimmage match against their own team and were
not using the ukrainian goalie that
would be the difference between prod and non-prod but now that they're going against ukrainian goalie
that's the big difference there and one last thing on this idea i think where at least i feel like we
netted out here was that it's not that shift left is invalid or anything else it just seems like i
would almost modify what
you're saying to say you're not doing complete and full and accurate 100 or as accurate as you
can get load testing until you put go take that last step into production everything else is valid
everything else is helping to remove uh to to add stability to the system but if you really want that
last mile and that solidity,
that's where you take it to production.
And then you're going to be probably as bulletproof as you can.
Obviously, we can never be completely bulletproof.
There's no way once we start simulating traffic
or even unpredictable events, right?
The funny thing is when I used to work,
I used to run the performance team at WebMD.
And I think it was i forget it was so it was in the earlier 2000s but the idea was like well what if one of us what
if someone from the government puts out some i think there might have been like some bird flu
and some swash this is way pre-covid right but like what if something like that something
emergency thing got put out can Can our systems handle that?
And looking back the last year, I crack up because granted now WebMD has a lot more competition, but back then WebMD was like WebMD, basically CDC at that time.
There was maybe one other competitor, but that's the kind of thing where you can't predict
is going to happen, which is why it's just so far.
Anyway, I'm rambling and I'll stop.
I want to let Andy get back to his game.
Thank you so much, Asi.
Anything you have you want to mention?
Anything coming up?
As I said, your favorite coffee
or Irish whiskey or anything else?
I don't know.
I'll just mention that I wrote
a really long article,
Everything I Know About Production Low Testing,
which is available on my website,
which is velstra.org.
I'll probably be linked
in the description of this episode and no other than that um yeah shift left shift right shift
both ways and use slos they're magical and test and test everything yes yeah thank you so much
anything else from you andy or are you just gonna running grab a pint to run exactly grab a pint and
root for the Austrians.
Let's see how long we can drag this out.
All right. Thank you so much for being on. And to our listeners,
thank you for listening. If you have any questions or comments,
you can tweet us at pure underscore DT,
or send us an email at pureperformance at dynatrace.com. As always,
if you have any ideas or you want to be on the show,
just reach out to us and let us know.
And thanks, everyone.
Bye-bye.
Bye-bye.