PurePerformance - Shift-Left Load Testing is a LIE with Hassy Veldstra

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and I'm back. And of course I'm here with Andy Grabner. And Andy, thank you so much for filling in for me last recording because I sure missed a good one. How are you doing, Andy? Good, good. I gotta say, right. We have a,

Starting point is 00:00:45 I think a challenge now because I know you had a good reason for skipping the last recording, but you know, I am skipping the first half of most likely our most important soccer match in Austrian history, because we may make it to the next round of the Euro cup. But I think it's worth sticking here with you guys and talking about something that is obviously dear to both of our hearts. The difference is, though, is that I could have lost my job

Starting point is 00:01:14 for skipping my work to do the podcast, whereas there's only social consequences for you. Yeah, maybe Austria takes my citizenship away. But didn't they lose the last game? Didn't they lose the last game you watched? So maybe you're missing the first half or give them the edge they need to win. And if that proves true,

Starting point is 00:01:35 then what I predicted here is 100% true. And I therefore know how these things work. If it didn't, then I was wrong. Otherwise, yeah. That's true. Good point. We'll see. We'll talk about this next episode. I wish you luck. I'm sure by the time this airs, everything will be no, but I hope the game goes well

Starting point is 00:01:52 for you and hope you have a great time later on. And thank you. Thank you for giving ourselves and our guests your time today, your valuable, precious time away from a football match to come talk about our favorite topic, performance. Exactly. But you know what? I want to let him introduce himself with a line that I got from his recent presentation. He presented at Open at the SlowConf, and one of his arguments was shift left load testing

Starting point is 00:02:20 is a lie. I saw that, and I wanted to talk about that. I thought about that, too, because we've been promoting shift left load testing is a lie. I saw that and I wanted to talk about that. I thought about that too, because we've been promoting shift-left load testing and test-test earlier, doing it continuously in the pipeline. To Chris for the first time, I think it said it's bullshit and he crossed it out real live. That's my first time cursing on the podcast. There you go. So, Hasi, welcome to the show.

Starting point is 00:02:41 Thank you. Very excited to be here. I love that we're starting with just going straight to the controversial stuff Thank you. Very excited to be here. I love that we're starting with, you know, just going straight to the controversial stuff. Yeah. Why not? So maybe you want to quickly introduce yourself, who you are, also why you spoke and what you spoke about at SlowConf, and then why you think your controversial statement is not a lie, it's true. Sure. So my name is Hasi Belstro.

Starting point is 00:03:06 I'm the founder of Artillery.io. We're a commercial open source company. And what we're doing is we're building a modern testing stack for DevOps and SRE. A bit more controversy. I'll throw that into the mix. We believe that the state of performance and reliability testing in general is kind of stuck in 2015, 2016, and want to change that by building tools which are cloud-native, have really strong focus on developer experience,

Starting point is 00:03:35 and focus on testing and production, which leads me to SLO Conf and my talk there, which was about using production load testing as a guardrail for your SLO Conf and my talk there, which was about using production load testing as a guardrail for your SLOs. And yeah, it turns out, so before founding Artillery, I was a consultant for many years, consulting for companies like the Trainline and the Zone and Condé Nast, helping them implement DevOps and SRE practices. And as I discovered, which formed the basis for my SLO Conf Talk, SLO, Service Level Objectives, and Production Load Testing,

Starting point is 00:04:15 just happened to go really amazingly well together. Yeah. So let me ask you a question. I really like your talk, and we will link to it. I also like the format of SlowConf because it was really short i think like yours was like eight to nine minutes to the point um you make it you made a very good point before you had slos with i think what was the company you were you the main story was conned uh condenast yes so condenast are an international publisher um they uh they run i wouldn't know probably close to 100 publications all around the world.

Starting point is 00:04:46 So, you know, magazines like GQ and Vogue and Condé Nast Traveler, huge, huge scale. They serve about 300 million uniques per month, you know, across all those publications. So, yeah, that's, you know, that's the experience at Condé Nast is what formed the basis for my talk at SLO.com. Yeah. So what you said, and I have the slides here open, you said before SLOs, people are paged constantly with symptoms, right? That something was restarted, something is very slow, but there was no clear indication of what was really the problem because there were no SLOs at all. And then you started forcing people to actually use SLOs. And first of all, think about what are my objectives for my services? Now, here's my question.

Starting point is 00:05:28 You said SLOs and performance testing and production go very well in hand. That's great. But people have done load testing before. SLOs were a big thing. Brian and I have been doing load testing before. This was a big thing. And I believe we've also tested against a certain objective because you ran a test in pre-production

Starting point is 00:05:48 and then you report it back. Either nothing works at all, even with only a fraction of the load, or here are our KPIs that we can now validate the system can kind of handle. Or like, this is the load, the throughput. Aren't these SLOs as well stuff we had before or not what do you think um so I think there's several things there that would

Starting point is 00:06:12 be great to actually unbundle and zoom in on but um yes I agree with you uh SLOs are basically KPIs right um the way I think about SLOs is that the idea is really simple. I mean, it's common sense. If you run something, if you run it in production, you need some kind of a success metric. You need some kind of a way to evaluate whether things are working or not. And different teams throughout the years have basically reinvented that concept in their own way. So I don't think the technical idea of SLOs is that groundbreaking or interesting or exciting. What's really exciting and what's really powerful is all of the common language

Starting point is 00:06:52 and all the common concepts that come packaged with SLOs. All of the rest of the SRE is a discipline. So that's where I think the real power of SLOs comes from because it gives different teams, different departments as well, this common way of speaking about performance and reliability. It makes it so much easier to do that. And I think it also helps that it originates in Google. So it comes with a bit of that halo effect, which makes implementing SLOs so much easier, because if Google are doing it, then it's probably a good idea.

Starting point is 00:07:27 Kind of, you know, I imagine most people that have been in the industry will probably agree. Most of the challenges when it comes to building and running systems at scale are actually not technical. They're personal and sociological issues almost. And I think that's where SLOs are really powerful because they give a way of implementing change in organizations in a much quicker and more streamlined way than trying to invent something that's unique to a specific organization. Now, when we talk about SLOs too, based on a lot of the conversations Andy and I have had on this podcast, it sounds there are sort of two types of SLOs. Google seems to be a little bit more focused on the SLO as a measure of the end user. It's from the perspective of the end user. And that consumer

Starting point is 00:08:19 could be a human being, it could be another service and all, but it's things like uptime availability, error rates, but something that you're going to feel whether or not, you know, who cares, in a way, who cares. An SLO, from that point of view, of 90% CPU utilization or something like that would be meaningless because what we're really focusing on is the page being, is performant and it's being available. Others, though, have taken the SLO side of the house and pulled it a little bit away from the end user and started putting some things like, we know that if our systems are operating within this range, things are good. So they've abstracted it down a layer. When you talk about using SLOs in production for performance, is it a combination?

Starting point is 00:09:00 Where on that spectrum, if on that spectrum, do you see those SLOs? Yeah, so I would classify myself as someone who's bought into the Google way of doing things, you know, wholesale. So very, very firmly in the first camp. And I think I spoke about that as well in my SLO Conf talk. so when we kicked off that sre team at condé nast one of the first and the biggest um projects and the biggest you know challenges that we had to solve was pager fatigue so people were getting paged for things that didn't really matter in the end and we wanted to tie that to user experience so all of the slos that we defined were then tied back to a real user experience. And that's what formed the basis for our new alerting and page duty setup. And that's also what we then use to drive our production load testing.

Starting point is 00:09:53 So very much in that first camp. Because Andy, I do think that's an improvement definitely over if you think about it. As you were mentioning, our old SLOs from back when we were doing the testing of throughput capacity and numbers like that, where it's shifting it to user focus. It's not that it's a complete departure from where we were looking at, but it's extending that to its natural endpoint, which, you know, I got to say up front, you know, I think that's a very good improvement. And I think focusing on that end user is a fantastic idea for sure. It probably depends. Yeah. I'd say it probably depends on the kind of system that you're testing as well. So let's say, I can imagine a situation where maybe you're a web server vendor, right? So that's your product. So in that case, you wouldn't have a user, you wouldn't have SLOs

Starting point is 00:10:52 that are tied directly to user experience. You know, the consumers of the service would be other systems. So then your SLOs become, you know, they move along that continuum into, you know, the second count that you mentioned. And this was just what I wanted to ask.

Starting point is 00:11:07 So when you're doing load testing in production, are you still then, even though it is for a large publishing company where clearly the end user SLOs are the most important things, are you still adding some of your SLOs in terms of, let's say, how much capacity do you need for a certain user load? Because in the end, yes, we all have, quote, unquote, infinite resources available in terms of compute and storage. But on the other side, everything comes with a cost, right?

Starting point is 00:11:36 That means, do you keep track of some of the SLOs, at least, where you say, hey, with 1,000 concurrent users, we used to need, I don't know two kubernetes nodes with these specs but now we need three with these specs so in your world do you see that slos are also kind of extended towards more let's say these efficiency metrics let's call the efficiency efficiency from a resource perspective you add them as well yeah so we definitely track those metrics because they are important you know one of the motivations for production load testing is to help you um plan for capacity that you might need right in future given given a certain load profile

Starting point is 00:12:17 but in that specific case at least we didn't define them as SLOs so our SLOs were only based on real user experience and then the way we use those SLOs was you know twofold it's actually quite interesting when you think about it once we define those SLOs our production load tests were both let's say bounded or constrained by those SLOs because we couldn't afford for production load tests to affect those SLOs negatively because we're testing in production. So the SLOs were a guardrail for production load tests, but then production load tests were a guardrail in a kind of wider sense for the SLOs because we use them to build up that margin safety that I talked about. So, you know, the goal was to get to a point

Starting point is 00:13:05 where at any point in time, we could add, you know, 20% of extra traffic in production whilst all of our SLOs, you know, stayed green and nothing was negatively affecting real users. So it's an interesting, almost like, you know, yin-yang type of thing where the thing loops into itself. Yeah. And so I think, and this is now the

Starting point is 00:13:26 great way of explaining really what i think you mean with with this and what the benefit is because we're talking about you have slos for your production system where you have real user load but then you basically say we have a certain you know buffer right that means let's figure out if we let's say next month we have a super cool new product online or like with the publisher a super cool story and we have 50 more traffic can we withstand that traffic and still be within our silos and this is exactly where your production load testing comes in so beautifully because you can say yes we can right yeah exactly exactly and so that you know one of the reasons that we wanted to put production load testing in place was

Starting point is 00:14:13 exactly that scenario so we would we would get massive traffic spikes regularly but not on schedule because you know as a as a, things go viral all the time, but there's no way to predict which one of the, you know, pieces of content will go viral and then exactly how viral it will go. So we had, you know, we did have a number of many incidents where, you know, parts of the system would be partially knocked out by something all of a sudden, you know, going viral in China, for example. So that's, you know, that's why that focus on that extra buffer of safety

Starting point is 00:14:47 was really, really important. Yeah, and I mean, SLOs are perfect for keeping that within that safety box, essentially. And what you end up with is almost like a, you end up with a feedback loop, similar to something like autoscaling controller, kind of, you can up with a feedback loop, you know, similar to something like auto-scaling controller, kind of, you can think of it in that way. Because you don't want to start with production load testing

Starting point is 00:15:14 at scale immediately, right? If you want to do things safely, if you really want to do them safely in production, you start really, really slowly and you build up that extra load very, very gradually. And that's where having existing SLOs defined really helps you because if anything goes wrong, that gets reflected immediately so you can back off and then go and try to find that bottleneck that caused something to go red, fix it, and then repeat again and try again.

Starting point is 00:15:41 In your talk, you also mentioned that once I think people got, developers especially learned about what you're doing, more and more developers were asking for, hey, can we do some load testing in production? And your first response was, well, do you have an Azure load device? Yeah. So that's something we kind of stumbled upon. So, you know, in my experience, at least,

Starting point is 00:16:02 developers in general love load testing. It's exciting, right? So you really get the chance to push something to its limits. And I think developers get excited about that because things tend to break when they're, you know, stressed. So that means it's always a learning opportunity and it's always an opportunity to improve something. Production load testing, you know, takes that to the next level. It's really, really exciting. It's also partially exciting because it's dangerous, right?

Starting point is 00:16:32 So we discovered that that was a great way to get people excited about SLOs because people get excited about production load testing. And then, as you said, if a team had a service that they wanted to include in that production load testing and then as you said if a team had a service that they wanted to include in that production load testing path or setup they had to have SLOs because otherwise we couldn't load test their thing safely and that yeah that made that that sell so so much so much easier one other question because I took some notes when I listened to your presentation, which, by the way, as I mentioned before, we will link to it. So if you listen to this somewhere in the description of the podcast, you'll find hopefully a lot of useful links. You talked about the importance of fitness functions.

Starting point is 00:17:19 And you explicitly pointed out that term. And I think we covered it already a little bit. But can you just, I like the term of a fitness function can you explain that just for people that have may have never heard about it and what benefits that brings yeah so it's it's actually it's one of my favorite concepts and um you know i have to credit the book which i think i linked in my uh in my presentation called evolutionary architecture which is where I learned that concept. And, you know, it's exactly what it sounds like. Just like in evolution, we have things that evolve a certain way because they're exposed to certain, you know, stressors,

Starting point is 00:17:54 let's say, in the environment. We can apply the same concept to software systems. And one, software systems, you know, in the wild, naturally evolve in response to some, you to some pre-existing fitness functions, which just happen to be part of the environment that they evolve in. A classic example might be something like Conway's Law, where software, just like it happens in nature, we can then almost use that as a hack and define our own fitness functions, which we know will lead to a certain evolutionary path being taken in the evolution of our software. So production load testing is one of those fitness functions. And, you know, in practical hands-on terms, what that means is that if, you know, if we're a team SRE and we announced to development teams

Starting point is 00:18:54 that we're planning to run production load tests, all of a sudden, you know, there is a shift in thinking. And any development team that buys into that, of course, you know, not every development team, you know, might be happy with that. But any team that buys into that, of course, you know, not every, every development team, you know, might be happy with that, but any developer that buys into the idea, all of a sudden they start thinking about the code they write slightly differently. All of a sudden things like logging and monitoring become so much more

Starting point is 00:19:17 important because things might break in production and, you know, you'll need to, you'll need to debug that. And loads, you know, loads of good stuff happens if you define a good fitness function. And production load testing, I think, is one of the best fitness functions, maybe I'm biased, that, you know, an SRE team could try to implement in an organization because it will help push through a whole, you know, a lot of good things that might be difficult to kind of convey the value of

Starting point is 00:19:52 otherwise. I like what you said that basically you have with these fitness functions, you have the power with some certain pressure on a certain part of the organizational software that you have the ability to shape what's coming out of it. And I think that's a nice thing. It's quite similar to what you see with chaos engineering. You know, there is this cooler thought even that production load testing can be seen as a type of chaos engineering.

Starting point is 00:20:20 And chaos engineering fulfills that same function. If you know, again, as a developer, for example, that if I deploy my services to Kubernetes, let's say, for example, and if I know that a number of those pods could be killed at any time with no warning, I will start writing my code slightly differently to account for that. Reliability then becomes one of the concerns that I actively think about day to day, which then helps the overall reliability of the system. Yeah, Andy and I have had quite a few discussions with Chaos,

Starting point is 00:20:54 and it definitely seems to be an outgrowth of performance testing, a mutation of it, if you will, if that's a good way to talk about it. But I think we both found it very fascinating because of its roots in theoretical practice. But that brings me to, well, speaking about your talk, I wanted to give you kudos for mentioning turtles all the way down because I love it when people mention that one because I just think that's a fantastic story. But going back to this idea of any load testing that's not in production is a lie, right? And I know it came with some caveats there. You have your unit testing, your dev testing, and you also mentioned complex systems.

Starting point is 00:21:38 But I was trying to think about situations in which maybe that wouldn't be true. Because I'm not being defensive on the idea i always love these i love exploring these ideas and i was trying to think about okay what if we're doing like full-on blue green deployment um or we're one of the rare companies that deploys everything in every environment the same so you have a full featured full scale uh i think austria just scored a goal because Andy's freaking out. I wish people could see what's going on. He just got really excited. I thought I was saying something that really excited Andy.

Starting point is 00:22:12 Of course. No, your information was amazingly special for me this moment. Oh, yes, yes. Let's say you are one of these semi-unicorns or rare breeds that have a full production environment in every, you know, in test, not in dev necessarily, but test and UAT and all that, or it's a blue-green type of thing. What I was trying to think of, okay, what would be the difference there? And to me, the difference there, even if you have the same scale, even if you have the same setup, the big difference is when you're running your tests in production, your base load is non-synthetic

Starting point is 00:22:47 traffic, meaning any time we write load tests, they're fake. That's the part that I would call BS, right? Because we cannot write what real customers do. And if you take a look at your users, they're all doing different things. There's no, oh, well, 90% of our customers click here, click here. I mean, there's always variation. So you can never recreate exactly what that is. So to me then, I think the biggest piece is that in production, you have the real crazy wild things real people do. You're adding those, let's say, fake users on top of that to fill in the gap.

Starting point is 00:23:21 But your base is so 100% accurate that that's going to be as close as you can get to a true picture. Whereas even if you're doing that in a green environment, which is exactly the same, you don't have that real user chaos in there to begin with. Is that part of your angle

Starting point is 00:23:35 or what's the angle that really makes you so strong about feel so strong about this or some multiple things? It's multiple things, but you know, that is definitely the angle. I don't even really have anything to add to what you just said. That's exactly it.

Starting point is 00:23:49 You need real traffic as part of your test setup. One of the reasons I think that load testing is, if you compare it to other kinds of testing, it's quite a niche activity almost. And part of it is because it's really, really difficult to write load tests, which are representative of real traffic. It's extremely difficult. It's very time consuming. And people still try doing that. And they try running those tests

Starting point is 00:24:15 in a non-production environment. So one of the arguments is, why are you doing that? Why not just go to the real thing and do it there? So that's one of the that's one of the aspects of it um the other one is you know i'm really glad to use the word unicorn for you know a company that might have a staging environment which is exactly like production uh just like a unicorn i'll believe that when i see it um you know i i've yet to see it myself.

Starting point is 00:24:50 It's just, you know, just even trying to think about a situation or a scenario in which that would be possible, I can only come up with extremely, extremely kind of limited and small-scale almost deployments where you could do something like that. And anything non-trivial, I personally don't think is even possible. In theory, with a 12-factor app and all, you should be able to, right? But again, that's the point, is who's actually doing that? And if you're, yeah. Yeah, so what I would say here is that I think that low testing

Starting point is 00:25:22 is fundamentally different from other kinds of testing. It's fundamentally different from unit testing, from end-to-end testing, from integration testing. And it's different in that something like a unit test, to take an extreme example, is testing logic and nothing else. It's testing the code and it's testing the things that that code does. And that's why unit tests are relatively easy to write. And that's why they're very easy to run in completely different environments. I can run a unit test on my laptop. I can run it as part of a CICD pipeline.

Starting point is 00:26:03 Someone else can run it on their laptop. They're very isolated, so end up being very reproducible. If you think about a load test, we're not actually testing the code and the logic as much as we're testing the environment in which the code runs. They don't load test operational characteristics of a system and not just part of the code that's in that system um which you know makes them fundamentally different which means that you cannot divorce

Starting point is 00:26:33 a load test of a system from everything that makes up that system and that would be code that would be infrastructure that would be all kinds of configuration that makes up that complex system. Maybe to make an analogy, let's say if you're in a team that is designing a new kind of four-wheel drive all-terrain vehicle, so your unit test would be something like my key fits into the ignition, the button that i used to roll down the window can actually be pressed your integration test might be something like if i put the key into ignition and turn it the engine starts so then if you if you extend that analogy to load testing and you say okay so how would a typical team that doesn't do load testing production might load test this

Starting point is 00:27:22 thing well you know someone will get into the car they'll start the engine they'll go into first gear maybe second gear and they'll drive it down the road for one minute and the road will be you know perfect tarmac absolutely smooth you know nothing like nothing weird if you do that like if we you know if we talk about a real world scenario that sounds ridiculous nobody would do that if you want to really load test or stress test this thing you would take it into the mountains you would drive across you know swamps and fields and forests you test it in the actual environment that it will run in um i.e you know something that is close to production or is production but for some reason we don't like doing that for software systems which doesn't really make sense i like your analogy with the

Starting point is 00:28:03 car so let me challenge you on this what if you do this load test where somebody drives it you know like five minutes around every day and then every time measures how long it took uh how much the tires were used afterwards how much gasoline was used maybe usage of the individual parts. And if you do this from every day, and you see it's constant, that's great. But all of a sudden, maybe for one day, you say, hey, we use 20% more fuel, even though we were driving down the same road with the same gear. Something is wrong.

Starting point is 00:28:37 So I think this is where the shift left comes in. Whether you call it load testing or performance unit testing, that's obviously up to you. But I think this is why we've been also very strong and we have some very strong opinions why we believe that load testing can be shifted left right i'll counter you right there because i'm on your side you know that but what i'll say for for the sake of this is that you're you're collecting some additional stuff but you're still not hitting you're you're still not hitting the mountain now all this shift left stuff is going to be extremely beneficial because there's a million other things you might catch before you try it out on the mountain

Starting point is 00:29:14 that yeah maybe there's going to be something like if you think about you know rocket rockets going up right you want to find every possible little fault before you put it on that launch pad and do that that load that production test so that only something when it's actually lifting off the earth is going to be detected. And the shift left is finding all those other things that could possibly go wrong with as much confidence so that when you're getting it production... I personally believe it's just like, do we pick Kubernetes serverless? Do we pick service fabric or whatever?

Starting point is 00:29:45 It's what are the goals of the test? What are the goals of the organization? And finding a best balance of what needs to be done for that. The ultimate goal, obviously, would be to be collecting the telemetry, at a very minimum, collecting the telemetry in production. But then being able to add those additional tests to bring yourself to the edge of those SLOs, I think is that last mile for absolute completeness that, okay, we've done everything we can before production. Now that we're in production, let's hammer that last bit and find

Starting point is 00:30:15 everything. Because one of the things, if you think about it, maybe you have a Kubernetes system running, but you're still hitting a traditional database. Well, any testing before that real load, especially unit testing, integration testing and everything, is not going to be stressed. I mean, your database, if you have a traditional database, is a very finite resource. You can't just scale. We know in production, I can throw money at the problem. I can put more pods, I can spin up a new node, I can do whatever. But if you have one of those bottleneck databases or something else like that in the system, unless you're going to pay for another license to have the full-scale database in pre-prod and the real user scenarios, you're not being able to test what happens if we scale up three more nodes and 100 more pods. Because we can do that, but can the database handle that?

Starting point is 00:31:01 And I think that's where those little bits come in. And to your point, Andy, there might be things you can do to test 80% of that or whatever percentage of what that might be in pre-prod. I think it's just a comment. I think it's an ad. And now I'm speaking for you, Hasi, or at least I feel like I'm... I want to hear from you because I got my own opinions, but you're our guest. Please, I'll shut up now. Well, I think, yeah, Andy made a great point. And I think it does demonstrate that there is value in shifting some of that performance testing left. But my counter argument to your counter argument

Starting point is 00:31:35 would be that what it comes down to is confidence and the fact that testing and production gives you the ultimate confidence and not testing and production can give you false confidence, which can be extremely dangerous. So to take it back to our, you know, car analogy, you're, you're running this vehicle on your test track and you're measuring, you know, petrol consumption and how much the tires are wearing and all that

Starting point is 00:32:01 stuff. And you get to the point where, you know, the car is extremely fuel efficient and then you take it for for its first drive out in the mountains or in a swamp and you drive into a puddle and a bit of water gets into the exhaust pipe and blows out the engine and you never tested for that right um so that's you know that's where production load testing really shines it lets lets you identify and see how your system deals with unknown unknowns almost. Yeah, no, I agree with you. I mean, I didn't want to say, I mean, I love your idea. I just want to make sure I want to make a case that people should not start thinking

Starting point is 00:32:39 about load testing only in production, because I think there's definitely many, many use cases that at least I can think of where it makes sense to do some level of performance testing before. But I have one question to you then, because it still means you need to write performance scripts. You still need to figure out what type of load are you simulating. So now here's my question to you. Why go through the effort in writing test scripts that simulate 20 on top

Starting point is 00:33:08 of your production load and not just you know mirroring the traffic or duplicating traffic you should say hey i'm taking certain user traffic and then i'm duplicating it to another environment or something like that is that something that you've thought of or is this just very hard because duplicating traffic is just as hard as or probably even harder because then you have duplicated entries in the database or i don't know yeah that's a great question and um so there are three main ways to do load testing in production or three types of load testing in production uh one is traffic replay um you know the thing that you're talking about. The other one is dark traffic or dark launches.

Starting point is 00:33:47 When you launch a system and it's in production, it's being hit by traffic, but it's not actually visible to end users. So Facebook have, I think, a white paper where they talk about testing their messenger service when it first launched in that way. And then the third one is creating synthetic traffic. So traffic replay is brilliant and it can work really well but it only works for a certain type of systems so it can only work well for a system where your requests are they have to satisfy two properties they have to be idempotent they have to be commutative so idempotent means that the same request can be replayed you know any number of times and get you the same result.

Starting point is 00:34:26 They have to be commutative, meaning that the order of requests doesn't really make a difference. And if we think what sort of, let's say, websites that translates into, it will be very content heavy websites. So like newspapers, magazines, which don't offer personalization. Websites that maybe have classified ads, again, no personalizations. And yeah, really not much else that I can think of at the moment. Should we then advocate,

Starting point is 00:34:57 like we advocate for two effect reps? Should we advocate for load-testable APIs or load-testable systems? Meaning, you know, in the arc, like design for load testing? Yes, yes, 100%. That's a great shout. Yeah, we do need to. We have to if we want to do production load testing, especially.

Starting point is 00:35:23 And, you know, the idea is not as strange as it might sound at first, because as developers, we already instrument our code for other operational properties like monitoring, like observability, logging, security. So why not do the same to make load testing easier? And in practice, what that usually means is that you put in a way to know certain operations in your code. So a classic example would be something like a checkout flow where you add in a way to not actually charge a card.

Starting point is 00:36:00 So that way you can, you know, low test almost that entire flow without actually triggering a change you know in in the real world um and that's um so just eat for example um i i saw a talk from one of their engineers a couple years ago uh they do low testing and production all the time um at peak as well which is really impressive and that's one of the things that they talked about. There is a way in which their load testing code can basically go through the full checkout process, but by toggling certain flags, they know not to try to charge a card. Yeah.

Starting point is 00:36:36 It's like feature toggling, but used for production load testing. Yeah. I think a similar thing would have to be done for, let's say, users. If you have a subscription-based service and you would have to have users in production to use in your test that you log in and authenticate with, from a business point of view,

Starting point is 00:36:59 you can't have those counted as your user base or when you're reporting to your investors on how big we are. So there would have to be a way designed into the system to be able to easily count those people and or identify those accounts and not include them in the business reporting which means more people have to be thinking about this ahead of time and plan for you know you're it's not going to be like hey andy go test this in production and you're just going to go hog wild because then that's just going to throw off a lot of different things regulatory as you know like so there's there's got to be an organizational plan it sounds like is that easy is that easy to

Starting point is 00:37:32 get people into doing like when you talk to people about this idea you know who is it that they're going to to rally to their side to say we want to test this and we need you to change the system a bit so that we can do this. Who is that and how do you get that to happen? Yeah. So as a rule, production load testing is a, it's a team sport, but you know, it does end up involving almost every department in the company. In terms of how long that might take and what the conversations look like will depend on you know the culture i suppose and you know the the organization and questions specifically but in my experience um it takes time but in general people are very very receptive because um i think especially again in my experience, there tends to be intuitive understanding for why performance and reliability are important for any system that has real users, paying users. People in general understand that outages and downtime and slowdowns are bad. So that makes production load testing a fairly straightforward proposition because of that ultimate confidence factor that it gives you that things will stay up

Starting point is 00:38:50 and it will stay working if there's a traffic spike. So yeah, those conversations are just conversations that need to be had so uh with you know folks in marketing folks in sales customer support often needs to be involved um but there's usually there's usually not very much pushback you might need to so in your in your um production load testing roadmap say that you know you need to come up with as an sre team that wants to implement that practice you need to put aside some time to have all those conversations and to get that buy-in but um in my experience it's just a matter of having those conversations and it's never really you know it's never really a blocker yeah hey um thank you so much for all the insights and for the good clan conversation on also the kind of that we,

Starting point is 00:39:45 we have opposites, not opposite views. I think we also put the same cause, but it's also always good to have arguments from, from either side. I want to quickly give you the chance to talk a little bit about artillery because I think it's a, it's a cool project. We also have a captain integration, I think lined up or already implemented, which is another case for shifting left because Kevin is triggering a performance test

Starting point is 00:40:09 and then evaluating your quality gates. But do you, for those people that have never heard about Artillery, just highlight why to look at Artillery? What's the, yeah, why should I use Artillery and not Gmetrix? Yeah, so Artillery is cloud-native. It runs in your own LBS account.

Starting point is 00:40:29 You can scale up your tests really easily. You know, you can run tests at hundreds of thousands of requests per second, millions of virtual users from many different geographical regions, and you can do it as easily as running a test on your laptop. That transition is absolutely seamless. It's very easy to use. We have a very strong batteries included philosophy. So out of the box,

Starting point is 00:40:50 you can test a variety of different systems, not just HTTP, but you can do socket IO, web sockets. There are plugins to basically test anything else you can think of. We integrate with monitoring and observability systems out of the box. And it's also designed to be very easy to extend.

Starting point is 00:41:04 So if you needed to do something that it doesn't do, usually just grab a NPM package, write a bit of code, and off you go. It's designed to be very hackable. So that's it in a gist. So that means is it? It just works. Yeah, it's designed to just work. As a developer, if you can run a test from your machine, you change literally a couple of things,

Starting point is 00:41:48 and boom, all of a sudden it runs at massive scale from your own AWS account. And I see, you know, I can encourage everyone, artillery.io, some good documentation. I'm just looking at the test script reference for scripters that want to see how it feels uh some great examples and uh yeah thanks for giving the performance community another great tool to make their life easier i think that's what it's really all about in the end thank you i really really enjoyed our conversation it was really fun

Starting point is 00:42:18 yeah good luck to austria the second half uh yeah let me let me just double check it's still 1-0 we are uh 45 plus three so it's the end and the key slo here is obviously the final result which is 1-0 but i can look at some of the other stats uh they had in the first half 11 shots and ukraine only one three shots on target ukraine only one all possession is almost the same the ukrainians are better in pass accuracy so but we'll see but overall it seems the austrians are shooting more they also have eight corner kicks versus one so i'm confident for the second half so basically this would be if the austrians were playing a scrimmage match against their own team and were not using the ukrainian goalie that

Starting point is 00:43:05 would be the difference between prod and non-prod but now that they're going against ukrainian goalie that's the big difference there and one last thing on this idea i think where at least i feel like we netted out here was that it's not that shift left is invalid or anything else it just seems like i would almost modify what you're saying to say you're not doing complete and full and accurate 100 or as accurate as you can get load testing until you put go take that last step into production everything else is valid everything else is helping to remove uh to to add stability to the system but if you really want that last mile and that solidity,

Starting point is 00:43:46 that's where you take it to production. And then you're going to be probably as bulletproof as you can. Obviously, we can never be completely bulletproof. There's no way once we start simulating traffic or even unpredictable events, right? The funny thing is when I used to work, I used to run the performance team at WebMD. And I think it was i forget it was so it was in the earlier 2000s but the idea was like well what if one of us what

Starting point is 00:44:14 if someone from the government puts out some i think there might have been like some bird flu and some swash this is way pre-covid right but like what if something like that something emergency thing got put out can Can our systems handle that? And looking back the last year, I crack up because granted now WebMD has a lot more competition, but back then WebMD was like WebMD, basically CDC at that time. There was maybe one other competitor, but that's the kind of thing where you can't predict is going to happen, which is why it's just so far. Anyway, I'm rambling and I'll stop. I want to let Andy get back to his game.

Starting point is 00:44:45 Thank you so much, Asi. Anything you have you want to mention? Anything coming up? As I said, your favorite coffee or Irish whiskey or anything else? I don't know. I'll just mention that I wrote a really long article,

Starting point is 00:44:59 Everything I Know About Production Low Testing, which is available on my website, which is velstra.org. I'll probably be linked in the description of this episode and no other than that um yeah shift left shift right shift both ways and use slos they're magical and test and test everything yes yeah thank you so much anything else from you andy or are you just gonna running grab a pint to run exactly grab a pint and root for the Austrians.

Starting point is 00:45:26 Let's see how long we can drag this out. All right. Thank you so much for being on. And to our listeners, thank you for listening. If you have any questions or comments, you can tweet us at pure underscore DT, or send us an email at pureperformance at dynatrace.com. As always, if you have any ideas or you want to be on the show, just reach out to us and let us know. And thanks, everyone.

Starting point is 00:45:48 Bye-bye. Bye-bye.

Your Ad Here

PurePerformance - Shift-Left Load Testing is a LIE with Hassy Veldstra

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.