PurePerformance - Security for Performance Engineers with Mark Tomlinson

Episode Date: October 25, 2021

If there is one thing you take away from this episode then the answer to “Why we should refrain from Reply All on company wide emails”. Jokes aside – as security and performance are not always ...funny!In this special anniversary episode we have Mark Tomlinson, System Performance Specialist, talking about the considerations and trade-offs between performance and security. We learn about performance vulnerabilities and why it is important to factor in the additional overhead each layer of security adds to your application stack. It's always a pleasure having Mark on the show – whether it was in the past, present or will be in the future.If you want to learn more from Mark on the topic of performance make sure to check out PerfBytes that has inspired us to launch PurePerformance.Show Links:Mark Tomlinson on Linkedinhttps://www.linkedin.com/in/mtomlins/PerfByteshttps://www.perfbytes.com/

Transcript
Discussion (0)
Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. My name is Brian Wilson and as always I have with me my fashionable colleague Andy Grabner. Andy Grabner, how are you doing today? I'm pretty good and I'm very glad to hear that on air you don't call me what you call me when we're not on air, which was a different word you used. You mean fantastic and loving and caring and kind and stupendous? Yeah, not annoying Austrian, I know. No, no, not at all.
Starting point is 00:00:56 Not at all. So this is a weird show today because we're celebrating our 150th episode, but this isn't our 150th episode. It almost feels like it. It's our virtual. We're virtualizing our 150th episode just because of timing and schedules and all that. So let's all pretend today that this is our 150th episode,
Starting point is 00:01:20 which is coming up very soon, dear listener. But it's not quite. Should we consider this an experiment? Like what happens if this will be the 150th episode? Well, we're traveling back in time to celebrate it before the 150th. Okay. Yeah, it's however you might want to consider it. Or maybe it was a glitch in the matrix, if you will, although that gets me to too many red pill red pill red pill dudes which are who are really
Starting point is 00:01:45 annoying but uh anyhow we're going way off topic and our guest is itching itching to get on and before we introduce him i wonder let's give a moment let's do a door the explorer style do you know who our guest is today and then we look at the camera and wait that's right our guest is mr mark tomlinson yes hello friends i have come i have come from the future i'm here to share with you what you will have talked about already in the past in the future on your 150th episode. So friend of the show, inspiration for the show. Muse of the show. Can I be a muse? Is that okay?
Starting point is 00:02:34 Yes, you are our muse. You are our Bugnish. And if anybody listening gets that. Hopefully I'm attractive enough to be a muse. And Mark, you can confirm that the stuff that you're drinking out of your cup is in fact tea or coffee and not an adult beverage. It's not. Actually, Andy, behind you, you and I have the same espresso maker.
Starting point is 00:02:54 Yeah. Right? It's the Dedica, I think it is. Delonghi. Yeah, yeah. And I got the hopper grinder with it. I'm digging it. So I'm having like anywhere from four to six shots
Starting point is 00:03:05 of pure espresso every day. Oh, that's fantastic. Wow. It's going to be one of those shows, everybody. Thank you. Much more fun. And Mark, what's before we dive into it, you know, what's up with what's going on with the PerfBytes? You know, it's funny because I almost call this in the first time I tried to introduce our show, I called it PerfBytes. But what's going on with the perf bites you know it's funny because i i almost call this in the in the first time i tried to introduce our show i called it perf bites but what's what's going on with perf bites i see leandro's taking off and doing a bunch what's what's going on in the the perf bites world of podcasting well we you know we're kind of on hiatus but not to you know sort of compete with you guys but when all told we hit about 240 episodes of different things some of of them a lot, Brian.
Starting point is 00:03:45 You know, we did at Perform. They're simulcast or coinciding episodes. There's a great number of them in the back catalog. So we kind of took a pause to support two different initiatives at first. One was helping to support PerfBytes Press. So getting into the written world, James Pulley, our colleague and friend, wanted to put together a publishing.
Starting point is 00:04:11 So we have Leandro has a new book out, which is like the Hitchhiking Guide to Load Testing Projects. James has one coming out that's about sort of hiring and career professionalism in the performance arts, the dark arts. I have one that I'm cooking on right now, i'm not gonna say anything about it it's kind of fun book um at the same time we supported leandro much more uh to kind of launch where he is headed and we added our our also mutual friend henrik rexid who came from the neotis Tricentis world, is now working with Dynatrace.
Starting point is 00:04:45 So Henrik is working on a bunch of new things. And you'll see a lot more YouTubing coming out of us. So you'll have books and videos. And Henrik and Leandro have been doing some retakes or sort of re-updates. What do you call that? Sort of refresh of some of our old topics. Reloaded. Reloaded, yeah, yeah, exactly. What's the new Matrix movie? what do you call that sort of part you know refresh of some of our old topics reloaded
Starting point is 00:05:06 yeah yeah exactly what's the new matrix movie it's like revisited re retired matrix retired isn't that replenished isn't that i don't remember the name there is you know yeah as a fan of the first movie i i can hope the new one will be yeah. Yes. Hopefully it's not refundable. That would be bad. Or non-fungible. Non-fundable. Yeah, refunded. Yeah, yeah.
Starting point is 00:05:35 Non-fungible. So Perp Pites, we'll see some new stuff happening probably in the new year. If you don't see the picture behind me, it's all, uh, remodeling my current studio, changing studios so we can set up for more YouTubing and more video stuff, which would be fun. Um, and that's, uh, some short form tutorials from very intro level performance performance, the old good old performance, load testing, performance monitoring, performance management, uh, all the way up to some of the continuous stuff that Andy is still carrying boldly into the future around continuous performance and continuous stuff. So Perp Bites is still happening.
Starting point is 00:06:15 And we miss you, Brian. But yeah, we'll see what we do with the Ask Perp Bites. But I was a quitter. No, no, no. You did fantastic work there. Some of those episodes were marvelous, especially the ones that I was on. Of course, of course.
Starting point is 00:06:28 And Leandro. Hey, Mark, for the few listeners, maybe it's the first time that they hear your voice and your name. Really? And I know I think it's still possible, even though it feels like impossible. It's like people who haven't heard of Alfred E. Newman. Like, how many of those exist?
Starting point is 00:06:49 Exactly. No, no. It's, you know, when you hit like 30 years in your career, you have to stop assuming that the young kids even know who you are. So, yeah, I'm Mark Tomlinson. Andy, you and I have known each other for a better part of the last 20 years. And Brian as well, both of you guys. And we kind of met through my former employer, Microsoft, who's not been in the news recently.
Starting point is 00:07:16 Of course, this airs in the future, so maybe they are in the news. But they're always in the news. And I've been a load tester, a performance engineer, a performance optimizer, working on Windows kernel, working on a bunch of different products. For a while, I was the load runner product manager with HP. Now, what, Micro Focus? All these companies I worked for are going out of business. But I'm not. I'm still having a good time doing performance optimization. I worked at PayPal a while ago and kind of got hooked on financial processing and the risks associated with being a payment gateway, payment processor, at least not necessarily a backend processor. processor but um yeah it's been really fun to overlap security in the financial space pci compliance with all the performance automation um that i've that i've come to do um so yeah still working for a startup now in philly and still doing some virtual online training i did one
Starting point is 00:08:17 in-person training this summer it was really fun um that was at the mile in denver so i escaped in the little gap in the weather and went and did a two-day training and then uh then came back and put my mask on so speaking of philly philly i just want to make a plug for this because i enjoy it and mark i'll send you a link to it as well there's a guy in philly musician in philly who uh does a show but there's a there's a there's a youtube show called what makes this song great. It goes into all music theory and junk. So this guy decided to make one, what makes this song suck? And he's just relentless. And it just goes down into all sorts of nastiness. It's pretty good.
Starting point is 00:08:56 But anyhow, that's completely just on the Philly tip there. So, yeah. So another lesson, right. Is if you, we, Andy, we should just make sure we never hire Mark because of his track record of sinking companies, right? That's true. That's a good, thank you. No, you know, I left and then like three, four years later, they either, people go to jail or they get bought out by, I mean, getting bought out is a good thing for some companies. Like you can, you like, you can become huge. Look at Dynatrace.
Starting point is 00:09:23 Like they took over, took over compuware either they fell apart without you or you made them so successful that they went and got bought out so got acquired yes that's a that's a good spin except for microsoft because you know then they just they just made more money being a giant awesome tech company yeah hey mark um you already kind of a little bit segued into the topic that we want to talk about today. The kind of intersection between performance and security and a little bit of depth and a little bit of ops because these are the hot topics these days. Yeah. In your current world where you are working in the financial industry, what is more important, security or performance? You can be killed by either failings.
Starting point is 00:10:08 If you fail security, you die quicker. If you fail performance, you die, I want to say kind of maybe slower. But yeah, in both of my last two roles over the last eight years, both have been aligned with ops or the DevOps team or the infrastructure team. And I'm currently on an infrastructure team and I have a peer who works, is the head of security. We have obviously compliance officers who are, I don't want to say they're not technical people, but they are more about the letter of the rules and the law
Starting point is 00:10:47 in some cases, or the policies and governance that we have around compliance that are directly influencing. They're kind of like the PMs for infrastructure security, infrastructure configuration. And then I'm kind of the, I've been the lone wolf performance person working on the team. And I work day in and day out. I mean, not a day goes by that I'm not tuned into the channel with all the security stuff that's happening at the company. And I would say security in this space, based on the risk, is financially much higher in the security space. So financially, that risk is much higher for security because all it takes is one breach to hit the big number. So it's kind of like when you're, you know, the old timey fairs where they had to like hit the mallet and ring the bell win without you know win a prize of some kind security it it doesn't take much energy to like hit that little pad and bing you know you're done
Starting point is 00:11:51 and it's and then the cso gets fired and everyone gets fired and the grc gets restructured and outsourced to some consulting firm with less experience and blah blah blah but performance is more like a bigger number because if you have a performance problem that's unabated over many, many months or even years, you will sort of quietly justify spending a lot of your cash, a lot of your money on growth. And every company is focused on growth and they sort of lose sight of sort of, you know, should we have tuned our code? Should we have tuned the database query? Should we have tuned the infrastructure? Should we have distributed our system?
Starting point is 00:12:36 Should we have gone to more of a scale out architecture? Slowly over time, should we have made those sort of complex decisions from an architecture standpoint? And, or can we just throw hardware at the problem? And can we throw hardware in the cloud? Well, it's just so easy. It's almost painless. So just, well, there's a few more containers, few more instances running. Okay, fine. Well, let's just upgrade a little bit here. Even physical infrastructures now are very easy to hot swap stuff and grow what you do. And I look at my current employer and I look at the business charts in terms of throughput. I can look at transactional throughput for a specific type. We're like a payment gateway. So I can look at transactions for a particular web service and I know whether business is good or business is bad.
Starting point is 00:13:23 And that curve in the three years, three plus years that I've been there has been like on a nonstop rocket ride, increasing growth. There's a pandemic in the middle. Sure, we took a little bit of a dip in the pandemic, but when business started to come back or people figured out how to do business in a pandemic or after a pandemic,
Starting point is 00:13:43 boom, that number goes right back up. So I think performance is, can be more costly over the longterm, uh, because it just sort of becomes this benign thing that, you know, well, we'll just keep growing. We'll throw some more hardware. We'll throw some more licenses. And you, then you get to a point where, oh my gosh, we hit some major bottleneck we can't do. Um, and there is a very typical Mark Tomlinson, very long answer to your question, Andy. So performance is more like termites in your house, where security is a bundle of dynamite. Yes. I don't want to go into like illness, because it's a really illness kind of kind of things, a slow death, you know, sort of like, yeah.
Starting point is 00:14:30 But I think it's more about cost, Brian, to be honest with you. termite infestation means you kind of have to rip the house down to its bones and rebuild the bones because that's where the those you know oh you know what i should optimize this wall over here but it takes so much work to tear it down and rebuild it a lot of people think about performance and architecture that way and security wise it's when i'm working closely with security folks, the one thing, and I think we even talked about it back on the 100th episode, was the nomenclature of a vulnerability, which I love in the infosec world. And business people, when we come from the testing world, it's pass or fail. It's a bug or it's not a bug. And things get wonky with those conversations when people are like, should we fix this bug or should we not fix this bug? What's the priority? What's the nuance? What's going on here? X, Y, and Z. If you present things
Starting point is 00:15:38 in the infosec world, it's like everyone's already in the infrastructure team ready to say, oh yeah, vulnerability. That makes sense to me. So we have risk. What are the chances of it being exploited in reality or not? Do we have any secondary or tertiary mechanisms to prevent the exploitation? And that's one of the key things that I've changed in the last 50 episodes of pure performance is really adopting this nomenclature of vulnerability and percentage risk. What are the odds and trying to estimate, well, of the, it's only going to affect 10% of the traffic and that 10% of the traffic was only, you know, so you learn how to kind of narrow the scope or understanding of the context for what is a performance vulnerability, which is to say things in a database like an optimizer. What are the chances that this particular plan is going to
Starting point is 00:16:33 end up in production? Well, I don't know. Let's take a look at how the optimizer works. And if there's other stuff, the other interesting thing from security is using that vulnerability language is we start adding more ciphers or more encryption or more layers in the architecture. Oh, we're going to put another firewall in, or we're going to jump through another device that's going to scan for this and scan for that and packet inspection and stuff. How much of that is appropriately coded or built asynchronously? Does it interrupt the transaction flow? So as people are getting more and more paranoid, they throw more and more security mechanisms or features into the architecture. If they end up being synchronous, you're going to
Starting point is 00:17:17 spend a few milliseconds. Maybe you're going to spend, let's say 50 milliseconds for every transaction times a couple million transactions an hour that adds up, you know, in terms of maybe there's another 5% CPU, boom, now you're out of CPU. So there's, there's some amazing things around really what's the vulnerability and then looking at this way of, of, of seeing security grow, and then what's the performance impact of that? So that's been the biggest thing in the last five, six years for me is really seeing the overlaps. So I had a couple of different thoughts, actually,
Starting point is 00:17:56 in the beginning of the episode on where I want to go with this discussion. But now as you bring it up, these additional layers, do you see performance engineers asking for these additional security layers to be added to the performance environment as well? And because, or is it even possible for some organizations to add all these security layers also in the testing environment? Because otherwise, how would you ever know that under
Starting point is 00:18:18 a certain load, you have these 50 milliseconds of overhead per transaction? Yeah, yeah. It's, the old days are gone where we tried to replicate production. I have a shirt that says, test in prod or live a lie. I love it. And so I think performance engineers, and I wouldn't say other performance engineers, but people I've talked to,
Starting point is 00:18:38 I have office hours and people can contact me and just talk about stuff, whatever they're dealing with in the discipline. And many, many more of them are getting access. They've been sort of in the testing world or now the development world. And I'm like, you really need to live almost purely in ops with all the access.
Starting point is 00:19:00 One of the nice things, we're a small company and I'm kind of the only performance guy. I have a lot of access to all of the dashboards, all of the monitoring and all of the data to pull stuff out. And another colleague of mine has joined the company recently and he's getting a PhD in infosec security with an emphasis in infrastructure. So machine learning based on infrastructure around vulnerabilities. And he's going to be a doctor, doctor of security, doctor security. It's going to be awesome. His name's Cliff. He's really awesome.
Starting point is 00:19:35 So there's some really interesting things about if you can get your hands on the data, you can do a lot of amazing stuff. So I would encourage every performance engineer. A lot of the answers I'm giving them are, Hey, you have to go look in production to see what, you know, why is it another 150 milliseconds slower in production and look at the number of hops, look, you know, start a dialogue with the security folks. Hey, what's this other device in the path? What's, you know, in my, in my lab or in my test experiment environment i don't see these components and that's tantamount like if you're kind of put the blinders on and you just
Starting point is 00:20:11 do it the old i ran the query and i i kind of do it what used to be when when i'd say prod was deceptively simple we could replicate that in a in a lab easily. You just can't do it now, especially with cloud services. My performance test environment is in Azure, and a lot of our physical infrastructure is sort of hybrid. It's half in the cloud and half hybrid. So I adjust. If I look and I find a bug within database I.O., for instance, I know that my database I.O., even on the ultra SSDs in Azure is still about 20 or 30
Starting point is 00:20:48 milliseconds per IO request slower than production. And so only if I find a bug in that area, like, Hey, you know, we've got a lot of IO overhead for this particular query, or we've got an incorrect mix of read and write those kinds of things you have to sort of adjust on the fly, but you kind of have to know as a performance engineer, what's my target environment? What are the differences? And I always have a cheat sheet of, you know, for each machine, here's the lab CPU,
Starting point is 00:21:19 and here's production CPU, disk, memory, network, configuration. And what's the difference? And if anything's plus or minus more than 10% difference, then I usually downgrade whatever bug I find. This looks like a bug, but what? And then I go back to that vulnerability language, which is like, okay, I found a bug, but it's a bug with CPU and my CPU is like okay i found a bug but it's a bug with cpu and my cpu is like 50 different
Starting point is 00:21:47 so maybe we don't hit this bug for a while in production or if ever and so you communicate you learn to communicate differently based on the differences between uh your lab and and that you know just just just briefly i wanted to say that that the CPU thing came up in your Perf Puzzlers we did the one time, and it's becoming sort of a lost art. If you're moving to the cloud, you can no longer really consider that stuff. But I also wonder, for people who are still running machines on-prem, how many are looking at hardware? And I just mention this because it was never a factor back when I was doing load testing um but anyway don't want to sidetrack with that i know andy had that i just want to interject that because that the that hardware stuff always comes up in performance when i talk to you but rarely any other time so yeah that's just just because other
Starting point is 00:22:38 people don't know what they're doing brian there's there's got to be a cpu somewhere right uh and you're paying for it even if you can't see it uh and that's the real question is if i'm paying for something i damn well better be able to see it let me i need to be able to get the data get my hands on the data monitor the data track the data trace that data and and that's that's a you know, that's the, one of the things you guys have talked a bit about serverless performance and serverless instrumentation. How do you monitor, Hey, serverless, what's the cost? I'll just take the bill. And it's like, great. We ran serverless. I, two of the folks recently that I've talked to have got, you know, the serverless they're doing like Lambda or something in the cloud. And I forget the name of the on-prem. You can have your own sort of serverless server on-prem. And they took their
Starting point is 00:23:31 functions that they were paying for, you know, let's say roundabout there, I think it was about 2,500 bucks a month in terms of what they were, the bill they were getting from Lambda to run serverless, a lot of transactions. And they just switched it to a two-node cluster of like the little pizza boxes, you know, with state-of-the-art processors and away we go. And of course, that investment cost them probably 10 grand for those two little boxes. And of course, the performance blew the doors off of Lambda in the cloud and it cost them 10 grand. Figure those pizza boxes are going to last them three to four years. Then they have plenty of overhead.
Starting point is 00:24:09 So there's still people that fall back out of the cloud, Brian. And they're like, you know what? Am I really getting what I'm paying for here? Why can't you let me see these things? So I'm kind of a cloud skeptic. I work at a startup right now that has stuff in the cloud, stuff on premise i can see you know the argument and benefit for both for sure it definitely security wise um in terms of if you really are going to be held liable for stuff uh some of those pci audits go much easier if you're like we own everything in the cage it's
Starting point is 00:24:42 done right here in that building over there. Yeah. One quick question. So because you bring up the term vulnerability in the context of performance, which I like, for me, it feels a little bit like what we do with looking at different performance metrics and then grading them and coming up with a score. What we do with uh with captain and i know it's the first time i said it today but here we go um and i wonder if we should i mean
Starting point is 00:25:16 like the term vulnerability and we currently we talk about it as an slo score if it makes more sense that we say hey we we give, we come up with a score, but basically in the end, it is a measure of high risk, medium risk or low risk. And then if you combine that with your vulnerability scans, maybe that come out of your security tools,
Starting point is 00:25:38 then you have a nice matrix where you can say, I don't know performance. Now people cannot see my hands right now, but you guys can. So you have like... He's got one hand up. Yeah, one hand up.
Starting point is 00:25:48 It's performance vulnerability. And then the second one? Second hand. Second hand. Next to it. Security vulnerability. And then the third one, you could even do like your functional vulnerabilities
Starting point is 00:25:57 because essentially functional performance security. And then you come up hopefully with three green lights, but if one of them is yellow or orange or red, then you have a challenge. So you're right, Andy, if we think back a good chunk of time, there was an initiative and a bunch of writing around risk-based testing. And it was an attempt to test or attach the testing disciplines and a lot of the actual idea of testing to business risk and or risk to an end user, risk to security, risk to health, the end user's experience. There's all kinds of extensions that we did that. But to me, it wasn't until like the whole InfoSec security testing world came along, that it cemented this word vulnerability. And it's even in technical jargon, vulns, right?
Starting point is 00:26:52 V-U-L-N-S. And there's all of the zero-day initiative stuff happened. And these are, you know, do you have this ailment? Do you have this vulnerability? Yes, my system does. It has not been patched or healed, or we haven't done anything with the patching. And what's interesting for performance is you can think of it as a performance patch.
Starting point is 00:27:12 In our DevOps, in our pipeline, I'll actually make recommendations from a code review or something. I'll go try something out and I'll pull whatever change I make on a branch into the performance lab for some special rounds of testing. Or I work with a developer that's, hey, here's a bunch of stuff we did around async task-based programming. So let's take a branch and go do a special initiative of testing around a particular thing. And then if it goes well, we go back and say you don't have to put this fix in you don't have to
Starting point is 00:27:47 push this patch or update but it's been tested to be beneficial when the right time comes and it could be business risk wise hey we made our numbers we could take some downtime x y and z who knows when the business sort of makes that decision, but it's been sort of vetted and hardened, if you will, through a performance test as, Hey, here are a list of performance updates. When you'd like to merge them back in, go ahead and do that. And then you'll see the pull request come through and it'll go. Almost all of our security testing is that sort of, you know, in development or production, I don't have a lot of security testing is that sort of, you know, in development or production, I don't have a lot of security testing in the performance wise, unless my security guys call
Starting point is 00:28:30 me and say, Hey, we're changing this device. Can you go review it? So I'll just do a paper, you know, online review of how many bytes per second, what are the routes we're going to run through it? What are the paths look like? What kind of packet inspection? What options are we going to enable? What's the overhead? A lot of the networking devices, even some in the cloud, their specification is very good around performance in terms of saying, hey, if you enable this type of packet inspection or this type of intrusion detection or intrusion prevention, there's CP CPU overhead and latency when you enable this configuration. The device itself, if you turn all the features off, is just a network switch. So, you know, boom, it can do a bunch of work and go really fast with throughput, at least the non-blocking
Starting point is 00:29:17 throughput for the backend of the switching on the device. And in the cloud, that's all happening in the backend. But again again they have feature switches so if you turn it on and things get slow usually you can read a lot of that proactively and say we need to buy two of these and run them in parallel to match our future growth projection and that means if you do that architecturally now i have two separate stacks versus one stack. And all of a sudden the cost benefit analysis gets kind of wonky. It's really weird because, oh, we weren't planning on making those kinds of changes. We weren't planning on having two devices for a throughput overhead. And they go back and they rethink this, that beautiful idea of shouldn't we do this for security's sake? Yes.
Starting point is 00:30:04 Zscaler is a good example the z scaler client goes everywhere on a bunch of people's desktops well they they push the z scaler client onto my load generators and i'm like um i'm not gonna have you sniffing my packets because this is i'm an employee generating this load like and it goes and it's set up to only go to a certain lab so security people usually get the green light because they're what they'll say is well we're more secure approved done do it and they they will come back around and be like did you think really about the overhead you're introducing it's a it's a huge problem as far as i i experience it for sure but to to let's say dumb it down for me in the end what you're what you're telling me
Starting point is 00:30:50 it's an additional one or two components layers that you just have to factor in and as a performance engineer need to do your analysis on you know how much overhead does to produce what's the impact on your resource consumption, on your, let's say, response time or throughput or whatever you're optimizing for. And then just do your capacity planning again and figure out how can we achieve our numbers with those additional components.
Starting point is 00:31:20 Or maybe there is some tweaks that you can do on those components and how to make them better. because I assume also these security layers in the end, there's some software running somewhere and you can probably turn knobs and switches to also optimize it for your use cases. So some of the initiatives that I have done of speaking of not having these components for security in the lab is our good friend product, the squid proxy, and a couple others. Toxie proxy is good at doing this. I've used Montbonk to do it. Or you can try some of the commercial products. And that's just to, you know, in the lab, I don't really care about the security in and of itself. I don't need to actually do encryption, decryption, et cetera. I just need to put the representation of the latency in the link to do it.
Starting point is 00:32:09 So in between my load generator and the web service or the web server or the front end of the application, I'll just put a proxy in there and maybe add that two to three milliseconds of latency. And then you can do some what if scenarios. What if we enabled configuration for packet inspection and blocking of X, Y, and Z inspection? And the docs say that could add anywhere between eight and 10 milliseconds to every transaction and handshake. Okay, so I put that into my proxy and I can do sort of a low, high latency um it's not really wan emulation but you're sort of using that slowness
Starting point is 00:32:47 to represent other layers of security um and then of course take a baseline with everything turned off so you'd be like what's what's the actual impact to doing that and oddly you'll you'll find that you have more capacity potentially on the back end because you have something slowing it down in the front end. But you know, that's, that's, that's the thing is like, we've spent all this money on the back end to make things really, really fast, even if they're running in the cloud, but we introduce these layers in front of our systems. And now that's the bottleneck. You know, this reminds me of, it reminds me of steve stouter's 80 20 rule where he said why are you optimizing anything in the back end if the biggest problem is your large image
Starting point is 00:33:30 that you're downloading and or like the javascript in your browser or yeah i mean that's yeah well it's only a 250k icon map you know that can't be a big deal how many employee how many employees are at your company 150 000 employees and this is And this is the HR website? Yes. And you're putting this link in an all points, all reply, all storm. Oh, man. That's another weird one is a reply all with a rich content on an email. And it's a reply all storm it's a human behavior right but every time that email get oh i know every time the email got opened it's loading all of that content because there was no browser cache wouldn't that be a great feature request now for microsoft saying your outlook has to confirm have has to have you confirm the the reply all is really meant. Are you sure you want to annoy the hell out of everybody? Are you sure?
Starting point is 00:34:31 Because Outlook could do the math, right, for you. It could say, hey, you're replying all to 50,000 people, and that means I don't know how many gigabytes of data that is now being transferred. Are you really sure? I think that would be an interesting... Or it could ask you something like, does everybody on this thread need to see your reply saying congratulations to Joe's promotion? Or does just Joe need this? And do you want to just...
Starting point is 00:34:54 No, we get those all the time. Sorry, that's like a pet peeve of mine. And then you do the reply all saying, please don't reply all. That's always the best one too. That's right. So that's what we've come down to here on the 150th episode is that the best thing you can do to optimize performance is to not reply all but thanks everyone thank you you know andy i wanted to ask you andy you you know you you work a lot more closely with the security aspect of you know of our products now but one thing i had never considered which mark is bringing to the topic here is performance and security you know yes we have security in our tool but it was more about
Starting point is 00:35:31 security and and knowing if things are going on and i think a lot of people think of security either in two ways is is our code secure did we leave any vulnerabilities in it or is someone doing a ddos attack on us and are we but the idea of the performance impact of all these security layers that are being added in between the front end and in every device and everything in between which just like if you think about apm tools people always ask well what's our overhead and it's well yeah you can measure it and you do the trade-off of what you're getting um same thing with security and but it's like and this is what blows my mind about you, Mark, which is why I look up to you as a performacologist,
Starting point is 00:36:11 as you called yourself at one point, is that so many people get stuck in the world of performance of, I run my tests, I look at my response times, and just earlier we were talking about looking at the hardware. Now considering the security features that are being turned on to devices and all these other kind of bits and pieces it's it's that pandora's box that performance is that is just never ending and having the exposure to even know to think about these things is just phenomenal yeah one on the agent thing when you think about sort of bytecode instrumentation or on the fly profiling, when you enable more profiling within an agent, you get more CPU overhead potentially. So you want to be very one. We've done a lot to speed up processors in our industry and bus speeds are really great second we've optimized the agents right the profiling
Starting point is 00:37:05 agents are way more efficient and use a lot of asynchronous programs so that you're not blocking anything but the third thing i'll say is there are security products a couple of them in the commercial space and there's some open source ones too that do like for the what were the um out of heart heartbleed and then there was another one that was like the CPU or processor metadata virus infection, something that's very, very low level in like even the processor itself. And there are products that like you need to put this on every machine. And it made me think of like 20 years ago when we, as an industry, came out with bytecode instrumentation and put our agents and profilers out there. It was like, yeah, you should profile
Starting point is 00:37:51 all these machines. And suddenly everything's slow. And you're like, oh, maybe we need to re-architect our agent. I've actually done some load tests with some of those agents. And I'm like, what did they tell you this agent would do for overhead? Well, they said it might affect it a little bit. Yeah, the salespeople always tell you. But, you know, a real engineer is going to be like, all right, let's see worst case scenario. Turn all the options on and then, you know, turn them off one by one. And I found like one particular option that was absolutely the worst
Starting point is 00:38:23 and like added 40% CPU on everything for a machine that was normally like five, 10% CPU, uh, and you know, had lots of headroom for other spikes. Um, but boom, all of a sudden it's running at like 50% CPU. I'm like, what is going on with this? Of course we call the vendor and tell them, Hey, uh, we're not going to move forward with your product because we like this feature, this security option. We want it. But it's way too slow. And sure enough, they're like, well, we went and talked to the developer and apparently we found some bugs in there that we're kind of blocking on.
Starting point is 00:38:57 Oh, yeah. And I'm like, oh, man, we're right back to performance profiling for performance reasons or profiling and agents for performance reasons and the security world is still in a way they're still catching up yeah you're like oh oh yeah i guess security is not the only thing we should think about we should also think about performance of our agent because they'll be out of business nobody will install them because they're like uh don't go with that product. But that's a great opportunity now, isn't it? A great opportunity for performance enthusiasts like us and like all the
Starting point is 00:39:33 other listeners to really reach out to the security experts in your organization and say, Hey, it's great what you're doing, but be reminded of the performance, potential performance impact and let me tell you some stories on my experience. If you are rolling something like this. Or start with a more open-ended question. Have you guys ever had to roll back something, a security agent or patch or something because it screwed something up? And of course, they're also very, I would say secretive for good reasons, right? They're very discerning about talking about the security patches and everything at the
Starting point is 00:40:09 company just because it's security. They're like, they don't make it really company-wide knowledge. It's kind of a secret little group. So you want to build a relationship with that team in a way that says, hey, if you guys ever want to try these things out before you push them, you know, we have a lab we could set up and I got some, you know, we got some ways we could figure out what the overhead is. And, you know, can you share with me the documentation? We can review what the specs are hardware wise, or even software wise, a lot, even Amazon, some of the Amazon stuff or the Azure stuff,
Starting point is 00:40:39 they're very specific about, you know, Hey, using this in your architecture will have certain limitations and knowing what transactions would be limited by those options. Just kind of help, just say, you guys think about all the security hacker-ness. That's nice. That's great. White hack people. And, uh, and I'll, I'll just do some study of latency and study of overhead, especially Brian. If it's, if you're in a physical world, you do have very limited machinery or resources. So that that's much more urgent. If you're in the cloud, you maybe don't have unlimited funds, but you can be more elastic, which is very, very true. Yeah. Hey, one more question.
Starting point is 00:41:20 Do you think from a performance engineering perspective, should you also think of actually running load? Like load against the security appliances to really figure out how do they behave in case, let's say, really a denial of service attack comes in, or something like a high influx in requests come in that are hard to analyze but then will be blocked or maybe not blocked.
Starting point is 00:41:45 But is it also part, like do you see this as part of the future performance engineer that is secure aware that they also need to generate loads that brings exactly these security devices to let's say the edge? Or is this something that the security team, the heck, whatever they're called are doing or should do yeah yeah um i think the vendors should be doing that and publishing their
Starting point is 00:42:14 specs um they know what kind of traffic and the specifics of what their filters and their scans or their packet inspection does feature-wise. And so they should have very, every appliance manager, if you're in the physical world, should be doing that. And if you're rolling out a software-based feature or a hardware-based feature in the cloud, and you're going to wrap something in front of it to make it accessible in the cloud as a provider, yeah, it should be on the producer of that service or the producer of that product.
Starting point is 00:42:48 I have only rarely done a appliance only or security solution only kind of load test. And I did it with very level, low level stuff. Really, again, really old companies that did low-level packet generation. It wasn't even really application level. So you're much layer in the OSI stack to be able to do that. And again, the company escapes my name. I'm never going to, yeah. I try to remember all these names.
Starting point is 00:43:18 They don't exist anymore, right, Brian? So I don't even worry about the name. So I think for the most part, I mean, you're still just having generating load through an application stack that's using HTTPS and set up with the routes or whatever protocols you're using. Application level traffic through the stack is usually enough to trip over anything egregious. The other thing I'll say is with deep packet inspection, if they're doing filtering on that, your payloads from an application standpoint might become more specific. So you might not just do a recording
Starting point is 00:43:56 and replay it and woohoo, everything looks good. That's a false negative, meaning I didn't have payload in there that would get picked up by the inspection. And then the inspection does something additional to scan it, log it, do something different. So I think there is the limitation that the payloads of your load generation might not actually send in the right patterns or the right actual data for packet inspection. And that's a big difference. But I haven't only done it once or twice, and it was at Microsoft.
Starting point is 00:44:33 Just testing ISA appliance. I think we did some stuff with partnership with F5 back in the day, F5 Networks, some of their security things things mostly around load balancing and actually you know network routing and things like that um but yeah not not i i don't think that's in most way be able to have that conversation with a security person and before they implement it try to catch them before they push something blindly to production um because you know next thing you know you're going to delete all your routes on the internet and disappear from a dns that never happened did it never
Starting point is 00:45:11 happened never happened at the time it happened in october of 2021 to facebook yeah exactly all right hey um mark it's always a pleasure having you on the show. There's always so much we can learn from you and it's, I know. Now we're really happy. We're not just saying this to make you feel good. We say this because we mean it. We pity you. So we try to have you on to make you feel better about your life. I live a small, I live a small life. Well, you're in Philly, right?
Starting point is 00:45:54 We also hope that not in the too distant future or in the past, whenever this airs, we will be able to see each other face to face. Hopefully, we'll see each other perform in Vegas, right? Because that's happening physically. And it seems the Europeans are now also allowed to travel to the States. That's nice. So it's going to be good. That'll be good. Yeah. So just a quick thing what I've learned.
Starting point is 00:46:09 The best thing we can all do to stop bad performance problems is to avoid the reply all and advocate for the reply all. To not be misused. Number one takeaway. Exactly. No, that's really something that I've put in the abstract of the podcast episode. But I think the other thing for me, and this was the revelation, is really, as a performance engineer in the really include, I don't know, scans and whatever security tools spit out. But I think this is definitely something for the security team who we should collaborate with very closely.
Starting point is 00:46:56 But really performance engineers really need to understand the impact of these additional layers on the resiliency, on the performance of the app that they're testing, right? I mean, that's, and then making the right recommendations on how to get to the ultimate goal, which is high performing, high available and secure applications. Yeah. I'd say, Andy, we talked a lot about sort of performance in the infrastructure and ops space. There's a lot of performance anti-patterns in code as well, where you're seeing some
Starting point is 00:47:23 kind of do some backflips, jump through some flaming hoops in code logic well where you're seeing some kind of do some backflips jump through some flaming hoops in in code logic to try to be more secure i've seen some really weird stuff that introduces sort of a mutex or semaphores for thread blocking i've seen limitations where every every request has to check this and check that before it proceeds. And, you know, there's some really sort of harebrained ideas, harebrained H-A-R-E, which is like offensive to rabbits. Damn you, Mark. I know. They're going to, they'll come at me on Twitter, the hare population.
Starting point is 00:47:59 But yeah, so that maybe that's another episode to talk about sort of code anti-patterns. For security, they're great patterns. But for performance, they're not so great. SecPerf. There you go. I was thinking PerfSec sounds better, but security performance makes more sense than performance security. Exactly. Cool.
Starting point is 00:48:19 So, yeah, to your point, Andy, I mean, that's my big thing, too,, is, you know, just, and what, just for, for any, for any performer performance person there. Yes. Mark's pointing as watch for any performance person out there. Uh, one more thing to consider security added to the list of the rabbit hole that you must understand more hair references there. Um, so thank you again, Mark,
Starting point is 00:48:39 for being on our virtual 150th episode. And yes, congratulations on 150 episodes. May you have 150 more. Yes. It's going to be a long time. Thank you everyone for listening and being part of our show for 150 episodes.
Starting point is 00:48:54 And you know what? If anyone's been with us since episode one, tweet us. Mark will send you something. I will. No, you won't. But let us know at pure underscore DT on Twitter. And we thank you all for listening.
Starting point is 00:49:10 Thank you, Mark. Thank you, Andy. Thank you guys. Thank you. Until next time. Bye.
Starting point is 00:49:14 Bye. Ciao. Ciao.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.