PurePerformance - Security for Performance Engineers with Mark Tomlinson
Episode Date: October 25, 2021If there is one thing you take away from this episode then the answer to “Why we should refrain from Reply All on company wide emails”. Jokes aside – as security and performance are not always ...funny!In this special anniversary episode we have Mark Tomlinson, System Performance Specialist, talking about the considerations and trade-offs between performance and security. We learn about performance vulnerabilities and why it is important to factor in the additional overhead each layer of security adds to your application stack. It's always a pleasure having Mark on the show – whether it was in the past, present or will be in the future.If you want to learn more from Mark on the topic of performance make sure to check out PerfBytes that has inspired us to launch PurePerformance.Show Links:Mark Tomlinson on Linkedinhttps://www.linkedin.com/in/mtomlins/PerfByteshttps://www.perfbytes.com/
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance.
My name is Brian Wilson and as always I have with me my fashionable colleague Andy Grabner.
Andy Grabner, how are you doing today?
I'm pretty good and I'm very glad to hear that on air you don't call me what you call me when we're not on air, which was a different word you used. You mean fantastic and loving and caring and kind and stupendous?
Yeah, not annoying Austrian, I know.
No, no, not at all.
Not at all.
So this is a weird show today because we're celebrating our 150th episode, but this isn't our 150th episode.
It almost feels like it.
It's our virtual.
We're virtualizing our 150th episode
just because of timing and schedules and all that.
So let's all pretend today
that this is our 150th episode,
which is coming up very soon, dear listener.
But it's not quite.
Should we consider this an experiment?
Like what happens if this will be the 150th episode?
Well, we're traveling back in time to celebrate it before the 150th.
Okay.
Yeah, it's however you might want to consider it. Or maybe it was a glitch in the matrix, if you will,
although that gets me to too many red pill red pill red pill dudes which are who are really
annoying but uh anyhow we're going way off topic and our guest is itching itching to get on and
before we introduce him i wonder let's give a moment let's do a door the explorer style
do you know who our guest is today and then we look at the camera and wait
that's right our guest is mr mark tomlinson yes hello friends i have come i have come from the future i'm here to
share with you what you will have talked about already in the past in the future on your 150th episode. So friend of the show, inspiration for the show.
Muse of the show.
Can I be a muse?
Is that okay?
Yes, you are our muse.
You are our Bugnish.
And if anybody listening gets that.
Hopefully I'm attractive enough to be a muse.
And Mark, you can confirm that the stuff
that you're drinking out of your cup is in fact tea or coffee and not an adult beverage.
It's not.
Actually, Andy, behind you, you and I have the same espresso maker.
Yeah.
Right?
It's the Dedica, I think it is.
Delonghi.
Yeah, yeah.
And I got the hopper grinder with it.
I'm digging it.
So I'm having like anywhere from four to six shots
of pure espresso every day. Oh, that's fantastic. Wow. It's going to be one of those shows,
everybody. Thank you. Much more fun. And Mark, what's before we dive into it, you know,
what's up with what's going on with the PerfBytes? You know, it's funny because I almost call this
in the first time I tried to introduce our show, I called it PerfBytes. But what's going on with the perf bites you know it's funny because i i almost call this in the in the first time i tried to introduce our show i called it perf bites but
what's what's going on with perf bites i see leandro's taking off and doing a bunch what's
what's going on in the the perf bites world of podcasting well we you know we're kind of on
hiatus but not to you know sort of compete with you guys but when all told we hit about 240
episodes of different things some of of them a lot, Brian.
You know, we did at Perform.
They're simulcast or coinciding episodes.
There's a great number of them in the back catalog.
So we kind of took a pause to support two different initiatives at first.
One was helping to support PerfBytes Press.
So getting into the written world,
James Pulley, our colleague and friend,
wanted to put together a publishing.
So we have Leandro has a new book out,
which is like the Hitchhiking Guide to Load Testing Projects.
James has one coming out that's about sort of hiring
and career professionalism in the performance arts,
the dark arts.
I have one that I'm cooking on right now, i'm not gonna say anything about it it's kind of fun book um at
the same time we supported leandro much more uh to kind of launch where he is headed and we added our
our also mutual friend henrik rexid who came from the neotis Tricentis world, is now working with Dynatrace.
So Henrik is working on a bunch of new things.
And you'll see a lot more YouTubing coming out of us.
So you'll have books and videos.
And Henrik and Leandro have been doing some retakes
or sort of re-updates.
What do you call that?
Sort of refresh of some of our old topics.
Reloaded. Reloaded, yeah, yeah, exactly. What's the new Matrix movie? what do you call that sort of part you know refresh of some of our old topics reloaded
yeah yeah exactly what's the new matrix movie it's like revisited re retired matrix retired
isn't that replenished isn't that i don't remember the name there is you know yeah as a fan of the
first movie i i can hope the new one will be yeah. Yes. Hopefully it's not refundable.
That would be bad.
Or non-fungible.
Non-fundable.
Yeah, refunded.
Yeah, yeah.
Non-fungible.
So Perp Pites, we'll see some new stuff happening probably in the new year.
If you don't see the picture behind me, it's all, uh, remodeling my current studio, changing studios so we can set up for more YouTubing
and more video stuff, which would be fun. Um, and that's, uh, some short form tutorials
from very intro level performance performance, the old good old performance, load testing,
performance monitoring, performance management, uh, all the way up to some of the continuous stuff that Andy is still carrying boldly into the future
around continuous performance and continuous stuff.
So Perp Bites is still happening.
And we miss you, Brian.
But yeah, we'll see what we do with the Ask Perp Bites.
But I was a quitter.
No, no, no.
You did fantastic work there.
Some of those episodes were marvelous,
especially the ones that I was on.
Of course, of course.
And Leandro.
Hey, Mark, for the few listeners,
maybe it's the first time that they hear your voice and your name.
Really?
And I know I think it's still possible,
even though it feels like impossible.
It's like people who haven't heard of Alfred E. Newman.
Like, how many of those exist?
Exactly.
No, no.
It's, you know, when you hit like 30 years in your career,
you have to stop assuming that the young kids even know who you are.
So, yeah, I'm Mark Tomlinson.
Andy, you and I have known each other for a better part of the last 20 years.
And Brian as well, both of you guys.
And we kind of met through my former employer, Microsoft, who's not been in the news recently.
Of course, this airs in the future, so maybe they are in the news.
But they're always in the news.
And I've been a load tester, a performance engineer, a performance optimizer,
working on Windows kernel, working on a bunch of different products. For a while, I was the
load runner product manager with HP. Now, what, Micro Focus? All these companies I worked for
are going out of business. But I'm not. I'm still having a good time doing performance optimization. I worked at PayPal a while ago and kind of got hooked on financial processing and the risks associated with being a payment gateway, payment processor, at least not necessarily a backend processor. processor but um yeah it's been really fun to overlap security in the financial space pci
compliance with all the performance automation um that i've that i've come to do um so yeah still
working for a startup now in philly and still doing some virtual online training i did one
in-person training this summer it was really fun um that was at the mile in denver so i escaped in the little gap in the weather and went and did a
two-day training and then uh then came back and put my mask on so speaking of philly philly i just
want to make a plug for this because i enjoy it and mark i'll send you a link to it as well
there's a guy in philly musician in philly who uh does a show but there's a there's a there's a
youtube show called what makes this song great.
It goes into all music theory and junk. So this guy decided to make one,
what makes this song suck? And he's just relentless.
And it just goes down into all sorts of nastiness. It's pretty good.
But anyhow, that's completely just on the Philly tip there. So, yeah.
So another lesson, right. Is if you, we, Andy,
we should just make sure we never hire Mark because of his track record of sinking companies, right?
That's true.
That's a good, thank you.
No, you know, I left and then like three, four years later, they either, people go to jail or they get bought out by, I mean, getting bought out is a good thing for some companies.
Like you can, you like, you can become huge.
Look at Dynatrace.
Like they took over, took over compuware either they fell apart without you or you made them so successful
that they went and got bought out so got acquired yes that's a that's a good spin except for
microsoft because you know then they just they just made more money being a giant awesome tech
company yeah hey mark um you already kind of a little bit segued into the topic that we want to talk about today.
The kind of intersection between performance and security and a little bit of depth and a little bit of ops because these are the hot topics these days.
Yeah.
In your current world where you are working in the financial industry, what is more important, security or performance?
You can be killed by either failings.
If you fail security, you die quicker.
If you fail performance, you die,
I want to say kind of maybe slower.
But yeah, in both of my last two roles over the last eight years,
both have been aligned with ops or the DevOps team or
the infrastructure team. And I'm currently on an infrastructure team and I have a peer who works,
is the head of security. We have obviously compliance officers who are, I don't want to
say they're not technical people, but they are more about the letter of the rules and the law
in some cases, or the policies and governance that we have around compliance that are directly
influencing. They're kind of like the PMs for infrastructure security, infrastructure
configuration. And then I'm kind of the, I've been the lone wolf performance person working on the team.
And I work day in and day out.
I mean, not a day goes by that I'm not tuned into the channel with all the security stuff that's happening at the company.
And I would say security in this space, based on the risk, is financially much higher in the security space.
So financially, that risk is much higher for security because all it takes is one breach to hit the big number.
So it's kind of like when you're, you know, the old timey fairs where they had to like hit the mallet and ring the bell win without you know win a prize of some kind security it it doesn't take much energy to like hit that little pad and bing you know you're done
and it's and then the cso gets fired and everyone gets fired and the grc gets restructured and
outsourced to some consulting firm with less experience and blah blah blah but performance
is more like a bigger number because if you have a performance problem that's
unabated over many, many months or even years, you will sort of quietly justify spending a lot
of your cash, a lot of your money on growth. And every company is focused on growth and they sort of lose sight of sort of, you know, should we have tuned our code?
Should we have tuned the database query?
Should we have tuned the infrastructure?
Should we have distributed our system?
Should we have gone to more of a scale out architecture?
Slowly over time, should we have made those sort of complex decisions from an architecture
standpoint? And, or can we just throw hardware at the problem? And can we throw hardware in the
cloud? Well, it's just so easy. It's almost painless. So just, well, there's a few more
containers, few more instances running. Okay, fine. Well, let's just upgrade a little bit here.
Even physical infrastructures now are very easy to hot swap stuff and grow what you do.
And I look at my current employer and I look at the business charts in terms of throughput. I can look at transactional throughput for a specific type. We're like a payment gateway. So I can look
at transactions for a particular web service and I know whether business is good or business is bad.
And that curve in the three years,
three plus years that I've been there
has been like on a nonstop rocket ride, increasing growth.
There's a pandemic in the middle.
Sure, we took a little bit of a dip in the pandemic,
but when business started to come back
or people figured out how to do business
in a pandemic or after a pandemic,
boom, that number goes right back up. So I think
performance is, can be more costly over the longterm, uh, because it just sort of becomes
this benign thing that, you know, well, we'll just keep growing. We'll throw some more hardware.
We'll throw some more licenses. And you, then you get to a point where, oh my gosh, we hit some
major bottleneck we can't do. Um, and there is a very typical Mark Tomlinson, very long answer to your question, Andy.
So performance is more like termites in your house, where security is a bundle of dynamite.
Yes.
I don't want to go into like illness, because it's a really illness kind of kind of things, a slow death, you know, sort of like, yeah.
But I think it's more about cost, Brian, to be honest with you. termite infestation means you kind of have to rip the house down to its bones and rebuild the bones
because that's where the those you know oh you know what i should optimize this wall over here
but it takes so much work to tear it down and rebuild it a lot of people think about performance
and architecture that way and security wise it's when i'm working closely with security folks, the one thing, and I think we even talked about it back on the 100th episode, was the nomenclature of a vulnerability, which I love in the infosec world.
And business people, when we come from the testing world, it's pass or fail.
It's a bug or it's not a bug. And things get wonky with those
conversations when people are like, should we fix this bug or should we not fix this bug?
What's the priority? What's the nuance? What's going on here? X, Y, and Z. If you present things
in the infosec world, it's like everyone's already in the infrastructure team ready to say, oh yeah,
vulnerability. That makes sense to me. So we have risk. What are the chances of it being exploited in reality or not?
Do we have any secondary or tertiary mechanisms to prevent the exploitation?
And that's one of the key things that I've changed in the last 50 episodes of pure performance is really adopting this nomenclature of vulnerability and percentage
risk. What are the odds and trying to estimate, well, of the, it's only going to affect 10% of
the traffic and that 10% of the traffic was only, you know, so you learn how to kind of narrow the
scope or understanding of the context for what is a performance vulnerability, which is to say
things in a database like an optimizer. What are the chances that this particular plan is going to
end up in production? Well, I don't know. Let's take a look at how the optimizer works.
And if there's other stuff, the other interesting thing from security is using that vulnerability language is we start
adding more ciphers or more encryption or more layers in the architecture. Oh, we're going to
put another firewall in, or we're going to jump through another device that's going to scan for
this and scan for that and packet inspection and stuff. How much of that is appropriately coded or
built asynchronously? Does it interrupt the transaction
flow? So as people are getting more and more paranoid, they throw more and more security
mechanisms or features into the architecture. If they end up being synchronous, you're going to
spend a few milliseconds. Maybe you're going to spend, let's say 50 milliseconds for every transaction times a
couple million transactions an hour that adds up, you know, in terms of maybe there's another 5%
CPU, boom, now you're out of CPU. So there's, there's some amazing things around really what's
the vulnerability and then looking at this way of, of, of seeing security grow, and then what's the performance impact of that?
So that's been the biggest thing
in the last five, six years for me
is really seeing the overlaps.
So I had a couple of different thoughts, actually,
in the beginning of the episode
on where I want to go with this discussion.
But now as you bring it up,
these additional layers,
do you see performance
engineers asking for these additional security layers to be added to the performance environment
as well? And because, or is it even possible for some organizations to add all these security
layers also in the testing environment? Because otherwise, how would you ever know that under
a certain load, you have these 50 milliseconds of overhead per transaction? Yeah, yeah. It's,
the old days are gone where we tried to replicate production.
I have a shirt that says,
test in prod or live a lie.
I love it.
And so I think performance engineers,
and I wouldn't say other performance engineers,
but people I've talked to,
I have office hours
and people can contact me
and just talk about stuff,
whatever they're dealing with
in the discipline.
And many, many more of them are getting access.
They've been sort of in the testing world or now the development world.
And I'm like, you really need to live almost purely in ops with all the access.
One of the nice things, we're a small company and I'm kind of the only performance guy. I have a lot of access to all of the dashboards, all of the monitoring and all of the
data to pull stuff out. And another colleague of mine has joined the company recently and he's
getting a PhD in infosec security with an emphasis in infrastructure. So machine learning based on
infrastructure around vulnerabilities.
And he's going to be a doctor, doctor of security, doctor security.
It's going to be awesome.
His name's Cliff.
He's really awesome.
So there's some really interesting things about if you can get your hands on the data,
you can do a lot of amazing stuff.
So I would encourage every performance engineer. A lot of the answers
I'm giving them are, Hey, you have to go look in production to see what, you know, why is it
another 150 milliseconds slower in production and look at the number of hops, look, you know,
start a dialogue with the security folks. Hey, what's this other device in the path? What's,
you know, in my, in my lab or in my test experiment environment i don't see
these components and that's tantamount like if you're kind of put the blinders on and you just
do it the old i ran the query and i i kind of do it what used to be when when i'd say prod was
deceptively simple we could replicate that in a in a lab easily. You just can't do it now, especially with cloud services.
My performance test environment is in Azure,
and a lot of our physical infrastructure is sort of hybrid.
It's half in the cloud and half hybrid.
So I adjust.
If I look and I find a bug within database I.O., for instance,
I know that my database I.O., even on the ultra SSDs in Azure is still about 20 or 30
milliseconds per IO request slower than production. And so only if I find a bug in that area, like,
Hey, you know, we've got a lot of IO overhead for this particular query, or we've got an
incorrect mix of read and write those kinds of things you have to sort of adjust on the fly,
but you kind of have to know as a performance engineer,
what's my target environment?
What are the differences?
And I always have a cheat sheet of, you know,
for each machine, here's the lab CPU,
and here's production CPU, disk, memory, network, configuration.
And what's the difference?
And if anything's plus or minus more than 10% difference,
then I usually downgrade whatever bug I find.
This looks like a bug, but what?
And then I go back to that vulnerability language,
which is like, okay, I found a bug,
but it's a bug with CPU and my CPU is like okay i found a bug but it's a bug with cpu and my cpu is like 50 different
so maybe we don't hit this bug for a while in production or if ever and so you communicate
you learn to communicate differently based on the differences between uh your lab and and that
you know just just just briefly i wanted to say that that the CPU thing came up in your Perf Puzzlers we did the one time, and it's becoming sort of a lost art.
If you're moving to the cloud, you can no longer really consider that stuff.
But I also wonder, for people who are still running machines on-prem, how many are looking at hardware?
And I just mention this because it was never a factor back when I was doing load testing um but anyway don't want to sidetrack with that i know andy
had that i just want to interject that because that the that hardware stuff always comes up in
performance when i talk to you but rarely any other time so yeah that's just just because other
people don't know what they're doing brian there's there's got to be a cpu somewhere right uh and you're paying for it even if you
can't see it uh and that's the real question is if i'm paying for something i damn well better be
able to see it let me i need to be able to get the data get my hands on the data monitor the data
track the data trace that data and and that's that's a you know, that's the, one of the things you guys have
talked a bit about serverless performance and serverless instrumentation. How do you monitor,
Hey, serverless, what's the cost? I'll just take the bill. And it's like, great. We ran serverless.
I, two of the folks recently that I've talked to have got, you know, the serverless they're doing
like Lambda or something in the cloud. And I forget the name of the on-prem. You can have your own sort of serverless server on-prem. And they took their
functions that they were paying for, you know, let's say roundabout there, I think it was about
2,500 bucks a month in terms of what they were, the bill they were getting from Lambda to run
serverless, a lot of transactions. And they just switched it
to a two-node cluster of like the little pizza boxes, you know, with state-of-the-art processors
and away we go. And of course, that investment cost them probably 10 grand for those two little
boxes. And of course, the performance blew the doors off of Lambda in the cloud and it cost them
10 grand. Figure those pizza boxes are going to last them three to four years.
Then they have plenty of overhead.
So there's still people that fall back out of the cloud, Brian.
And they're like, you know what?
Am I really getting what I'm paying for here?
Why can't you let me see these things?
So I'm kind of a cloud skeptic.
I work at a startup right now that has stuff in the cloud, stuff on premise i can see you know the argument and benefit for both for
sure it definitely security wise um in terms of if you really are going to be held liable for stuff
uh some of those pci audits go much easier if you're like we own everything in the cage it's
done right here in that building over there.
Yeah.
One quick question.
So because you bring up the term vulnerability in the context of performance,
which I like, for me, it feels a little bit like what we do with looking at
different performance metrics and then grading them and coming up with a score.
What we do with uh with captain and
i know it's the first time i said it today but here we go um and i wonder if we should i mean
like the term vulnerability and we currently we talk about it as an slo score if it makes more
sense that we say hey we we give, we come up with a score,
but basically in the end,
it is a measure of high risk,
medium risk or low risk.
And then if you combine that
with your vulnerability scans,
maybe that come out of your security tools,
then you have a nice matrix
where you can say,
I don't know performance.
Now people cannot see my hands right now,
but you guys can.
So you have like...
He's got one hand up.
Yeah, one hand up.
It's performance vulnerability.
And then the second one?
Second hand.
Second hand.
Next to it.
Security vulnerability.
And then the third one,
you could even do like your functional vulnerabilities
because essentially functional performance security.
And then you come up hopefully with three green lights, but if one of them is yellow
or orange or red, then you have a challenge. So you're right, Andy, if we think back a good
chunk of time, there was an initiative and a bunch of writing around risk-based testing.
And it was an attempt to test or attach the testing disciplines and a lot of the actual idea of testing to business risk and or
risk to an end user, risk to security, risk to health, the end user's experience. There's all
kinds of extensions that we did that. But to me, it wasn't until like the whole InfoSec security testing world came along, that it cemented this word vulnerability.
And it's even in technical jargon, vulns, right?
V-U-L-N-S.
And there's all of the zero-day initiative stuff happened.
And these are, you know, do you have this ailment?
Do you have this vulnerability?
Yes, my system does.
It has not been patched or healed,
or we haven't done anything with the patching.
And what's interesting for performance is you can think of it as a performance patch.
In our DevOps, in our pipeline,
I'll actually make recommendations
from a code review or something.
I'll go try something out
and I'll pull whatever change I make on a branch into the performance
lab for some special rounds of testing. Or I work with a developer that's, hey, here's a bunch of
stuff we did around async task-based programming. So let's take a branch and go do a special
initiative of testing around a particular thing. And then if it goes well, we go back and say you don't have to put this fix in you don't have to
push this patch or update but it's been tested to be beneficial when the right time comes and it
could be business risk wise hey we made our numbers we could take some downtime x y and z
who knows when the business sort of makes that decision, but it's been sort of vetted and
hardened, if you will, through a performance test as, Hey, here are a list of performance updates.
When you'd like to merge them back in, go ahead and do that. And then you'll see the pull request
come through and it'll go. Almost all of our security testing is that sort of, you know,
in development or production, I don't have a lot of security testing is that sort of, you know, in development or production,
I don't have a lot of security testing in the performance wise, unless my security guys call
me and say, Hey, we're changing this device. Can you go review it? So I'll just do a paper,
you know, online review of how many bytes per second, what are the routes we're going to run
through it? What are the paths look like? What kind of packet inspection? What options are we going to enable? What's the overhead? A lot of the
networking devices, even some in the cloud, their specification is very good around performance in
terms of saying, hey, if you enable this type of packet inspection or this type of intrusion
detection or intrusion prevention, there's CP CPU overhead and latency when you enable this configuration.
The device itself, if you turn all the features off, is just a network switch. So, you know,
boom, it can do a bunch of work and go really fast with throughput, at least the non-blocking
throughput for the backend of the switching on the device. And in the cloud, that's all
happening in the backend. But again again they have feature switches so if
you turn it on and things get slow usually you can read a lot of that proactively and say
we need to buy two of these and run them in parallel to match our future growth projection
and that means if you do that architecturally now i have two separate stacks versus one stack. And all of a sudden the cost benefit analysis
gets kind of wonky. It's really weird because, oh, we weren't planning on making those kinds
of changes. We weren't planning on having two devices for a throughput overhead. And they go
back and they rethink this, that beautiful idea of shouldn't we do this for security's sake? Yes.
Zscaler is a good example
the z scaler client goes everywhere on a bunch of people's desktops well they they push the z
scaler client onto my load generators and i'm like um i'm not gonna have you sniffing my packets
because this is i'm an employee generating this load like and it goes and it's set up to only go to a certain lab so security
people usually get the green light because they're what they'll say is well we're more secure
approved done do it and they they will come back around and be like did you think really about the
overhead you're introducing it's a it's a huge problem as far as i i experience it for sure
but to to let's say dumb it down for me in the end what you're what you're telling me
it's an additional one or two components layers that you just have to factor in and as a performance
engineer need to do your analysis on you know how much overhead does to produce what's the impact on
your resource consumption,
on your, let's say, response time or throughput
or whatever you're optimizing for.
And then just do your capacity planning again and figure out
how can we achieve our numbers with those additional
components.
Or maybe there is some tweaks that you
can do on those components and how to make them better. because I assume also these security layers in the end, there's some software running
somewhere and you can probably turn knobs and switches to also optimize it for your use cases.
So some of the initiatives that I have done of speaking of not having these components
for security in the lab is our good friend product, the squid proxy, and a couple others.
Toxie proxy is good at doing this. I've used Montbonk to do it. Or you can try some of the
commercial products. And that's just to, you know, in the lab, I don't really care about the security
in and of itself. I don't need to actually do encryption, decryption, et cetera. I just need to put the representation of the latency in the link to do it.
So in between my load generator and the web service or the web server or the front end
of the application, I'll just put a proxy in there and maybe add that two to three milliseconds
of latency.
And then you can do some what if scenarios.
What if we enabled configuration
for packet inspection and blocking of X, Y, and Z inspection? And the docs say that could add
anywhere between eight and 10 milliseconds to every transaction and handshake. Okay,
so I put that into my proxy and I can do sort of a low, high latency um it's not really wan emulation but you're sort of using that slowness
to represent other layers of security um and then of course take a baseline with everything turned
off so you'd be like what's what's the actual impact to doing that and oddly you'll you'll find
that you have more capacity potentially on the back end because you have something slowing
it down in the front end. But you know, that's, that's, that's the thing is like, we've spent all
this money on the back end to make things really, really fast, even if they're running in the cloud,
but we introduce these layers in front of our systems. And now that's the bottleneck.
You know, this reminds me of, it reminds me of steve stouter's 80 20 rule where he
said why are you optimizing anything in the back end if the biggest problem is your large image
that you're downloading and or like the javascript in your browser or yeah i mean that's yeah well
it's only a 250k icon map you know that can't be a big deal how many employee how many employees
are at your company 150 000 employees and this is And this is the HR website? Yes. And you're putting this link in an all points, all reply, all storm. Oh, man. That's another weird one is a reply all with a rich content on an email. And it's a reply all storm it's a human behavior right but every time
that email get oh i know every time the email got opened it's loading all of that content because
there was no browser cache wouldn't that be a great feature request now for microsoft saying
your outlook has to confirm have has to have you confirm the the reply all is really meant.
Are you sure you want to annoy the hell out of everybody?
Are you sure?
Because Outlook could do the math, right, for you. It could say, hey, you're replying all to 50,000 people,
and that means I don't know how many gigabytes of data
that is now being transferred.
Are you really sure?
I think that would be an interesting...
Or it could ask you something like, does everybody on this thread need to see your reply saying congratulations to Joe's promotion?
Or does just Joe need this?
And do you want to just...
No, we get those all the time. Sorry, that's like a pet peeve of mine.
And then you do the reply all saying, please don't reply all. That's always the best one too.
That's right.
So that's what we've come down to here on the 150th episode is that the best thing you can do to
optimize performance is to not reply all but thanks everyone thank you you know andy i wanted
to ask you andy you you know you you work a lot more closely with the security aspect of you know
of our products now but one thing i had never considered which mark is bringing to the topic here is
performance and security you know yes we have security in our tool but it was more about
security and and knowing if things are going on and i think a lot of people think of security
either in two ways is is our code secure did we leave any vulnerabilities in it or is someone doing
a ddos attack on us and are we but the idea of the performance impact of
all these security layers that are being added in between the front end and in every device and
everything in between which just like if you think about apm tools people always ask well what's our
overhead and it's well yeah you can measure it and you do the trade-off of what you're getting
um same thing with security and but it's like and this is what blows my mind about you, Mark,
which is why I look up to you as a performacologist,
as you called yourself at one point,
is that so many people get stuck in the world of performance of,
I run my tests, I look at my response times,
and just earlier we were talking about looking at the hardware.
Now considering the security features that are being turned on to devices and all these other kind of
bits and pieces it's it's that pandora's box that performance is that is just never ending and
having the exposure to even know to think about these things is just phenomenal yeah one on the
agent thing when you think about sort of bytecode instrumentation or on the fly profiling, when you enable more profiling within an agent, you get more CPU overhead potentially. So you want to be very one. We've done a lot to speed up processors in our industry and bus speeds are really great second we've optimized the agents right the profiling
agents are way more efficient and use a lot of asynchronous programs so that you're not blocking
anything but the third thing i'll say is there are security products a couple of them in the
commercial space and there's some open source ones too that do like for the what were the um
out of heart heartbleed and then there was another one that was like the CPU or processor metadata virus infection,
something that's very, very low level in like even the processor itself.
And there are products that like you need to put this on every machine.
And it made me think of like 20 years ago when we, as an industry, came out with bytecode
instrumentation and put our agents and profilers out there. It was like, yeah, you should profile
all these machines. And suddenly everything's slow. And you're like, oh, maybe we need to
re-architect our agent. I've actually done some load tests with some of those agents.
And I'm like, what did they tell you this agent would do for overhead? Well, they said it might affect it a little bit.
Yeah, the salespeople always tell you.
But, you know, a real engineer is going to be like,
all right, let's see worst case scenario.
Turn all the options on and then, you know, turn them off one by one.
And I found like one particular option that was absolutely the worst
and like added 40% CPU
on everything for a machine that was normally like five, 10% CPU, uh, and you know, had lots
of headroom for other spikes. Um, but boom, all of a sudden it's running at like 50% CPU. I'm like,
what is going on with this? Of course we call the vendor and tell them, Hey, uh, we're not going to
move forward with your product because we like this feature, this security option.
We want it.
But it's way too slow.
And sure enough, they're like, well, we went and talked to the developer and apparently we found some bugs in there that we're kind of blocking on.
Oh, yeah.
And I'm like, oh, man, we're right back to performance profiling for performance reasons or profiling and agents
for performance reasons and the security world is still in a way they're still catching up yeah
you're like oh oh yeah i guess security is not the only thing we should think about we should
also think about performance of our agent because they'll be out of business nobody will install
them because they're like uh don't go with that product.
But that's a great opportunity now, isn't it?
A great opportunity for performance enthusiasts like us and like all the
other listeners to really reach out to the security experts in your
organization and say, Hey, it's great what you're doing,
but be reminded of the performance,
potential performance impact and let me tell you some stories on my experience. If you are rolling something like this.
Or start with a more open-ended question. Have you guys ever had to roll back
something, a security agent or patch or something because it screwed something up? And of course,
they're also very, I would say secretive for good reasons, right? They're very
discerning about talking about the security patches and everything at the
company just because it's security.
They're like, they don't make it really company-wide knowledge.
It's kind of a secret little group.
So you want to build a relationship with that team in a way that says, hey, if you guys
ever want to try these things out before you push them, you know, we have a lab we could set up and
I got some, you know, we got some ways we could figure out what the overhead is. And, you know,
can you share with me the documentation? We can review what the specs are hardware wise,
or even software wise, a lot, even Amazon, some of the Amazon stuff or the Azure stuff,
they're very specific about, you know, Hey, using this in your architecture will have certain limitations and knowing what transactions would be limited by those options. Just kind of help,
just say, you guys think about all the security hacker-ness. That's nice. That's great. White
hack people. And, uh, and I'll, I'll just do some study of latency and study of overhead,
especially Brian. If it's, if you're in a physical world, you do have very limited machinery or resources.
So that that's much more urgent. If you're in the cloud,
you maybe don't have unlimited funds, but you can be more elastic,
which is very, very true.
Yeah. Hey, one more question.
Do you think from a performance engineering perspective,
should you also think of actually running load?
Like load against the security appliances
to really figure out how do they behave in case,
let's say, really a denial of service attack comes in,
or something like a high influx in requests
come in that are hard to analyze but then will be blocked
or maybe not blocked.
But is it also part, like do you see this
as part of the future performance engineer
that is secure aware that they also need to generate loads
that brings exactly these security devices
to let's say the edge?
Or is this something that the security team,
the heck, whatever they're called
are doing or should do yeah yeah um i think the vendors should be doing that and publishing their
specs um they know what kind of traffic and the specifics of what their filters and their scans
or their packet inspection does feature-wise.
And so they should have very, every appliance manager,
if you're in the physical world, should be doing that.
And if you're rolling out a software-based feature or a hardware-based feature in the cloud,
and you're going to wrap something in front of it
to make it accessible in the cloud as a provider,
yeah, it should be on the producer of that service or the producer of that product.
I have only rarely done a appliance only or security solution only kind of
load test. And I did it with very level, low level stuff. Really, again,
really old companies that did low-level packet generation.
It wasn't even really application level.
So you're much layer in the OSI stack to be able to do that.
And again, the company escapes my name.
I'm never going to, yeah.
I try to remember all these names.
They don't exist anymore, right, Brian?
So I don't even worry about the name.
So I think for the most part, I mean, you're still just having generating load through an application stack that's using HTTPS and set up with the routes or whatever protocols you're using.
Application level traffic through the stack is usually enough to trip over anything egregious. The other thing I'll say is with deep packet inspection,
if they're doing filtering on that,
your payloads from an application standpoint
might become more specific.
So you might not just do a recording
and replay it and woohoo,
everything looks good.
That's a false negative,
meaning I didn't have payload in there
that would get picked up by the inspection.
And then the inspection does something additional to scan it, log it, do something different. So I think there is the limitation that the payloads of your load generation might not actually send in the right patterns or the right actual data for packet inspection.
And that's a big difference.
But I haven't only done it once or twice, and it was at Microsoft.
Just testing ISA appliance.
I think we did some stuff with partnership with F5 back in the day, F5 Networks,
some of their security things things mostly around load balancing and
actually you know network routing and things like that um but yeah not not i i don't think that's
in most way be able to have that conversation with a security person and before they implement it
try to catch them before they push something blindly to production um because you know next
thing you know you're going to delete
all your routes on the internet and disappear from a dns that never happened did it never
happened never happened at the time it happened in october of 2021 to facebook yeah exactly
all right hey um mark it's always a pleasure having you on the show.
There's always so much we can learn from you and it's, I know.
Now we're really happy. We're not just saying this to make you feel good.
We say this because we mean it.
We pity you. So we try to have you on to make you feel better about your life.
I live a small, I live a small life.
Well, you're in Philly, right?
We also hope that not in the too distant future or in the past, whenever this airs, we will be able to see each other face to face.
Hopefully, we'll see each other perform in Vegas, right?
Because that's happening physically.
And it seems the Europeans are now also allowed to travel to the States. That's nice.
So it's going to be good.
That'll be good.
Yeah.
So just a quick thing what I've learned.
The best thing we can all do to stop bad performance problems is to avoid the reply all and advocate for the reply all.
To not be misused.
Number one takeaway.
Exactly.
No, that's really something that I've put in the abstract of the podcast episode.
But I think the other thing for me, and this was the revelation, is really, as a performance engineer in the really include, I don't know, scans and whatever security tools spit out.
But I think this is definitely something for the security team who we should collaborate
with very closely.
But really performance engineers really need to understand the impact of these additional
layers on the resiliency, on the performance of the app that they're testing, right?
I mean, that's, and then making the right recommendations on how to get to the ultimate
goal, which is high performing, high available and secure applications.
Yeah.
I'd say, Andy, we talked a lot about sort of performance in the infrastructure and ops
space.
There's a lot of performance anti-patterns in code as well, where you're seeing some
kind of do some backflips, jump through some flaming hoops in code logic well where you're seeing some kind of do some backflips jump through some
flaming hoops in in code logic to try to be more secure i've seen some really weird stuff that
introduces sort of a mutex or semaphores for thread blocking i've seen limitations where every
every request has to check this and check that before it proceeds. And, you know, there's some really sort of harebrained ideas,
harebrained H-A-R-E, which is like offensive to rabbits.
Damn you, Mark.
I know.
They're going to, they'll come at me on Twitter, the hare population.
But yeah, so that maybe that's another episode to talk about sort of code anti-patterns.
For security, they're great patterns.
But for performance, they're not so great.
SecPerf.
There you go.
I was thinking PerfSec sounds better, but security performance makes more sense than performance security.
Exactly.
Cool.
So, yeah, to your point, Andy, I mean, that's my big thing, too,, is, you know, just, and what, just for, for any, for any performer performance person there.
Yes.
Mark's pointing as watch for any performance person out there.
Uh,
one more thing to consider security added to the list of the rabbit hole that you must understand more hair references there.
Um,
so thank you again,
Mark,
for being on our virtual 150th episode.
And yes,
congratulations on 150 episodes.
May you have 150 more.
Yes.
It's going to be a long time.
Thank you everyone for listening
and being part of our show for 150 episodes.
And you know what?
If anyone's been with us since episode one,
tweet us.
Mark will send you something.
I will.
No, you won't.
But let us know at pure underscore DT on Twitter.
And we thank you all for listening.
Thank you,
Mark.
Thank you,
Andy.
Thank you guys.
Thank you.
Until next time.
Bye.
Bye.
Ciao.
Ciao.