Screaming in the Cloud - Creating A Resilient Security Strategy Through Chaos Engineering with Kelly Shortridge
Episode Date: May 30, 2023Kelly Shortridge, Senior Principal Engineer at Fastly, joins Corey on Screaming in the Cloud to discuss their recently released book, Security Chaos Engineering: Sustaining Resilience in Soft...ware and Systems. Kelly explains why a resilient strategy is far preferable to a bubble-wrapped approach to cybersecurity, and how developer teams can use evidence to mitigate security threats. Corey and Kelly discuss how the risks of working with complex systems is perfectly illustrated by Jurassic Park, and Kelly also highlights why it’s critical to address both system vulnerabilities and human vulnerabilities in your development environment rather than pointing fingers when something goes wrong.About KellyKelly Shortridge is a senior principal engineer at Fastly in the office of the CTO and lead author of "Security Chaos Engineering: Sustaining Resilience in Software and Systems" (O'Reilly Media). Shortridge is best known for their work on resilience in complex software systems, the application of behavioral economics to cybersecurity, and bringing security out of the dark ages. Shortridge has been a successful enterprise product leader as well as a startup founder (with an exit to CrowdStrike) and investment banker. Shortridge frequently advises Fortune 500s, investors, startups, and federal agencies and has spoken at major technology conferences internationally, including Black Hat USA, O'Reilly Velocity Conference, and SREcon. Shortridge's research has been featured in ACM, IEEE, and USENIX, spanning behavioral science in cybersecurity, deception strategies, and the ROI of software resilience. They also serve on the editorial board of ACM Queue.Links Referenced:Fastly: https://www.fastly.com/Personal website: https://kellyshortridge.comBook website: https://securitychaoseng.comLinkedIn: https://www.linkedin.com/in/kellyshortridge/Twitter: https://twitter.com/swagitda_Bluesky: https://shortridge.bsky.social
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
Have you listened to the new season of Traceroute yet?
Traceroute's a tech podcast that peels back the layers of the stack
to tell the real human stories about how the inner workings of our digital world
affect our lives in ways you may have never thought of
before. Listen and follow Traceroute on your favorite platform or learn more about Traceroute
at origins.dev. My thanks to them for sponsoring this ridiculous podcast.
Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Kelly Shortridge,
who's a senior principal engineer over at Fastly,
as well as the lead author of the recently released Security Chaos Engineering,
Sustaining Resilience in Software and Systems. Kelly, welcome to the show.
Thank you so much for having me.
So I want to start with the honest truth that in that title, I think I know what some of the words mean.
But when you put them together in that particular order, I want to make sure we're talking about the same thing.
Can you explain it like I'm five as far as what your book is about?
Yes, I'll actually start with an analogy I make in the book, which is imagine you were trying to rollerblade to some
destination. Now, one thing you could do is wrap yourself in a bunch of bubble wrap and become the
bubble person. And you can waddle down the street trying to make it to your destination on the
rollerblades. But if there's a gust of wind or a dog barks or something, you're going to flop over.
You're not going to recover. However, if you instead don what everybody does, which is, you
know, knee pads and other things that keep you flexible and nimble, the gust, you know,
there's a gust of wind. You can kind of be agile, navigate around it. If a dog barks,
you just roller skate around it. You can reach your destination. The former, the bubble person,
that's a lot of our cybersecurity today. It's just keeping us very rigid, right? And then the
alternative is resilience, which is the ability
to recover from failure and adapt to evolving conditions. I feel like I am about to torture
your analogy to death because back when I was in school in 2000, there was an annual tradition at
the school I was attending before failing out where a bunch of us would paint ourselves green
every year and then bike around the campus naked. It was the green bike ride.
So one year I did this on rollerblades.
So if you wind up looking at it, there's a bubble wrap, there's the safety gear, and then there's wearing absolutely nothing, which feels kind of like the startup approach to InfoSec.
It's like, it'll be fine.
What's the worst that happens?
And you're super nimble, super flexible until suddenly, oops, now I really wish I'd done things differently. Well, there's a reason why I don't say rollerblade naked, which other than it being
rather visceral, what you described is what I've called YOLOsec before, which is not what you want
to do. Because the problem when you think about it from a resilience perspective, again, is you
want to be able to recover from failure and adapt. Sure, you can oftentimes move quickly,
but you're probably going to erode
software quality over time. So at a certain point, there's going to be some big incident and suddenly
you aren't fast anymore. You're actually pretty slow. So there's this kind of happy medium where
you have enough, I would like security by design. We can talk about that a bit if you want,
where you have enough of this security by design baked in. And you can think of it as guardrails that you're able to withstand and recover from any failure. But yeah, going naked, that's a recipe for not
being able to rollerblade like ever again, potentially. I think on some level, the correct
dialing in of security posture is going to come down to context in almost every case.
I'm building something in my spare time in the off hours, does not need the same security posture, mostly, as we're a bank. It feels like there's a
very wide gulf between those two extremes. Unfortunately, I find that there's a certain
tone deafness coming from a lot of the security industry around, oh, everyone must have security
as their number one thing ever.
It's, I mean, with my clients
who are, I fix their AWS bills,
I have to care about security contractually,
but the secrets that I hold are boring.
How much money certain companies
pay another very large company?
Yes, I'll get sued into oblivion if that leaks,
but nobody dies.
Nobody is having their money stolen as a result. It's slightly
embarrassing in the tech press for a cycle, and then it's over and done with. That's not the same
thing as a brief stint I did running tech ops at Grindr 10 years ago, where leak that database and
people will die. There's a strong difference between those threat models. And on some level, being able to
act accordingly has been one of the more eye-opening approaches to increasing velocity,
in my experience. Does that align with the thesis of your book, since my copy has not yet arrived
for this recording? Yes, the book. I am not afraid to say it depends on the book. And you're right,
it depends on context. I actually talk about this resilience potion recipe that you can check out if you want these ingredients
so we can sustain resilience. A key one is defining your critical functions. Just what is
your system's reason for existence? And that is what you want to make sure can recover and still
operate under adverse conditions. Like you said, another example I give all the time is most SaaS
apps have some sort of reporting functionality. Guess what? That's not mission critical. You don't need the
utmost security on that for the most part. But if it's processing transactions, yeah, probably you
want to invest more security there. So yes, I couldn't agree more that it's context dependent.
And oh my God, does the security industry ignore that so much of the time? And it's been my grape
for, I feel like, as long as I've been in the industry. I mean, there was a great talk that Netflix gave years ago where they mentioned
in passing that all developers have root in production. And that's awesome. And the person
next to me was super excited. And I looked at their badge and holy hell, they worked at an
actual bank. That seems like a bad plan. But talking to the Netflix speaker after the fact,
Dave Hahn, something that I found that was extraordinarily insightful is that, yeah, but we just isolate off the PCI environment.
So the rest and the sensitive data lives in its own compartmentalized area.
So at that point, yeah, you're not going to be able to break much in that scenario.
It's like that would have been helpful context to put it the talk, which I'm sure he did.
But my attention span had tripped out and I missed that. But that's on some level constraining blast radius and not having compliance and regulatory issues extending to every corner of your environment really frees you up to do things appropriately. But there are some things where you do need to care about this stuff regardless of how small the surface area is. Agreed. And I introduced the concept of the effort investment portfolio in the book,
which is basically that is where does it matter to invest effort and where can you kind of like
maybe save some resources up? I think one thing you touched on though is
we're really talking about isolation. And I actually think people don't think about isolation
as detailed or maybe as expansively as they could, because we want both temporal
and logical and spatial isolation. What you talked about is, yeah, there are some cases where you want
to isolate data, you want to isolate certain subsystems, and that could be containers. It
could also be AWS security groups. It could take a bunch of different forms. It could be something
like RLbox in WebAssembly land. But I think that's something that I really try to highlight in the book is there's actually
a huge opportunity for security engineers, starting from the design of a system to really
think about how can we infuse different forms of isolation to sustain resilience.
It's interesting that you use the word investment.
Fixing AWS bills for a living, I've learned over
the last almost seven years now of doing this, that cost and architecture in cloud are fundamentally
the same thing. And resilience is something that comes with a very real cost, particularly when
you start looking at what the architectural choices are. I mean, one of the big reasons that
I only ever work on a fixed fee basis is because if I'm charging for a percentage of savings or something,
it inspires me to say really uncomfortable things like backups are for cowards. And when's the last
time you saw an entire AWS availability zone go down for so long that it mattered? You don't need
to worry about that. And it does cut off an awful lot of cost issues at the price of making the
environment more fragile. That's where one of
the context things starts to come in. In many cases, if AWS is having a bad day in a given region,
well, does your business need that workload to be functional? For my newsletter, I have a
publication system that's single-homed out of the Oregon region. If that whole thing goes down for
multiple days, I'm writing that week's issue by hand because I'm going to have something different to talk about anyway. For me, there's no value in making that investment. But for companies, there absolutely is. But there also seems to be a lack of awareness around how much is a reasonable investment in that area. When do you start making that investment? And most critically, when do you stop? I think that's a good point.
And luckily, what's on my side is the fact that there's a lot of just profligate spending
in cybersecurity.
And that's really what I'm focused on is how can we spend those investments better?
And I actually think there's an opportunity in many cases to ditch a ton of cybersecurity
tools and focus more on some of the stuff you talked about.
I agree, by the way, that I've seen some threat models where it's like, well, AWS, all regions go down. I'm like,
at that point, we have like a severe, bigger than whatever you're thinking about problem, right?
Right. So does your business continuity plan account for every one of your staff suddenly
quitting on the spot because there's a whole bunch of companies with very expensive consulting-like
problems that I'm going to go work for a week and then buy a house in cash. It's one of those areas where, yeah, people are not going to care
about your environment more than they are about their families and other things that are going on.
Plan accordingly. People tend to get so carried away with these things,
with the tabletop planning exercises. And then, of course, they forget little things like,
I overwrote the database by dropping the wrong thing. Turns out that was production.
Remembering for a me there.
Precisely.
And a lot of the chaos experiments that I talk about in the book are a lot of those like let's validate some of those basics.
Right.
That's actually some of the best investments you can make.
Like if you do have backups, I can totally see your argument about backups are for cowards.
But if you do have them, like maybe conduct experiments to make sure that they're available when you need them. And the same thing, even on the social side.
No one cares about backups, but everyone really cares about restores suddenly,
right after they really should have cared about backups.
Exactly. So I think it's looking at those experiments where it's like, okay,
you have these basic assumptions in place that you assume to be invariants or assume that they're
going to bail you out if something goes wrong. Let's just verify. That's a great place to start because I can tell you,
I know you've been to the RSA hall floor,
how many cybersecurity teams are actually assessing the efficacy
and actually experimenting to see if those tools really help them during incidents.
It's pretty few.
Oh, vendors do not want to do those analyses.
They don't want you to do those analyses either.
If you do, for God's sake, shut up about it.
They're trying to sell things here, mostly firewalls.
Yeah, cybersecurity vendors aren't necessarily happy about my book and what I talk about
because I have almost this ruthless focus on evidence.
And it turns out cybersecurity vendors kind of thrive on a lack of evidence.
There's so much fear, uncertainty, and doubt in that space.
And I do feel for them.
It's a hard market to sell in without having to talk about, here's the thing that you're defending against.
In my case, it's easy to sell the AWS bill is high because if I don't have to explain why
more or less setting money on fire is a bad thing, I don't really know what to tell you.
I'm going to go look for a slightly different customer profile. That's not really how it works in security. I'm sure there are better go-to-market approaches, but they're
hard to find, at least ones that work holistically. There are. And one of my priorities with the book
was to really enumerate how many opportunities there are to take software engineering practices
that people already know, let's say something like type systems even, and how those can actually
help sustain resilience. Even things like integration testing or infrastructure as code. There are a lot of opportunities just to extend what we already do
for systems reliability to sustain resilience against things that aren't attacks. And just
make sure that we cover a few of those cases as well. A lot of it should be really natural to
software engineering teams. Again, security vendors don't like that because it turns out
software engineering teams don't particularly like security vendors.
I hadn't noticed that.
I do wonder, though, that for those who are unaware, chaos engineering started off as breaking things on purpose, which I feel like one person had a really good story and thought about it super quickly when they were about to get fired.
Like, no, no, it's called chaos engineering.
Good for them. It's now a well-regarded discipline. But I've always heard of it in the context of
reliability of, oh, you think your site is going to work if the database falls over? Let's push
it over and see what happens. How does that manifest in a security context?
So I will clarify, I think that's a slight misconception. It's really about fixing things
in production. And that's the end goal. I think we should not break things just to break them. Right. But I'll give a simple example, which I know it's based on what Aaron Reinhart conducted at United Health Group, which is, OK, let's inject a misconfigured port as an experiment and see what happens end-to-end. In their case, the firewall only detected the misconfigured port
60% of the time. So 60% of the time it works every time. But it was actually the cloud,
the very common cloud configuration management tool that caught the change and alerted responders.
So it's that kind of thing where we're still trying to verify those assumptions that we have
about our systems and how they behave again end-to-end. In a lot of cases, again, with security
tools, they are not behaving as we
expect. But I still argue security is just a subset of software quality. So if we're experimenting
to verify again our assumptions and observe system behavior, we're benefiting software
quality and security is just a subset of that. Think about C code, right? It's not like there's
a healthy memory corruption. So it's bad for both
equality and security reason. One problem that I've had in the security space for a while is,
let's put on this to AWS for a second, because that is the area in which I spend the most of
my time, which probably explains a lot about my personality challenges. But the problem that I
keep smacking into is if I go ahead and configure everything the way that I should, according to best practices and the rest, I wind up with a firehose torrent of information in terms of CloudTrail logs, etc.
And it's expensive in its own right.
But then to sort through it or to do a lot of things in security, there are basically two options. I can either buy a vendor's product, which generally tends to start around $12,000 a
year and goes up rapidly from there. I'm at my current $6,000 a year bill. So okay, twice as
much as the infrastructure for security monitoring. Okay. Or alternately, find a bunch of different
random scripts and tools on GitHub of wildly diverging quality and sort of hope for the best
on that. It feels like there's
nothing in between. And the reason I care about this is not because I'm cheap, but because when
you have an individual learner who is either a student or a career switcher or someone just
trying to experiment with this, you want them to begin as you want them to go on. And things that
are no money for an enterprise are all the money to them. They're going to learn to work with the tools that they can afford.
That feels like it's a big security swing and a miss.
Do you agree?
Disagree?
What's the nuance I'm missing here?
No, I don't think there's nuance you're missing.
I think security observability, for one, isn't a buzzword that particularly exists.
I've been trying to make it a thing, but I'm solely one individual screaming into the void.
But observability just hasn't been a thing. We haven't really focused on,
okay, so what? We get data and what do we do with it? And I think, again, from a software
engineering perspective, I think there's a lot we can do. One, we can just avoid duplicating
efforts. We can treat observability, again, of any sort of issue as similar, whether that's an
attack or a performance issue. I think this is another place where security or any sort of issue is similar, whether that's an attack or a performance issue. I think this is another place where security or any sort of chaos experiment shines, though, because if you have an
idea of here's an adverse scenario we care about, you can actually see how does it manifest in the
logs and you can start to figure out like what signals do we actually need to be looking for,
what signals matter to be able to narrow it down, which again is it involves time and effort. But also I can attest when you're
buying the security vendor tool and in theory, absolving some of that time and effort, it's
maybe, maybe not because it can be hard to understand what the outcomes are or what the
outputs are from the tool. And it can also be very difficult to tune it and to be able to explain some of the
outputs. It's kind of like trading upfront effort versus long-term overall overhead,
if that makes sense. It does. On that note, the title of your book includes the magic key phrase,
sustaining resilience. I have found that security effort and investment tends to resemble a fire drill in an awful
lot of places where we care very much about security, says the company, right after they
very clearly failed to care about security.
And I know this because I'm reading it in an email about a breach that they've just
sent me.
And then there's a whole bunch of running around and hair on fire moments.
But then there's a new shiny that always comes up, a new strategic priority, and it falls to the wayside again. What do you see that drives that sustained
effort and focus on resilience in a security context?
I think it's really making sure you have a learning culture, which sounds very draw the
owl, but things again like experiments can help just because when you do simulate those adverse
scenarios and you see how your system behaves, it's almost like running
an incident and you can use that as very fresh kind of like collective memory. And I even strongly
recommend starting off with prior incidents and simulating those just to see like, hey, did the
improvements we make actually help? If they didn't, that can be a kind of another fire under the butt,
so to speak, to continue investing. So definitely in practice, and there's some case studies in the book, it can be really helpful just to kind of like sustain
that memory and sustain that learning and keep things feeling a bit fresh. It's almost like
prodding the nervous system a little, just so it doesn't go back to that complacent,
inconvenient feeling. It's one of the hard problems because I'm sure I'm going to get
castigated for this by some of
the listeners, but computers are easy, particularly compared to the people. There are deterministic
ways to solve almost any computer problem, but people are always going to be a little bit
different and getting them to perform the same way today that they did yesterday is an exercise
in frustration. Changing the culture, changing the approach and the attitude that people take toward a lot of these things feels,
from my perspective, like something of an impossible job. Cultural transformations
are things that everyone talks about, but it's rare to see them succeed.
Yes. And that's actually something that I very strongly weave throughout the book is that
if your security solutions rely on human behavior, they're going to fail.
We want to either reduce hazards or eliminate hazards by design as much as possible.
So my view is very much, again, like, can you make processes more repeatable?
That's going to help security.
I definitely do not think that if anyone takes away from my book that they need to have like a thousand hours of training to change hearts and minds, then they have completely misunderstood most of the book.
The idea is very much like what are practices that we want for other outcomes anyway, again, reliability or faster time to market?
And how can we harness those to also be improving resilience or security at the same time?
It's very much trying to think about those opportunities rather than trying to drill into people's heads like thou shalt not or thou shall.
Way back in 2018, you gave a keynote at some conference or another, and you built the entire
thing on the story of Jurassic Park, specifically Ian Malcolm as one of your favorite fictional
heroes. And you tied it into security in a bunch of different ways. You hadn't written this book
then unless the authorship process is way longer than I think it is. So I'm curious to get your
take on what Jurassic Park can teach us about software security. Yes. So I talk about Jurassic
Park as a reference throughout the book frequently. I've loved that book since I was a very young child. Jurassic Park is a great example of a complex system they didn't anticipate there could be more in the count. Like there, there's so many different factors that influenced it.
You can't actually blame just like human error point fingers at one thing. So that's a beautiful
example of how things go wrong in our software systems. Cause like you said, there's this human
element and then there's also how the humans interact and how the software components interact.
But with Jurassic Park 2, I think the great thing is dinosaurs are going to do dinosaur things like eating people. And there are also equivalents
in software like C code. C code is going to do C code things, right? It's not a memory safe
language. So we shouldn't be surprised when something goes wrong. We need to prepare accordingly.
How could this happen again?
Right. And a certain point, it's like there's probably no way to sufficiently introduce
isolation for dinosaurs unless you put them in a bunker where no one can see them.
And it's the same thing sometimes with things like Seacoat.
There's just no amount of effort you can invest, and you're just kind of investing for a really
unclear and generally not fortuitous outcome.
So I like it as kind of this analogy to think about, okay, where do our effort investments make sense? And where is it sometimes like we really just do need to refactor
because we're dealing with dinosaurs here. When I was a kid, that was one of my favorite books too.
The problem is, is I didn't realize I was getting a glimpse of my future at a number of crappy
startups that I worked at. Because you have John Hammond, who was the owner of the park,
talking constantly about how we spared no expense, But then you look at what actually happened and
he spared every freaking expense. You have one IT person who is so criminally underpaid that
smuggling dinosaur embryos off the island becomes a viable strategy for this. He went, oh, we
couldn't find the right DNA. So we're just going to splice some other random stuff in there. It'll be fine. Then you have the massive overconfidence,
because it sounds very much like he had this almost Muskian desire to fire anyone who disagreed
with him. And yeah, there was a certain lack of investment that could have been made, despite
loud protestations to the contrary. I'd say that he is the root cause. He is the proximate reason for the entire failure of the park.
But I'm willing to entertain disagreement on that point.
I think there are other individuals like Dr. Wu, if you recall,
deciding to defrag DNA and not thinking that maybe something could go wrong.
I think there was a lot of overconfidence, which you're right.
We do see a lot in software.
So I think that's actually another very important lesson is that incentives matter and incentives are very hard
to change, kind of like what you talked about earlier. It doesn't mean that we shouldn't
include incentives in our threat models. Like in the book I talk about, our threat models should
include things like maybe, yeah, people are underpaid or there is a ton of pressure to deliver
things quickly or do things as cheaply as possible. That should be
just as much of our threat models as all of the technical stuff, too.
I think that there's a lot that was in that movie that was flat out wrong. For example,
one of the kids, I forget her name, it's been a long time, was logging in and said,
oh, this is Unix. I know Unix. And having learned Unix is my first basically professional operating
system. No, you don't. No one knows Unix. They get very confused at some point. The question is just
how far down what rabbit hole it is. I feel so sorry for that kid. I hope she wound up seeking
therapy when she was older to realize that she, no, you don't actually know Unix. It's not that
you're bad at computers. It's that Unix is user hostile, actively so. Like the Raptors. That's the better metaphor when everything winds up shaking out.
Yeah. I don't disagree with that. The movie definitely takes many liberties. I think what's
interesting, though, is that Michael Crichton, specifically when he talked about writing the
book, I don't know how many people know this, dinosaurs were just a mechanism. He knew people
would want to read it in an airport. What he cared about was communicating really the danger of complex systems and how if you don't
respect them and respect that interactivity and that it can baffle and surprise us,
like things will go wrong. So I actually find it kind of beautiful in a way that the dinosaurs
were almost like an afterthought. What he really cared about was exactly what we deal with all the
time in software is when things go wrong with complexity.
Like one of his other books, Airframe, talked about an air disaster with a bunch of contributing
factors and the rest. And for some reason, that did not receive the wild acclaim that Jurassic
Park did to become a cultural phenomenon that we're still talking about, what, 30 years later?
Right. Dinosaurs are very compelling.
They really are. I have to ask, though, this is the joy of having a kid who's almost six.
What is your favorite dinosaur?
Not a question most people get asked very often, but I am going to trot that one out.
No. Oh, that is such a good question.
Maybe a Deinonychus.
Oh, because they get so angry they spit and kill people?
That's amazing.
Yeah, and I like the kind of nimble, smarter ones, and also the fact that most of the smaller ones allegedly had feathers, which I just love this idea of featherful murder machines.
I have the classic nerd kid syndrome, though, where I've read all these dinosaur names as a kid and I've never pronounced them out loud, so I'm sure there are others that I would just word salad.
But honestly,
it's hard to go wrong with choosing a favorite dinosaur. Oh yeah. I'm sure some paleontologist
is sitting out there in the field and a dig somewhere listening to this podcast, just getting
very angry at our pronunciation of things. For God's sake, I call the database post-grasqueal,
get in line. There's a lot of that out there. We're looking at complex system failures and
different contributing factors and the rest.
Make stuff, that's what makes things interesting.
I think that there's this,
the idea of a root cause is almost always incorrect.
It's not, okay, who tripped over the buried landmine
is not the interesting question.
It's who buried the thing.
What were all the things
that wound up contributing to this?
And you can't even frame it that way
in the blaming context, just because you start doing
that and people clam up and good luck figuring out what really happened.
Exactly.
But that's so much of what the cybersecurity industry is focused on is how do we assign
blame?
And it's, you know, the marketing person clicked on a link.
It's like they do that thousands of times, like a month.
And the one time suddenly they were stupid for doing it, that doesn't sound right.
So I'm a big fan of, yes, vanquishing root cause, thinking about contributing factors.
And in particular, in any sort of incident review, you have to think about, was there a
designer process problem? You can't just think about the human behavior. You have to think about
where are the opportunities for us to design things better, to make the secure way more of
the default way. When you talk about resilience and reliability and big notable outages,
most forward-thinking companies are going to go and do a variety of incident reviews and
disclosures around everything that happened to it, depending upon levels of trust and whether
you're NDA'd or not and how much gets public is going to vary from place to place. But from a
security perspective, that feels like the sort of thing that companies will clam up
about and never say a word. Because I can wind up pouring a couple of drinks into people and get the
real story of outages or the AWS bill. But security stuff, they start to wonder if I'm a state actor
on some level. When you were building all of this, how did you wind up getting people to talk candidly and forthrightly about issues that if it became tied to them, that they
were talking about this in public would almost certainly have negative career impact for them?
Yes. So that's almost like a trade secret, I feel like. A lot of it is, yes, over the years,
talking with people over generally at a conference where, you know, things are tipsy.
I never want to betray confidentiality, to be clear, but certainly pattern matching across people's stories.
We're both in positions where if they're even a hint of they can't be trusted enters the ecosystem.
I think both of our careers explode and never recover.
Exactly.
Yeah. Oh, yeah.
They fly fast and loose with secrets is never the reputation you want as a professional.
No, no, definitely not.
So it's much more pattern matching and trying to generalize.
But again, a lot of what can go wrong is not that different when you think about a developer
being really tired and making a bunch of mistakes versus an attacker.
A lot of times they're very much the same.
So luckily there's commonality there.
I do wish the security industry was more forthright and less clandestine because, frankly, all
of the public postmortems that are out there about performance issues are just such a boon
for everyone else to improve what they're doing.
So that's the change I wish would happen.
So I have to ask, given that you talk about security, chaos engineering, and resilience,
and of course, software and systems, all in the title of the O'Reilly book. Who is the target audience for this? Is it folks who have the word security
featured three times in their job title? Is it folks who are new to the space? Where does your
target audience start and stop? Yes. So I have kept it pretty broad and it's anyone who works
with software, but I'll talk about the software engineering audience because that is honestly
probably out of anyone who I would love to read the book the most because I firmly believe
that there's so much that software engineering teams can do to sustain resilience and security,
and they don't have to be security experts. So I've tried to demystify security, make it much
less arcane, even down to like how attackers, you know, they have their own development life cycle.
I try to demystify that too.
So it's very much for any team, especially like platform engineering teams, SREs,
to think about, hey, what are some of the things maybe I'm already doing
that I can extend to cover, you know, the security cases as well.
So I would love for every software engineer to check it out to see like,
hey, what are the opportunities for me to just do things
slightly differently and have these great security outcomes?
I really want to thank you for taking the time to talk with me about how you view these things.
If people want to learn more, where's the best place for them to find you?
Yes, I have all of the social media, which is increasingly fragmented, I feel like.
But I also have my personal site, kellyschorch.com.
The official book site is securitychaosenge.com as well,
but otherwise find me on LinkedIn, Twitter, Mastodon, Blue Sky. I'm probably blanking on
the others. It's probably already a new one while we've spoken. Yeah, Blue Ski is how I insist on
pronouncing it as well while we're talking about fun house pronunciation on things. I like it.
Excellent. And we will of course put links to all of those things in the show notes. Thank you so much for being so generous with your time. I really appreciate it.
Thank you for having me and being a fellow dinosaur nerd.
Kelly Shortridge, Senior Principal Engineer at Fastly. I'm cloud economist Corey Quinn,
and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review
on your podcast platform of choice. Whereas
if you've hated this podcast, please leave
a five-star review on your podcast platform
of choice, along with an insulting
comment about how our choice of dinosaurs
is incorrect. Then
put the computer away and struggle
to figure out how to open a door.
If your AWS bill keeps rising
and your blood pressure is doing the same,
then you need the Duck Bill Group.
We help companies fix their AWS bill
by making it smaller and less horrifying.
The Duck Bill Group works for you, not AWS.
We tailor recommendations to your business and we get to the point.
Visit duckbillgroup.com to get started.