Screaming in the Cloud - The Art of Effective Incident Response with Emily Ruppe
Episode Date: January 31, 2023About EmilyEmily Ruppe is a Solutions Engineer at Jeli.io whose greatest accomplishment was once being referred to as “the Bob Ross of incident reviews.” Previously Emily has written hund...reds of status posts, incident timelines and analyses at SendGrid, and was a founding member of the Incident Command team at Twilio. She’s written on human centered incident management and facilitating incident reviews. Emily believes the most important thing in both life and incidents is having enough snacks.Links Referenced:Jeli.io: https://jeli.ioTwitter: https://twitter.com/themortalemilyHowie Guide: https://www.jeli.io/howie/welcome
Transcript
Discussion (0)
Hello, and welcome to Screaming in the Cloud, with your host, Chief Cloud Economist at the
Duckbill Group, Corey Quinn.
This weekly show features conversations with people doing interesting work in the world
of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles
for which Corey refuses to apologize.
This is Screaming in the Cloud.
This episode is sponsored in part by our friends at Logicworks.
Getting to the cloud is challenging enough for many places,
especially maintaining security, resiliency, cost control, agility, etc., etc., etc.
Things break, configurations drift, technology advances,
and organizations, frankly, need to evolve. How can you get to the cloud faster and ensure you
have the right team in place to maintain success over time? Day two matters. Work with a partner
who gets it. Logicworks combines the cloud expertise and platform automation to customize solutions
to meet your unique requirements. Get started by chatting with a cloud specialist today at
snark.cloud slash logicworks. That's snark.cloud slash logicworks. And my thanks to them for
sponsoring this ridiculous podcast. Cloud native just means you've got more components or microservices than anyone,
even a mythical TEDx engineer, can keep track of.
With Ops Level, you can build a catalog in minutes and forget needing that mythical TEDx engineer.
Now, you'll have a TEDx service catalog to accompany your TEDx service count.
Visit OpsLevel.com to learn how easy it is to build and manage your service catalog to accompany your 10x service count. Visit OpsLevel.com to learn how easy it
is to build and manage your service catalog. Connect to your Git provider and you're off
to the races with service import, repo ownership, tech docs, and more.
Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Emily Roop, who's a solutions engineer over at Jelly.io,
but her entire career has generally focused around incident management. So I sort of view her as
being my eternal nemesis, just because I like to cause problems by and large, and then I make
incidents for other people to wind up solving. Emily, thank you for joining me and agreeing to suffer my slings and arrows here. Yeah. Hey, I like causing problems too. I'm a solutions engineer,
but sometimes we like to call ourselves problems engineers. I'm a problems architect is generally
how I tend to view it, but doing the work, ah, one wonders. So you are a jelly where as of this
recording, you've been for a year now. And before that, you spent some time over at Twilio slash SendGrid.
Spoiler, it's kind of the same company given the way acquisitions tend to work and all.
Now it is.
Oh, yeah.
You were there during the acquisition.
Yes, they acquired me.
That's why they bought SendGrid.
Indeed.
It's a good reason to acquire a company.
That one person I want to bring in. Absolutely. So you started with email and then effectively continued in that
general direction, given that Twilio now has eaten that business whole. And that's where I started my
career. The one thing I've learned about email systems is that they love to cause problems
because it's either completely invisible and no one knows, or suddenly an email didn't go through
and everyone's screaming at you. And there's no upside, only down. So let me ask the obvious
question I suspect I know the answer to here. What made you decide to get into incident management?
Well, I joined SendGrid. Actually, I love mess. I run towards problems. I'm someone who really enjoys that.
My ADHD, I hyper-focus. Incidents are like that perfect environment of just like all of the
problems are laying themselves out right in front of you. The distraction is the focus. It's kind
of a wonderful place where I really enjoy the flow of that. But I've started in customer
support. I've been in technical support and customer. I used to work at the Apple store.
I worked at the Genius Bar for a long time, moved into technical support over the phone. And
whenever things broke really bad, I really enjoyed that process and kind of getting involved in
incidents. And I came, I was one of two weekend support people at SunGrid, came in during a time of
change and growth.
And everyone knows that growth, usually exponential growth, usually happens very smoothly and
nothing breaks during that time.
So no, there was a lot of incidents.
And because I was on the weekend, one of the only people on the weekend, I kind of had
to very quickly find my way and learn, when do I escalate this? How do I make the determination that this is something that is an incident? And, you know,
is this worth paging engineers that are on their weekend? And getting involved in incidents and
being kind of that core communication between our customers and the engineers.
For listeners who might not have been involved in sufficiently scaled out environments.
That sounds counterintuitive,
but one of the things that you learn,
very often the hard way,
has been that as you continue down the path
of building a site out and scaling it,
it stops being an issue relatively quickly
of is the site up or down,
and instead becomes a question of how up is it?
So it doesn't sound obvious until you've lived it,
but declaring what is an incident versus what isn't an incident is incredibly nuanced,
and it's not the sort of thing that lends itself to casual solutions. Because every time a customer
gets an error, we should open an incident on that. Well, I've worked at companies that throw
dozens of 500 errors every second at their scale.
You will never hire enough people to solve that if you do an incident process on even 10% of them.
Yeah.
So, I mean, it actually became something that I, when you join Twilio, they have you create a project using Twilio's API to earn your track jacket, essentially.
It's kind of like an onboarding thing.
And as they absorbed SendGrid,
we all did that onboarding process.
And mine was a number for support people to text and it would ask them six questions.
And if they answered yes to more than two of them,
it would text back,
okay, maybe you should escalate this.
And the questions were pretty simple of,
can emails be sent? Can customers log into their website? Are you able to view this particular part of the website? Because it is with email in particular, at SendGrid in particular,
the bulk of it is the email API. So like the site being up or down was the easiest type of incident,
the easiest thing to flex on, because that's so much
easier to see. Being able to determine what percentage or what level, how many emails are
not processing? Are they getting stuck? Or is this the correct amount of things that should be
bouncing because of IP reputation? There's a thousand different things. We had this visualization
of this mail pipeline that was just a mess of all of these different pipes kind of connected together. And mail could get stuck
in a lot of different places. So it was a lot of spending time trying to find that and
segued into project management. I was a QA for a little while doing QA work, became a project
manager and learned a lot about imposing process because you're supposed to and that
sometimes imposing process on teams that are working well can actually destroy them. So I
learned a lot of interesting things about process the hard way and during all of that time that I
was doing project management I kind of accidentally started owning the incident response process
because a lot of people left. I had been a part of the incident analysis group as well. And so I kind of became the sole owner of that when Twilio
purchased SendGrid. I found out they were creating an incident commander team and I just reached out
and said, here's all of SendGrid's incident response stuff. We just created a new Slackbot.
I just retrained the entire team on how to talk to each other and recognize when
something might be an incident. Please don't rewrite all of this to be Twilio's response
process. And Terry, the person who was putting together that team said, excellent, you're going
to be welcome to Twilio incident command. This is your problem. And it's a lot worse than you
thought because here's all the rest of
it. So yeah, it was really interesting experience coming into technically the same company, but an
entirely different company and finding out, like really trying to learn and understand all of the
differences and, you know, the different problems, the different organizational history, the like
fascia that has been built up between some of these parts
of the organization to understand why things are the way that they are within processes.
It's very interesting, and I kind of get to do it now as my job. I get to learn about the full
organizational subtext of all of these different companies to understand how incident response
works, how incident analysis works, and maybe some of the
whys, like what are the places where there was a very bad incident? So we put in very specific,
very strange process pieces in order to navigate that, or teams that are difficult to work with.
So we've built up interesting process around them. It feels like that can almost become ossified if
you're not careful, because you wind up with a release process that's 2,000 steps long and each one of them is there to wind up avoiding a specific type of
failure that had happened previously. And this gets into a world where in so many cases there
needs to be a level of dynamism to how you wind up going about your work. It feels almost like
companies have this idealized vision of the
future where if they can distill every task that happens within the company down to a series of
inputs and responses of scripts almost, you can either wind up replacing your staff with a bunch
of folks who just work from a runbook and cost way less, or computers in the ultimate sense of things. But that's been teased
for generations now. And I have a very hard time seeing a path where you're ever going to be able
to replace the contextually informed level of human judgment that honestly has fixed every
incident I've ever seen. Yeah. The problem comes down to, in my opinion, the fact that humans wrote this code. People with specific context and specific understanding of how the thing needs to work in a specific way and the shortcomings and limitations they have for the libraries they're using or the different things they're trying to integrate in, a human being is who's writing the code. Code is not being written by computers.
It's being written by people who have understanding in subtext.
And so when you have that code written, and then maybe that person leaves,
or that person joins a different team, and their focus and priority is on something else,
there is still human subtest that exists within the services that have been written.
We have it call in this specific way and timeout in this specific amount
of time because when we were writing it, there was this ancient service that we had to integrate
with. There's always just these little pieces of we had to do things because we were people trying
to make connections with lines of code. We're trying to connect a bunch of things to do some
sort of task. And we have a human understanding of how to get from A to B. And probably if a computer wrote this code, it would work in an entirely different way. So
in order to debug a problem, the humans usually need some sort of context. Like,
why did we do this the way that we did this? And I think it's a really interesting thing
that we're finding that it is very hard to replace humans around
computers, even though intellectually we think like this is all computers, but it's not. It's
people convincing computers to do things that maybe they shouldn't necessarily be doing.
Sometimes they're things that computers should be doing maybe, but a lot of the times it's kind of
a miracle that any of these things continue to work on a given
basis. And I think that it's very interesting when I think we think that we can take people out of it.
The problem I keep running into, though, the more I think about this and the more I see it out there
is I don't think that it necessarily did incident management any favors when it was originally cast
as the idea of blamelessness
and blameless postmortems.
Just because it seems an awful lot to me,
like the people who were the most advocate champions
of approaching things from a blameless perspective
and having a blameless culture
are the people who would otherwise
have been blamed themselves.
So it really kind of feels on some broader level,
like, oh, is this entire movement
really just about
being self-serving so that people don't themselves get in trouble? Because if you're not going to
blame no one, you're going to blame me instead. I think that that on some level set up a framing
that was not hugely helpful for folks with only a limited understanding of what the incident
life cycle looks like. Yeah, I think we've evolved, right?
I think from the blameless,
I think there was good intentions there,
but I think that we actually missed
the really big part of that boat
that a lot of folks glossed over
because then as it is now,
it's a little bit harder to sell.
When we're talking about being blameless,
we have to talk about circumventing blame
in order to get people to talk about circumventing blame in order to
get people to talk candidly about their experiences. And really it's, it's less about
blaming someone and what they've done. Cause we, as humans blame, there's a great Brene Brown
talk that she gives, I think it's a Ted talk about blame and how we as humans cannot physically
avoid blaming, placing blame on things. It's about
understanding where that's coming from and working through it. That is actually how we grow. And I
think that there's, we're starting to kind of shift into this more blame aware culture, but I think the
hard pill to swallow about blamelessness is that we actually need to talk about
the way that this stuff makes us feel as people,
like feelings, like emotions.
Talk about emotions during a technical incident review is not really an easy thing to get
some tech executives to swallow or even engineers.
There's a lot of engineers who are just kind of like, why do you care about how I felt
about this problem?
But in reality, you can't measure emotions as easily as you can measure meantime to resolution.
But meantime to resolution is impacted really heavily by, like, were we freaking out?
Did we feel like we had absolutely no idea what we were trying to solve?
Or did we understand this problem and we were confident that we could solve it?
We just didn't, we couldn't find the specific place where this bug was happening. All of that is really interesting and important context
about how we work together and how our processes work for us. But it's hard because we talk about
our feelings. I think that you're onto something here because I look back at the, the key outages
that really defined my perspective on things over the course of my career. And most of the early ones were beset by
a sense of panic of, am I going to get fired for this? Because at the time I was firmly convinced
that, well, the root cause is me. I am the person that did the thing that blew up production.
And while I am certainly not blameless in some of those things, I was never setting out with an intent to wind up tearing things down. So it was not that I was a bad actor subverting internal controls. Because in many companies, you don't need that level of rigor.
Right. to wind up tearing things down when I did not mean to. So there were absolutely systemic issues
there. But I still remember that rising tide of panic. Like, should I be focusing on getting the
site back up or updating my resume? Which of these is going to be the better longer-term outcome?
And now that I've been in this industry long enough and have seen enough of these,
you almost don't feel the blood pressure rise anymore when you wind up having it, something gets panicky, but it takes time and nuance to get there.
Yeah. Well, and it's also in order to best understand how you got in that situation,
like, were you willing to tell people that you were absolutely panicked? Would you have felt
comfortable? Like if someone was saying like, okay, so what happened? How did walk me through what you were experiencing? Would you have said like, I was scared out of my
goddamn mouth? Were you absolutely panicking? Or did you feel like you had some like grasping at
some straws? Like, where were you? Because uncovering that for the person who is experiencing
that in the issue in the incident can help understand what resources did they feel like they
knew where to go to or where did they go to? Like what resource did they decide in the middle of
this panic haze to grasp for? Is that something that we should start using as, hey, if it's your
first time on call, this is a great thing to pull into because that's where instinctively you went.
Like there's so much that we can learn from the people who are experiencing this massive amount of panic during the incident but sometimes
we will if we're being quote-unquote blameless gloss over your entire your involvement in that
entirely because we don't want to blame cory for this thing happening instead we'll say
an engineer made a decision and that's
fine. We'll move past that. But there's so much wealth of information there.
Well, I wound up in postmortems later when I ran teams. I said, okay, so an engineer made a
mistake. It's like, well, hang on. There's always more to it than that because we don't hire
malicious people and the people we have are competent for their role. So that goes a bit beyond that. We
will never get into a scenario where people do not make mistakes in a variety of different ways.
So that's not a helpful framing. It's a question of what, if they made a mistake, sure, what was
it that brought them to that place? Because that's where it gets really interesting. The problem is
when you're trying to figure out in a business context, why a customer is super upset if they're
a major partner, for example, and there's a sense of, all right, we're looking for a sacrificial lamb or
someone that we can blame for this because we tend to think in relatively straight lines.
And in those scenarios, often a nuanced understanding of the systemic failure modes
within your organization that might wind up being useful in the mid to long term are not helpful for
the crisis there.
So trying to stuff too much into a given incident response might be a symptom there. I'm thinking
of one or two incidents in the course of my later career that really had that stink to them,
for lack of a better term. What's your take on the idea? I've been in a lot of incidents where
it's the desire to be able to point and say a person made this mistake is
high. It's definitely something that the organization, and I put the organization
in quotes there and say technical leadership or maybe PR or the comms team said, like,
we're going to say like a person made this mistake when in reality, I mean, nine times out of 10,
calling it a mistake is hindsight, right?
Usually people, sometimes we know that we make a mistake and it's the recovery from
that that is response.
But a lot of times we are making an informed decision.
You know, an engineer has the information that they have available to them at the time
and they're making an informed decision.
And oh, no, it does not go as we planned things in the system that we
didn't fully understand are coexisting it's a perfect storm of these events in order to lead to
impact to this important customer for me i've been customer facing for a very long time and i feel
like from my observation customers tend to like if you say like this person did something wrong versus we learned more about
how the system works together and we understand now these kind of different pieces and mechanisms
within our system are not necessarily single points of failure but points at which they interact that
we didn't understand could cause impact before and now we have a better understanding of how our
system works and we're making some changes to some pieces. I feel like personally, as someone who has had to say
that kind of stuff to customers a thousand times saying it was a person who did this thing.
It shows so much less understanding of the event and understanding of the system than actually
talking through the different components and different kind of contributing factors,
what went wrong.
So I feel like there's a lot of growth
that we as an industry can go from blaming things
on an intern to actually saying,
no, we invested time in understanding
how a single person could perform these actions
that would lead to this impact.
And now we have a deeper understanding of our system
is, in my opinion,
builds a little bit more confidence
from the customer side.
This episode is sponsored in part by Honeycomb.
I'm not going to dance around the problem.
Your engineers are burned out.
They're tired from pagers waking them up at 2 a.m.
for something that could have waited
until after their morning coffee.
Ring, ring, who's there?
It's Nagios, the original Call of Duty.
They're fed up with relying on two or three different monitoring tools
that still require them to manually trudge through logs
to decipher what might be wrong.
Simply put, there is a better way.
Observability tools like Honeycomb, and very little else
because they do admittedly set the bar,
show you the patterns and outliers of how users experience your code
in complex and unpredictable environments
so you can spend less time firefighting and more time innovating.
It's great for your business, great for your engineers,
and most importantly, great for your customers.
Try free today at honeycomb.io slash screaming in the cloud.
That's honeycomb.io slash screaming in the cloud. That's honeycomb.io slash screaming
in the cloud. I think so much of this is, I mean, it gets back to your question to me that I sort
of dodged. Was I willing to talk about my emotional state in these moments? And yeah, I was visibly
sweating and very nervous. And I've always been relatively okay with calling out the fact that I'm not in a
great place at the moment and I'm panicking. And it wasn't helped in some cases by, in those early
days, the CEO of the company standing over my shoulder, coming down from the upstairs building,
know what that, what was going on and everything had broken. And in that case, I was only coming
in to do mop-up. I wasn't one of the factors contributing to this, at least not by a primary or secondary degree.
And it still was incredibly stress-inducing.
So from that perspective, it feels odd.
But you also talk about we in the sense of as an industry,
as a culture and the rest.
I'm going to push back on that a little bit
because there are still companies today
in the closing days of 2022 that are extraordinarily
far behind where many of us are at the companies we work for. And they're still stuck in the
relative dark ages, technically, where, well, are VMs okay or should we stay on bare metal is still
the era that they're in, let alone cloud, let alone containerization, let alone infrastructure
as code, et cetera, et cetera.
I'm unconvinced that they have meaningfully progressed on the interpersonal aspects of
incident management when they've been effectively frozen in amber from a technical basis.
I don't think that's fair.
No, excellent.
Let's talk about that.
I think just because an organization is still like maybe in DCs and using hardware and maybe hasn't advanced so thoroughly within the technical aspect of things, that doesn't necessarily mean that they haven't adopted new. point of clarification then on this, because what I'm talking about here is the fact there are companies who are that far behind on a technical basis. They are not necessarily one in the same.
Because you're using older technology, that means your processes are stuck in the past too.
But rather, just as there are companies that are ancient on the technology basis,
there are also companies will be 20 years behind in learnings compared to how the more
progressive folks have already internalized
some of these things ages ago. Blamelessness is still in the future for them. They haven't
gotten there yet. I mean, yeah, there's still places that are doing root cause analysis that
are doing the five whys. And I think that we're doing our best. I mean, I think it really takes,
that's a cultural change. A lot of the actual change in approach of incident
analysis and incident response is a cultural change. And I can speak from firsthand experience
that that's really hard to do, especially from the inside. It's very hard to do. So luckily with
the role that I'm in now at Jelly.io, get to kind of support those folks who are trying to champion
a change like that internally. And right now, my perspective is just trying to generate as much
material for those folks to send internally to say like, hey, there's a better way. Hey,
there's a different approach for this that can maybe get us around these things that are difficult. I do think that there's this
tendency, and I've used this analogy before, for us to think that our junk drawers are better than
somebody else's junk drawers. I see an organization as just a junk drawer, a drawer full of weird odds
and ends and spilled glue and like a broken box of tacks. And when you pull out somebody else's junk drawer, you're like,
this is a mess. This is an absolute mess. How can anyone live like this? But when you pull out your
own junk drawer, like I know there are 17 rubber bands in this drawer somehow. I am going to just
completely rifle through this drawer until I find those things that I know are in here. Just the
difference of knowing where our mess is, knowing where the bodies are buried or the skeletons are in each closet, whatever analogy works best.
But I think that some organizations have this thought process that by organizations, I mean executive leadership organizations are not an entity with an opinion.
They're made up of a bunch of individuals doing the work that they need to do.
But they think that their problems
are harder or more unique than at other organizations. And so it's a lot harder to
kind of help them see that, yes, there is a very unique situation. The way that your people work
together with their technology is unique to every single different organization, but it's not that
those problems cannot be
solved in new different ways. Just because we've always done something in this way does not mean
that is the way that is serving us the best in this moment. So we can experiment and we can make
some changes, especially with process, especially with the human aspect of things of how we talk to
each other during incidents and how we communicate externally during incidents. Those aren't hard
coded. We don't have to do a bunch of code reviews
and make sure it's working with existing integrations
to be able to make those changes.
We can experiment with that kind of stuff.
And I really would like to try to encourage folks to do that,
even though it seems scary, because incidents are...
I think people think they're scary.
They're not. They're kind of fun.
They seem to be. For a lot of folks, they are. Let's not be too dismissive on that. We were both talking about panic and the panic
that we have felt during incidents. And I don't want to dismiss that and say that it's not real.
But I also think that we feel that way because we're worried about how we're going to be judged
for our involvement in them. We're panicking because, oh no, we have contributed to this in some way.
And the fact that I don't know what to do,
or the fact that I did something
is going to reflect poorly on me,
or maybe I'm going to get fired.
And I think that the panic associated with incidents
also very often has to do with the environment
in which you were experiencing that incident
and how that is going to be accepted and discussed.
Are you going to be blamed regardless discussed is, are you going to
be blamed regardless of how, quote unquote, blameless your organization is?
I wish there was a better awareness at a lot of these things, but I don't think that we
are at a point yet where we're there.
No.
How does this map what you do day to day over at Jelly.io?
It is what I do every single day.
So, I mean, I do a ton of different things. We're a
very small startup, so I'm doing a lot. But the main thing that I'm doing is working with our
customers to tackle these hurdles within each of their organizations. Our customers vary from very
small organizations to very, very large organizations and working with them to find how to make movement, how to sell this internally, sell this idea of
let's talk about our incidents a little bit differently. Let's maybe dial back some of the
hard-coded automation that we're doing around response and change that to speaking to each
other, as opposed to we need 11 emails sent automatically upon the creation of an incident
that will automatically map to these three pager duty schedules.
And a lot more of it can be us working through the issue together and then talking about
it afterwards, not just in reference to the root cause, but in how we interfaced, how
did it go?
How did response work as well as how did we solve the problem of the technical problem that occurred
so i kind of pinch myself i feel very lucky that i get to work with a lot of different companies
to understand these these human aspects and the technical aspects of how to do these experiments
and make some change within organizations to help make incidents easier that's the whole feeling
right we were talking about the panic it doesn't need to be as hard as it feels sometimes. And I think that it can be
easier than we let ourselves think. That's a good way of framing it. It just feels on so many levels,
this is one of the hardest areas to build a company in because you're not really talking about fixing technical broken systems
out there. You're talking about solving people problems. And I have some software that solves
your people problems. I'm not sure if that's ever been true. Yeah, it's not the software that's
going to solve the people problems. It's building the skills. A lot of what we do is we have software
that helps you immensely in the analysis process and
build out a story as opposed to just building on a timeline trying to tell kind of the narrative
of the incident because that's what works like anthropologically we've been conveying information
through folklore through tales telling tales of things that happened in order to help teach people lessons is kind of how oral history has worked for thousands of years. And we aren't better than
that just because we have technology. So it's really about helping people uncover those things
by using the technology that we have, pulling in Slack transcripts and PagerDuty alerts and
Zoom transcripts and all of this different information that we have available to us and help people tell that story and convey that story to the folks that
were involved in it, as well as other people within your organization who might have similar
things come up in the future. And that's how we learn. That's what we teach, but that's what we
learn. I feel like there's a big difference. I'm understanding there's a big difference between
being taught something and learning something because you usually have to earn that knowledge when you learn it.
You can be taught something a thousand times and then you learn that once.
And so we're trying to use those moments that we actually learn it where we earn that hard
earned information through an incident and tell those stories and convey that.
And our team, the solutions team is in there helping people build these skills, teaching people
how to talk to each other and really find out this information during incidents and then after them.
I really want to thank you for being as generous with your time as you have been.
If people want to learn more, where's the best place to find you?
Oh, I was going to say Twitter, but...
Yeah, that's a big open question these days, isn't it?
Assuming it's still there at the time this episode airs,
it might be a few days between now and then.
Where should they find you on Twitter with a big asterisk next to it?
It's at TheMortalEmily,
which I started this by saying I like mess,
and I'm someone who loves incidents, so I'll be on Twitter.
We're there to watch it all burn.
Oh, I feel terrible saying that.
Actually, if any Twitter engineers are listening to this,
someone has found that the TLS certificate is going to expire at the end of this year.
Please check Twitter for where that TLS certificate lives so that you all can renew that.
Also, Jelly.io, we have a blog that a lot of us write.
Our solutions team, and honestly, a lot of us, we tend to hire folks who have a lot of
experience in incident response and analysis.
I've never been a solutions engineer before in my life, but I've done a lot of incident
response.
So we put up a lot of stuff and our goal is to build resources
that are available to folks
who are trying to make these changes happen,
who are in those organizations
where they're still doing 5Ys and RCAs
and are trying to convince people
to experiment and change.
We have our How We Guide,
which is available for free.
It's How We Got Here,
which is like a full free incident analysis guide
and a lot of cool blogs and stuff there. So if I'm on Twitter, we're writing things there.
We will, of course, put links to all of that in the show notes. Thank you so much for your
time today. It's appreciated. Thank you, Corey. This was great.
Emily Roop, Solutions Engineer at Jelly.io. I'm cloud economist Corey Quinn, and this is Screaming in the Cloud.
If you've enjoyed this podcast, please leave a five-star review on your podcast platform of
choice. Whereas if you've hated this episode, please leave a five-star review on your podcast
platform of choice, along with an angry comment talking about how we've gotten it wrong,
and it is always someone's fault. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point.
Visit duckbillgroup.com to get started.