PurePerformance - Is The Practice of Practice the better Gameday with Matt Davis
Episode Date: February 27, 2023How do you prepare yourself for the next incident? Not at all? Are you running game days where you simulate incidents? Or are you following the steps of good musicians who are constantly practicing wi...th their band members to always be best prepared for the next big gig!Tune in and hear from Matt Davis, Specialist in Learning from Incidents, how he runs weekly continuous practice and learning sessions with DevOps, SREs, Developers, Marketers or Technical Writers and what the outcomes are.Matt is a regular presenter at conferences. You can meet him at SRECon Americas 2023 where he talks about “Human Observability of Incident Response” Here the other links we discussed during the podcast:Practice of PracticeRivers of OppositesVarieties of WorkFollow Matt on TwitterConnect on LinkedIn
Transcript
Discussion (0)
It's time for Pure Performance!
Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson.
Hello everybody and welcome to another episode of Pure Performance Because I didn't think of anything stupid to say, although I might be making them happy because I didn't think of anything stupid to say.
And this is already stupid enough.
And if I keep going with this, it's going to become more and more stupid.
And therefore, I will have met the audience's expectations.
I will have performed as they expect me to. So Andy, how are you doing today?
Really good. you doing today really good so you're telling me that all these episodes all these five years these
175 episodes we've done so far you always thought about the opening before you did the opening
not always but often if you can believe it or not so just so people understand what i think
you're alluding to andy is a lot of times the the most banal asinine things that I'm saying were planned out.
That's how not funny I am.
Yeah, cool.
Interesting.
So you are, it's okay.
That's why I'm not on stage being a comedian.
Yeah, yeah, of course.
Although my dream is to go on stage and be the most unfunny comedian in the world.
But the jokes on the audience for thinking they're going to get a comedy show. I would just do that show after show
and annoy people and
basically get all my produce for free
from people throwing it at me.
Nice ripe tomatoes. Some eggs
and celery and whatever they throw at you.
Hey, should we try to figure out
if our guest today is funny
or not funny and tries to be funny or not?
Yeah, I guess we'll find out.
Sure. Let's do a different type of introduction Sure. Yeah, let's do a different type of introduction.
No pressure.
Let's do a different type of introduction.
Matt.
I don't know.
I don't know any jokes.
Okay.
I really don't.
Well, I think that's the end of the podcast then.
Thank you so much.
It was great having you and see you next time.
Andy, you were funny.
So we could also make one up right now.
Why did Matt cross the road?
To go learn some jokes.
Told you I'm not funny. Told you I'm not funny. Anyone who thinks I'm funny. So we could also make one up right now. Why did Matt cross the road? To go learn some jokes. Told you I'm not funny.
Told you I'm not funny.
Anyone who thinks I'm funny.
I got a whole series of cow jokes that aren't funny.
Anyway, we're way off topic today.
Today, completely.
Hey, Matt, before I let you introduce yourself,
I just want to say something that I just read on your LinkedIn profile.
And it says, Matt Davis, he is learning.
He's learning from incidents specialist.
That means you're specialized in learning from incidents.
And I think that alone, I think it's worth applauding.
Because if you're a specialist in learning from incidents, then you should teach a lot of people what we can learn from incidents.
Because I think that's a big, big topic.
Matt, before we get deep into the topic, can you's that's a big big topic um matt before we
get deep into the topic can you just introduce yourself a little bit to the audience we know
you're not a comedian but who are you i'm not i'm i'm more of a musician than a comedian um
and i've spent a lot of time doing music but my career is in site reliability engineering.
And let's see, I've kind of entered this career through a history of being in data centers.
So I was a rack monkey.
So I racked and stacked servers in the 90s.
And I worked on hardware systems. In fact, all the way up until like 2013, I was in hardware companies that had hardware,
bare metal data centers,
and then started getting into the cloud
around the same time,
because the company was at,
we wanted to do a migration from bare metal to cloud.
And then ever since I've just been really interested in,
yeah, how do we operate things in the cloud when software has become so complex?
Things aren't as simple as go into the data center and reboot servers.
Those days are gone.
And complexity really interests me.
And this is where my music comes in a lot too mm-hmm yeah I should just clarify for a moment there because Matt and I both
have you know music in common but the reason why I'm funny or I try to be
funny is because my original instrument was drums so I had to make up for it
with bad jokes do you always always do the all the time?
Yeah.
That's the first thing you're taught
as a drummer is to play that.
Anyway, I just needed to clarify
and make a self-deprecating drummer joke.
So anyway.
So Matt, before we go
into making
and keeping systems in the cloud reliable
and resilient,
is thinking back about the way you started with physical hardware.
Is the way you are treating
an incident and how you approach an incident, has that changed from hardware
incidents to software incidents?
Wow, yes, it has.
We're removed from the hardware these days.
So, you know, back when hardware was part of what you thought about as a sysadmin, or back then, the equivalent to an SRE,
we were called technical account managers.
And you always think about the hardware.
You're always thinking, you know, you helped install the hardware.
So you have some of your hands on the hardware and you know that, oh, well, this broke.
Well, let's go.
Maybe we need to replace memory.
It's a much different way of thinking about resilience.
Hardware is hardware and hardware fails. A concept like mean time to
recovery for a piece of hardware looks different than what it looks like today, which is
kind of completely blown apart in the cloud. Things have become so complex, you've got
abstraction layers in front of you instead of
knowing that you just touched this hardware last week. You don't even know what the hardware is.
You can't see anything happening. Even if you go into a server and you run code on a server,
you can't tell what that code's doing. You're not actually seeing the code
run. And so when we're in the cloud, just take that concept. I can't actually see code run on
a computer and then put many layers and networks, other computers, other processes in between.
And it's almost like we're blind to everything now. So how do we deal with resilience and
reliability when it feels like we're blind?
Well, I guess this is where observability comes in and plays a big
role. I got another question, though, before we go into the current world.
We had a couple of episodes talking about chaos
engineering, where on purpose you're killing pods, you are simulating network latency.
When you were in the data centers, did you do something similar to chaos engineering?
Did you actually plug out a cable or shut down a machine and try to figure out if the whole system that you are,
the services that you're providing is still resilient
or were you never that bold?
We did, but not in a scientific way.
I think that's the big difference.
Chaos engineering, it has developed over the past five, six, seven years
to be a very scientific thing.
There's a very specific way to go about it. There's a very specific
way to measure it. There's a very specific way to
think about it. Now, back in the data center days,
yeah, we were doing that, but it was more like
we just plugged in a dual power supply for this RAID array.
Let's power up the RAID array and let's pull out the plug and make sure that the dual power supply is working.
You know, a lot of equipment back then, and today it still does, has dual power supplies that are both plugged in, but they may not necessarily both be operating.
Things like we used to have dual switches at the top of racks,
and so we would unplug a switch,
make sure that the other switch took over the traffic.
It was very informal, but we definitely did it.
Nowadays, chaos engineering has become a lot more formal.
Yeah. I like the definition it became now this is like a almost a scientific based practice yep yeah i think it's interesting
too because back in the days you're talking about you knew that you had that second switch you knew
that you had that second power to your point now that you're in the cloud, you have no idea what redundancy exists.
So part of finding out is,
let's take something down and see what happens
because we don't know what's there maybe.
You can maybe explain it.
It's all part of the...
The interesting thing is that
hardware manufacturers are actually taking that as an advantage.
So a hardware manufacturer may actually decide, or I should say maybe a vendor
buying a piece of hardware may decide, no, we don't need the dual power supply because
if this blade goes down, we've got 40 others in the rack that can take over.
We don't really care to spend money
on dual power. And I think spending the money, I think, is part of the key too.
Hey, Matt, switching gears a little bit, when I reached out to you, I actually came across a
couple of your presentations that you did at
ObservabilityCon.
You're talking about learning from incidents.
And obviously that's what you also have in your subtitle on LinkedIn,
learning from incident specialists.
But then just a couple of hours ago, before we actually started recording,
you said, hey, happy to talk about the talk that I gave.
But there's a subtopic that is much more interesting for you, at least,
and you think probably also for the audience,
which is around the practice of practice.
And you also gave us a couple of links.
Folks, by the way, if you're listening to this,
all the links we talk about,
like there's two blog posts that Matt wrote and published
and we will have them
in the description of the
podcast. But for me,
I started skimming through the blog
post and I
thought it's really cool stuff
that you talk about. Practicing
of practice. Can you enlighten us
for those people that have not read
yet read your blog post and
now as intrigued as i am to learn more yeah yeah sure um the idea of the practice of practice
it's a it's a term that's taken from music um there's a guitarist who is pretty famous, a British guitarist.
His name is Derek Bailey.
Pretty famous for improvisation in kind of the 20th century mostly.
And he studied improvisation, and he was a free improviser. So he, in other words, he didn't really, even though he was in the jazz area of music, he didn't play straight ahead jazz. He played free improvisation interviews that he did with other improvisers,
was that there's two ways to think about what we're doing. There's the theory of practice,
which is like going into the practice room, practicing your scales, learning new notes on
the instrument, becoming one with your instrument, however you want to call that.
That's kind of the theory of practice that's not together with anyone else. You're on your own doing that. But then there's the practice of practice. And the practice of practice
is where you take what you've taught yourself or what you're learning in your lessons or
what you're doing in the practice room and you're doing in the practice room, and you bring it to the band.
And I always use the example of a jazz band just because it doesn't necessarily have to be jazz,
but everyone is fairly familiar with jazz as a musical genre,
but especially jazz improvisation.
These players, they come in, they're part of this band,
and they don't go to the gig for the first time and have a performance,
and that's the first time they've played together.
I mean, I'm not saying that doesn't happen,
but for the most part, they practice together.
And they're not practicing scales.
They're not practicing how to play jazz.
They're practicing playing together as a group. And the whole idea here is that when an improvisation
ensemble practices together as a group iteratively over and over, when they get to the performance,
it's just another time that they've gotten
together to practice. They're completely familiar with each other. They understand some of the
signals that are being used. They have some reciprocity with the other members of the band.
They kind of know how the flute player likes to end phrases,, or they know that the sax player will give signals
by winking their eye or something, or shifting around in their chair. As we get together to
practice the practice of practice, we learn these things about each other so that when we get into
the performance, we get to the gig, it's not a big deal. We're not as nervous. We don't go into
the show thinking, oh, I don't know what I'm going to play, because we've practiced together.
Even though it's improvisation, there's going to be ambiguity. There's going to be questions
unanswered. There's going to be things that we have to discover. But we've learned to do that together by practicing improvising together over and over again.
So take that whole concept from a jazz band
and apply that to a team that is responsible
for the reliability of your system.
Right.
You don't want that team to have the first time
they get together to work
be under the production pressure
of an incident.
That's probably the worst time
to introduce people to each other
when there's ambiguity
like that. So the idea
behind the practice of practice in
technology is that
we do what the jazz musicians do.
We get together every week and we have fun.
We practice practicing.
So we do things like playing games.
There's a fairly famous sort of game that came out of a large company that starts with the G.
And it's called The Wheel of Misfortune. And this is something that I've used in other
companies. I took the concept of the wheel of misfortune. And if you don't know what I'm
talking about with a wheel, it's like a carnival wheel. You know, you spin the wheel and you land on space.
So now I would have a wheel here with the team that came in to the session. And on the wheel,
we would write like the services, or maybe we would write an integration partner, or maybe we
would write some kind of keyword in the spaces on the wheel,
and then we'd spin the wheel. And then whoever's turn it was would then have to
tell us what they know about this thing. And if they don't know enough about it,
we ask, well, how would you find out? Show us. Share your screen. Show us what you would search for.
And the cool thing is that you get people who aren't experts that will land on a topic,
and they don't know what, they don't have any idea.
They're like, where would I look?
Okay, well, maybe I'll go first search in the wiki.
So, okay, go ahead, share your wiki page.
Show us what you're searching for.
And then we also will have an expert do the same thing
and then the expert is like okay first i go here share my screen i'm going to show you this i'm
going to show you this here's all the repos oh and here's a diagram i built and it's this way to kind
of like without it's it's the kind of stuff that we would want to know in an incident but we don't
have that production pressure on top of us so we we can spend time to dig into, you know, this has happened.
Oh, hey, Craig, you mentioned just now we were looking at this code.
I noticed in the code, what is that?
Oh, and, you know, we would do that.
We would dig through the code or we would think,
oh, that's part of this in the application.
Let's go look at our software and I'll show you exactly what this code matches to the software. And, you know, we just, it's this
other dimension of learning about the system that showed up in incidents all the time. People,
I'm thinking of one instance where we did that wheel of expertise.
That's what I called it.
We landed on this one system and we,
we,
we spent 45 minutes or an hour digging into this one system.
And it was just like,
you know,
we had an expert.
So they were like,
Oh yeah,
this is,
this is not very well known.
This is where this goes.
It's encrypted here.
And this is the entry point for the API
that does that. So we get to learn all of that. Well, guess what? Not more than two weeks later,
we had a major incident about that very system. And the responders got the incident solved quicker,
meaning they mitigated it faster because of the practice
of practice session we had. It was a direct one-to-one thing. Now, that doesn't always happen
because we're also learning reciprocity and empathy in these sessions. It's not like we're
learning specific things, but in this case, we had learned a specific thing that came to light and was usable during an incident. It was
marvelous. I mean, for me, the amazing thing about this is, you know, it sounds so obvious
to do something like this if I listen to you, right? But I don't know how many organizations
actually take the time and do it that way i mean if i look at development right
the practice of peer programming i think does something like this because you have kind of i
don't want to just for the lack of of better terms um like an apprentice more like a junior developer
maybe and then next to you to have a senior developer and so the one basically gives them
feedback on how the senior developer would do something, but still watching the junior developer.
But then also maybe switching around so that the junior directly sees how the senior is doing it.
But the way you explain it, I think, is even more interesting because at the time of an incident, right, it is possible that you don't have the experts at all available anymore because it might be in the middle of the night.
Somebody's on vacation.
Exactly. anymore because it might be in the middle of the night somebody's on vacation somebody exactly you practice this a lot where you are constantly sharing knowledge about a part of
the system but also what is your thought process on finding the right information to actually solve
the problem i think that alone is also really amazing because all of all of us maybe have
different ways how we deal with the situation, how we find something new.
I may go to Slack first and
I always look into our internal Slack for
keywords because maybe somebody
else had already a discussion about it. I never
look into other systems, but maybe somebody else is
going to a different system that has more information
and I don't think about it.
So eye-opening.
Pretty cool.
Yeah, I wondered... Go on.
Go ahead, Matt. Sorry.
Go ahead, Brian.
No, no. You had a response to Andy. I have a similar topic, but I wanted to... Yeah.
Well, I was just going to echo that sentiment of it feels so obvious.
Why wouldn't we do this? You know, it's kind of like the same question about on-call training.
It seems obvious that you should train people to be on-call,
but that also doesn't happen very much.
Yeah.
I wonder if this also exposes another weakness in the team.
We've been exploring these concepts of competency levels.
So if you have four competency levels, unconsciously incompetent, consciously incompetent, unconsciously
– no, I got them wrong.
Unconsciously incompetent – I gonna get him wrong either way they run the gamut of I don't
know what I'm doing and I don't know that I know what I'm doing all the way up to I know everything
that I'm doing without knowing how I'm doing it someone who's an expert someone who's an expert is often unconsciously competent because they know it so well they
couldn't train someone else to do it because they're like, what do you mean? You just do it,
right? If you think maybe like in your jazz thing, who knows what it'd be like to suddenly drop in
and start practicing with Miles Davis, but is he even going to be giving you any of the cues
because he just assumes everyone knows, right? He's someone who is just so in it, they don't think about it.
Another good example is you very often in, say, sports, you very often see the home run
king being a coach of a baseball team in the future because they just do what they do.
It's oftentimes the third baseman, shortstop, someone who is watching the whole game the entire time who
really has an idea what's going on so in the situation that we're talking about here when
you're talking about the the novice versus the expert do you ever find situations where the
expert comes on and let me show you how i do it and then they almost pause or have to figure out what they're doing because they
maybe don't know how they're doing it and have a hard time communicating like some of
this deep embedded knowledge that they wouldn't even think was deep embedded knowledge is
like well how do you know because that's the way it is well how do you know that's the
way it is i don't know right right right right yeah does that ever get exposed in those, have you ever seen that happen in these situations?
Oh yeah, yeah, for sure.
This is the really cool thing about focusing on the work.
So that's one of the things about this session that I try to underline,
is that we don't go in there and talk about theoretical things.
I mean, we do.
We don't leave that out.
And we talk about philosophies of resilience and all that kind of stuff.
It's just kind of part of the subject matter.
But we try to focus on non-hypothetical stuff.
We try to focus on the work as done.
If you're not familiar with the concept of work as done,
Stephen Sherrock works in aviation human factors.
And he has a great blog out there that I can, I can share the link with
you later. Um, and it's called the, I think it's called the varieties of work. Um, and work as
done is one of these varieties of work. Another to contrast that work as imagined is another variety of work. So if you think about work as imagined,
that might be a run book.
That might be a set of prescriptions.
That might be a process or a procedure.
Those are all things that are work as imagined or work as defined,
or there's a lot of different versions of that,
but they're not work as done.
When we perform work as done, it doesn't look like the work as prescribed. It doesn't look
like the work as imagined. It's different. It's different because we make local adaptations.
We do things like exactly like what you're talking about. Our intuition comes into play.
Work as imagined can't account for intuition. That's where work as done becomes really important.
So, when we have that in mind, that is what helps those experts start to dig into that stuff.
I'm a cuber, and the Rubik's Cube is a perfect example
of the kind of muscle memory that happens with this.
Same thing when I'm sitting here.
If I were to try to show you a move on the cube, I won't do it right
now because I'll mess it up. But if I tried to show you a move, I would have to slow myself down.
And then as I slowed myself down, I would have to think, wait, what did I do? Do I turn right here?
Or do I do two? I don't quite remember. But if I back out and I just let my muscle memory take over, it just happens.
And that's what happens to experts.
They have this intuition that gets built through their becoming an expert.
So when they get to this point of, well, show me this move, they'll do exactly what you're saying.
They'll be like, well, I can't tear this apart.
I don't know how.
And that's where when we try to dig into those,
the entry point into helping these experts
dig into how to help others figure out what they know
is to look at the actual work they're doing.
That's the entry point.
And that's why it's important for us to share what we're doing. That's the entry point. And that's why it's important for us to share what we're doing.
Because then we get to see, even if the expert doesn't know that we're seeing,
we get to see some of their thought process. And we get to see what we're really doing is we're
getting a piece of their mental model, and we're getting that mental model shared, and that just enhances everyone's mental model together.
So it's not easy necessarily, but if you really look at the work as done, and I really do mean specifics.
Like I was saying, where in the code is this thing that you're talking about?
Like, show us that where in the code is this thing that you're talking about? Like,
show us that thing in the code. Or actually step through your thought process of where you go first when you are paged. Like, we know that you do something automatically, but let's sit here
as a group and let's break down exactly the regular work that you do at each step of yourself getting paged.
And that's what helps illuminate that intuition.
It sort of helps the expert kind of declaratively bring their intuition out into the open, I guess.
Good question.
I'm still processing all this.
It's really cool.
But I have a question for you.
In the past, we talked with people on the podcast that talked about game days, running game days, where you basically bring the system into a certain state.
Like you're actually simulating an incident and then figure out how you can solve it.
It feels though, while you have a game, you gamify the whole thing a little bit,
but it's still something different what you're explaining here, right?
Because you're not necessarily talking about, let's simulate an incident and how we will respond to it.
It feels like you're more talking about, let's learn, let's just all of us get better overall
in the environment that we are working in and with the components.
So if an incident happens, we have more knowledge about it and we're more comfortable doing the right things.
Do we get this right or are they still the same?
I think that chaos engineering, especially game days, because the practice of practice session does sound a lot like a game day.
And in fact, you could run a game day as part of this thing.
But the goal, I running the experiment itself. It's almost more beneficial to go through all the procedures and steps and
discovery and learning that you have to do in order to create the experiment in the first place.
And that's kind of where it crosses paths with the practice of practice.
Mm-hmm. paths with the practice of practice. That's the area that we're entirely focused on in practice practice.
And we, sometimes we may take that.
We had done some chaos engineering types of exercises in that session.
And we talked about doing more.
But the goal isn't necessarily to bend the system.
I mean, that's what chaos engineering does.
It puts,
it pushes,
pushes pressures around the edges of the system.
Kind of let's poke the system and see where it's fragile.
That's a real benefit to it.
But in,
in light of the practice of practice,
the more beneficial part of it is how we work together and how we,
how we extract understanding from the system.
Yeah.
Yeah.
Does that,
does that make sense?
Yeah,
it makes a lot of sense.
And,
and,
and because if the,
if I can kind of repeat what I just learned,
you would say a classical game,
they will be, let's say I
slow down a database and then we're just focusing on this particular incident and that's great. And
if we're lucky and then this exact problem happens a week later, then we are lucky and we can fix it
faster. But we haven't learned a lot about the whole system.
So if we now learn a lot about the system in general,
we can not only deal better with that particular problem,
but we can deal with many other problems in a much better way
because we have a better understanding how the system works,
where it may fail.
We also know how different people in the team work.
We know who to go to, right right in case we still we have a
question even under pressure i think that's the that's the big difference yeah you're getting yeah
i think the and that's and that's the thing that gets people more excited in my mind. In my experience,
people want to get together. They want this.
It doesn't feel as...
Let me put it this way.
We may go into a chaos engineering type of investigation to try to eke out some
expertise. We may not finish. We may get partway through and then we may decide, oh, well, we're
learning too much about this other area. Let's keep going. The thing about your example that struck me was the difference between, okay SREs know how to operate the configuration that, you know,
maybe if this incident that were this hypothetical incident happened
and the expert isn't available, maybe the expert was there in the chaos test,
but they're not available for the regular thing. So it's like, well, how did you extract the knowledge from that expert when you
did the testing in the first place? So that's a step that can't be missed because you're exactly
right. It doesn't have as much to do with what happens as it has to do with how we respond to it that's the key yeah
and and to bring another analogy i know you like the analogies with music but i want to bring a
sports analogy um if you're praying if you're a football team and i'm talking about soccer now so
european football right and if if if i if i only practice let's say one one
situation where you have a free kick and you know exactly if i'm standing at this position i know
exactly where i need to hit the ball because then i i know the best player in the front he can he
can strike the ball but that doesn't teach you how all of your players are actually reacting who is
fast who is slow where can i anticipate that person to be in a certain situation?
I think that's also the other big piece
and where team understanding the strengths and the weaknesses
of your team members is so important.
And also how they act and how they react to certain situations.
Because then if somebody needs to orchestrate everything, in the in the football game it's it's just typically one one person that is kind of
like he orchestrates the game it's like also in american football right the quarterback obviously
knows exactly who is running where at which particular move but um this is i like this a lot
i think it's helped your explanation helped me a lot to understand the difference
between chaos engineering game days, where you are simulating
typically a particular problem and then you try to fix it as fast as possible,
obviously learn from it, but versus the practice of practices,
we all elevate and get better.
And therefore, even statistically, we will be better in fixing any type of incident that comes our way.
Yeah. And one of the things that's really
important about doing this kind of session,
in fact, at my last
gig, we did this session and it was called
Practice of Practice gamelan uh this was just
you know practice of practice itself that's the concept um but we actually called the session at
the company practice of practice gamelan now if you're not familiar with what a gamelan is that's
an indonesian percussion orchestra It's a traditional orchestra.
It's made up of all these
gongs and percussion instruments
and xylophone-looking things.
It has some woodwinds in it. It may
even have a singer or two.
But the thing about the gamelan
is that when they get together to practice,
they actually
use improvisation
every time. So they'll come up with a song,
just the melody. And then they'll get together as a group, and they'll practice the song,
and they improvise a rhythm or a harmony to go with it. So next time they get together,
they don't rehash what they already did.
They take what they did, and then they improvise some more, and they develop it.
And then after several different practices and meetings, the gamelan will have then written a piece.
And then that's the piece they take into the concert.
But even in the concert, they will improvise and revise.
And the idea here is that it's iterative.
So this is another difference between chaos engineering.
Chaos engineering tends to be one thing, like you were kind of describing.
And really, that's how you want to do it.
You want to keep your blast radius small.
You want to keep your changes to a minimum,
you know,
all that kind of stuff with practice of practice.
As we're iterating,
you know,
we,
we were there every week and we,
we,
we get together as a team.
We learn about each other's personalities.
We're building on the last time that we met so that every time we meet,
the team gets stronger and stronger. And that's what we, that's what I've experienced with this
session. You get people start to like it and then they'll start to come over and over again,
the same people. And by the way, I didn't mention this, the whole company is invited to this. This isn't
just for engineers. It's not just for SREs.
It's not just for the people who respond to incidents.
All the time, I would
have people from the marketing team come in. People
from the customer support team come in.
I had someone from the technical writer team come in
and participate in these things.
And so it's a growth function.
It's not something that you just do once and you're trained
and then you don't do it again for a couple months.
The way that it works is that it's iterative
and it grows fantastic i'm i'm speechless yeah i need to process all of this no i mean i love all
the music analogies and so i work on the cell you know solution sales engineering side of the house
and it's oh cool i always put it in the same type of
terms for people when we're onboarding them especially when i have someone else who's
a musician right when you're learning let's say just how to do a demo it's like all right you
need to know the demo inside and out but when you get in front of people you're gonna improvise but
you have to have those core fundamentals you need to know you know right the core changes you need to know the structure this and that
but once you know that you can go in and it also reminds me of you know the
difference between like a band and a singer-songwriter right I tend to like
bands much better be then sing you know individual artists because when you have
an actual band making music, and it's my own
experience being in bands too, someone brings in an idea and the rest of the people that
you're working with that you now know and you choose to work with enhance that idea
to become something much greater than it would have been on its own.
And a similar thing comes into these aspects.
You talked about bringing marketing in. If you bring marketing into
this, they're going to come in with a completely different viewpoint
of all this that maybe no one has ever even considered.
Whether that's the chaos engineering side or just even the wheel of misfortune.
Because people might not be considering, oh, I should look up how this might impact
sales.
Right? But now with the marketing person there, a new aspect comes into it.
That, yeah, we're not just doing this for the sake of making sure our systems run.
We're making sure our systems run so we can sell our product.
Right? And now that we have this other person there, we're getting more and more perspective and everything's becoming more and more rich um so yeah at least for people like me like you for people who are you know other you know musicians or anything where there's some
sort of teamwork i think this kind of you know andy was doing you know a soccer analogy analogy
sorry football analogy um same concept supply right the quarterback isn't going to know oh
the person i was supposed to throw to isn't open right now. But I know Tommy over there is going to look back in three, two, one,
because I just work with them and we've done this so much
that they know to turn around and look to make sure everything's going right.
My backup plan is to go to that one.
And then after that is this one,
because you've done these things so many times together.
That practice is really, really important.
But as Andy said earlier,
it seems so obvious, but who talks about it?
And I think there's a lot of really obvious things in our industry,
and not even just our industry,
but things that can be applied everywhere,
that are not obvious.
Well, they seem obvious,
but they're obviously not
because it's not stuff we're doing.
But when you hear it, you're like, well, duh, of course.
But pause for a moment and ask yourself, are you doing that?
When someone asks you a question, even when it comes down to someone asking you a question,
are you thinking about answering what you heard?
Or I mean, are you thinking about answering that question?
Or are you first thinking about the question and what the question really is behind it?
And then are you thinking about your answer first? Right? It all goes into these different aspects of things there. But it's, yeah, it's a whole skill. It's a real, it almost goes into
soft skills, what you're talking about, too, because the ability to work with others, to work out all that stuff, goes into that. It's not just the tech side.
Yep. And I think that this is where
the industry has a problem.
By making the distinction that there
is a soft skill that's different from their other job.
You see
this a lot. Yeah, it's really important for you to develop your soft skills.
And your soft skills are different from your
hard skills and your technical chops.
I don't like to think of it that way.
There's this term socio-technical.
And it's become more widely used.
And in a very simple sense, it means exactly what you think it means.
It's the combination of society and technology.
And it's not two separate things, though.
So it's not society, sociological, quote, soft skills, and then technical, hard skills.
You can't separate them or you lose the system.
The system is not the system without those
things together. So to make distinctions like
work on your soft skills, that
doesn't make a lot of sense.
And then you also get companies that think, well, we can't
afford to let people work on their soft skills.
You hear people that use the Mythical Man Month, and you'll hear people say things like, this meeting is really expensive because they're counting people's hours.
And if you're counting people's hours, and then you're going, we need you to make sure that you're coding for all of these hours. And if you're counting people's hours and then you're going, we need you to make sure that
you're coding for all of these hours, you know, maybe you're doing, you know, agile and you're,
and you're dividing your team's capacity up or something, you know, well, part of that needs to
be the socio part of the socio-technical formula that always needs to be there. And that's,
I think that's something that the practice of practice brings. And I was really lucky in my
last gig that the company completely supported this thing. So it was not like skunk works. It
wasn't like this secret meeting that, you know, managers and execs didn't know about not at all it was it
was highly publicized at the company um we even did you know public blog posts and i've done talks
about it so it's it's something that companies are embracing but it's a little bit like chaos engineering in that it becomes a cost center.
You've heard this thing like operations is a cost center.
There's no revenue from this kind of work.
You can't look at an ROI in terms of a quantitative dollar. But when you look at the qualitative benefits that you get,
a more reliable system,
teams that are responding in more resilient ways,
that's more valuable almost than the cash is.
I just got to say how I think this is maybe our first podcast that got into deep
philosophical territory.
It's funny Matt, when you were talking about the soft skills bit, I was sitting there thinking
like, well, that's the same concept as you don't end at your skin, right?
You are your environment.
You can't be your environment without being you and all that.
And I was like, wow, this could go really, really, really deep.
But I think it's a really good point, right? Nothing exists in a vacuum. Everything is part of something.
And yeah, man.
Hey, obviously we could talk about this forever and ever. And there's also other blog posts
that you talked, that you gave us. But I want to highlight again, folks, if you want to
read more about this, then the blog post that is linked is called uh practice of practice uh gamelan right
do i pronounce this correctly gamelan yeah it's not game land even though it's well it's like
written like this if you read it yeah it's spelled like game land exactly yeah and um well matt i
want to say and i know there was a second topic that you proposed, but I'm just pointing people to the blog post.
It was about, you know, repeating incidents.
The blog post is called Rivers of Opposites that people should check out. And Matt, as you have obviously a lot of experience in this field coming from the old days where you were in data centers and making sure these data centers, the hardware works reliable.
And now for the last 10 years or so, kind of switch to the software and the cloud side.
I think we should do a follow up session at some point with whatever else comes your way.
Obviously, you're creating a lot of great content
based on your day-to-day experience.
Yeah, we should.
We totally should.
Yeah.
I should let people know that I am talking about this topic
at SRECon in March.
So that's in Santa Clara, California.
If you are going to SRECon, I will be speaking about the human observability of incident response there at the conference.
I think I'm speaking on the morning of the last day, which is Thursday.
So SRECon.
I'm also speaking at Southern California Linux Expo.
It's not the same talk.
I'm going to be talking about the same area, incident response.
But I'm going to be talking about actually building an on-call program at scale.
But both of those conferences I'll be at in March.
Awesome.
And is that second one, I think that's part of scale 20, right?
Yeah, this is the 20th year of Southern California Linux Expo.
Really exciting.
And we're back in Pasadena this year.
So that's going to be cool.
Yeah, I enjoy meeting people face-to-face again.
Yeah, I'm looking forward to it.
For sure.
I'm just getting over the fact that it's 20 years of a Linux
expo.
Isn't that crazy?
Yeah.
Wasn't it crazy
that yesterday
we talked about
the mainframe
at the time
of the recording?
It's like almost
60 years
of the mainframe.
Yeah.
More than 60.
64 was the big day
but it started
before that.
Quick side note, Matt. Did you know that IBM spent about, what was it,
$4 or $5 billion on the first mainframe and development and all that in 1964?
That whole project was like $4 or $5 billion in 1964 money.
Crazy.
That blew me away.
Yeah.
Even today, that would be huge.
But back then, gosh, anyhow.
Yeah.
Yeah. me away yeah that's even today that would be huge but back then gosh anyhow yeah yeah you know what
i love hearing about those early computing things is is um there were so many composers that were
helping develop those systems yeah like so many composers people actually don't realize how many
music people had a hand into the early days of
computing.
I think it's fascinating.
Interesting.
Yeah.
All right.
Well,
we are out of time.
Okay.
Anybody have any closing thoughts?
I mean,
the one thing,
the one thing I learned today,
the only thing I learned today,
and I joke about that,
but the,
the,
the most non sequitur thing I learned today and I joke about that but the most non sequitur thing I learned today is
that somebody who does Rubik's Cube stuff is called a cuber. I will take that with me. I was
like when you said cuber I was like oh my gosh didn't know there was a name. T-I-L. Yeah. Yeah.
What I learned from you but yeah. What I, Brian, because you opened up with the joke
that wasn't a joke or was a joke, but you didn't
really practice it, maybe you should
apply practice of practice
to your choking skills.
Maybe you and I
have to have more sessions on you.
Jokes.
We'll just have abuse sessions
with Andy where I abuse him over and over again.
When it comes to the real thing, it'll be like natural.
I can't abuse Andy.
That's the thing though, Andy.
You're too nice.
I've never seen your dark side.
Sure, it's the worst.
You're probably the kind of guy who goes from the nicest guy in the world to the virgin murder.
I'll just think that no matter what.
There's another thing I'm learning today.
I'm making that up about Andy
so everybody look out
anyway we're good
it's a weird day today isn't it
alright everybody I hope you enjoy
thank you all for listening
Matt look forward to having you back on
and enjoy your modules back there
it's one thing I can't wrap my head around too much
I try my best anyway thanks everybody I can't wrap my head around too much. But it's a whole different practice.
I try my best.
Yeah.
Anyway, thanks, everybody.
And we'll talk to you next time.
Bye-bye.
Yeah, thanks for having me.