PurePerformance - Is The Practice of Practice the better Gameday with Matt Davis

Episode Date: February 27, 2023

How do you prepare yourself for the next incident? Not at all? Are you running game days where you simulate incidents? Or are you following the steps of good musicians who are constantly practicing wi...th their band members to always be best prepared for the next big gig!Tune in and hear from Matt Davis, Specialist in Learning from Incidents, how he runs weekly continuous practice and learning sessions with DevOps, SREs, Developers, Marketers or Technical Writers and what the outcomes are.Matt is a regular presenter at conferences. You can meet him at SRECon Americas 2023 where he talks about “Human Observability of Incident Response” Here the other links we discussed during the podcast:Practice of PracticeRivers of OppositesVarieties of WorkFollow Matt on TwitterConnect on LinkedIn

Transcript
Discussion (0)
Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance Because I didn't think of anything stupid to say, although I might be making them happy because I didn't think of anything stupid to say. And this is already stupid enough. And if I keep going with this, it's going to become more and more stupid. And therefore, I will have met the audience's expectations. I will have performed as they expect me to. So Andy, how are you doing today? Really good. you doing today really good so you're telling me that all these episodes all these five years these
Starting point is 00:01:05 175 episodes we've done so far you always thought about the opening before you did the opening not always but often if you can believe it or not so just so people understand what i think you're alluding to andy is a lot of times the the most banal asinine things that I'm saying were planned out. That's how not funny I am. Yeah, cool. Interesting. So you are, it's okay. That's why I'm not on stage being a comedian.
Starting point is 00:01:36 Yeah, yeah, of course. Although my dream is to go on stage and be the most unfunny comedian in the world. But the jokes on the audience for thinking they're going to get a comedy show. I would just do that show after show and annoy people and basically get all my produce for free from people throwing it at me. Nice ripe tomatoes. Some eggs and celery and whatever they throw at you.
Starting point is 00:01:56 Hey, should we try to figure out if our guest today is funny or not funny and tries to be funny or not? Yeah, I guess we'll find out. Sure. Let's do a different type of introduction Sure. Yeah, let's do a different type of introduction. No pressure. Let's do a different type of introduction. Matt.
Starting point is 00:02:09 I don't know. I don't know any jokes. Okay. I really don't. Well, I think that's the end of the podcast then. Thank you so much. It was great having you and see you next time. Andy, you were funny.
Starting point is 00:02:20 So we could also make one up right now. Why did Matt cross the road? To go learn some jokes. Told you I'm not funny. Told you I'm not funny. Anyone who thinks I'm funny. So we could also make one up right now. Why did Matt cross the road? To go learn some jokes. Told you I'm not funny. Told you I'm not funny. Anyone who thinks I'm funny. I got a whole series of cow jokes that aren't funny. Anyway, we're way off topic today.
Starting point is 00:02:33 Today, completely. Hey, Matt, before I let you introduce yourself, I just want to say something that I just read on your LinkedIn profile. And it says, Matt Davis, he is learning. He's learning from incidents specialist. That means you're specialized in learning from incidents. And I think that alone, I think it's worth applauding. Because if you're a specialist in learning from incidents, then you should teach a lot of people what we can learn from incidents.
Starting point is 00:03:01 Because I think that's a big, big topic. Matt, before we get deep into the topic, can you's that's a big big topic um matt before we get deep into the topic can you just introduce yourself a little bit to the audience we know you're not a comedian but who are you i'm not i'm i'm more of a musician than a comedian um and i've spent a lot of time doing music but my career is in site reliability engineering. And let's see, I've kind of entered this career through a history of being in data centers. So I was a rack monkey. So I racked and stacked servers in the 90s.
Starting point is 00:03:46 And I worked on hardware systems. In fact, all the way up until like 2013, I was in hardware companies that had hardware, bare metal data centers, and then started getting into the cloud around the same time, because the company was at, we wanted to do a migration from bare metal to cloud. And then ever since I've just been really interested in, yeah, how do we operate things in the cloud when software has become so complex?
Starting point is 00:04:17 Things aren't as simple as go into the data center and reboot servers. Those days are gone. And complexity really interests me. And this is where my music comes in a lot too mm-hmm yeah I should just clarify for a moment there because Matt and I both have you know music in common but the reason why I'm funny or I try to be funny is because my original instrument was drums so I had to make up for it with bad jokes do you always always do the all the time? Yeah.
Starting point is 00:04:47 That's the first thing you're taught as a drummer is to play that. Anyway, I just needed to clarify and make a self-deprecating drummer joke. So anyway. So Matt, before we go into making and keeping systems in the cloud reliable
Starting point is 00:05:04 and resilient, is thinking back about the way you started with physical hardware. Is the way you are treating an incident and how you approach an incident, has that changed from hardware incidents to software incidents? Wow, yes, it has. We're removed from the hardware these days. So, you know, back when hardware was part of what you thought about as a sysadmin, or back then, the equivalent to an SRE,
Starting point is 00:05:39 we were called technical account managers. And you always think about the hardware. You're always thinking, you know, you helped install the hardware. So you have some of your hands on the hardware and you know that, oh, well, this broke. Well, let's go. Maybe we need to replace memory. It's a much different way of thinking about resilience. Hardware is hardware and hardware fails. A concept like mean time to
Starting point is 00:06:09 recovery for a piece of hardware looks different than what it looks like today, which is kind of completely blown apart in the cloud. Things have become so complex, you've got abstraction layers in front of you instead of knowing that you just touched this hardware last week. You don't even know what the hardware is. You can't see anything happening. Even if you go into a server and you run code on a server, you can't tell what that code's doing. You're not actually seeing the code run. And so when we're in the cloud, just take that concept. I can't actually see code run on a computer and then put many layers and networks, other computers, other processes in between.
Starting point is 00:07:01 And it's almost like we're blind to everything now. So how do we deal with resilience and reliability when it feels like we're blind? Well, I guess this is where observability comes in and plays a big role. I got another question, though, before we go into the current world. We had a couple of episodes talking about chaos engineering, where on purpose you're killing pods, you are simulating network latency. When you were in the data centers, did you do something similar to chaos engineering? Did you actually plug out a cable or shut down a machine and try to figure out if the whole system that you are,
Starting point is 00:07:42 the services that you're providing is still resilient or were you never that bold? We did, but not in a scientific way. I think that's the big difference. Chaos engineering, it has developed over the past five, six, seven years to be a very scientific thing. There's a very specific way to go about it. There's a very specific way to measure it. There's a very specific way to
Starting point is 00:08:16 think about it. Now, back in the data center days, yeah, we were doing that, but it was more like we just plugged in a dual power supply for this RAID array. Let's power up the RAID array and let's pull out the plug and make sure that the dual power supply is working. You know, a lot of equipment back then, and today it still does, has dual power supplies that are both plugged in, but they may not necessarily both be operating. Things like we used to have dual switches at the top of racks, and so we would unplug a switch, make sure that the other switch took over the traffic.
Starting point is 00:08:57 It was very informal, but we definitely did it. Nowadays, chaos engineering has become a lot more formal. Yeah. I like the definition it became now this is like a almost a scientific based practice yep yeah i think it's interesting too because back in the days you're talking about you knew that you had that second switch you knew that you had that second power to your point now that you're in the cloud, you have no idea what redundancy exists. So part of finding out is, let's take something down and see what happens because we don't know what's there maybe.
Starting point is 00:09:33 You can maybe explain it. It's all part of the... The interesting thing is that hardware manufacturers are actually taking that as an advantage. So a hardware manufacturer may actually decide, or I should say maybe a vendor buying a piece of hardware may decide, no, we don't need the dual power supply because if this blade goes down, we've got 40 others in the rack that can take over. We don't really care to spend money
Starting point is 00:10:07 on dual power. And I think spending the money, I think, is part of the key too. Hey, Matt, switching gears a little bit, when I reached out to you, I actually came across a couple of your presentations that you did at ObservabilityCon. You're talking about learning from incidents. And obviously that's what you also have in your subtitle on LinkedIn, learning from incident specialists. But then just a couple of hours ago, before we actually started recording,
Starting point is 00:10:40 you said, hey, happy to talk about the talk that I gave. But there's a subtopic that is much more interesting for you, at least, and you think probably also for the audience, which is around the practice of practice. And you also gave us a couple of links. Folks, by the way, if you're listening to this, all the links we talk about, like there's two blog posts that Matt wrote and published
Starting point is 00:11:05 and we will have them in the description of the podcast. But for me, I started skimming through the blog post and I thought it's really cool stuff that you talk about. Practicing of practice. Can you enlighten us
Starting point is 00:11:22 for those people that have not read yet read your blog post and now as intrigued as i am to learn more yeah yeah sure um the idea of the practice of practice it's a it's a term that's taken from music um there's a guitarist who is pretty famous, a British guitarist. His name is Derek Bailey. Pretty famous for improvisation in kind of the 20th century mostly. And he studied improvisation, and he was a free improviser. So he, in other words, he didn't really, even though he was in the jazz area of music, he didn't play straight ahead jazz. He played free improvisation interviews that he did with other improvisers, was that there's two ways to think about what we're doing. There's the theory of practice,
Starting point is 00:12:34 which is like going into the practice room, practicing your scales, learning new notes on the instrument, becoming one with your instrument, however you want to call that. That's kind of the theory of practice that's not together with anyone else. You're on your own doing that. But then there's the practice of practice. And the practice of practice is where you take what you've taught yourself or what you're learning in your lessons or what you're doing in the practice room and you're doing in the practice room, and you bring it to the band. And I always use the example of a jazz band just because it doesn't necessarily have to be jazz, but everyone is fairly familiar with jazz as a musical genre, but especially jazz improvisation.
Starting point is 00:13:22 These players, they come in, they're part of this band, and they don't go to the gig for the first time and have a performance, and that's the first time they've played together. I mean, I'm not saying that doesn't happen, but for the most part, they practice together. And they're not practicing scales. They're not practicing how to play jazz. They're practicing playing together as a group. And the whole idea here is that when an improvisation
Starting point is 00:13:55 ensemble practices together as a group iteratively over and over, when they get to the performance, it's just another time that they've gotten together to practice. They're completely familiar with each other. They understand some of the signals that are being used. They have some reciprocity with the other members of the band. They kind of know how the flute player likes to end phrases,, or they know that the sax player will give signals by winking their eye or something, or shifting around in their chair. As we get together to practice the practice of practice, we learn these things about each other so that when we get into the performance, we get to the gig, it's not a big deal. We're not as nervous. We don't go into
Starting point is 00:14:46 the show thinking, oh, I don't know what I'm going to play, because we've practiced together. Even though it's improvisation, there's going to be ambiguity. There's going to be questions unanswered. There's going to be things that we have to discover. But we've learned to do that together by practicing improvising together over and over again. So take that whole concept from a jazz band and apply that to a team that is responsible for the reliability of your system. Right. You don't want that team to have the first time
Starting point is 00:15:26 they get together to work be under the production pressure of an incident. That's probably the worst time to introduce people to each other when there's ambiguity like that. So the idea behind the practice of practice in
Starting point is 00:15:42 technology is that we do what the jazz musicians do. We get together every week and we have fun. We practice practicing. So we do things like playing games. There's a fairly famous sort of game that came out of a large company that starts with the G. And it's called The Wheel of Misfortune. And this is something that I've used in other companies. I took the concept of the wheel of misfortune. And if you don't know what I'm
Starting point is 00:16:19 talking about with a wheel, it's like a carnival wheel. You know, you spin the wheel and you land on space. So now I would have a wheel here with the team that came in to the session. And on the wheel, we would write like the services, or maybe we would write an integration partner, or maybe we would write some kind of keyword in the spaces on the wheel, and then we'd spin the wheel. And then whoever's turn it was would then have to tell us what they know about this thing. And if they don't know enough about it, we ask, well, how would you find out? Show us. Share your screen. Show us what you would search for. And the cool thing is that you get people who aren't experts that will land on a topic,
Starting point is 00:17:11 and they don't know what, they don't have any idea. They're like, where would I look? Okay, well, maybe I'll go first search in the wiki. So, okay, go ahead, share your wiki page. Show us what you're searching for. And then we also will have an expert do the same thing and then the expert is like okay first i go here share my screen i'm going to show you this i'm going to show you this here's all the repos oh and here's a diagram i built and it's this way to kind
Starting point is 00:17:34 of like without it's it's the kind of stuff that we would want to know in an incident but we don't have that production pressure on top of us so we we can spend time to dig into, you know, this has happened. Oh, hey, Craig, you mentioned just now we were looking at this code. I noticed in the code, what is that? Oh, and, you know, we would do that. We would dig through the code or we would think, oh, that's part of this in the application. Let's go look at our software and I'll show you exactly what this code matches to the software. And, you know, we just, it's this
Starting point is 00:18:10 other dimension of learning about the system that showed up in incidents all the time. People, I'm thinking of one instance where we did that wheel of expertise. That's what I called it. We landed on this one system and we, we, we spent 45 minutes or an hour digging into this one system. And it was just like, you know,
Starting point is 00:18:36 we had an expert. So they were like, Oh yeah, this is, this is not very well known. This is where this goes. It's encrypted here. And this is the entry point for the API
Starting point is 00:18:45 that does that. So we get to learn all of that. Well, guess what? Not more than two weeks later, we had a major incident about that very system. And the responders got the incident solved quicker, meaning they mitigated it faster because of the practice of practice session we had. It was a direct one-to-one thing. Now, that doesn't always happen because we're also learning reciprocity and empathy in these sessions. It's not like we're learning specific things, but in this case, we had learned a specific thing that came to light and was usable during an incident. It was marvelous. I mean, for me, the amazing thing about this is, you know, it sounds so obvious to do something like this if I listen to you, right? But I don't know how many organizations
Starting point is 00:19:40 actually take the time and do it that way i mean if i look at development right the practice of peer programming i think does something like this because you have kind of i don't want to just for the lack of of better terms um like an apprentice more like a junior developer maybe and then next to you to have a senior developer and so the one basically gives them feedback on how the senior developer would do something, but still watching the junior developer. But then also maybe switching around so that the junior directly sees how the senior is doing it. But the way you explain it, I think, is even more interesting because at the time of an incident, right, it is possible that you don't have the experts at all available anymore because it might be in the middle of the night. Somebody's on vacation.
Starting point is 00:20:22 Exactly. anymore because it might be in the middle of the night somebody's on vacation somebody exactly you practice this a lot where you are constantly sharing knowledge about a part of the system but also what is your thought process on finding the right information to actually solve the problem i think that alone is also really amazing because all of all of us maybe have different ways how we deal with the situation, how we find something new. I may go to Slack first and I always look into our internal Slack for keywords because maybe somebody else had already a discussion about it. I never
Starting point is 00:20:54 look into other systems, but maybe somebody else is going to a different system that has more information and I don't think about it. So eye-opening. Pretty cool. Yeah, I wondered... Go on. Go ahead, Matt. Sorry. Go ahead, Brian.
Starting point is 00:21:10 No, no. You had a response to Andy. I have a similar topic, but I wanted to... Yeah. Well, I was just going to echo that sentiment of it feels so obvious. Why wouldn't we do this? You know, it's kind of like the same question about on-call training. It seems obvious that you should train people to be on-call, but that also doesn't happen very much. Yeah. I wonder if this also exposes another weakness in the team. We've been exploring these concepts of competency levels.
Starting point is 00:21:46 So if you have four competency levels, unconsciously incompetent, consciously incompetent, unconsciously – no, I got them wrong. Unconsciously incompetent – I gonna get him wrong either way they run the gamut of I don't know what I'm doing and I don't know that I know what I'm doing all the way up to I know everything that I'm doing without knowing how I'm doing it someone who's an expert someone who's an expert is often unconsciously competent because they know it so well they couldn't train someone else to do it because they're like, what do you mean? You just do it, right? If you think maybe like in your jazz thing, who knows what it'd be like to suddenly drop in and start practicing with Miles Davis, but is he even going to be giving you any of the cues
Starting point is 00:22:42 because he just assumes everyone knows, right? He's someone who is just so in it, they don't think about it. Another good example is you very often in, say, sports, you very often see the home run king being a coach of a baseball team in the future because they just do what they do. It's oftentimes the third baseman, shortstop, someone who is watching the whole game the entire time who really has an idea what's going on so in the situation that we're talking about here when you're talking about the the novice versus the expert do you ever find situations where the expert comes on and let me show you how i do it and then they almost pause or have to figure out what they're doing because they maybe don't know how they're doing it and have a hard time communicating like some of
Starting point is 00:23:31 this deep embedded knowledge that they wouldn't even think was deep embedded knowledge is like well how do you know because that's the way it is well how do you know that's the way it is i don't know right right right right yeah does that ever get exposed in those, have you ever seen that happen in these situations? Oh yeah, yeah, for sure. This is the really cool thing about focusing on the work. So that's one of the things about this session that I try to underline, is that we don't go in there and talk about theoretical things. I mean, we do.
Starting point is 00:24:08 We don't leave that out. And we talk about philosophies of resilience and all that kind of stuff. It's just kind of part of the subject matter. But we try to focus on non-hypothetical stuff. We try to focus on the work as done. If you're not familiar with the concept of work as done, Stephen Sherrock works in aviation human factors. And he has a great blog out there that I can, I can share the link with
Starting point is 00:24:45 you later. Um, and it's called the, I think it's called the varieties of work. Um, and work as done is one of these varieties of work. Another to contrast that work as imagined is another variety of work. So if you think about work as imagined, that might be a run book. That might be a set of prescriptions. That might be a process or a procedure. Those are all things that are work as imagined or work as defined, or there's a lot of different versions of that, but they're not work as done.
Starting point is 00:25:27 When we perform work as done, it doesn't look like the work as prescribed. It doesn't look like the work as imagined. It's different. It's different because we make local adaptations. We do things like exactly like what you're talking about. Our intuition comes into play. Work as imagined can't account for intuition. That's where work as done becomes really important. So, when we have that in mind, that is what helps those experts start to dig into that stuff. I'm a cuber, and the Rubik's Cube is a perfect example of the kind of muscle memory that happens with this. Same thing when I'm sitting here.
Starting point is 00:26:21 If I were to try to show you a move on the cube, I won't do it right now because I'll mess it up. But if I tried to show you a move, I would have to slow myself down. And then as I slowed myself down, I would have to think, wait, what did I do? Do I turn right here? Or do I do two? I don't quite remember. But if I back out and I just let my muscle memory take over, it just happens. And that's what happens to experts. They have this intuition that gets built through their becoming an expert. So when they get to this point of, well, show me this move, they'll do exactly what you're saying. They'll be like, well, I can't tear this apart.
Starting point is 00:27:05 I don't know how. And that's where when we try to dig into those, the entry point into helping these experts dig into how to help others figure out what they know is to look at the actual work they're doing. That's the entry point. And that's why it's important for us to share what we're doing. That's the entry point. And that's why it's important for us to share what we're doing. Because then we get to see, even if the expert doesn't know that we're seeing,
Starting point is 00:27:34 we get to see some of their thought process. And we get to see what we're really doing is we're getting a piece of their mental model, and we're getting that mental model shared, and that just enhances everyone's mental model together. So it's not easy necessarily, but if you really look at the work as done, and I really do mean specifics. Like I was saying, where in the code is this thing that you're talking about? Like, show us that where in the code is this thing that you're talking about? Like, show us that thing in the code. Or actually step through your thought process of where you go first when you are paged. Like, we know that you do something automatically, but let's sit here as a group and let's break down exactly the regular work that you do at each step of yourself getting paged. And that's what helps illuminate that intuition.
Starting point is 00:28:29 It sort of helps the expert kind of declaratively bring their intuition out into the open, I guess. Good question. I'm still processing all this. It's really cool. But I have a question for you. In the past, we talked with people on the podcast that talked about game days, running game days, where you basically bring the system into a certain state. Like you're actually simulating an incident and then figure out how you can solve it. It feels though, while you have a game, you gamify the whole thing a little bit,
Starting point is 00:29:05 but it's still something different what you're explaining here, right? Because you're not necessarily talking about, let's simulate an incident and how we will respond to it. It feels like you're more talking about, let's learn, let's just all of us get better overall in the environment that we are working in and with the components. So if an incident happens, we have more knowledge about it and we're more comfortable doing the right things. Do we get this right or are they still the same? I think that chaos engineering, especially game days, because the practice of practice session does sound a lot like a game day. And in fact, you could run a game day as part of this thing.
Starting point is 00:29:54 But the goal, I running the experiment itself. It's almost more beneficial to go through all the procedures and steps and discovery and learning that you have to do in order to create the experiment in the first place. And that's kind of where it crosses paths with the practice of practice. Mm-hmm. paths with the practice of practice. That's the area that we're entirely focused on in practice practice. And we, sometimes we may take that. We had done some chaos engineering types of exercises in that session. And we talked about doing more. But the goal isn't necessarily to bend the system.
Starting point is 00:31:04 I mean, that's what chaos engineering does. It puts, it pushes, pushes pressures around the edges of the system. Kind of let's poke the system and see where it's fragile. That's a real benefit to it. But in, in light of the practice of practice,
Starting point is 00:31:21 the more beneficial part of it is how we work together and how we, how we extract understanding from the system. Yeah. Yeah. Does that, does that make sense? Yeah, it makes a lot of sense.
Starting point is 00:31:36 And, and, and because if the, if I can kind of repeat what I just learned, you would say a classical game, they will be, let's say I slow down a database and then we're just focusing on this particular incident and that's great. And if we're lucky and then this exact problem happens a week later, then we are lucky and we can fix it
Starting point is 00:32:01 faster. But we haven't learned a lot about the whole system. So if we now learn a lot about the system in general, we can not only deal better with that particular problem, but we can deal with many other problems in a much better way because we have a better understanding how the system works, where it may fail. We also know how different people in the team work. We know who to go to, right right in case we still we have a
Starting point is 00:32:25 question even under pressure i think that's the that's the big difference yeah you're getting yeah i think the and that's and that's the thing that gets people more excited in my mind. In my experience, people want to get together. They want this. It doesn't feel as... Let me put it this way. We may go into a chaos engineering type of investigation to try to eke out some expertise. We may not finish. We may get partway through and then we may decide, oh, well, we're learning too much about this other area. Let's keep going. The thing about your example that struck me was the difference between, okay SREs know how to operate the configuration that, you know,
Starting point is 00:33:52 maybe if this incident that were this hypothetical incident happened and the expert isn't available, maybe the expert was there in the chaos test, but they're not available for the regular thing. So it's like, well, how did you extract the knowledge from that expert when you did the testing in the first place? So that's a step that can't be missed because you're exactly right. It doesn't have as much to do with what happens as it has to do with how we respond to it that's the key yeah and and to bring another analogy i know you like the analogies with music but i want to bring a sports analogy um if you're praying if you're a football team and i'm talking about soccer now so european football right and if if if i if i only practice let's say one one
Starting point is 00:34:47 situation where you have a free kick and you know exactly if i'm standing at this position i know exactly where i need to hit the ball because then i i know the best player in the front he can he can strike the ball but that doesn't teach you how all of your players are actually reacting who is fast who is slow where can i anticipate that person to be in a certain situation? I think that's also the other big piece and where team understanding the strengths and the weaknesses of your team members is so important. And also how they act and how they react to certain situations.
Starting point is 00:35:21 Because then if somebody needs to orchestrate everything, in the in the football game it's it's just typically one one person that is kind of like he orchestrates the game it's like also in american football right the quarterback obviously knows exactly who is running where at which particular move but um this is i like this a lot i think it's helped your explanation helped me a lot to understand the difference between chaos engineering game days, where you are simulating typically a particular problem and then you try to fix it as fast as possible, obviously learn from it, but versus the practice of practices, we all elevate and get better.
Starting point is 00:36:02 And therefore, even statistically, we will be better in fixing any type of incident that comes our way. Yeah. And one of the things that's really important about doing this kind of session, in fact, at my last gig, we did this session and it was called Practice of Practice gamelan uh this was just you know practice of practice itself that's the concept um but we actually called the session at the company practice of practice gamelan now if you're not familiar with what a gamelan is that's
Starting point is 00:36:41 an indonesian percussion orchestra It's a traditional orchestra. It's made up of all these gongs and percussion instruments and xylophone-looking things. It has some woodwinds in it. It may even have a singer or two. But the thing about the gamelan is that when they get together to practice,
Starting point is 00:37:01 they actually use improvisation every time. So they'll come up with a song, just the melody. And then they'll get together as a group, and they'll practice the song, and they improvise a rhythm or a harmony to go with it. So next time they get together, they don't rehash what they already did. They take what they did, and then they improvise some more, and they develop it. And then after several different practices and meetings, the gamelan will have then written a piece.
Starting point is 00:37:39 And then that's the piece they take into the concert. But even in the concert, they will improvise and revise. And the idea here is that it's iterative. So this is another difference between chaos engineering. Chaos engineering tends to be one thing, like you were kind of describing. And really, that's how you want to do it. You want to keep your blast radius small. You want to keep your changes to a minimum,
Starting point is 00:38:09 you know, all that kind of stuff with practice of practice. As we're iterating, you know, we, we were there every week and we, we, we get together as a team.
Starting point is 00:38:19 We learn about each other's personalities. We're building on the last time that we met so that every time we meet, the team gets stronger and stronger. And that's what we, that's what I've experienced with this session. You get people start to like it and then they'll start to come over and over again, the same people. And by the way, I didn't mention this, the whole company is invited to this. This isn't just for engineers. It's not just for SREs. It's not just for the people who respond to incidents. All the time, I would
Starting point is 00:39:00 have people from the marketing team come in. People from the customer support team come in. I had someone from the technical writer team come in and participate in these things. And so it's a growth function. It's not something that you just do once and you're trained and then you don't do it again for a couple months. The way that it works is that it's iterative
Starting point is 00:39:26 and it grows fantastic i'm i'm speechless yeah i need to process all of this no i mean i love all the music analogies and so i work on the cell you know solution sales engineering side of the house and it's oh cool i always put it in the same type of terms for people when we're onboarding them especially when i have someone else who's a musician right when you're learning let's say just how to do a demo it's like all right you need to know the demo inside and out but when you get in front of people you're gonna improvise but you have to have those core fundamentals you need to know you know right the core changes you need to know the structure this and that but once you know that you can go in and it also reminds me of you know the
Starting point is 00:40:11 difference between like a band and a singer-songwriter right I tend to like bands much better be then sing you know individual artists because when you have an actual band making music, and it's my own experience being in bands too, someone brings in an idea and the rest of the people that you're working with that you now know and you choose to work with enhance that idea to become something much greater than it would have been on its own. And a similar thing comes into these aspects. You talked about bringing marketing in. If you bring marketing into
Starting point is 00:40:48 this, they're going to come in with a completely different viewpoint of all this that maybe no one has ever even considered. Whether that's the chaos engineering side or just even the wheel of misfortune. Because people might not be considering, oh, I should look up how this might impact sales. Right? But now with the marketing person there, a new aspect comes into it. That, yeah, we're not just doing this for the sake of making sure our systems run. We're making sure our systems run so we can sell our product.
Starting point is 00:41:17 Right? And now that we have this other person there, we're getting more and more perspective and everything's becoming more and more rich um so yeah at least for people like me like you for people who are you know other you know musicians or anything where there's some sort of teamwork i think this kind of you know andy was doing you know a soccer analogy analogy sorry football analogy um same concept supply right the quarterback isn't going to know oh the person i was supposed to throw to isn't open right now. But I know Tommy over there is going to look back in three, two, one, because I just work with them and we've done this so much that they know to turn around and look to make sure everything's going right. My backup plan is to go to that one. And then after that is this one,
Starting point is 00:41:56 because you've done these things so many times together. That practice is really, really important. But as Andy said earlier, it seems so obvious, but who talks about it? And I think there's a lot of really obvious things in our industry, and not even just our industry, but things that can be applied everywhere, that are not obvious.
Starting point is 00:42:19 Well, they seem obvious, but they're obviously not because it's not stuff we're doing. But when you hear it, you're like, well, duh, of course. But pause for a moment and ask yourself, are you doing that? When someone asks you a question, even when it comes down to someone asking you a question, are you thinking about answering what you heard? Or I mean, are you thinking about answering that question?
Starting point is 00:42:40 Or are you first thinking about the question and what the question really is behind it? And then are you thinking about your answer first? Right? It all goes into these different aspects of things there. But it's, yeah, it's a whole skill. It's a real, it almost goes into soft skills, what you're talking about, too, because the ability to work with others, to work out all that stuff, goes into that. It's not just the tech side. Yep. And I think that this is where the industry has a problem. By making the distinction that there is a soft skill that's different from their other job. You see
Starting point is 00:43:28 this a lot. Yeah, it's really important for you to develop your soft skills. And your soft skills are different from your hard skills and your technical chops. I don't like to think of it that way. There's this term socio-technical. And it's become more widely used. And in a very simple sense, it means exactly what you think it means. It's the combination of society and technology.
Starting point is 00:44:05 And it's not two separate things, though. So it's not society, sociological, quote, soft skills, and then technical, hard skills. You can't separate them or you lose the system. The system is not the system without those things together. So to make distinctions like work on your soft skills, that doesn't make a lot of sense. And then you also get companies that think, well, we can't
Starting point is 00:44:43 afford to let people work on their soft skills. You hear people that use the Mythical Man Month, and you'll hear people say things like, this meeting is really expensive because they're counting people's hours. And if you're counting people's hours, and then you're going, we need you to make sure that you're coding for all of these hours. And if you're counting people's hours and then you're going, we need you to make sure that you're coding for all of these hours, you know, maybe you're doing, you know, agile and you're, and you're dividing your team's capacity up or something, you know, well, part of that needs to be the socio part of the socio-technical formula that always needs to be there. And that's, I think that's something that the practice of practice brings. And I was really lucky in my last gig that the company completely supported this thing. So it was not like skunk works. It
Starting point is 00:45:39 wasn't like this secret meeting that, you know, managers and execs didn't know about not at all it was it was highly publicized at the company um we even did you know public blog posts and i've done talks about it so it's it's something that companies are embracing but it's a little bit like chaos engineering in that it becomes a cost center. You've heard this thing like operations is a cost center. There's no revenue from this kind of work. You can't look at an ROI in terms of a quantitative dollar. But when you look at the qualitative benefits that you get, a more reliable system, teams that are responding in more resilient ways,
Starting point is 00:46:35 that's more valuable almost than the cash is. I just got to say how I think this is maybe our first podcast that got into deep philosophical territory. It's funny Matt, when you were talking about the soft skills bit, I was sitting there thinking like, well, that's the same concept as you don't end at your skin, right? You are your environment. You can't be your environment without being you and all that. And I was like, wow, this could go really, really, really deep.
Starting point is 00:47:03 But I think it's a really good point, right? Nothing exists in a vacuum. Everything is part of something. And yeah, man. Hey, obviously we could talk about this forever and ever. And there's also other blog posts that you talked, that you gave us. But I want to highlight again, folks, if you want to read more about this, then the blog post that is linked is called uh practice of practice uh gamelan right do i pronounce this correctly gamelan yeah it's not game land even though it's well it's like written like this if you read it yeah it's spelled like game land exactly yeah and um well matt i want to say and i know there was a second topic that you proposed, but I'm just pointing people to the blog post.
Starting point is 00:47:49 It was about, you know, repeating incidents. The blog post is called Rivers of Opposites that people should check out. And Matt, as you have obviously a lot of experience in this field coming from the old days where you were in data centers and making sure these data centers, the hardware works reliable. And now for the last 10 years or so, kind of switch to the software and the cloud side. I think we should do a follow up session at some point with whatever else comes your way. Obviously, you're creating a lot of great content based on your day-to-day experience. Yeah, we should. We totally should.
Starting point is 00:48:30 Yeah. I should let people know that I am talking about this topic at SRECon in March. So that's in Santa Clara, California. If you are going to SRECon, I will be speaking about the human observability of incident response there at the conference. I think I'm speaking on the morning of the last day, which is Thursday. So SRECon. I'm also speaking at Southern California Linux Expo.
Starting point is 00:49:03 It's not the same talk. I'm going to be talking about the same area, incident response. But I'm going to be talking about actually building an on-call program at scale. But both of those conferences I'll be at in March. Awesome. And is that second one, I think that's part of scale 20, right? Yeah, this is the 20th year of Southern California Linux Expo. Really exciting.
Starting point is 00:49:30 And we're back in Pasadena this year. So that's going to be cool. Yeah, I enjoy meeting people face-to-face again. Yeah, I'm looking forward to it. For sure. I'm just getting over the fact that it's 20 years of a Linux expo. Isn't that crazy?
Starting point is 00:49:48 Yeah. Wasn't it crazy that yesterday we talked about the mainframe at the time of the recording? It's like almost
Starting point is 00:49:57 60 years of the mainframe. Yeah. More than 60. 64 was the big day but it started before that. Quick side note, Matt. Did you know that IBM spent about, what was it,
Starting point is 00:50:08 $4 or $5 billion on the first mainframe and development and all that in 1964? That whole project was like $4 or $5 billion in 1964 money. Crazy. That blew me away. Yeah. Even today, that would be huge. But back then, gosh, anyhow. Yeah.
Starting point is 00:50:24 Yeah. me away yeah that's even today that would be huge but back then gosh anyhow yeah yeah you know what i love hearing about those early computing things is is um there were so many composers that were helping develop those systems yeah like so many composers people actually don't realize how many music people had a hand into the early days of computing. I think it's fascinating. Interesting. Yeah.
Starting point is 00:50:51 All right. Well, we are out of time. Okay. Anybody have any closing thoughts? I mean, the one thing, the one thing I learned today,
Starting point is 00:50:59 the only thing I learned today, and I joke about that, but the, the, the most non sequitur thing I learned today and I joke about that but the most non sequitur thing I learned today is that somebody who does Rubik's Cube stuff is called a cuber. I will take that with me. I was like when you said cuber I was like oh my gosh didn't know there was a name. T-I-L. Yeah. Yeah. What I learned from you but yeah. What I, Brian, because you opened up with the joke
Starting point is 00:51:25 that wasn't a joke or was a joke, but you didn't really practice it, maybe you should apply practice of practice to your choking skills. Maybe you and I have to have more sessions on you. Jokes. We'll just have abuse sessions
Starting point is 00:51:41 with Andy where I abuse him over and over again. When it comes to the real thing, it'll be like natural. I can't abuse Andy. That's the thing though, Andy. You're too nice. I've never seen your dark side. Sure, it's the worst. You're probably the kind of guy who goes from the nicest guy in the world to the virgin murder.
Starting point is 00:52:01 I'll just think that no matter what. There's another thing I'm learning today. I'm making that up about Andy so everybody look out anyway we're good it's a weird day today isn't it alright everybody I hope you enjoy thank you all for listening
Starting point is 00:52:12 Matt look forward to having you back on and enjoy your modules back there it's one thing I can't wrap my head around too much I try my best anyway thanks everybody I can't wrap my head around too much. But it's a whole different practice. I try my best. Yeah. Anyway, thanks, everybody. And we'll talk to you next time.
Starting point is 00:52:31 Bye-bye. Yeah, thanks for having me.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.