Screaming in the Cloud - Evolving, Adapting, and Staying Prepared with Brian Weber

Starting point is 00:00:00 And that's exactly how SRE generally works in my mind as well. You're not building something for the normal day-to-day. Actually, no, that's not true. You're building stuff for the normal day-to-day, but you are also building stuff for the day when everything catches fire. Welcome to Screaming in the Cloud. I'm Corey Quinn, and I've been trying to get a particular person on this show since its very inception. Brian Weber, currently between jobs, was a formative influence on my early career that started to look a little bit vaguely like

Starting point is 00:00:41 software engineering. Brian, thank you for your ongoing patience and willingness to subject yourself to my tomfoolery yet again. Oh, your tomfoolery is always amazing. Did you just call me a mentor? This episode is sponsored in part by my day job, the Duck Bill Group. Do you have a horrifying AWS bill? That can mean a lot of things. Predicting what it's going to be. Determining what it should

Starting point is 00:01:05 be, negotiating your next long-term contract with AWS, or just figuring out why it increasingly resembles a phone number, but nobody seems to quite know why that is. To learn more, visit duckbillgroup.com. Remember, you can't duck the duck bill bill. And my CEO informs me that is absolutely not our slogan. I had no idea what I was doing many years ago when I was working for a large consulting firm and you were working at Pinterest at the time. And they parachuted me into this environment because I was personable for lack of a better term. And they had a, at the time pinterest had a very weird technical vetting

Starting point is 00:01:45 process for consultants so they needed someone who could do the work ostensibly but also be gregarious and talk their way through the process this was many years ago the consulting company no longer exists after being bought by ibm i don't think i'm spilling any tea here but as at the end of it i was brought in to write a bunch of tests for puppet code as part of a long stalled puppet three migration, if memory serves. I had no idea what I was doing. Ruby was a precious stone to me, not so much a programming language. And you took me under your wing for about a month and a half, and it resonated. Thank you for doing that.

Starting point is 00:02:21 Thank you so kindly. I do remember you had, I believe it was an Ansible sticker on your laptop, and you told me that you made a very clear point of not adhering a laptop sticker until you'd actually contributed to the source repo. It would have been SaltStack then, not Ansible, because I still haven't dared to touch Ansible with my hands. Oh, the other Python. Exactly, the one that basically was frozen in amber forever and then achieved its final form of all software projects that have run their course getting acquired by VMware. You know, it's funny. I have a very good friend who basically soft retired when VMware got bought out by Broadcom. A lot of folks have that story. Oh yeah. It's kind of funny how everybody takes

Starting point is 00:03:04 layoffs just a little bit differently, you know,'s kind of funny how everybody takes layoffs just a little bit differently, you know? Just like me and all my various layoffs, like, you end up, like, staying in touch with some friends, and maybe not, I don't know, and some people get angry and bitter, and others are just like, woohoo, I can do what I want now. I have severance. So, I do want to talk about your technical evolution because you are something of a rarity in that you were for years over at Facebook, which they've since renamed to something dumb, but they'll always be Facebook to me. You were briefly at Pinterest that coincided with my time there. And then you decided to spend the next seven and a half years over at Twitter.

Starting point is 00:03:41 Yes, we're still calling it Twitter. Now, what makes that interesting is that Pinterest was sort of the departure from the other two, because neither Twitter nor Facebook, at least at the time, were large cloud shops. They weren't running Kubernetes. You, in fact, called yourself Mr. Mesos at one point, or Mr. Marathon. I forget what it was, but you were effectively responsible for the care and feeding of that particular orchestration system while you were at Twitter. So you have found yourself in this interesting scenario where, despite the fact that this is where the zeitgeist is gone, you hadn't done a whole lot of cloud work until your most recent gig over at CoreLogix, where you're focusing on FedRAMP. So one could argue that, well, is GovCloud really cloud at all? The jury is still out.

Starting point is 00:04:23 But what's it like coming to cloud as someone who's very competent as an engineer, but who just has found themselves in a situation where until recently, you never had to touch it? Well, you know, you could consider that, Mike. I don't know. What happens when you put a cloud in a bottle or a room? Is it, it's kind of like when you go to the bar

Starting point is 00:04:43 and they have the smoked Manhattan, you know, and it's there and it's pretty. And then you open the bottle and all the tendrils. Anyway, I'm bad at metaphors today. It's early yet. By the way, happy new year. We're recording this shortly into the new year. It is the second of the year. Yes. Anyway, in some ways it was just like being dropped into literally any other environment where, you know, you don't know anything. You don't know what's going on. You don't know how all the pieces glue together.

Starting point is 00:05:11 But it was a lot more challenging because a lot of the facets that I do know, like I know, you know, how a kernel works, how all of the modules work, how systemd works, how to strap things together. You know, when do you need to disable SEL SE Linux permissions to make things talk to one another? Oh, you're funny. Set in 4.0. It's a way to live. There you go.

Starting point is 00:05:34 But well, it's a way to lift. But anyway, so it was a very different environment. You know, when you spin up, say, a web server on a siloed out host, you know, you spin it up, you access it, you see, oh, this is cool. And then you start putting up walls to protect it. When you spin up an instance for the very first time in a kube cluster, in an AWS cluster, you can see that it's running, but it is very much behind the phalanx. You know, all of those protections are saying, yes, your service is running, come and get it, which is often a challenge when you don't know how to do things, simple things like properly open up a port and make sure it stays open and reopens the next time

Starting point is 00:06:17 the service runs, how to hack and slash your way through all of the VPC rules and whatever other rules randomly appear in the way when you don't know. Now, I spent, you know, a good 10 months, you know, trying to figure all that out. And luckily, I was there in an environment where there were parallel running environments. And, you know, once you learn that the various differences are basically just down to ARN names are different. You know, when you look at an ARN, it's AWS right in the middle of everything. You have to then change it to AWS-US-gov. Yeah. Yeah.

Starting point is 00:06:56 It's a different partition is to use their nomenclature. Right. Because it is a common assumption in all of your templating. And so I had to go and hack and slash through so many YAML configs and Terraform configs, you know, and we can sit here and talk about, you know, how fascinating and interesting it is that all the stuff glues together. But at the end of the day, we are all just monkeys scratching our heads, looking at code saying, where the hell is this config and why doesn't it do what I want it to do?

Starting point is 00:07:24 It's the same thing that brought me to consulting in that i was always parachuted into environments where i didn't know what the hell was going on in order to succeed in those environments to my mind at least you've got to have a strong grasp of fundamentals okay i don't know how this particular system works however i know the linux system internal is well enough to know that it should be doing the okay if it's doing this that means it's making this other call it It's not doing what I would expect. What do I not understand fully and diving deeper and dismantling it into bite-sized problems, which is why when people ask, oh, what technology should I learn? It almost doesn't matter. If you're entering the

Starting point is 00:07:59 field now as a new graduate in your early twenties, the technology you're going to be running by the time that you're my age, in my mid to late 40s, is no longer going to be the same thing. You have to reinvent yourself. You have to understand how this stuff all ties together. So I like the foundational things that are likely to remain constant for, well, at least the rest of my life. Well, I remember when there were cries and moans when the environment I was in at the time, which I'll leave nameless, was migrating from CentOS 7 to CentOS 8 because of the whole stream model. What are you doing to my RPM delivery system?

Starting point is 00:08:36 How does this work? And you look under the hood and it's really just the same. It's just packaged slightly differently and branded differently and it works the same. It's just they figured out ways to smooth off some of the rough edges. So if you're sitting there saying, oh, my goodness, I can't handle change, then what

Starting point is 00:08:53 the hell are you doing here? Well, that's one of the areas I wanted to dive into with you, because I wasn't kidding what I said used to be the Mesos Marathon guy for a period of time. The industry collectively took a vote and Mesos Marathon did not win. Kubernetes did. How did you react to that? Well, at the time when I was still at the company formerly known as Twitter, we talked a lot about whether we should spin up Kubernetes. When the decision came through that we should, we did it in a very slow and piecemeal manner. And in my opinion, I felt it was a little bit too slow. We spun up sample environments in GCP. We even had acquisitions that were in AWS that we were still operating in AWS just because the migration just didn't make sense. It was well entrenched. It worked properly. Why the heck not? Leave it there. And so we actually have reasonable brain trust around this stuff for a while.

Starting point is 00:09:50 Where we ran into a lot of trouble was spinning up Kubernetes internally on our own bare metal infrastructure. You know, not the least of that, as I'm learning now, as I set up my own home lab, setting up Kubernetes on your own bare metal infrastructure is a pain in the ass. Oh, yeah. I did it a year ago, almost exactly, where I spun up a Kubernetes in my spare room running on top of K3S and some raspberries pie. And sure enough, it was, oh, OK, this makes sense. Kubernetes lets you cosplay as your own cloud provider.

Starting point is 00:10:21 I sort of get it now. But yeah, I'd forgotten all the obnoxious hardware bits that the cloud has gently abstracted away in the intervening years. Oh, I don't even think it's the hardware bits. Kubernetes makes a point of not making it easy. I wonder if they're just in collusion with the cloud providers to say,

Starting point is 00:10:36 here, we're going to escort you on the way so that way you can earn all this money and then pay the CNCF a bunch of money so that way we all get rich. My tinfoil hat conspiracy theory remains that Kubernetes is how Google decided to get the rest of the world to write software more like Google does, because without that, Google Cloud was never going to work as a cloud provider for a lot of these workloads. So it works super well. They sort of lost control of it and they don't get to drive it anymore the way that they

Starting point is 00:11:05 once did. But I'm not entirely convinced I'm wrong. Well, you know, that same model worked for Google in search. You know, they got everybody in the world to change how they wrote web pages, how they structured web pages, buying into the AMP project. All that stuff is all because Google said we want it this way and everybody wanted some of that sweet, sweet search results and figured out how to do it. And now, as a result, when you go to a Web page to look for a recipe for peanut butter brownies, you have to read a 10 page diatribe about order to come up in the search rankings and potentially get affiliate links, which makes the experience of a human reading a web page suck. And there's always some of the better sites now have the jump to recipe button at the top because they know what's up.

Starting point is 00:11:56 But at the same time, it's why do we go through this ridiculous theater piece? Because it's what our Google overlords built for us, you know? And now we experience that in cloud factories because we get to play with Kubernetes. How lovely of them for doing these games. It's always appreciated. Well, what can I say? It's, you know, it makes our lives relatively easier as opposed to when we had to thumb through recipe cards and when I could just, you know, bootstrap, install, you know, whatever OS I felt like at the time and get something running at home. It's a reasonable approach to take. But I guess what I'm curious about is how you perceive that shift, though, because I've met an awful lot of technologists over the course

Starting point is 00:12:38 of my career who start to identify themselves by the technology upon which they're working. And I'm not immune from this. I think of myself these days as an AWS guy to some extent. And before that, I was an email systems guy. And reinventing the way that you perceive yourself is never easy. You know, I still perceive myself as somebody who just, like you say, and like you do, parachuted into a site, tried to figure out what was wrong, and mostly just try to make things better for the other people running it. Because I've said this before a thousand times, and I'll say it again, software is made of people. We are all here together and we do what we do as a collective. You know, open source projects, yes, there's occasionally the one lone guy in Nebraska, a la XKCD, who's maintaining a very important core project.

Starting point is 00:13:30 But a lot of projects out there and a lot of companies, well, all companies out there, are building it as a group, as people, as many people. if we can make that experience for our peers, for our colleagues, for whoever you're working with better, then we all get better at writing the software, at building the systems, at making things better. So that's what I pride myself in. And that one thing has never changed for me. I've picked up multiple languages. I've dived into multiple different environments. I'm comfortable in multiple operating systems. But the reality is that we're all people. We all do what people do. And if I can at least just be empathetic and be as human as I can and try and understand

Starting point is 00:14:20 that you're human too, you just want to read a simple doc that tells you how to start and stop the service. You just want to read a simple dashboard that can tell you what's wrong. And you don't want to get paged in the middle of the night at something stupid and pointless that had no reason to page you. Every human wants that. Every human engineer wants that. I mean, granted, there may be exceptions to that case. I have known masochists who just want to alert on everything because they don't know what's going on and they'd rather be woken up and find out. And 90% of the time, they wake up, they look at the alert, they say, oh, this is nothing, they crush the alert, they go back to sleep.

Starting point is 00:14:57 And then the next person comes on call and goes, what the holy hell? And I care about both of those people similarly. I think empathy is one of those core attributes to being a competent technologist. And I have no idea how you teach it. I feel like it's something you either have or you don't. I feel like the significant bulk of us have it. We just don't often know what to do with it. You know, sometimes we learn how not to be empathetic. Sometimes we're psychopaths and we just innately don't have it. But I believe those are the exceptions. You know, in reality, we're all empathetic people.

Starting point is 00:15:36 And if we can tap into that empathy and help make other people's lives better as a result, then that's what we should be doing. This is in part why up here in my small town, I tried to help start a tech meetup out here because there's so many people around here. There's a local university, a local community college, and a whole lot of other people who are just career changers, who are just interested in trying to learn about the technology, not only because they find it fascinating, but they see it as a career path forward, hopefully as long as AI doesn't destroy everything. I used to be fairly active in the, I guess, helping the next generation figure out how

Starting point is 00:16:17 to navigate the world of tech. And I've gotten away from it just because it's been so long since I was new to the space that I worry I would give boomer to your advice of, oh, just have a strong handshake and walk in with a resume printed on nice paper, ask to speak to the owner, and you'll have a job by dark, which does not work. I don't know how to get started in technology in the current system.

Starting point is 00:16:38 I know a lot about how to get started in technology in the early 2000s, but that apparently is not a highly useful skill. No, absolutely not. Although traces of it still are like, you know, yes, you can't just, you know, walk in and be bold, but having a level of confidence exudes and it shows other people you're talking to. When you're talking to a recruiter, when you're talking to a hiring manager, if you can say, hey, I may not know everything, but I know how to do these things well, and I know how to figure out what I don't know. And it's funny because one other person in our little group here of my local

Starting point is 00:17:17 meetup has finally achieved something that I had been hoping for. And of course, I'm leaving location out. I'm leaving people nameless and all that to protect the innocent. You know, this young man had been doing hack jobs on Fiverr to try and boost his skills on top of working a simple retail job and got enough chops together after a while that he cleared an interview for a local company. Now, it's not that huge. It's writing some JavaScript tests, but it's a start. And if that's what gives him the foot in the door that he needs to build a career, then I feel 100% vindicated in everything that I've ever done to try and build a community out here. What worries me is the future of that story. When I first played with ChatGPT and it spat out

Starting point is 00:18:07 a quick hacked together script to query NAT gateway prices across different AWS regions, the response that I got instantly from a couple of senior devs was, oh, well, this is fantastic, but it's only for junior dev work. It'll never take the place of a senior engineer. And it's great. Where do you, is it that you believe that senior engineers come from? You didn't just show up one day knowing all the stuff that you know now, it was incremental. What does this mean for the next generation? And people don't really have a good answer for that yet. No, nobody has the crystal ball right now, unfortunately. And I wish we did because I'd love to be able to say, here's what's coming. Now, I have high hopes that we're still going to need humans in order to actually build large systems because large systems are not easily intuited.

Starting point is 00:18:59 You know, as much as other talking heads out there would like you to believe, oh, Twitter is just small globs of characters ordered in a timeline, right? Twitter sounds like the easiest problem in the world. Oh, I could build that in a weekend until you actually think about it for 30 seconds. Well, you could build it in a weekend to serve like 10 users. Here at the Duckbill Group, one of the things we do with, you know, my day job is we help negotiate AWS contracts. We just recently crossed $5 billion of contract value negotiated. It solves for fun problems, such as how do you know that your contract that you have with AWS is the best deal you can get?

Starting point is 00:19:39 How do you know you're not leaving money on the table? How do you know that you're not doing what I do on this podcast and on Twitter constantly and sticking your foot in your mouth? To learn more, come chat at duckbillgroup.com. Optionally, I will also do podcast voice when we talk about it. Again, that's duckbillgroup.com. I have a question about Twitter. Since you were there during the acquisition for a bit before the fall, everyone that I know in this space, and we didn't talk to you folks because we didn't want to compromise any of the folks who were working there and trying to hold on to a job. But a lot of us predicted that Twitter itself would basically fall over one day and have a lot of trouble getting back up. And that never happened. Do you have any insight into why that might've been like, well, how did we all get it wrong? I almost want to do a post-mortem on how the SRE community got it wrong. I'm in a little chat group with a bunch of other former SREs from Twitter. And we

Starting point is 00:20:40 have talked about this a time or two, and we attributed that a lot to the work that we had done. Because those of us who are SREs, we don't just think about, you know, what's going on right now. We often think about what's going on in the future. How do I make sure my service doesn't completely tip over? You know, the first hammer fell and cut off half the company and then another half of the remaining company all right in November before the holidays. And I believe it was the week between Christmas and New Year's that Elon said, oh, we don't need Sacramento data center. And you would expect that to end as hilariously as it sounds, but somehow they pulled it off. Somehow they pulled it off.

Starting point is 00:21:26 Now, granted, Twitter had for that whole year of 2023 stumbled a lot. The down detector had been going bonkers on Twitter. Things had been falling over. The site just didn't always want to work. So I attribute partly the work that we had done to shore up the service for the long term. Now, the other thing that I can think of as maybe just the reduced user count, because I know people had been leaving the site in droves. But I don't know. I honestly haven't looked at, you know, any whatever stat count to see what the daily active users are, what the, of course, none of that stuff is public anymore because they

Starting point is 00:22:15 don't have to report to the SEC anymore because they're a privately held company. Thanks, dudes. A lot of it does make sense in that when I was building systems, I always wanted to make sure they were well documented and the interfaces were easily understood for basically a complete Thanks, dudes. easily understood. The idea that Twitter learned pretty early on in the course of its life was one of graceful degradation. Instead of showing the fail whale when things started breaking, okay, maybe you just don't reload the timeline as rapidly, or you put the eventual in eventual consistency. That tends to be a failure mode that is less noticeable, and it stops treating the service as a binary, is it up or is it down, and instead views it, how down is

Starting point is 00:23:05 it? Once you unlock those graceful degradation modes, that's kind of awesome. I'm still surprised there weren't a whole bunch of issues that were coincided with certificate expiries and whatnot, but apparently there's still enough talent left there to keep the lights on. I'm glad you mentioned certificate expiries because that's what I worked on. That's what my, you know, I was on that team for, I want to say like four years, I think, where we managed the distribution of internal certificates and public PKIs and all that stuff. And we automated the shit out of that. It's the dumbest outage in the world because it's highly visible that there's a certificate that just expired when someone can get to it with their browser it's it's one of those things of you

Starting point is 00:23:48 should have known this was coming um we have this fancy technology called calendar reminders so the idea of automated certificate renewal is huge i think that this was a lost it was a poor decision in the 90s to have an expired certificate by 15 minutes, have the exact same failure mode as a man in the middle attack, but that's a battle long since lost. Well, it was also relatively simple to just say, you know, a Java application loads the file on disk at start time.

Starting point is 00:24:17 So if you, at that point, you can do whatever you want to the file. So we had automated systems that just went in and said, that cert is due to expire in X amount of time. Let's just snap it up. All right. So you'd have X number of days before it expired. And service owners should theoretically know,

Starting point is 00:24:35 restart your service within X number of days and light's good. Now, what you can do is have a failure state that says, oh, I've never restarted my service, but this cert's expired. Maybe I should die. And then it dies. And then whatever container system you're using restarts the service for you because a service died. You do that and then voila, automation happens. These are the kinds of things that we thought of collectively at Twitter for years

Starting point is 00:25:06 in order to keep things up and running smoothly. So that way, as much as possible, all of the pain in the butt things that everybody had to deal with could just be on autopilot. Oh, I just restart my service. Cool. It's the right approach. It's why I love what uh let's encrypt is done and the maximum maximum cert validity is 90 days because people go through an outage like that like oh crap let's let's build a cert that has a 10-year expiry great which i understand from a human perspective this was painful let's make sure we're gonna deal with this again anytime soon while we're rotating it but when you have like a wild card cert god help, that is good for the next 10 years, you'll never be able to trace all the places that it winds up in the next decade. So when that does hit expiry, everything is going to break and it becomes a massive issue.

Starting point is 00:25:54 Whereas if you do the painful things and scary things more frequently and it makes them routine, yeah, I have a bunch of systems now that auto-roll certificates programmatically and I never have to think about it until and unless I'm doing something clever. Yeah. Well, I know a lot of people have talked about the golden path. The golden path being where you want everybody to go in order to get to that destination. That destination being a running service that makes us all money so that we can all pay our rent and eat food. So if you make that golden path as easy to walk as possible, then people will naturally go there. You know, and I say that knowing full well, that's one of those Pareto principle things you run into. Because multiple times in my career, I have run through mass migrations where I chase down large numbers of people at a large company in order to get them to do a thing. You know, here, this is going to take you two hours to do. This is going to take you 10 minutes to do.

Starting point is 00:26:57 We just need you to do it. I will show you how to do it. I will do it for you if you're willing to let me. So on and so on. You know, the bulk of people are just like, oh, cool. We love it. I will do it for you if you're willing to let me. So on and so on. You know, the bulk of people are just like, oh, cool. We love it. Sure. And then you get to that last 20%. And even worse, you get to that last 3%. And those last people are like, you want me to restart? I'm not sure we know how. That's one of the things I learned from my kubernetes because it's okay great i have a person on a bunch of raspberries pi plugged into the same power

Starting point is 00:27:29 supply and when that thing gets jostled and loses power okay how do you safely bring up an entire cluster we didn't think about that because why would you ever turn off cloud instances all at once oh no oh dear because my again this comes back to the ancient sysadmin wisdom of once i had my cluster built out one of the first thingsysadmin wisdom of once i had my cluster built out one of the first things i did is i yanked the power cord out of the back of one of the nodes like i was rip starting a lawnmower just so i could see what the recovery looked like and it turns out with a lot of extra work it just never comes back which okay that's that's a little disturbing it all comes down to long horde the disk system i'm using because ebs is a marvel that

Starting point is 00:28:04 people do not give enough credence to, because managing disk volumes in a distributed fashion is super hard. And this is why people pay AWS, GCP, and Azure tons of money. Tons of money. Because managing Kubernetes sucks on its own. Managing an EBS, I 100% agree, sucks even worse. At least for home labbing stuff, you could do a true NAS, which has all the right APIs for doing that, which makes that a lot easier. Oh, yeah.

Starting point is 00:28:32 There are a lot of options you have, but it's also stuff that I run that is only production adjacent. Like my RSS reader lives on top of this thing. My change detection bot that winds up validating at different websites have these things changed and showing me what happens. I have a bunch of container stuff that I've thrown together in here, but if the entire thing blows up and falls into the sea, I still have a bunch of options that do not preclude me from getting my work done. Yes, yes, yes. I get that. You know, it's funny. I think about this in the real world too. I have a pantry full of home canned soup. No, I'm not a super prepper.

Starting point is 00:29:07 I just like doing it. But it's great because, you know, where I live, it can get inclement weather. So if the roads shut down, I have four days worth of food in the house just in case. And this was just because of how I learned how to live growing up. You know, I grew up in another mountain town and the roads would routinely close. So we would have routinely a couple of weeks of food in the house. And if the power went out, we could pull out a camp stove and warm up a can of soup. I just like homemade soup better than Campbell's. It's the right answer. I wish more people thought about these things and did a little bit of planning ahead. Like, oh, they start forecasting and climate weather. You don't need to do a run to the store with everyone else

Starting point is 00:29:48 necessarily. And that's exactly how SRE generally works in my mind as well. You're not building something for the normal day to day. Actually, no, that's not true. You're building stuff for the normal day to day, but you are also building stuff for the day when everything catches fire. A lot of work that I did on a lot of my different teams and products that I had worked on was not just to say, OK, everything is burning to the ground. How are we surviving? A lot of what I have done is saying, let's make deploys easier so that we don't have to think about it. So one thing that's kind of on my brag sheet is I worked with a couple of different teams, both my own team and the core services team at Ye Olde Twitter to help build out a process for continuously deploying an RPM. Now, this is often not something you want to do in production environments. Not without some gating or some really great automated testing.

Starting point is 00:30:55 Oh, yeah. And that's what we did. We made sure that we had good process for dating and versioning, for easy push-button rollback, for hard versioning, because originally my first version of this was just saying, yada-da, latest, whatever, which is never a good scenario. So why the hell are you doing it with your RPMs? So we came up with this process. We pinned the version into a Hira file for Puppet. We read that out of a config file from elsewhere so that way another automation surface could stamp it in and tied it all together in a Jenkins script that would then pull all the right stuff together, auto-stamp a version, and then ratchet up a FQDN hash percentage number. So that way you could say, let's roll this new version to 1% of the fleet and see how it does. Let's roll it to 10% of the

Starting point is 00:31:53 fleet and see how it does. And once we got that machine well-oiled and well-lubricated, and mind you, this was a process that took like maybe three to six months to build on top of doing other things. And then another three to six months to gain enough confidence on that we could just pull the brakes off and just say, let's let it go. And the biggest noise that we ran into was that we would ratchet the version forward faster than all of the RPM masters could sync. So occasionally, a puppet run would go through, would talk to a yum repo that didn't have the newest version because we literally just shoved it out there. And we actually got some feedback from the team that managed that saying, oh, yeah, we are having some problems with a couple of these.

Starting point is 00:32:48 And I said, what can I do to help? And he said, well, maybe don't roll out so fast. So I added extra steps to then say, let's not roll through and just look through and see, did all the young repos sync? Because you could just probe it all in a loop and then just come back and wait a minute and probe it all, blah, blah, blah, blah, minutia, minutia, minutia. We got it working. And again, software is made of people. I was able to do that because I had good relationships with

Starting point is 00:33:17 the people on my team and the people on those other teams so that we could talk about these things like humans. Which is a reasonable and grown-up way to approach it. Yeah, because it's one thing to walk up and say, I don't give two shits about what your job is. I have to get this done, which is not the way. That's not how you win friends and influence people. No, it's not. Instead, you walk up and say, well, I'd like to get this done.

Starting point is 00:33:41 How do you think we can do this? You know, I'm here playing in your pool. I don't want to pee in your pool. I want to do this done. How do you think we can do this? You know, I'm here playing in your pool. I don't want to pee in your pool. I want to do this right. Exactly. With the unspoken thing being, look, at some point this has to get done.

Starting point is 00:33:50 And so you, you either have to get at some point, leave, follow, or get out of the way. I would love to collaborate with you on this for a better outcome for everyone. Right.

Starting point is 00:33:57 And at the end of the day, this can be copy pasted out to make everybody else's life easier. You know, lots of carrots, lots of hugs and lots of golden stars and all that. The stick may be back there somewhere else, but don't even think about it. Be people, be human. We're all here to just take care of each other. So let's do that. I want to thank you for taking the time to chat with me about all this. If people want to learn more about what you're up to, where's the best place for them to find you these days?

Starting point is 00:34:28 I feel like I should re-step up my social media game because I was a lot more active on ye olde Twitter before it became something not Twitter. I have migrated entirely to blue sky, and it's like Twitter of old in a lot of ways. It's great. That's what it looks like. All right, well, in the meanwhile, you can find me on the LinkedIn. And we will, of course, put a link to that in the show notes.

Starting point is 00:34:51 Thank you so much for taking the time to speak with me. I appreciate it. More than happy to, Corey. Thank you. Brian Weber, longtime friend and mentor. I'm cloud economist, Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast,

Starting point is 00:35:03 please leave a five-star review on your podcast platform of choice. Whereas if you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice. Whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry, insulting comment telling us that we must be idiots because clearly setting up storage for Kubernetes

Starting point is 00:35:17 in a home environment couldn't possibly be that hard.

Screaming in the Cloud - Evolving, Adapting, and Staying Prepared with Brian Weber

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.