Command Line Heroes - DevOps_Tear Down That Wall

Episode Date: February 13, 2018

As the race to deliver applications ramps up, the wall between development and operations comes crashing down. When it does, those on both sides learn to work together like never before. But what is D...evOps, really? Developer guests, including Microsoft’s Scott Hanselman and Cindy Sridharan (better known as @copyconstruct) think about DevOps as a practice from their side of the wall, while members from various operations teams explain what they’ve been working to defend. Differences remain but with DevOps, teams are working better than ever. And this episode explores why that matters for the command line heroes of tomorrow. Read Cindy Sridharan's attempt to demystify DevOps. And check out Gordon Haff's take on how to improve DevOps here.

Transcript
Discussion (0)
Starting point is 00:00:00 I want you to imagine a wall. The wall stretches as far as you can see to the right and all the way off to the left. It's taller than you. You can't see over it. And you know there are people on the other side. Lots of people. But you just don't know if they're anything like you. Are they enemies or friends?
Starting point is 00:00:30 Developers created their code and threw it over the wall to operations, and then it was operations problem. Just doing whatever they feel like, not really caring about the quality of the service. These two sides have almost opposing jobs. One to make changes and one to resist those changes as much as possible. But it's not talking on the same page about what it actually is they're trying to achieve. I'm Saranya Bark and this is Command Line Heroes, an original podcast from Red Hat. Episode 4, DevOps, tear down that wall. So yeah, for decades, the IT world was defined by that division of roles.
Starting point is 00:01:09 You had developers on one side. They were incentivized to create as much change as quickly as possible. And then you had the operations team on the other side. They were incentivized to prevent too much change from happening. In the meantime, code was getting tossed blindly over that wall with no real empathy or communication between these two worlds. What would it take to tear down a wall like that? It would take a seismic shift.
Starting point is 00:01:41 Last episode, we heard how new Agile methodologies were making it possible to produce constant, iterative improvements. And that was great. But with change comes unintended consequences. Agile increased the rate of changes we were making. But then suddenly, all that throwing code over the wall and hoping for the best, it just wasn't fast enough anymore. In our little silos, we were comfy with the way things were. But siloed people can't get things done as fast as they should. We'd put a speed limit on ourselves because we weren't working together.
Starting point is 00:02:20 And that speed limit was getting to be more and more of a problem because... It's all about faster time to market, increased agility, doing more iterative rather than longer term big pieces of work. Richard Henshaw is an Ansible product manager. You know, I remember the days when you put in an order for a server and it turned up four months later. Everything was converged together. So the entire stack was one thing and it took years for those to be designed and built. That doesn't fly anymore. And it's just
Starting point is 00:02:49 disappeared to the point that it's just throw something up, try it, bring it back down again for a lot of organizations. These days, a company like Amazon will deploy new code several times every minute. Imagine trying to get that done using some step-by-step waterfall workflow. It's just impossible. Soon enough, those ops concerns about stability, security, reliability would get pushed to the side in favor of moving fast. Developers, meanwhile, didn't see it as their responsibility to produce code that worked in the real world. Developers had little interest in stability and security issues, but those are very real issues that we need to address.
Starting point is 00:03:29 So we end up with a lot of needless revisions down the pipe, back and forth across the divide. Think how much that division can slow a company down. Think how inefficient that could get. But developers were rarely encouraged to look beyond their own command line. The size of their directories would just grow and grow and they would never clean up. They wouldn't be able to get any work done without cleaning up. Sandra Henry-Stocker is a retired sysadmin who writes for the IDG magazines. So I was kind of often having to be a nag saying,
Starting point is 00:04:03 hey, look, you know, using this much disk space, isn't there something you can get rid of, you know, so that we have more space to work because we're running out of space on this server? And yeah, we'd go through that a lot. Ultimately, this is a mindset problem. This divisive attitude between developers and operations where one didn't have to understand the concerns of the other, well, in the past, that had been just fine. But as speed became a premium, that culture became more and more unstable.
Starting point is 00:04:33 Being siloed in your own work bubble was just way too inefficient. Jonah Horowitz works for the reliability engineering team at Stripe. He describes how, even if developers and operations had wanted to work together, they couldn't have because, in a sense, they'd been placed on opposite teams. The operations team is often measured by uptime and reliability. And one of the biggest ways to increase uptime is to decrease the amount of change in the system. But of course, releasing new features is changing the system. And the software engineers who are doing product work are incentivized to ship as many features as quickly as possible. So you set up this conflict between dev and ops when
Starting point is 00:05:21 you've got these separate roles. Developers committed to building features. Operations committed to keeping the site working. Two goals at odds with each other. But, like I said, because of the increasing need for speed, for iterative rapid-fire releases, this disconnect between dev and ops was reaching a crisis point. And something had to give. Around 2009, the wall dividing dev and ops
Starting point is 00:05:56 was looking a lot more like a prison wall than anything else. What we needed was a new methodology that would smooth the transition from development to operations, allowing both sides to work in a faster, more holistic way. Patrick Dubois, CTO of the video platform Small Town Heroes, launched a conference for people who wanted to tear down that wall. He called his brainchild DevOps Days. He shortened it to DevOps for the hashtag.
Starting point is 00:06:24 And thus the movement was given a name. But a name is not a process. It was clear why DevOps was needed, but how would it work? How are we supposed to bring dev and ops together without starting a war? Thankfully, I have Scott Hanselman to walk me through this. Scott's the Principal Program Manager for.NET and ASP.NET at Microsoft. So, Scott, I've known you for, I feel like I've known you for forever. Definitely a few years. Forever. And I want to talk to you about the relationship between being a developer and what DevOps has looked like over the years.
Starting point is 00:07:02 How does that sound? Yeah, that sounds like a plan. Okay. So I think a good place to start is just defining what DevOps is. How would you describe it? The Wikipedia from 2008 that defines DevOps is actually very good. So it's a set of practices that is intended to reduce the time between committing a change and that change going into production while ensuring quality. So if you think about, hey, I checked in some code, it's Tuesday, and that'll be going out in the June release. Right? That sucks.
Starting point is 00:07:40 That would be not continuous integration. That would be a couple times a year integration. If you have a good, healthy DevOps system, if you've done a set of practices, then you are going to be continuously integrating into production. So it's what can you do, what best practices can you define, can you create, that will allow you to get it? So I
Starting point is 00:08:06 checked in some code on Tuesday and it's in production on Thursday. Now here's the important part. Pause for effect while ensuring high quality. So what's really interesting about that definition is it's a set of practices, but I feel like when I hear people talk about DevOps, it's a little bit more crystallized, I guess. They talk about it like it's a role, a job, a position, a title. Does that conflict with the idea that it's a set of practices? I think that when a new set of practices or a new buzzword comes out, people like to put it on a business card. No disrespect to people who are like listening to this podcast and now are offended and looking at their business cards. This sucks. And now they're going to like, I don't know, slam their laptop shut and rage quit this podcast. There was a really great thread by
Starting point is 00:08:59 Brian Guthrie, who is a thought worker and he worked at SoundCloud and he talked about DevOps and he said that DevOps is a set of practices, period. It's not a job title, it's not a software tool, it's not a thing you install, it's not a team name and the way he phrased it was, it's not magic enterprise fairy dust. If you don't have best practices, if you don't have good practices, you have no DevOps. So it's more a mindset than it is putting out a job title and like, we're going to hire DevOps engineers. And then we're going to sprinkle these magical DevOps engineers into the organization without the organization having organizational willpower and buying into the mindset that if DevOps. So if you think it's a toolkit or a thing you install,
Starting point is 00:09:45 then you've missed the point. Okay, so let's go back in time. Before DevOps was a term, before we had DevOps on our business cards or talked about it as a set of practices 10 years ago, how would you describe the relationship between developers and those people who were on the ops side of things? It was rather combative. Like the people in ops controlled production and developers never got near production. We were on different sides of a wall that was an opaque
Starting point is 00:10:20 wall. And we over in development tried as much as we could to make something that looked like production, but you never actually, it never looks like production. So we had a couple of issues. We had development environments that didn't look or feel or smell like production. So inevitably you'd have those, hey, it works different in production than it does in development kind of environments. And then the distance between the check-in and when it got into production was weeks and weeks and weeks. So your brain wasn't even in the right headspace because I worked on that feature in January and it's just now rolling out in April.
Starting point is 00:10:59 So then when the bug inevitably comes down, it's not going to be fixed until June. And I don't even remember what we were talking about, you know? So people in ops, it was almost like their job was to consciously slow us down. They existed to make developers slower. And then of course, they felt that we wanted to break production at all times. So why was it like that? Was it just a fundamental misunderstanding of what developers wanted and were trying to do? Was it a trust issue? Why was it so combative? I think that you nailed that. You answered it all correctly. There was a trust issue.
Starting point is 00:11:38 There was a sense, I think, that developers thought they were special or somehow better than IT people. And IT people thought that developers had no respect for production. So I think that that culture came kind of from the top, the idea that we were different orgs and that somehow our goals were different. I think that there's some maturity that's happened in software where we all realize that we write software in order to move the business forward, whatever that business is. So that sense of we're all pushing in the right direction, you know, but it was definitely trust because, you know, DevOps engineers don't trust product engineers to deploy, right? And no one understood the deployment process and people trusted only themselves. And they also, um like i only trust myself to go into production i can't trust saran to go into production she doesn't know what she's talking
Starting point is 00:12:31 about i'll do it um so if no one truly understood the system like the idea of a full stack engineer was a was a mythic thing but now we're starting to think about the whole stack as an organization. We've had terms like full product ownership, and the Agile methodology has come along saying that everyone should own the product, and that sense of community ownership and community around the code all slowly changes things to bring an environment of trust. I'm Saran Yitbarek, and you're listening to Command Line Heroes, an original podcast from Red Hat.
Starting point is 00:13:23 So for DevOps to hit its potential, we were going to need a lot of trust on both sides. And that means a lot more communication. Back to Richard Henshaw. He sees empathy for both sides as the cornerstone of DevOps. Some of the DevOps practitioners, some of the really good ones, have done both roles. And I think that is where the real power comes, is when people actually get to do both roles rather than just seeing the other side. So you don't keep the separation.
Starting point is 00:13:53 You actually, you know, you are going to live in their shoes for a period of time. And I think that gives, that's what brings the empathy back. Now, this isn't just communication for the sake of warm fuzzies. What Richard is describing is the industry swerving toward that focus Scott mentioned, a focus on continuous integration. Software was going to be not just written and released in small rapid-fire batches, but also tested in small rapid-fire batches. And that meant developers needed instant feedback on the code they were writing and how it would perform in the real world. As time to market shrank from months to days to hours, we cast around for a new set of tools that could automate any element that could be automated. You really need a whole new ecosystem of tooling to do DevOps most effectively. Gordon Half is a senior manager at Red Hat. What we see is this huge collection of new types of tooling and platforms that DevOps
Starting point is 00:14:53 can make use of, and they're really all coming out of open source. Gordon's right. The collection of new tools is huge. And he's right about the open source angle, too. The growth of automation tools never would have been possible in a strictly proprietary system. A lot of monitoring tools out there. Prometheus is a common one. Istio for service orchestration is starting to interest a lot of people, so that's out there.
Starting point is 00:15:25 GitHub lets you track changes. PagerDuty manages digital operations. NFS mounts file systems across a network. Jenkins lets you automate testing on your build. So many tools, so much automation. The end result? Developers can push their changes live, the build is automatically created,
Starting point is 00:15:43 compilation is managed, and automated tests are run against it. Sandra Henry-Stocker describes what a change this made. So I could take something that I was working on and rapidly deploy it, and I could control many systems just from the command line on one, rather than having to work at a lot of different places or wonder how I was going to get something that I was working on sent across a network and deploy it on a lot of different machines. It became easier to basically sit in one spot and yet make my changes across a wide range of computer systems. Automation tools had solved the speed problem.
Starting point is 00:16:28 But I don't want us to just praise tools at the expense of the actual methodology. Scott Hanselman and I talked about that fine line. You started this conversation by saying DevOps is a set of practices. It's a mindset. It's a way of thinking. And it sounds like the tools that we created are the manifestation, the code version of the way we should be thinking and we should be operating. I love that. You're a genius. Exactly. We used to have the product owners write in these Word documents about how the code should work. They write the spec, right?
Starting point is 00:17:06 When was the last time a Word document broke the build? Right. Okay, partly I just wanted you to hear Scott calling me a genius. But I do think those tools are almost like symbols of our cultural shift. They encourage us to broaden our roles. We developers have been forced to look up, at least occasionally, from the command line. That way, the priorities of dev and ops partly come into alignment. In fact, what the rise of
Starting point is 00:17:32 DevOps has made clear is that in a world of ever-increasing speed, nobody can afford to remain siloed. Jonah Horowitz has worked for a number of Bay Area companies, including Quantcast and Netflix. He explains how even some of the Area companies, including Quantcast and Netflix. He explains how even some of the largest companies in the world have reimagined their culture in this light. We had sort of this cultural buy-in from the entire company that was like, this is how we're going to deploy software. We're going to do it in these small batches. We're going to do it using these deployment procedures. I don't think DevOps can be, I don't think it can be successful if it's just being driven by the ops team. It has to be something that the management and leadership of the company buy into. And it's very much a cultural shift.
Starting point is 00:18:18 When McKinsey surveyed 800 CIOs and IT executives, 80% said they were implementing DevOps in some part of their organization, and more than half planned to implement it company-wide by 2020. Executives are realizing that automation tools ramp up the speed of delivery. These are the same people who used to be okay with having a pallet arrive in a data center and then have it sit there for a whole month before a new machine was brought online. Today, if you're waiting longer than 10 minutes to have something provisioned, you're doing something wrong.
Starting point is 00:18:51 With competitors hitting speeds like that, nobody can afford to be left behind. I can imagine that ops teams must have been nervous, handing all those tools over to developers. Ops was used to being the grown-up, and now they were supposed to hand over the keys to the car? Yikes. I think we developers are learning to move fast without breaking things. But as the dust settles on the DevOps revolution, the biggest changes may be for the ops team. Does DevOps actually threaten the role of operations? Is Dev using its shiny new tools
Starting point is 00:19:30 to eat ops? Cindy Stridharan is a developer who wrote a long investigative piece about all this. In your article, in your blog post, you mentioned that operations people were not necessarily happy with the way things were going. What was going on? What were you saying? Let's put it this way, right? The DevOps ideal was that, you know, responsibilities will be shared, right? Where, you know, developers and operations will have like, you know, more 50-50 split, you know, for really ensuring the holistic delivery of software, right? And I think a lot of the unhappiness from engineers, from operations engineers stems from the fact that that is not really
Starting point is 00:20:10 the reality on the ground, right? And that there's still sort of like, there's still the ones who are always picking the short straw. There's still the ones who are sort of like, you know, always doing the grunt work. There's still the ones who are primarily shouldering responsibility for like actually running the applications and the developers aren't necessarily doing enough always doing the grunt work, they're still the ones who are primarily shouldering the responsibility
Starting point is 00:20:25 for like actually running the applications and the developers aren't necessarily doing enough. The question will be a crucial one over the next few years. How Opsy is DevOps going to be? As we automate, does the role of ops get diminished or does it transform? Maybe the responsibilities of older ops
Starting point is 00:20:44 will get automated so their teams can focus on creating new services instead of just maintaining old ones. However the ops role evolves, this much is clear. The DevOps methodology is actually shaping the tech. And in turn, the tech is shaping the methodology. There's this amazing feedback loop. Culture makes the tools, and tech is shaping the methodology. There's this amazing feedback loop. Culture makes the tools, and the tools reinforce the culture.
Starting point is 00:21:11 And in the end, that wall we described at the top of the episode, the one dividing dev from ops, I don't even know if the whole throw your code over the wall analogy is going to make sense to a developer in five years. And that's sort of a great thing. Already, when I talk to folks today, I'm hearing a new story. Cloud architect Richard Henshaw. I think it is starting to make people realize what the other side of the equation was concerned about more. I've seen a lot more understanding. CysAdmin Jonah Horowitz. I think there is a craft to writing really good software.
Starting point is 00:21:48 And one thing that I see in the best developers that I work with is that they really, they push the craft of software engineering or software development forward. SysAdmin Sandra Henry-Stocker. I think that developers are becoming much more astute and much more careful. So they're constantly having to up their skills. And I know that takes a lot of work. It's a love-in. Turns out there were some friends on the other side of that wall.
Starting point is 00:22:24 Nice to meet you. So a confession. I always used to think DevOps was boring. Just a bunch of hardcore automation scripts and scaling issues. My resistance was partly just practical. As developers, every week there's some new tool coming out, some new framework. DevOps has been part of those scary, fast changes. But now, especially after hearing these stories, I get it. DevOps is more than its tools. It's how we can work together to build better products faster. And here's the good news.
Starting point is 00:22:57 As we develop new platforms for developers like you and me, my work is becoming better, faster, and more adaptive to different environments. The circle of interest can keep expanding too. You see people widening DevOps to include security, so we get SecDevOps, or they include business, so we get BizDevOps. The debate we're going to have now is, how important is it for a developer to understand not just how to use these tools, but how all that DevOps stuff even works? And how realistic is it to expect developers to understand that new world? The way we settle that debate is going to define the work of tomorrow's command line heroes. You might have noticed that in all that talk about tools and automation,
Starting point is 00:23:44 I left out some big ones. Well, I'm saving those for next time when all this DevOps automation hits light speed and we track the rise of containers. It's all in episode five. Command Line Heroes is an original podcast from Red Hat. For more information about this and past episodes, go to redhat.com slash command line heroes.
Starting point is 00:24:06 Once you're there, you can also sign up for our newsletter. And to get new episodes delivered automatically for free, make sure to subscribe to the show. Just search for command line heroes in Apple podcasts, Spotify, Google Play, CastBox, or however you get your podcasts. Then hit subscribe. So you'll be the first to know when new episodes are available. I'm Saran Yitbarek. Thanks for listening, and keep on coding. Hi, I'm Jeff Ligon. I'm the Director of Engineering for Edge and Automotive at Red Hat. Even 10 years ago, the chaos of running hundreds and thousands of containers in a cluster,
Starting point is 00:24:52 it didn't feel like you could go from that to running just dozens in a car. But these days, it's coming. In fact, containers are a big part of the future vision of software-defined vehicles. And look, if we can get the container revolution to work in cars, then everything a cloud-native developer can do today can apply to cars. This huge ecosystem of engineers can start to write applications for automotive. We can completely change the industry. This is why Red Hat's open-source approach to edge computing is so important. The way we collaborate, the way we build together, it's already making some pretty incredible things possible.
Starting point is 00:25:28 Learn more about them at redhat.com slash edge.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.