PurePerformance - 020 DevOps Stories, Practices and Outlooks with Gene Kim: Part 3

Starting point is 00:00:00 It's time for Pure Performance. Get your stopwatches ready. It's time for Pure Performance with Andy Grabner and Brian Wilson. Hello, everybody, and welcome back to Pure Performance. This is part three of our interview with Gene Kim, and I just learned he has a stopwatch that he's using, I think, to collect some timing metrics. So once again, hello, Andy. Hello, Gene. Let's get right back into it.

Starting point is 00:00:39 Hi, everybody. Hello, Brian. Hello, Andy. Great to be back. And Gene, thanks for sticking with us for three episodes here. And I know you are getting ready for your DevOps Enterprise Summit next week. That's why I promise we'll keep it short. You have your stopwatch. You make sure that we are not running over time.

Starting point is 00:00:58 After all these great discussions in the first two sections, I want to touch on one topic that is part of the phoenix project obviously and i think the big enabler of of devops which is feedback loops and kind of the way i see it if if you look at the first way of the phoenix project it's about you know getting rid of obstacles uh optimizing stuff that is not good enough that you know optimizing flow it's getting stuff faster out from dev to ops. But then the feedback loop really tells us, the initial feedback loop, did we build the right thing?

Starting point is 00:01:31 Or what's the impact of the stuff that we just pushed out there? And so the question that I always get is, so what is the best metric? What is the best feedback loop to get started? We're obviously a performance monitoring company. So do I look at response time? the best metric? What is the best feedback loop to get started? We're obviously a performance monitoring company, so do I look at response time? Do I look at user experience? What do I look at?

Starting point is 00:01:53 What is the basic feedback loop that I need from operations back to development? Or who should actually look at this feedback loop? Is it really operations back to development? What are the actually look at this feedback loop? Is it really operations back to development? What other metrics do I need to consider in order for everybody to be happy? So how do I get started with the feedback loop and what are the right metrics? That's the question that I always get. So I want to pass this on to

Starting point is 00:02:15 you. How would you answer the question? Yeah, you know, in fact, maybe that's just, whoa, whoa, what an amazing surface area to cover there. But I think one – before we even talk about production metrics, I mean I think there's this one question that we found to be this not only performance but also it predicts the presence of technical practices, of architecture, of cultural norms. And that one question is on a scale of one to seven, how much do we fear doing deployments, right? One is we have no fear at all. We do it all the time. Seven is we have existential fear of doing deployments.

Starting point is 00:03:04 We're so afraid of doing deployments, we do it all the time. Seven is we have existential fear of doing deployments. We're so afraid of doing deployments, we do it never. And how organizations answer that question predicts the entirety of IT and organizational performance, levels of burnout, as well as presence of monitoring, version control, continuous testing, and so forth. So let's assume that we don't have that much fear of doing deployments. I think the other sort of – so if we don't have fear about deploying, then we can actually start optimizing around outcome measures.

Starting point is 00:03:37 One of the things I learned is just the importance, just the true importance of monitoring. It's certainly one of the top predictors of performance. And there's a great quote from Tom Limoncelli of, you know, alerting, right? He said, you know, the exercise he thinks about mentally is what would happen if we took out all our – we just deleted all the alerts in our fault management and performance management systems. And instead, we focused on how to prevent bad things from happening again, just from a performance perspective. So instead of, is the web server up or down? What are the conditions that lead to the service falling over, like ever-increasing response

Starting point is 00:04:22 time and so forth. I mean, that was such an eye-opener for me just because it sort of refocuses us on, you know, what are those metrics we need to get earlier indication of problems, ideally long before the problem actually happens. And, you know, if we can't do that, at least how do we enable quicker detection and recovery? The other thing that I learned was just the construct of the customer acquisition funnel. One of the examples I love about this that frames this is the founder of Intuit. He said for their TurboTax property, they did 160 production experiments during the peak three months of their tax filing season. And so the first time I read this, I thought – my reaction was that's the dumbest idea I've ever heard. Why would you do production changes during peak seasons?

Starting point is 00:05:17 The way I was trained in retailing is that we were so afraid of the holiday outage that we had a change freeze from October 1 to January 30th. And so the reason they do this is in the next paragraph. He said the business outcome of doing these experiments is that we were able to increase the conversion of our customer acquisition funnel by 50%. So the aha moment for me was, had they waited until April 16th after the U.S. tax filing season ends, they could have lost their prospects and maybe even some of their customers to the competition, never again to return. So this notion of a customer acquisition funnel at Microsoft, Ed Blankenship, he's the product manager for a good chunk of the Visual Studio properties. He said every development team, they are measured on their feature usage. And so in the ideal, there's a funnel that each team tracks

Starting point is 00:06:21 about how many people are actually using their feature, how many are using them using their feature, how many are using them periodically, what percentage of the customers are using it daily. And by doing that, they can help optimize how they design their features so that ideally they're being all integrated into daily use and making their customers more productive. So I think that kind of gives a lot of guidance in terms of what things do we want to monitor to help achieve our business goals? So if we were a search company, right, we might be measuring for an e-commerce company, right? We might be measuring how many times are people clicking on a product page?

Starting point is 00:06:59 How many times are they checking out? How many times are they actually successfully completing the transaction, right? And we want to make sure that those numbers are – have the highest yield possible. I like the story thing from Microsoft. As you said, the feature teams. Like who can build the best feature for our users so that users want to use my features more than yours? I think it's also a fun competition and spurs to become even more creative on how to make the feature even better. No, totally.

Starting point is 00:07:46 In fact, Ed Blankenship, he'll be presenting at the DevOps Enterprise Summit next week. One of the things that he shares is not only does it create competition between teams, I think it's like once a year they basically allow people to switch teams. So each product manager has to basically go in front of all the engineers and pitch their, you know, the mission of their team. It's like drafting the next team. Yeah. I got stuck on Joe's team.

Starting point is 00:08:17 I got stuck on Joe's team. And sometimes they're not, you know, if on the Phoenix project, we have that character named Brent, right? I can't switch teams because there's something in my head that only I know. So they have to make sure that for all the Brents in the organization, they give them the help they need so that the teams are not reliant upon them so that they can also work on more interesting things that they want to work on. You know, speaking about the feedback loops, there was something in the book that you mentioned that just kind of blew my mind. And I wanted to see if you could expand a little bit more on it. The quote was QA team being defined as the team responsible for ensuring that feedback loops exist to ensure the service functions as desired.

Starting point is 00:09:11 So this obviously sounds like a major shift in what the QA team is. Can you explain that a little more? Oh, yeah. I think that quote comes from Elizabeth Hendrickson, one of my heroes. She's VP of engineering at Pivotal for their big data products. And just to tell a story behind the story, one of the things that blew me away, I mean, this is so easily misinterpreted but i mean i think when i heard the story it just uh i found uh to be mind-blowing uh she said she described how she was a part of a group called the los altos workshop for software testing um so in the early 2000s, this was a group of the best testers in the game. And no consultants allowed. It was only people who are leading large software initiatives.

Starting point is 00:10:12 And they did 60 plus workshops. And one of the exercises they did was they asked everybody, what was the dev to test ratio or the dev QA ratio for your best project and your worst project? And the surprising kind of outcome was that for people's worst projects, they were highly correlated with very high dev-test ratios. In other words, sometimes one-to-one ratio between dev and QA. The even bigger surprise was that for people's best projects, the answer that kept on coming up over and over again was no testers. They're not saying that test is bad. This is the best testers in the game.

Starting point is 00:10:56 Instead, it was suggesting that you get the best software quality when everybody knows that there is no one out there who will test your code for you. Everybody is responsible for their own quality because there's no other department who will save them in the same ways about security. And so I think the goal about, you know, I think it's kind of the next generation of QA professionals, you know, what they're saying is that our job is not to write developers' tests and it's not to, you know, execute those tests. Instead, it is how do we coach and consult with our development teams to help them integrate, you know, writing tests as part of their daily work. And I think Adam Auerbach from Capital One, who you had on earlier, I think he is an enormous proponent and champion of this cause at Capital One.

Starting point is 00:11:50 And it means that increasingly our job is not to be the bottleneck. Instead, it's really about creating these incredible testing capabilities within development teams. And maybe my last thought on this is that, you know, I think in many organizations, there's this diaspora where, you know, the testing professionals are moving out of this functional orientation, moving into the development teams, into the feature teams. But ultimately, I think, you know, there's going to be sort of like the Spotify Guild model where the QA professionals will all be in one group and maybe be matrixed into the delivery teams. Because as someone once said, a QA professional will never learn what they need to learn hanging out with just developers. They need to be hanging out with the best QA professionals in the organization. And so, you know, that would say – that would lean more towards functional orientation

Starting point is 00:12:51 than kind of being dispersed throughout the organization. Interesting. One metric, one feedback loop. I know we are getting close on time. I mean, I obviously agree with all the stuff you said earlier with, you know, how often is a feature used, the conversion funnels, conversion rates, user experience. With more and more companies that we talk to where their sole business is based on software, especially software running in the cloud, meaning it's all running on hardware owned by Amazon, Google, Microsoft, whatever, and frameworks making it so easy to develop software and taking a lot of the underlying pain away of accessing a data store, scaling up and down. We see a lot of developers now focusing on what they should do, which is creating value.

Starting point is 00:13:39 We see that a lot of them fall back into the trap of actually writing code on top of frameworks that are running in a very highly complex environment that is actually not very efficient. But thanks to the scalability options of these frameworks or platform-as-a-service, whatever it is, that makes it scale globally, it doesn't seem like you have a problem. But at the end of the month, you have to pay the bill for the CPU cycles, the storage. So what I try to tell people nowadays, if you track your feature usage, your feature teams not only track how many people are using it, how fast it is, what's the user experience, but also how many CPU cycles does it consume? How many database statements do you execute? How many log statements do you create? How many log statements do you create? How many bytes do you send over the wire? How big is your page?

Starting point is 00:14:30 It's basically combining the feedback loop with usage with the resource budget. I think the term resource budget was something that was coined years ago in the testing community around web performance optimization. So how big can a page be? And then we figure out how many resources we consume. So I believe this is a big thing because if you make feature change and you push it out and everybody's happy and super-duper and usage goes up, but at the end of the month, you figure out your feature that is used by 90% of your users

Starting point is 00:15:01 is now 50% more expensive to run on Amazon because you just made a bad change in Hibernate and now Hibernate is bypassing the cache, then this is obviously not good. Because ultimately, the cost of the software will define the business success. Right. And I think this is something that I want to get out there to people. Yeah, I think one of the many smart, brilliant things that John Allspaugh said to a group of us, he said – he put it like this. He said if you have two development teams competing with each other, right, you have the first team is just developers. And the second team you have developers and just one ops person. He said he would bet on the second team you have developers and just one ops person he said he would bet on

Starting point is 00:15:45 the op the second team every time he said you know when it comes to challenges like this you know you need someone with some ops experience who can ask you know questions like you know given this performance objective you know uh are our existing frameworks good enough do we need to re-architect something different you know uh is this something that we need more IO capability or CPU capability? You know, and I think that dimension of knowledge to bring to bear on the problem, you know, will create dramatic differences in performance. And I think, you know, I would be, I think if we were to do an experiment, right, you take team one and team two and you toss that problem to them. I think the team that wins will definitely be the one

Starting point is 00:16:28 who has more operational experience to help with the achievement of goals. So I agree with you 100%. And it's also, I've been working and talking with Adam a lot from Capital One and with his approach of shifting left performance,

Starting point is 00:16:44 and we see this now all over, actually looking at these metrics as soon as a developer makes a code change and you start running your unit test, your functional tests. And if you not only look at functionality, but also look at these resource metrics like, hey, how many round trips do you make to the data store. And even if it's mocked away in a CI environment, it's also okay. But I still can count how often does this code now access this particular service. And then knowing that this service in production will actually be a server on the other side of the world where I need to pay for every single byte

Starting point is 00:17:18 is something that allows you to early on in the pipeline actually stop bad ideas. I always say stop bad ideas and code changes early. And I think we need to look at some of these metrics. And they are obviously feedback loops. So within minutes of a code change, we should know what the impact of that code change potentially is. All right, Willie.

Starting point is 00:17:39 We're right about the end there. Gene wanted to see, do you have any final thoughts? We usually do a little final thought section here. Any final thoughts to our listeners? Oh, my goodness. Yeah, I think my final thought is it's just never been a better time to be a developer or an ops person. It's never been so easy to do great things, and there's never been a better community to learn from. So, man, I've never learn from so uh man i've never

Starting point is 00:18:05 had as much fun i've never learned as much as i have in the last couple years and so what a great time to be in the game andy i just want to say thanks for thanks for being somebody that inspires a lot of people and for bringing out the companies behind the curtain and basically let them talk about what they do and changing especially the culture of these companies being open to talk about what they're doing, sharing things instead of keeping their stuff within their own boundaries. I think that's great. We can all become better.

Starting point is 00:18:39 Thank you so much. No, my pleasure. And Gina, I'd like to thank you for being on. And thanks to you, Jez, Patrick, and John for writing the book. I want to really encourage everybody to go out and get it because, you know, this will hopefully inspire you. That and also the Phoenix Project is another great book. Hopefully inspire you. Hopefully you'll start talking to your colleagues and maybe you can start getting so you can become part of the next percent that changes over.

Starting point is 00:19:02 You know, the one last thought I had about the book, too, Gene, I don't know if you thought about it when you were writing it, but it really struck me not only as like a IT transformation book and, you know, how to do DevOps, but also a, almost a self-help book. Kind of think about how much, have you ever thought about how much

Starting point is 00:19:18 you could apply this to besides, you know, the technology transformation? It's just such an amazing concept in this book. Like when I kept on thinking, wow, if we could just apply this to politics, how amazing that would be. Anyway, it's an awesome book. Really, really thank you.

Starting point is 00:19:32 Everyone should, you know, definitely just go out and read it. We're going to put up that link for the, we're going to get a link from you, I believe, for the, what's it, 140 pages, was it? Yes, exactly. The first 140 pages of DevOps Handbook. All right, so look for that on both the Spreaker page and also on our peer performance page on dynatrace.com.

Starting point is 00:19:51 We'll have that up there. Thank you so much, Gene, for taking the time. I hope you have a wonderful time next week at the DevOps Enterprise. What's the name of it again? DevOps Enterprise Summit in San Francisco. Excellent. Well, thank you. Goodbye from me. Excellent. Well, thank you. Goodbye from me.

Starting point is 00:20:07 Bye. Thanks, Andy. Thank you, Brian. Thanks. Bye.

PurePerformance - 020 DevOps Stories, Practices and Outlooks with Gene Kim: Part 3

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.