PurePerformance - 036 Baking Functional, Performance and Security into your DevOps Best Practices

Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it next episode of Pure Performance. As always, I'm host number one, Brian Wilson, and host number two is my good pal, Andy Grabner. Andy, how are you doing today? Hey, I'm doing pretty well, Brian. I know that you're full of songs today. Sure. It seems we're all in a good mood. least you're definitely a good mood i'm a good mood too

Starting point is 00:00:50 for multiple reasons first of all the thermometer uh is above freezing which is a great thing for early may when we are recording uh also we had a phenomenal first visual podcast yesterday, the two of us. Oh, yes, we did. And we recorded a performance clinic on a topic that is dear to your heart, which is performance monitoring of Salesforce Commerce Cloud, formerly Demandware. Yes, that was a good one. I enjoyed that very much. Thank you for having me on that. You're welcome. Sharing our knowledge not only in an audio way but also visual,

Starting point is 00:01:27 proving that our faces are better for audio only, but still we showed it to the world. And today we actually have a guest on the line. We do? We never have guests. Before you go there, I just got to say it did snow a little today in Denver. It's not normally a cold place, but our temperatures have dipped the last week anyhow. So you have your nice warm weather. I've got rain and snow, but it's not getting me down. Anyway, we have a great guest today.

Starting point is 00:01:52 Go on, Andy. We have a great guest today. So Todd and I, we've been – at least we've known each other virtually for quite a bit, and probably Todd can feel us if he remembers when we first virtually met but we physically met last year at Impact I believe in sunny California and we've been I think we're all in the same industry where we're all performance engineers and what I was really intrigued with with Todd gave a presentation he's doing a lot of presentations out there he's also he also wrote a book effective performance engineering which I'm sure he can probably talk even more about. So, Todd, without further ado, I hope you are with us on the line, and maybe you can tell the audience

Starting point is 00:02:33 a little bit about yourself. And if you remember when we first met, I don't really remember anymore, but it's been a while. Yeah, thank you, Andy and Brian, for the introduction and opportunity. Andy, I cannot honestly remember. I think it's got to be the better part of 10 or 15 years that at least we've been working within the same area in the same community. And again, we kind of virtually say hello to each other through all of our social media and blogging and supporting everything that we're all doing. So, again, I think one of the fun things about this community is how small it really is. But having the opportunity now to speak with Brian and Andy and, of course, everybody who's listening in on the podcast is great. And, yes, Andy, you and I, we were out at the SEMG Impact. I believe it was in La Jolla, California this past summer.

Starting point is 00:03:24 And, again, as we're talking about weather, just absolutely gorgeous there. I think every time that I've been there. So looking forward to I think the next one might be coming up in New Orleans, which I've got to get ready for that. But back to the topic. So, Andy, I think you kind of said, hey, maybe there would be some interesting topics. I was fortunate enough to be able to co-author the book of Effective Performance Engineering with O'Reilly. So, again, a lot of information that many of us in the industry have said, hey, wouldn't it be great if somebody would throw a couple of these ideas down into a book so that we could get it out there for education and a little bit of knowledge sharing and enabling other people to be able to spread the word around. So Shane Evans and myself finally were the two guys that kind of sat down and put pen to paper. That's out there. It's available. It's available free as an electronic

Starting point is 00:04:21 download in three different formats. You can either get it off of effectiveperformanceengineering.com and then just click on get the book or you go directly to o'reilly.com and you can find the electronic download there as well. So again, just getting knowledge out, sharing ideas, always looking to get feedback as well of other tips and tricks that you guys are using. Perfect. Hey, and I know when we discussed about what should be the kind of main theme today, we bounced around some ideas. Obviously, some of the hot topics came up like DevOps integrating functional performance and security. I believe that's what you brought up into the DevOps best practices.

Starting point is 00:05:04 And I think this is some of the parts we actually want to talk about. Obviously, we've all been in the performance industry. We actually on the podcast have talked a lot about how we can integrate especially performance into the pipeline through automated tests, earlier shifting left. You additionally brought up two terms, functional and security, which I think are very interesting aspects that obviously are as equally important. And maybe I want to actually just hand it over to you and kind of give us a little overview of how you start the discussion around how can we actually bake all of these important things into DevOps and what does this actually mean? And what are some of the best practices that you write about in your book, but also, you know, teach in your classes, discuss with your clients?

Starting point is 00:05:53 Yeah. So it's been interesting because, you know, we all have a journey. And through my journey over the last 11 or 12 years, you know, starting at ING Direct, where we were transforming from a waterfall organization into an agile organization and ultimately drove to DevOps before it was even being called DevOps. Then to do that three or four more times in my career. Then, again, a number of fun things happening to where I had the opportunity to land at Helo Packard Enterprise through an acquisition of Shunra Software into this chief technology evangelist role where I had the opportunity to travel the globe, meet with CIOs on down and helping them to really understand how is it they could leverage certain things. So this topic of being able to deliver amazing results to your business with proven DevOps practices was kind of at the forefront of many of those conversations in many large and mid-sized

Starting point is 00:06:46 companies around the world. So as I was trying to shape this into how do we cover some of these important topics in a podcast today, I often went back to, you know, why is this important to the customers that I was speaking with, some of the prospects. And again, how do they look at these different types of things? So again, I don't really care if you're waterfall, you're agile, you know, you're practicing some DevOps practices or throw in whatever other buzzwords. But today, what I'm finding, Andy, is, you know, many organizations are looking kind of five different areas, or it's also been called the five S's. So things like speed, stability, scalability, savings, and of course, security. So that's why I think this topic has become so important.

Starting point is 00:07:37 Now that we've got proven DevOps practices, the challenge becomes, well, how do you integrate that into your automated workflow or your automated CICD? And when we talk about it, yes, you know, the three of us on the podcast here have been focusing a lot on performance, but, you know, security is something that has continued to become more and more important over the years. And of course, functional has always played an important role. So again, really thinking about that piece of functional performance and security, how do you automate that into your proven DevOps practices? But again, as I've looked at the landscape, as I've had the conversations,

Starting point is 00:08:18 those five S's of speed, stability, scalability, savings, and security are the real focus areas at the business level and that C-level conversation. Todd, before we go further, I wanted to – you bring up security, right? And it's really catching – really gaining a lot of momentum these days, right? But I think security in general is kind of a vague term for a lot of people because it's new to a lot of people. So before going on and without getting into how you bake it in, could you just provide maybe a very high level view of what is security in terms of, you know, in context of all this? And what do people have to look out in terms of what that security piece means? Brian, it's a great question. I'm glad you stopped me here because security, again, I'm chatting with one of my buddies the other day.

Starting point is 00:09:05 He's a cybersecurity specialist. So, again, as he thinks about security, it's more of, you're managing the ways that people are trying to penetrate your organization or your applications. The third area, which is probably where most of us have spent the majority of our time is in application performance. So how is it that whether it's through dynamic or static code analysis or some of the older things that have now come back, things like SQL injection, cross-site scripting, you know, all these different elements of application security. So I think for this conversation, we could probably just look at application security and those aspects. I think there's another element here that is playing more and more of a role in the world and, you know, things like DNS. So, you know, where is your DNS? Who is your DNS? Some of those risk areas. We could also kind of talk about that a little bit, perhaps, Brian.

Starting point is 00:10:22 Okay. Thank you. Andy, you were about to ask a question before I really cut you off. Yeah. So – no, no, no, no, no. You didn't cut me off. You brought up a very good point. I wanted to ask one more thing about one of the other five S's. You talked about savings. Could you elaborate more what you mean with savings? Is this time savings or is this probably more speed then?

Starting point is 00:10:46 What's the savings exports? How can we save or what do you save in the end? So there are many different facets of savings. In organizations that I've been working with and now I have the opportunity to work myself, it's really focusing on that customer value around cheaper. So this might be things like eliminating some of the legacy middleware costs. It may be reducing infrastructure or consolidating infrastructure components. So I know that some of the opportunities I've had to work with very large companies. Again, if you think about the

Starting point is 00:11:25 ability to optimize an application, you actually will reduce the hardware footprint required to support that application and what it functions for your customers. So there's some stories that we could go in and tell, but it's been pretty substantial savings when you look at some very large companies. And again, if you can optimize it by 20%, what does that mean as far as being able to realize a savings in eliminating, again, a couple of factors people typically look at is legacy middleware costs and also some of the infrastructure consolidation pieces. Yeah. And also, I guess, savings, if you think about modern virtualization and cloud technologies,

Starting point is 00:12:05 I assume most of the clients that you have also been working with, they have traditionally provisioned a lot of physical hardware just to have enough hardware for the peak load. But then most of the time, this hardware sits idle because who is using the software on the weekend or middle of the night? And that's at least where we see a lot of people moving towards a more flexible infrastructure as a service offering because you just buy or pay for the infrastructure when you really need it. And if you don't need it 90% of the time, then you don't want to pay for it. And I think that's also a great way of obviously in the end saving.

Starting point is 00:12:41 It's a great pro point like a good point good argument for moving towards uh some type of cloud offering whether it's public cloud or something more private you know that's i think that's that's great but i just wanted to talk about savings i just i was intrigued uh what you mean with savings but fully understand now you yeah you also saw you said security. You brought DNS up. What's up with that? Any particular example that you want to talk about? You know, there's a pretty specific example that I could bring up, Andy.

Starting point is 00:13:26 And I don't know if it's too close to home, but I guess what I've always found in working with many organizations is the more specific and close to home or real you can make it, probably the better. And again, I was on a flight back from London the day that this happened, but in the latter part of October in 2016, so not but a few months ago, there was a massive attack that came via IoT devices. So these are things like microwaves and refrigerators that were leveraged to attack the DIN or DINE DNS provider. And again, seemingly, you know, a big deal, but maybe not a huge deal. One of the articles that I had read was on Wired. And so again, with Wired, they referenced, you know referenced Reddit, the New York Times, all of these media outlets being engaged because this specific DDoS or distributed denial of service attack took down a pretty significant chunk of the Internet for most of the eastern United States. And this was the majority of the day so a fairly substantial impact there i believe i believe we all remember that because you know they i think dennis uh

Starting point is 00:14:33 do not dine they are located as you said on the east coast i think new hampshire that's where their home base is and uh many many websites including the diamond including diamond choice we also use them and i remember we we felt the impact as well or actually our customers uh felt the impact i felt it too because i was trying to get on our community to look stuff up and i was like why can't i get on our communities yeah and i know and i think the reason obviously was because our login service i remember that because we analyzed uh we analyzed We actually, I think, wrote a blog about it, how we could use pure web technology to actually prove or see where our own system actually failed when people tried to log in like you, Brian, trying to log in. Or customers trying to log into our synthetic portal or to a diamond-free SaaS portal then failed because we could no longer resolve the backend authentication system.

Starting point is 00:15:25 And then we failed. and again yeah it actually had a very interesting ripple effect because the way it was implemented that i need to look up the blog post again but our our strategy then was to just keep retrying keep retrying keep retrying and then we were basically blocking a lot of threads and then basically the whole system just came to a crawl because those requests that couldn't get through the backend system just kept trying and kept spawning up more threads. And basically eventually, you know, crashing out or slowing down other parts of our system too that were not directly impacted by it,

Starting point is 00:16:01 but indirectly. And that's actually kind of interesting. Yeah, so again, it's an interesting story. Again, it helps you to start thinking a little bit more about, you know, the resiliency of your systems, dependency on third parties. Again, who is your DNS provider? You know, are there designs that you might want to change then to think about local caching at a web server level?

Starting point is 00:16:23 Again, a lot of these questions that as we start, again, revisiting this topic of how do we deliver amazing results to your business with proven DevOps practices and specifically being able to automate and integrate in functional performance and security. Again, how would you be able to recreate that scenario in your environment today and automate it with every single build? Again, it's just something to think about. But again, it was a very real example that happened fairly recently. And I know it was something that Dynatrace felt the impact of as well. Now, just on a side note on that, with that DNS specific example, and we don't need to go deep into this, but at a high level, what would be a solution to that?

Starting point is 00:17:06 Would it be something like having a backup DNS service that you can switch to in another location? Is it something like kind of that simple or what does somebody do if this is the situation? What are some of those options? So there's definitely other DNS providers that are out there. You could think about it. You could just switch over, right? You could just switch over. Yeah. And again, some service providers today already offer that for you. So you have that higher HA or resiliency to, you know, if one of your DNS providers were to be attacked.

Starting point is 00:17:39 Because again, the way that we're seeing things evolving today, it's not a matter of if you're going to have a challenge like that. It's just when it's going to happen. And given the fact of IoT and the way that these attacks continue to happen, it's going to happen to you. But thinking about that from a resiliency is something you can do today. And Brian, to your point, there are options that are out there for you to consider. I'm sure most people were like, well, that's not good. What's the likelihood of that happening, right? So they went for the other option. That brings up another question.

Starting point is 00:18:09 So now, Todd, do you have, as part of your best practices, a catalog of things that people, you know, like a high priority, medium priority, like this is what you have to have in your pipeline in terms of security checks, in terms of resilient checks, in terms of functional techs, in terms of security checks, in terms of resilient checks, in terms

Starting point is 00:18:25 of functional techs, in terms of how does the system correctly fail over? Do you have a catalog or is there something out there that is, I would assume, also kind of dynamically growing? Because as we are, you know, I mean, the world is maturing and unfortunately also the bad guys are getting more sophisticated and come up with new ideas. So is there somehow a list, available best practice list of things like this is what we need to have in our pipeline, in our DevOps best practices, and kind of revisiting that list about new things that we learn along the way? So it's interesting.

Starting point is 00:19:01 To date, there's been nothing published around, you know, you should absolutely check these five things. That might be an interesting follow-on podcast, Andy, because I think there are – I mean, not think. I know there are practices that many organizations have adopted after, you know, learning in these ways. But I'm not aware of anything, one thing that's published today that, you know, here's your checklist of items that for all your major feature functions should be checked. One of the practices that I see adopted often is that's handled within these scripts. So again, as an organization is defining what are those very quick functional performance and security scripts that they would want to have run against every single build, they would be building those conditions in.

Starting point is 00:19:54 But again, that's often very much tailored to a specific product or a specific business unit, often not kind of that standard library that you're suggesting, which I think would be a pretty valuable piece of information to at least share with the broader community. And I find it interesting, too, because you mentioned having this playbook. And earlier, you mentioned when this DNS attack happened or the DDoS happened, you were flying. And it sounds almost parallel to what the TSA is trying to cope with. Right. There's the idea of different kinds of attacks coming in and now we can adjust our protocol or in this case, build in different security checks for what we know of is happening.

Starting point is 00:20:38 It sounds like, though, that the bigger challenge is predicting what might be next and trying to figure out some security testing models for things you haven't yet anticipated or you know it's very much like you know national security in very much the same way except all just in bits and bytes right yes absolutely all right all right so i i assume so not i mean it was a very specific example on the DNS baking security. And you mentioned a lot of functional checks as well. Are you talking here about the classical functional testing that we need to bake this into the pipeline in an automated way, running as many functional tests as possible per build? Or is there anything else outside of what I would normally, what people normally know about functional when they hear functional testing or functional checks? Is there anything new out there from a functional perspective that people need to know about? So specifically what I've seen, Don, I mean, yeah, so this is the classic functional automation

Starting point is 00:21:43 piece. And, you know, you go ahead and ask people, how long does it take for your functional test library to run? You know, typically it's, you know, hours, not something that could be run in minutes. So part of, again, as you think about, you know, this proven DevOps practice of building in and automating functional performance and security to each build is you want those to run in a very short period of time. Typically less than 15 minutes is kind of a tolerance

Starting point is 00:22:12 that many organizations have considered and adopted. So again, when you think about from a functional perspective you know, and the security and the performance bits, it's what are those most critical elements that if these pieces do not work, we are not going to be able to drive revenue. We are not going to, you know, be within our regulatory requirements. So really skinning down, what are those pieces that we absolutely need to make sure that they're working at every single build? Or at least if they're not working, we would want to know about it as a measurement of quality.

Starting point is 00:22:49 Yeah. So what we – and this is interesting. I want to get your thoughts on this. We, as Dynatrace, we obviously also have an engineering organization that is actually building our software. And we did the same thing. So what we learned when we moved from a monolithic code base to a more service-oriented cloud-native environment, we came to the same conclusion that our build times are too long. It takes too long to get feedback. If a developer checks in code and it takes hours for him to get feedback, then he may check in code, walks home,

Starting point is 00:23:20 and while he's home, we realize he just broke the bill, but he's impacting all the others that are either still in the office or that are working in different time zones. So we did the same thing. We tried to bring down build times by obviously different means, breaking up the large copies into smaller components that could be independently tested. interesting is we came up, I think, with a 10-minute rule saying that every team, every feature team, application team needs to figure out which tests need to be run within the first 10 minutes so that they get good feedback fast about the quality status of the code change they just made. Because anything that takes longer than 10 minutes, you know, it just takes too long. It's out of your mind. And then it's like minutes you know it just takes too long it's out

Starting point is 00:24:05 of your mind and then it's like you know just you need immediate feedback but it still needs to be quality feedback and so what we learned and maybe you have some additional thoughts on this with what we did and learned we optimized our our builds we invested a lot in parallel test execution so that we can execute even more tests in shorter time, but eventually give the decisions to our application teams, to our development teams to say which tests need to be executed in the first 10 minutes because they should know best. Because they're also in the end responsible for then saying this is good enough and we can ship it to the next stage in the pipeline. Now, do you think – do you have any other suggestions, anything else that you teach out there or that you've seen? Absolutely, and there's a number of different areas, Andy, that we could go. But I'm encouraged that that same 15 minutes, 10 minutes, again, that's – for all the reasons that you could go, but I'm encouraged that, you know, that same, you know, 15 minutes, 10 minutes, again, that's for all the reasons that you just suggested, that is absolutely why it's, you know,

Starting point is 00:25:10 how do you skinny it down to that small piece? Um, I guess there's, there is something else that, um, you know, I've highlighted quite a bit, um, through, you know, the last 12 or so years of my career. Um, one of these is, is as you think about, um, everything that you're driving through, you know, the last 12 or so years of my career. One of these is, is as you think about everything that you're driving through, and I actually covered this on page 40 in the book as an illustration, is talking about quality gates. And I think this speaks to a bit of what you were just suggesting. I know that, you know, Dynatrace often highlights the UFO model as well. But again, it's just another way of having an information radiator to be able to visually, you know, show for the teams, for the leaders, for other key stakeholders. You know, this is the status of these different pieces as we're going through the different stages or quality gates. And it's funny, I remember back to, you know, many years ago when we used to

Starting point is 00:26:07 have the entry and exit criteria or acceptance criteria in every one of the stages through a waterfall model. But, you know, now that we're automating everything through, you know, from a developer's desktop into a BBT or a build validation test environment, and then however many, you know, dev environments, QA environments, pre-production environments, you probably have a lot more than that as well. But each one of those automated tests, each one of those feedback cycles that you get, you know, how is it that you're measuring those quality gates? Are you able to then automate the pass or fail across each one of those components within the quality gates, again, whether it's functional security or performance.

Starting point is 00:26:49 But that's kind of the next piece that, you know, as you're sharing with your Dynatrace engineering team and some of the practices they've adopted, again, taking it down to the next bit of, can you actually automate it so that all of this is passing or failing? And once you get to that, here is a good quality build with, again, very, very short cycle times that by the time somebody is finally looking at it or seeing it, you know that you at least have a level of quality that's acceptable based on what the team has assigned to as far as quality. Now, coming back to quickly to security here, what do you see? Do you see our larger organizations or also small and medium organizations

Starting point is 00:27:33 having dedicated security teams and then they work with the engineering teams, with the people that are responsible for test automation at the pipeline into baking their security automation at the pipeline into baking their security checks into the pipeline? Or is it more the security team start providing security as a service, kind of like security checks as a service, providing APIs so that whoever builds the pipeline and runs tests in the pipeline can just reach out to these services. Can you give any insights on that, how we bake security in?

Starting point is 00:28:08 Yeah, so most of the security piece has been taken on by the development teams. So again, cross-functional, you know, you have BAs, QA, dev, the scrum masters, you know, depending on, again, what flavor of what you're running. But typically that's taken on within the development team many of the security tools today already offer open apis so being able to set that up within your automated build environment and you know having access to be able to call that api to then execute these tests and deliver you your results again the other piece that'll come i I'm sure, Andy, you're already thinking here is, you know, being able to store those results

Starting point is 00:28:48 in some other FHIR folder system. Because again, as you're automatically bringing up these various development or test environments, you're also going to deprecate them. So, you know, would not want to have that located on those machines, again, whether they be virtualized machines, whether they be containers, however you are architecting your dev and test environments. Same exact example, just store all the results over on a folder or file share so that, again, whatever you're doing to bring that up and illustrate it within your information radiator can go and pull those results from that file system to then display it but again most of this is already

Starting point is 00:29:30 being adopted by the development teams in that manner again many of the tools today already have apis openly available cool and then and then i assume also when you talk about you know taking taking these results and then displaying them i I mean, a perfect example or a perfect location for that would be like a build server, like a Jenkins, where you are collecting all of your results, where you actually execute your pipeline. And then you have your quality gates and then your Jenkins, your bamboo, whatever build server you have also keeps track of the build artifacts, which includes the quality metrics of every stage in the pipeline. Yes. And again, Jenkins is one way you could do it. There's a number of other tools. Again, I find every organization maybe has a little bit different way of doing it.

Starting point is 00:30:20 But you're absolutely right. Something like Jenkins would be easy to illustrate it in. There's plenty of dashboards out of the box with them that you could leverage. There's many other tools out there today that many teams already have in their environment. It's just thinking about, oh, right, we could do this. How would we adopt that and then be able to deliver these types of results? The other important part, though, that I've learned, Andy, through this is making sure that's visualized outside of just your development team. How is it that and this is a whole nother topic as well as, you know, building in that culture where looking at the build at that frequency and helping people to understand, you know, why is that build important?

Starting point is 00:31:01 Why is the quality of that important? And I think the other piece is as we opened up with those five S's, you know, who's not interested in speed or stability or scalability or savings and security? So there's always that piece of the culture of the organization. And how is it that now we can execute these tests? Now we can get this level of frequency. We can see quality. But why is that important to some of your senior stakeholders? Well, because I guess in the underlying, as you said, in the end, it impacts the bottom line, which is how happy are our users? Do we make money?

Starting point is 00:31:38 Right. Is our system up and running after – if we miss a critical security hole and then somebody brings down our system and then nobody can spend money with us, obviously business cares. So better get the test coverage up. Better make sure you're catching more things earlier. Even worse with security is if you, if not just bringing it down, but if you get a breach and then you have to spend all that money on, you know, getting people, what do they call it? The background or the credit watches and all that. It could be a very, very costly mistake. Yeah. Also legal fees, I guess. I mean, I'm not that familiar with all that, but I'm sure there's a lot of costs.

Starting point is 00:32:24 I'm not sure if you guys watch Silicon Valley, the show. I'm still catching up on it, but I've gotten I think first one and a half seasons or something. Yeah. Season three. I feel like they've lived part of my life just watching this show. Did you watch the latest episode last Sunday by any chance? I have not. I think I'm caught up to – was it season five, six, something like that.

Starting point is 00:32:47 I did not see it this past Sunday. Okay, then I'm not spoiling it, but it's basically in the same direction, legal fees and stuff like that. Anyway, so Todd, we talked a lot about one topic that I believe, at least we use the term shifting left, shifting left quality, shifting left performance, shifting left security, baking it automatically into the pipeline, and that's all great. But obviously we unfortunately cannot find everything in pre-prod. What suggestions do you have and best practices, if you can name maybe

Starting point is 00:33:23 one or two top ones in terms of closing the feedback loop in production, because some certain things can only be seen in production. Because this is also the real environment with the real users out there, and then having real impact of evil people that are out there and trying to poke holes. Any things here to kind of close the cycle, close the feedback loop? I think there's two pieces that I would highlight here. And maybe some of this, you know, is kind of ringing Brian's bell a little bit. But it's – and again, if we're going to call shifting left what we just spoke about, it's also, you know, how do you shift right? So how is it that you can bring the same monitoring that you have in production into all of your pre-production environments?

Starting point is 00:34:09 And that way, being able to see those results of, okay, I've just moved this release to production. I want to be able to see how that's performing, how it's functioning, how it's from a security perspective, you know, what are my end users doing and how is the system as a result being impacted? But also, is there a way that I can see that or recreate that in my pre-production environments? So I think that might be, you know, one area just from a monitoring perspective that I've often seen as not a common practice, but as soon as you start doing it, you start seeing some interesting insights. You know, some very, very large production systems, you know, not always are organizations able to have like or same scale pre-production environments as they have in production. And some people will more or less assume a fairly linear extrapolation

Starting point is 00:35:06 percentage as they're in their pre-production environments. So again, another benefit or value coming out of this shift rate monitoring, being able to do both production and pre-production environment monitoring is you can start to see some of those bits. There's a second one that I'd also want to jump in on, which is making sure you're just measuring some very basic metrics. One of those is the percentage of commit done by each of those teams. And again, that's measured as a percentage based on the stories they commit to for that specific sprint. Another one is number of production incidents. So, you know, as far as the old measurements on defect density and things like that,

Starting point is 00:35:56 I don't think that's as much value for many organizations as just focusing on the production incidents that have been the result of a release or sprint from a specific team. So being able to tie that accountability and ownership and again, enabling the teams to then figure out, you know, hey, why is it that this happened in production? Why did we not see it pre-production? And what are some things that we could do to change now so that we don't have that in the future? So again, those two pieces are probably other items

Starting point is 00:36:22 that I would highlight given your question there. That's really amazing because you're talking about now not just monitoring the applications, but monitoring, maybe lack of a better word, monitoring the pipe line or monitoring the release process and collecting metrics on the releases themselves, not just application and hardware and process metrics, but tracking those actual release metrics that's um that's that's a new idea to me so thank you for bringing that up yeah and i like so i like the two metrics just to re to to to recite you said the first metric was the percentage of stories or commits based on the story they committed to, right?

Starting point is 00:37:06 Correct. Yeah. And the second one was the number of incidents in production. Not necessarily bugs, but incidents, right? So you might have a bug where, okay, the text field didn't have a limit, something silly and insignificant like that, and who cares about that? But when you actually have a production production incident that's a much better metric i think and that kind of ties into a lot of what we talk about right because a lot of it is are you monitoring the the proper metrics right so that that concept of the production incidents i think is a much more mature way of looking at it

Starting point is 00:37:40 so that you don't get clouded by all these little minor things that might not really matter so much. And it's actually a great quality metric that also I think is easy understandable to business. Because if we have a certain amount of incidents, I can probably also correlate that to how many main hours did I spend to work on these incidents, whether it's in first-level support, second-level support, or in engineering. And also what was the impact to the customer. And I think this is a great way to correlate business metrics or business answers with the quality that comes in from the pipeline. And then we know, hey, in the last three months with every release,

Starting point is 00:38:19 we saw an increase in production incidents and an increase in customer impact. And we know exactly which features it actually, from where it came from, then I can make an investment and say, well, now it's about time to fix this because the trend is alarming

Starting point is 00:38:33 and we need to work on that. Yeah, and on a related note, if I can switch a teeny bit here, Todd, I was reading one of your entries on TechBeacon, by the way. Do you, is TechBeacon, by the way. Is TechBeacon yours, or do you write a lot for that? So I was one of the co-founders of TechBeacon, but TechBeacon is an HPE-sponsored website. What's extraordinarily unique about it is you probably saw it's content-rich rich and all of the contributors are practitioners

Starting point is 00:39:07 like you and I. So again, what you're getting is pure content. There's no sales involved, as you could tell on the site, but it's just wonderful content. And again, I'm a high contributor, but also one of the original co-founders. You're not in Denver, right? You said you're a high contributor. I'm kidding. Okay. So the reason I brought it up, right, it's on techbeacon.com. Go check it out if you're listening.

Starting point is 00:39:33 It's got a lot of great stuff on it because we were talking about shifting right. And one of the articles I was reading, somewhat related, is how to build performance requirements to your user stories, right? And while I was reading this, I saw a lot of how to put in those performance checkpoints on the card so that you can get it out to production properly. And now we're talking about shifting right into production, which brought my mind to Andy. If you recall, one of our last shows, we were talking with Karinka Biedoff from Facebook, and she had mentioned the idea of creating success measurements before deployment. So kind of, so Todd, the idea here is you're going to put out a new feature or a new code, right? And you're going to put it, introduce it introduce it into the production realm.

Starting point is 00:40:25 Now, before it goes into production, you as the development team or whoever's requesting this new component in there has to define the success of that component once it gets into production because it's not good enough to push it out there and say, hey, it ran. It's not falling down.

Starting point is 00:40:42 Building in some more metrics like, is it running? How much power is it taking to run? Whether it's compute power, memory, whatever, there? Even if 1% of the people are using it and it's really running really good, maybe it's just a feature people aren't interested in. And that comes back into some of that savings of do we pull that code out? Is this something that, I don't know, fits into sort of the areas you're in or you're looking at from that point of view? It is, Brian. And it's funny that you referred to techbeacon.com and that specific article of how to build performance into your user stories. This was actually a quite fun story for me to write. And I don't know, maybe we can include a link to it.

Starting point is 00:41:42 But, you know, right to where you're talking about right now was a piece of it. My editor was funny. He goes, you know, Todd, as much as you're telling me about it and I'm reading about it, he said, you know, I need you to actually fill out the cards as somebody would do potentially to go ahead and stick up on the wall with their team. And I said, well, absolutely. So I had to go to the drugstore to go ahead and buy these three by five cards and then actually write them and then took pictures of them to include in the article. But what you probably remember seeing in that article are those acceptance criteria is what what I've heard many people call them. You would write on the back. So, for example, in that article, the front of it on one of the I think I used three different cards, but one of them was as an airline passenger, I want to be able to check flight status on my mobile app so that I can make my flight. And I was actually recalling as I was writing this story, I think I was flying from Philadelphia to L.A. And, you know, they had to shut down the one plane because of mechanical issues. But then I had to go to a whole nother terminal to be able to try to make this flight. But nobody had the gate number.

Starting point is 00:42:51 So I'm sitting here on that mobile app saying, OK, well, where am I going to be? And that's why I said, well, flip the card over. And those acceptance criteria, Brian, that you're referring to, what I wrote on the back of that card was, you know, must be able to handle 10,000 concurrent users. Users are going to be connecting over a variety of networks, 60% of them on a 4G, 20% on a 2.5G, you know, et cetera, et cetera. Using mobile data encryption to ensure that we have secure data transmissions, also application launch time of two seconds or less, and then a screen-to-screen response time of three seconds or less. So, again, thinking about, right, how is it that you're building the, you know, this was an example of building performance into your user stories. But as you can hear, I've also think, you know, about functional pieces, security pieces.

Starting point is 00:43:40 So, again, starting early and building that into your process is absolutely critical. Right. Right. And then I guess the last piece would be like, how do you like, let's say you deliver all of that, right? Let's say it all gets out there. Defining a measure of, all right, now that we delivered it all, is it, is it, are people using it and are people happy? i think um you know you know looking at the the concept of the commits done and the production incidents taking a look at also adding in the measures of uh you know what's the usage of this now that we delivered exactly what we said we wanted to deliver and we met our goal what's the real world usage and accept and response to

Starting point is 00:44:22 this and pulling that back in as a metric, I think could work really well too. Well, I'm a, I'm fascinated about, about the discussion and actually taught one thing that you, I think we could talk probably for a much longer time, but I don't, I know we don't have more time today, but I would love to come back, maybe with another episode and talking more particularly about security because even though we touched base on it, I think this is a topic and also, Brian, as you said, it's a hot topic. We see it more and more. And it's also our listeners are more interested in that.

Starting point is 00:44:58 So I would love to invite you back and talk more about DevSecOps, as they call it, how to bake security better into the pipeline. I really want to understand a little bit more about what the best practices are. I mean, you told us a couple of those, but this is an open invitation now for you for a future episode on that. That would be really great. Excellent. Well, Andy and Brian, I really appreciate the opportunity to participate in this podcast

Starting point is 00:45:24 and absolutely look forward to the opportunity to do some more in the future. Thank you. Excellent. Well, Andy and Brian, I really appreciate the opportunity to participateone episodes on, on a lot of different, uh, concepts, where we're just going to dive in with some, some experts on, you know, things like, uh, um, open stack, open shift. Yeah. All these other kinds of concepts. And I think the security thing could fit really well into that, that series. So Todd, if you, if you're saying, yes, we're going to hold you to it next, we have it on, on tape. Well, there's no tape, but you do have it on digital bits and bytes.

Starting point is 00:46:09 Andy, is there any – or Andy or Todd, I don't know if either one of you want to go first. It looks like we're about wrapping up here. Any kind of last thoughts or concepts you wanted to throw out or is it time for the summarrator? I want to let go Todd first If you have any summaries. Well, I guess, I mean, as, as we've talked through a number of these different ideas and we think about, you know, why is this important and, you know, specifically what is the impact of not doing some of these things and coming back to that business conversation, I think, you know, increasing revenue is something that comes to mind for many business leaders. How do we attract new customers?

Starting point is 00:46:49 How do we retain existing customers? What is the impact of not doing this and having an impact to our brand? So thinking about brand value also is functional performance or security a competitive advantage that your business is looking to take out in the market today? Hopefully, the answer is yes to all four of those. But that's one of the reasons why or four of the reasons why, as I've bumped up against different business leaders that they say, yeah, this is absolutely important. The second thing and last thing I would want to mention again is just this book, Effective Performance Engineering. It's available for free for a digital download off of Effective Performance Engineering. And when you type that in, just go to get the book and you'll see a very quick and easy download there. Just want to get this information out and share it.

Starting point is 00:47:37 I don't make a dime on it. So, again, it's out there for the taking and sharing. Hopefully we can all learn a little bit and let's give some feedback back as to what we learned and what we could do to improve next time. We'll have a link to that on the site there, Todd, as well. So hopefully people will download that. Andy, do you want to do the summonator? Yeah, the summonator. So I love the fact that you're echoing what we've been talking about for a while, which is shifting left quality performance, now also security, which we have not really talked about in the past. Shifting right, meaning – or pulling – I mean the question is always shifting right or pulling right or pulling stuff in from the right to the left, meaning looking at production data but also getting production monitoring information into a pre-broad environments so these are things we've been talking about because it's

Starting point is 00:48:29 very valuable because it allows you to get more meaningful data about the quality performance scalability and speed i think all the four at the five s's that you had these are all metrics that we can measure early on and make better decisions on whether we want to promote changes further into the pipeline and what we do with these changes in case they hit production, there's a problem. So I like that, shifting left, shifting right. Security, as I said earlier, thanks for the example. Thanks for reminding us about the DUNE outage that we had and all impacted us, showing us how vulnerable we all are, but not only our own systems, but also the systems that we are depending on. And thanks for giving back to the community through your effective performance engineering book and through all the blogging that you also do. And we'll definitely call you again on doing another episode. Excellent.

Starting point is 00:49:26 Sounds great. Thank you, guys. And Andy, I just wanted to add one more thing too. And we're talking about, you know, and I want to say these slowly because I want people to really think about them for a minute. Speed and agility, stability, scalability, savings, and security, right? Where we've been talking about those in context of the release cycle or, you know, your different sprints, but also to think about that in whatever role you

Starting point is 00:49:51 might be fulfilling, whether or not you're doing functional automation, your developer, architect, performance, those can apply to your process specifically as well, not just the whole release cycle. There are ways you can bake all those in to increase the speed and agility of your process. If you recall, Andy, when we were talking to Rick Boyd, and he was talking about delivering kind of performance as a service, they had to figure out how to scale that service to make it more robust and be able to serve all their clients within the company. And you get the savings and stability of all these things. So there's even beyond that pipeline of code, it's in the processes that you're running yourself

Starting point is 00:50:31 that you can apply these things to on another layer, which is a whole different kind of way of looking at it. But if you think about them, I'm sure anybody can come up with a lot of ways they can imagine how that takes shape. Todd, I want to toss, before leave on a toss back to you. Do you have any, uh, appearances or anything besides, you know, you, we, we have your book. Uh, we've, we've got the, um, uh, the, the, the, the webpage, which we'll put links on.

Starting point is 00:50:54 Are you doing any upcoming performances anytime soon? Uh, this will probably be, uh, we're looking maybe, I think this is going to be airing this month. So maybe anything end of May or June, July timeframe? Most of my stuff you'll either find on LinkedIn or on Twitter. So LinkedIn, of course, you can just look for my name on Twitter. I am appperfeng. So it's A-P-P-P-E-R-F-E-N-G. But many of the speaking bits that I've been doing lately have been with a variety of organizations, including some of the STAR events, some of the CMP events, some of the O'Reilly events I've been doing as well. So a lot of papers are submitted. But I'm trying to think if there's anything. I've got two things in June that are probably coming up. I've got a security event that I'm doing here

Starting point is 00:51:53 locally in Wilmington on the 17th of May. But I'll be out in the Bay Area in June. I'm also going to be in London, I believe, in June. So again, I would just, if you go out to my LinkedIn profile, my Twitter handle, check out those bits, you'll go ahead and see many of the events that I've got coming up. Excellent. Andy, do you have anything to promote before we take off here? Yeah, just my personal highlights. Dev1 in Linz, Austria, early June. I think there's still tickets out there, dev1.at. Then DevOps Enterprise Summit in London early June. And I believe Velocity is coming up end of June. Then also ATIRC, ATARC, Federal DevOps, where I'm speaking, in Washington.

Starting point is 00:52:38 And maybe some more, but these are the highlights that are coming up. Okay. And I'll be probably making some popcorn in my house this weekend. So that's about all I got. Although I did just do that podcast, but that's in the past now. But yeah, that was awesome. All right. Well, unless anyone's got anything else to say, I think we're at a wrap here.

Starting point is 00:52:59 Thank you all for listening. We'd love any kind of feedback. Please check out TechBeacon. It's got a lot of great articles on it. I was starting to look through a bunch of them yesterday. Great stuff there, Todd. And if you have any feedback for us, you can

Starting point is 00:53:13 message at pure underscore DT, or you can always reach Andy at GrabnerAndy on Twitter, and myself is at EmperorWilson. Love to hear any ideas, feedbacks, or ideas for future shows. Um, and without further ado, then I'd say we can all say goodbye. Ready? One, two, three. Goodbye. Bye. Todd, you missed it. Goodbye.

Starting point is 00:53:38 Ciao. Thanks guys.

PurePerformance - 036 Baking Functional, Performance and Security into your DevOps Best Practices

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.