PurePerformance - 012 Automating Performance into the Capital One Delivery Pipeline

Episode Date: September 12, 2016

Adam Auerbach (@Bugman31) has helped Capital One transform their development and testing practices into the Digital Delivery Age. Practicing ATDD and DevOps allows them to deploy high quality software... continuously. One of their challenges has been the rather slow performance testing stage in their pipeline. Breaking up performance test into smaller units, using Docker to allow development to run concurrency and scalability tests early on, and automating these tests into their pipeline are some of the actions they have taken to level-up their performance engineering practices. Listen to this podcast to learn about how Capital One pushes code through the pipeline, what they have already achieved in their transformation and where the road is heading.Related Links:* Hygea Delivery Pipeline Dashboard https://github.com/capitalone/Hygieia* Capital One Labs http://www.capitalonelabs.com/#welcome* Capital One DevExchange https://developer.capitalone.com/

Transcript
Discussion (0)
Starting point is 00:00:00 It's time for Pure Performance. Get your stopwatches ready. It's time of Pure Performance. My name is Brian Wilson and as always we have Andy Grabner. Hello Andy. Hey Brian. It's been a while since we've last recorded because I i've been i've been gone for for a while i know it doesn't matter to the audience because they keep hearing the episodes they keep coming on i think bi-weekly schedule now is that what we have yeah a couple of others inserted in between just because we had to squeeze them in but yeah pretty much bi-weekly with some special shows yeah well i'm
Starting point is 00:01:00 i'm happy to be back it's the almost well, the summer is getting towards an end, but not the end yet. And I had a fantastic, fabulous time in Europe the last three weeks. I got some tan. I got the chance to dance with my girlfriend in Slovenia at the Salsa Congress. Wow. Yeah, it was really cool, actually. But now back to reality, back to performance. I got to,
Starting point is 00:01:25 well, I got to spend some time in a lovely New Jersey, um, humid, hot New Jersey, visiting friends and family. So I'm nice and refreshed as well. But before we continue, I do need to,
Starting point is 00:01:35 uh, give a shout out, um, to one of our listeners, uh, Ted. Um, he's my wife's cousin's fiance.
Starting point is 00:01:44 And due to some sad circumstances of, uh, her cousin's father dying or uncle dying, um, they were together and Ted listens to us a lot. And one night at the bar after the repass, uh, Megan sent me a video of Ted doing a impersonation of our intro right off the cuff. I was like pretty impressed, but i was also embarrassed for myself so hi ted and thanks for being a loyal listener so we have a special guest today do you want to we do you have a history with him so why don't you go ahead and introduce him yeah the thing is i remember the last time actually we had it on air already on
Starting point is 00:02:26 one of my performance cafe it's adam our buck even though i know i think adam when we asked you earlier and i know you're on the line already but when he asked you earlier how to pronounce your name he said our back and i think me with my german background i would say our buck because they at least that's how i would say so i I call you Adam Auerbach. I just call you Adam. But Adam and I have a history because I think Adam, we met last year at Velocity. And since then, I've just happened to bump into you at different conferences. You've been promoting a topic that is dear to my heart, which we will be talking about a lot, which is continuous testing, continuous performance, and speeding up the pipeline by pushing performance early in the lifecycle.
Starting point is 00:03:08 And without further ado, I don't want to, because I think you can introduce yourself better. Adam, welcome to Pure Performance. And maybe you want to give the listeners a little background about yourself, what you do and what drives you, and then we go right into the topic. Awesome. Thanks, Andy. And thanks, Brian, for having me. Yeah, so Adam Auerbach. I work at Capital One. I am a senior director of technology, and I lead the enterprise group advanced testing and release services for Capital One.
Starting point is 00:03:37 And so our big mission is enabling continuous testing, continuous delivery for our feature teams. My background has mostly been in the testing profession for over 15, 16 years now, and I've been at different banks and whatnot and have gone through different transformations. But over the last four years, I've been at Capital One, and we've been on just a magical journey, which started with our agile transformation and then moved into DevOps and continuous testing, continuous delivery. And now we have teams that are actually living that dream, which is pretty exciting. So I'm happy to be here and happy to talk about this with you guys. Awesome. And so now this is actually an interesting – you mentioned something very interesting.
Starting point is 00:04:25 It's the transformational process, and you said you have a background in testing, and so does Brian as well. We're all testers kind of in our initial profession. And you said you're transformed, and you mentioned DevOps. You mentioned continuous delivery so has your and i remember the first presentation you gave at velocity which for me is something i always reference in my presentations were and correct me if i'm wrong what you said back then is you three years ago and now probably four years ago you used to be eight testing teams and you did a lot of manual testing and now three years later there's no testing teams anymore and the only I think you're the
Starting point is 00:05:05 only one left now that is doing a lot of mentoring and basically providing the framework, the tooling, the automation support. And I think that, for me, was phenomenal. And I think I'm seeing a lot of this transformation also going on with other companies. But did I get that right? Is that kind of the major, the agile movement first that you had? Yeah, that's been a big key. The you know, we still have testers. The testers are on teams and they're part of the development organization. And we really have have structured around delivering value to our customers. And so. So, yeah, so the testers are on the teams, and my team is really focused around enablers.
Starting point is 00:05:48 So originally, enterprise groups were doing performance testing or doing test automation or even testing work on behalf of a line. But as we've transitioned to agile and put everyone on teams, then, you know, we we start embracing DevOps, you you you start to focus on flow and really having true feature teams being able to deliver quality software early and often. And in order to do that, you really have to, like, enable them to be able to to do everything. Um, and so from an enterprise perspective, it's really been around getting out of being a service-based organization and being product-based, you know, helping people have, uh, something that they can use to get their data that they need or something that they can use to do performance testing or something that they can use, uh, to help with their test automation and give them those, those tools and, and resources so that they can do that themselves.
Starting point is 00:06:48 I'm at a testing conference today, the Practical Software Quality and Testing Conference in San Diego. And actually, most of my morning has been spent talking to people around the future of testing and how testers are still important, but they just have to embrace these technologies, embrace the ability to do these things where they're able to automate the validations in real time, giving developers fast feedback. And that's how you continue to have this career in testing. And so that's been part of the transformation that myself and my team have been on
Starting point is 00:07:28 and now trying to spread that word to others so that they can get left behind, in essence, through what's happening. Sorry, I wanted to ask, we often talk about that in terms of leveling up. I'm sure we didn't make up that term, but it's general concept of if you're a tester, no matter what it is, you have to level up. If you're going to keep your job and keep running with this, uh, with these concepts, how difficult either for you, or do you think in general, um, is that process? And do you see a lot of people not making it through it or, you know, in the environments you've been around? Or have you seen kind of more of as long as the people are willing to put some skin in the game, they can level up and get into these positions and maintain them as well? Yeah, so it definitely takes some work.
Starting point is 00:08:18 And I will tell you, like when we first started this journey, we had some good debates about this. Part of the conversation was, you know, before you used to build frameworks to, you know, bring, to dumb it down, to bring, to obfuscate that layer. So a manual tester could do all these things. And so, but now what we're trying to do is get them the technical skills so that they know how to create jobs in Jenkins or they know how to create a Docker container to do their automation or they know how to use the open source tools. And that was a big leap that we took. And not everyone's going to be able to make that transition. It is not easy. It does take work. Capital One made a large investment. We have a software engineering college where we've built courses to help people get them some programming
Starting point is 00:09:14 skills or skills with the different cloud technologies or whatever the case may be. We have technical coaches that are there. We've done pair programming to partner people together. But then we've also given people alternatives. So if you don't want to go down the technical route or it's just not your thing, the subject matter expertise that you have is still valuable. And so we have had people move into product owner roles or scrum master roles or other non-technical roles. Um, you know, I would say like, uh, majority of people are able to do it. Uh, it just does, does take some work and, you know, and that's what, what there is an investment that the company needs to make to help people go through it just, and that that's not just an investment in the resources, but also in the time,
Starting point is 00:10:03 right. Because people need to have time to, their classes or spend time just putting the technology at play and learning on the job. And, you know, you can't, you got to give them some leeway when it comes time to, from a velocity perspective. You know, that's one thing that I've heard from people say, well, we have this deadline. And so how do I have time to learn this if I have this deliverable? And that is something that management has to be able to accept. And, you know, Capital One has done a really good job of letting people, giving them the resources, but also giving them the space to pick it up. And we have seen majority of our people are able to do it. That's all.
Starting point is 00:10:42 So that's interesting. So it's a top-down commitment. And also, I think it makes Capital One as an employer much more interesting for employees because we all know it is hard to find the people with the right skills. But you basically said, well, we are giving people the chance to actually move into that field. We invest the time and the money because eventually it will pay off because we all know how hard it is to get people right and so you actually took the the uh at least one of the routes to say we are
Starting point is 00:11:10 going to invest in the people within our company and level them up to where they need to be and that's paying off yeah i mean remember i mean uh the people who have been with the company for some time have some really great subject matter expertise. They have the battle scars of, of what we went through when we were a smaller company and through the different acquisitions and where we are today. And, and those battle scars have a lot of value, um, and, and great perspective. And so you definitely don't want to throw that all away. You know, definitely it's good to bring on new talent and, and, and new perspective, but at the same time, you know, we have, you know, a very large technology group and so you don't want to lose, um, good people. And so yeah, Capital One's definitely investing in, in new technologies, new skills, and keeping people relevant as we continue to
Starting point is 00:12:02 push forward. So you might be able to say that Capital One asked their employees what's in their wallet? Definitely. Hey, so you talked about feature teams and that basically you migrated or you moved from service-oriented testing teams to feature teams. That means the feature teams itself have the quote unquote testing quality role within the team. So does this mean you, you have fixed assigned a quality engineers in every feature team or are you still moving around a little bit or are they just part of these feature teams? Um, so teams are fixed. So, uh, when we put a team together, that team is usually together for at least three months or six sprints.
Starting point is 00:12:49 But yeah, the people typically, when we go through like a bunch, when we go through planning sessions, like those teams are pretty much set. I mean, that's a core Agile construct, right? That the team doesn't have too much variance. You know, and then who does the testing on the team? I think that's where, you know, that becomes a little bit more nebulous, right? When we originally started the agile transformation, sure, you had like one or two testers on the team and they did all the testing. But now as you get into continuous testing, continuous delivery, everybody plays a role. You know, a story is not done unless you have automated tests that are running and passing. And that's not just like one person's job that becomes the team or the people that own that story.
Starting point is 00:13:36 And so that's, you know, who you would say is the quality and the whole team owns quality. Yeah. The whole team owns quality. And I think I remember one of the quotes that – well, I got a lot of strange – good and strange comments. But one of your quotes last year at Velocity was – and I have it on my slides. And it says, we don't lock bugs. We fix them. And basically, that's just the mentality of an agile team. And the benefit of an agile team what you just said the story is not done until the whole team says this is done done and therefore there's no need at that later stage
Starting point is 00:14:10 to find and log a bug because of course in the real in the theoretical world everything that exits the team the feature team at the end of sprint if the story is complete has a right quality right exactly um you know part of everything that we're doing is is around being able to deliver something into production um and so there is nothing after the fact right where it's being coded and tested so that it can ship today. That's the goal. And if there's a defect, you need to fix it. Otherwise, it doesn't ship. Or if it's not something that's worth fixing, then it's not a defect.
Starting point is 00:14:57 It becomes a story in the backlog for a later point, and it's an enhancement. And that definitely is – that might be the first thing that we tackled when we started our transformation, but it's a big deal in the sense that do things now in the moment. Anything, performance issues, security issues, regression issues, everything that we're doing today is about shifting all of that left and bringing it right now so that you get those insights. Because we know that a developer that is, you know, checks in their code and then moves on to something else to go back and have to fix something, you know, a day, a week, two weeks later, most likely they're going to, they've already forgotten what they did, how they did it, and they're going to break something else. And so every, that's why that, you know, the shift left is really so overarching and applies to so many things. But we really want to get them focused on doing things right now.
Starting point is 00:15:49 And defects are just a – you're not communicating. There are so many things wrong with defects. I could go on a whole show just about why logging defects is bad. But, yeah. I wanted to ask about, this is a question, not a question, a question on my side that I still haven't satisfactorily wrapped my head around or maybe not gotten a good answer to. Is in terms of, you know, there's, we can look at performance testing and load testing. And as Andy and I have discussed in previous tests, performance testing can be done without load, right?
Starting point is 00:16:30 That's why you're taking a look at the performance metrics of a single call or whatever, right? There's a lot that can be done on the left side there. There's a lot that can be done during all the CI components. But when it comes time to a load test, and this is where I always get confused in these more agile DevOps-y, continue, you know, CICD areas, my, you know, my specialty in the past was load testing,
Starting point is 00:16:52 and it was always a behemoth. It was always, we had all these scripts, we had to make sure they all worked. Sure, we can test an individual component, but we really would always want to test to be complete. We would always want to test the system as a whole, which of them, you know, that brought back then the whole problem of do all the scripts still work and all that. How is low testing being handled or how do you see that it should be being handled, um, in this kind of scenario? How do you make it from not being a bottle? How do you keep it from not being a bottleneck basically is where I'm getting at. Yeah. So I think, um, I'll answer and I know Andy probably has a perspective as well. So from Capital One's perspective, you have to be able to do several different things.
Starting point is 00:17:35 And so to your point, when a developer checks in code, there should be some level of performance tests that they can run just at a component level just to make sure that the response rates are what they should be and there's been no degradation. And those types of tests can run pretty quickly. And then using monitoring tools in your non-prod environments has been a great way for us to see. Now, granted, again, it's not load testing. It's still just performance testing and response rates. But it's a great way to get insights. While you're exercising the system for your acceptance tests, you're able to get insights into degradation right there. And then lastly, you do need to do some level of load testing. But what we've started doing is taking advantage of the cloud tools so something
Starting point is 00:18:25 like docker for example you can take your selenium tests you can spin up many many docker containers and in essence just ramp up your your functional testing to create load And you can do that multi-threaded, so you don't need that many servers to be able to have a thousand or so browsers or whatnot to be able to generate that load. And so you're able to still do it paralyzed in a parallel execution and get that information in real time. that's for us that's been the way that we've been doing we basically just have broken things down getting component level
Starting point is 00:19:12 information do it using the monitoring tools and then being able to use you know things like containers to be able to take advantage of the tests we already have to be able to then generate additional load all right so that's cool so it's basically you're just repurposing the scripts or not even repurposing just reusing the scripts that you already have for all your other testing or most of your other testing and pushing it out at scale to generate load exactly i mean we um you know there'll be times when maybe we'll do something um one-off uh and-off and use a different tool set and dedicated performance testers for something. But at the end of the day, the people on the team know the scripts that they've created.
Starting point is 00:19:55 And so why make them learn something else or create something additional that then takes more work to support when you can take the test that they've already written and just scale them up and get a nice good end-to-end view of performance under load. So one thing that I – I mean I love this and I think the – one thing that you said is basically what it is all about is identifying regressions and regressions are not only there. It's not only red and green from a functional perspective, but also performance issues can easily – or not easily, but some of them can be identified even in the earlier stage where you just run a functional test by looking at the right metrics. It's a theme that I've been promoting and we've been promoting for a while. I call it metrics-driven CICD, where I basically look at key architectural performance
Starting point is 00:20:52 and scalability metrics while executing something like a unit test, a component test, a functional test. Because if I'm a developer and I make a code change, and that code change now means my local chain meter test that is testing five API calls is now sending 20% more bytes over the wire, or it is making 5% more database calls. I know this is going to become a problem later on because it just needs more resources or I'm just inefficient in the way I am letting my components talk with each other. And that will become a scalability and performance issue. And it seems that you are doing the same thing, right? You're allowing your developers or you're asking the developer, you're demanding your feature teams that they do a lot of these tests early on, identify
Starting point is 00:21:45 regressions early on. So you can find a lot of problems much earlier without pushing these code changes into later stages of the pipeline where executing your longer running tests would find the same problems, but they would just take longer. And so you're optimizing your pipeline flow. Yeah, I think that's a, it's so critical. And I think that's the realization that many people don't have. You think of some of these tools and you think just production and, and by using them in non-prod, you get these insights and then you can, you know, create baselines, failure, build,
Starting point is 00:22:20 failure, failure, pipe, stop your pipeline and really react. And it gets the team more comfortable with the tools, how to bake some of this stuff in. And then when you ask them to do something in production, they already have that familiarity. They can already start to write alerts in production. It's just – to us, it's been really eye-opening and has had a lot of value. And even when you do find something late, it used to take us a while, days if not weeks, to find root cause of a performance issue. And now that time is cut down drastically because of those types of tools. And do your development teams, your feature teams, I think you just said it, but just to confirm,
Starting point is 00:23:08 your feature teams also define what the monitoring strategy should be for production, like how they can monitor the technical performance and scalability behavior of their features? I'm sorry, say that again. I said, do the feature teams, are they also responsible for not only the testing, but also defining the monitoring strategy? That means what needs to be monitored later on in production.
Starting point is 00:23:36 So for them to close the feedback loop, to see how their feature is actually really behaving in production, because their feature might behave a little different in production than in what they've tested. So are the developers and the feature teams also building in the monitoring or defining the monitoring, the dashboards, the metrics, and all that stuff?
Starting point is 00:23:56 Absolutely. Absolutely. Yeah, I mean, especially those are the same developers that have the pagers. So when something's not working, they're the ones being called. And so they absolutely have that accountability. How do developers left it if they have to be on call? They don't like it at all. And we've had some teams that want to hire somebody to be a dedicated on-call person. You know, so there is a lot of effort spent on making sure it's right up front. I mean,
Starting point is 00:24:27 um, I know Gene Kim has talked about it many times, right. But the first time a developer has woken up at two in the morning because there's a performance issue in production, uh, that's a great reminder when they're building something the next time to make sure that it's working all around. Um, so, I mean, it definitely is really powerful because no one wants that call. Yeah, hiring somebody sounds like a cop-out. Exactly. I mean, it's just going backwards.
Starting point is 00:24:52 Yeah. Yeah. Wow, that's pretty. And so can you tell me how many feature teams do you have that work in parallel? And how often do these guys deploy into production? How long does it take for, like, if a code change to actually go all the way through? I know it's like kind of three questions now.
Starting point is 00:25:11 But kind of we'll be interested in seeing how your development team actually looks like, how often you deploy, what's the throughput of your pipeline. It would be interesting to know. Yeah, so we have multiple lines of business. So just in general, we filed the scaled agile framework or safe, um, and our teams are aligned to trains. So, you know, typically in safe, they talk about 10 to 12, 12, 10 to 12 feature teams working on a train. And that's very common to what you would see at Capital One. I'm sure it changes where you go, but that's the general rule of thumb,
Starting point is 00:25:50 and they're all working in parallel. We try to practice where everyone is committing back to main or master and not having prolonged feature branches. And so in order to do that, you have to have some of these best practices that we're talking about. But yeah, I mean, code right now, we have teams that are delivering multiple times a day through this process.
Starting point is 00:26:16 And it's these best practices. It's also taking advantage of blue-green environments, feature toggles. There's other enablers that make that a reality, but I think you definitely could find instances of someone making a commit, and then an hour or two later, that being in production. Now, whether or not an end user can actually see it and touch it, that might depend the other uh pieces of that feature and you know how we've we're rolling it out you know if we're doing canary builds or whatnot but um yeah we it's a
Starting point is 00:26:54 it's pretty frequent and uh and who defines your performance criteria so who defines what is acceptable performance uh under which load and who is this the feature team itself or is there a quote-unquote business business person that says well we're working on this feature and therefore we're expecting x amount of users and it has to be that fast or how does this work how do you find the performance criteria and the SLAs? Yeah, so we practice acceptance test-driven development. And one of the big things with AT&T is around that three amigos and where the business is a business person with developer and tester sitting down going through user scenarios, which we then develop and test against. And so the same thing from a non-functional perspective.
Starting point is 00:27:45 So the business is there to provide that input around what the expected, you know, user load is going to be. And then we're able to work with the developers to figure out what the TPS or the transactions per second is going to be and then, you know, build it from there. There's sometimes what we'll do do maybe some exploratory stuff, looking at response, looking at where degradation is, but the business is involved in helping us to find those NFRs. Now, I'm also just – I opened up my browser here,
Starting point is 00:28:20 and I found the Hygieia framework that you guys uh put up on github which which i think is an awesome stuff in terms of visualizing your pipelines and i know you've been promoting this at the different conferences and you're encouraging people to integrate with it and i think i mean i'm still i still i want to i want to get in touch with you and with TopoPal, with getting a Dynatrace integration as well so that we can feed into your dashboard seeing whether a particular build phase or a particular code change in a particular phase is good or not. Do you want to tell us a little bit about Hygieia and how you use it, why you built it, and also what your plans are. And I think, Brian, we should probably put up a link on the podcast recording in case people don't know how to spell it based on my pronunciation.
Starting point is 00:29:15 I'll put up a link to the GitHub and also a direct link to the video on YouTube. I watched that today. It was a great overview. Was that you recording? I don't think that was you recording the audio. Was that on the YouTube video? No. Okay. I don't think so. So Hygieia came out of, a couple of years ago, we made this executive decision that we wanted that in order to compete going forward, that our competitors weren't going to be big banks, that our competitors were going to be these other fintech companies, and Google and Apple and whomever. And so in order to be
Starting point is 00:29:52 able to play with them, we need to be known as an engineering technology company. And open source is a key driver in that being able to move fast and being able to adopt quickly, as well as when you bring in top talent, there's an expectation that they want to be able to use and contribute to open source. So as a way to kind of eat our own dog food in this journey, we said, well, hey, as we're building out our pipelines, the teams need some type of dashboard to be able to see what's happening, where they are from an intent perspective, where they're at with their builds and commits,
Starting point is 00:30:29 what's the health of their pipeline, what's the quality overall. And so that's how we came up with the idea to build Hygieia and then use that as a way to cut our teeth on contributing what is needed from our perspective and a compliance perspective to be able to contribute back to open source, get our name out there and have other people help us. And so it's really been, it was named rookie of the open source rookie of the year last year. We have a lot of big companies that are using it and contributing to it. And for us, it's been a big tool to help provide Teams visibility into their pipelines. And now the latest thing that we just rolled out is this product view.
Starting point is 00:31:22 So many products are a combination of multiple pipelines. So you might have a pipeline from a UI perspective, from different APIs and other backend systems that you're using. And so in order to get a good picture of flow, we have this product view. And basically what that tells you is for each one of your pipelines, where are the commits and what stage are they at? So how many commits are waiting for a build? How many commits are waiting for a deploy?
Starting point is 00:31:49 How many commits are waiting to be tested? How many are waiting for performance? And then thus, what is your overall time to market from a commit all the way through to production? And so that's been able to help us get really crisp on where we have bottlenecks. You know, one of the things that one of the slides that we showed at Velocity this year was just that when we first started rolling this dashboard out, the fact that, you know, a commit might sit in all
Starting point is 00:32:16 these other stages for, you know, a couple minutes or maybe an hour at most, but then when it gets to performance, it might sit there for days. And so it really, you know, gives you a good indicator of, you know, we have a problem. We have, you know, performance right now is still happening way too late and it's happening too long. And so that was a real good reinforcement that we needed to lean in more to that space. But you wouldn't have had that. We knew that was somewhat the case, but this visualization just gives you a good sense of where things are at. And frankly, if something stopped, what impact does it have on other things?
Starting point is 00:33:01 Yeah, I'm just looking, I think, at the screenshot that you show the velocity it's also on the github page where it says uh it's about 10 days from comet to production for the the first project or the first product that is on there and it's just phenomenal because it shows you a heat map on on where on your production line you have your bottlenecks and and obviously there's just there's multiple options you can deal with the bottlenecks. One thing you said earlier for performance testing, now with new cloud technologies, you can paralyze a lot of work. But on the other side is if you shift left some of this
Starting point is 00:33:36 and break it down into smaller units that can be run earlier and faster and basically stop that build earlier before they clock the pipeline later at heavier weight stages then this is much more efficient right because if you i'm looking at this dashboard now and it says i'm not sure how many how many builds are in here but on average a test runs 65 minutes in perf and you have like 10 15 builds in there so if you can if you can if you can achieve the same level of performance testing with a test that only runs a minute or two i mean not the same level but you can identify a lot of these problems earlier and it only runs like like five minutes then you are shaving off 60 minutes
Starting point is 00:34:17 all of a sudden and you get faster feedback from engineers and they don't have to as you said earlier it's the it's the biggest pain if as an engineer you're in the middle of implementing a cool new feature and then you get a notification that the stuff you committed 10 days ago is now a big blocker and then it's just like not help this is this is just a thing the feedback loops we need to tighten them up and i think what you guys are doing is just phenomenal yeah you know um uh gene kim, uh, Gene Kim, Jez Humble and, uh, Nicole Forsgren, um, they help, uh, run and execute the puppet lab survey every year. And, uh, and people talk about metrics all the time, but that, that lead time or time to market is really like the, one of
Starting point is 00:35:01 the biggest indicators of how well you're doing um you know but then how how can you get a metric like that that then is actionable from a leadership perspective and this is you know how we got to this dashboard you know where we can actually dive into like what what are the bottlenecks that we're facing um it's not just one pipeline, it's across them, but being able to see that visualize and be able to then you could actually click into it to see, you know, specifically, which can what that commit is for and what might be getting held up. But it definitely was very eye opening for us. And, you know, it did shake out other issues, not just a performance testing bed, we had some other things where, you things where people had poor branching strategies where they had featured branches that lasted a long time, and then they got back into main.
Starting point is 00:35:52 And again, their lead time was very long. And so it's been very valuable for us. and um so i know where you are and i kind of have an understanding on where you are right now and i believe capital one and what you've done you are i think seen by a lot as as kind of somebody to follow because you did some great work and you're far ahead of many but i know you're not you never stop and you never stop to innovate and get better. So what are the next steps? What are the other next big milestones that you want to implement, that you want to change in order to stay ahead of competition, in order to stay flexible and innovative, in order to keep the talent? Any other big projects that you're currently working on that's coming up? Um, you know, I think right now we're focused on, um, you know, continuing to, to roll this out at scale. Um, you know, it's easier to do some of this stuff on Greenfield work than do some of the
Starting point is 00:36:54 legacy stuff. And so that's, that's the big mission right now is, is some of our core banking systems that are, that are older, you know, how do we make sure that these same great concepts can apply to them? So that's some of the stuff, things that we have going on now. I think the other thing too is just around open source and how do we continue to make more contributions
Starting point is 00:37:19 outside of Hygieia? We have another product called Cloud Custodian, which helps you better manage your AWS instances. Right now we're looking at service virtualization and we're looking at some of the chaos monkey type tools. So I think that's probably the next foray is just a greater presence within the open space, open source communities and contributions from Capital One.
Starting point is 00:37:53 And so I think what you just said is a problem that many companies face. Obviously, you started all of this, what we just talked about with some new projects where you had a greenfield experience and you could test new things and you could do things. And now you try to apply that to your, let's say, older legacy enterprise systems. I think this is a big challenge that a lot of people have, like how to get started. And it seems on your case, you also started with some projects where you could actually try something new. And now you try to apply that back. Maybe not all of it because maybe not every system can be changed in a way you're doing development here.
Starting point is 00:38:33 But at least you apply a lot of the lessons learned to the other systems to make them faster, more agile, and I guess also motivate the people and keep them that are still working on these older projects, right? Yes, absolutely. I mean the overarching theme of shifting left, removing constraints, being able to get fast feedback early, that applies to everybody. And so maybe you might not be able to do every single thing that we talked about, but you're able to implement some of those things and still have big impact, um, for your customer. Um, you know, that's the, you know, one of the other things that we've, we've done as part of this transformation is, is use things like value
Starting point is 00:39:16 stream analysis, where you sit down with, um, um, a legacy group and you go through like what it takes for, um, business intent to get developed and tested and out the door and where are they spending time manually doing things and where they have waste. And that's been a really great exercise for us to then be able to put together an action plan of, okay, you know, you have, you're spending 25% of your time on your builds and your deployments. Let's focus on that first. Then, okay, you have high defect leakage rates. You're spending time after the fact, and you're losing productivity,
Starting point is 00:39:57 and this is all done manually, so let's get some stuff automated. And basically you're able to then build out a plan for them to, and you can show them in their, in their world, how this will have impact to them. And that's been a real powerful tool for us to get business commitment because these things do take money. They do take resources and time and, uh, you know, just generalizing at a high level is one thing, but being able to give someone very specific, uh, direction based on their experience, uh, has been very powerful for us in making that argument on why they need to do these things. And with one question I had with, I think Andy and I have discussed this in the past. It's something that I think people are starting to think about more, but I haven't seen it deployed too much,
Starting point is 00:40:52 is another metric for operational metric could be cost of operations, you know, especially if you're going into public clouds or, you know, areas where you can actually track how much you're spending on hardware, network, disk, and all these other components so that you could see build to build, feature to feature, how much your costs are going up and down. Is that something you've thought about? Is that something you're doing at Capital One? Is that something that's ever been on your radar? It definitely is on our radar. We use it from the sense of, you know, when we first went to the cloud, we weren't as efficient as we should be.
Starting point is 00:41:33 Meaning we didn't have immutable servers. The infrastructure might not have been all scripted out. And so when there was a patch, there was downtime. Servers were always up. the environments weren't very elastic. And so yes, we did look at those infrastructure costs and very detailed by each one of our platforms and applications. And we use that as a mechanism to then go back and drive best practices. You know, you could get very transparent on people who did things through brute force versus the right way. looking at the costs from a cloud perspective and then translating that back into how well people have engineered their pipelines and their utilization of the cloud to stay as efficient.
Starting point is 00:42:30 Because the whole play about going to the cloud is that it was going to be more economical for us than having these big data centers. But we definitely saw early on that that was not the case because of how we implemented it. And so it really is an important metric you have to look at. Do you run the public cloud as well or do you build your private clouds? So our CIO, Rob Alexander, has been very public about us moving into AWS. So right now, we're still one or two years into that journey. But I think at some point, the goal is that majority of things are in the cloud, in the public cloud.
Starting point is 00:43:23 Hey, and to actually, I know it seems questions keep coming, but it's just interesting to have somebody like you there that we can ask questions so uh besides the cost effect of let's say you know features that run in software runs do you guys also are the feature teams also monitoring let's say which features are used how, which features may not be used to make a decision later on, which features to either kick out if it's not that used or features to optimize because you see, wow, it's amazing. People keep using that feature versus the other feature, but in order to make it more efficient, it makes more sense now to optimize it to keep down the operational costs. So do you have something built in where you keep monitoring a feature set and then to use that as a prioritization vehicle for the upcoming sprints? So definitely on our customer-facing sites and apps, the business absolutely has that information,
Starting point is 00:44:23 and they're using it from a product owner, product management perspective and prioritization. I'll tell you that we're using it, we're experimenting with it in some other aspects around team performance and trying to get teams more familiar with those metrics. One of the things that we've been talking about is like, how do you demonstrate a high performing team? How does a high performing team? And so what we talk about is, you know, high performing team is delivering value to their customer early and often. And then you get into what is value. And so having some of those types of metrics really has given us some insight. And again, like this like these are experiments.
Starting point is 00:45:07 It's not necessarily widespread just yet, but like we've used it internally on some of our test data management tools. So we built a homegrown application called OneSource, which gives feature teams the ability to get their own data in real time, on demand, as part of their pipelines. And so for this tool, you know, we've implemented monitoring tools so that we can see how people are using it, what features they're using, where they might have, uh, you know, fallout rates or abandonment, uh, and then focus our, the next features that we're building on those areas. And that's, uh, um, we're seeing some really positive results with that. And so hopefully that will grow more.
Starting point is 00:45:48 Cool. Yeah. Nice. Well, I guess we could probably go on for a while because I have a list of questions that are still up, but I know we also have to make sure we stay within our time constraints, Brian, because otherwise we keep going. I mean, Adam, I want to say thank you so much
Starting point is 00:46:10 for trying and doing the same thing that we try to do is spreading the word about that we just need to build better software and release it more often. In order to do that, we need to shift left quality checks. And I think that's just awesome. Hopefully, you also find our podcast useful to spread the word. And we do keep educating the people also at the speaking engagements that we have.
Starting point is 00:46:39 I'm really looking forward to also contribute to Hygeia. I think it's just an amazing project that you do and actually giving back to the community. But obviously, there's also benefits for you, for everybody that opens up to open source and that contributes and puts projects out there. So I think it's just a win-win situation. It's awesome to see that a company like Capital One says our main competition is not the big banks, but it's the Google and all the other companies because we are a technology company, even though our main business is dealing with money. But I think that's just a wake-up call from a lot of companies out there that need to transform and think differently about how they go about their business.
Starting point is 00:47:22 Because we know we are a software-driven world, and that's true for every business out there. Yeah, I also think it's really cool that Capital One gave everybody a chance to level up. You know, that's kind of the scariest thing in the testing field is, as all this stuff changes, what's going to happen to me, right? And I think it's really awesome the way they tackled it. You know, there's always going to happen to me. Right. And I think it's really awesome. Uh, the way they tackled it, um, you know, there's always going to be, unfortunately, some people who aren't up for
Starting point is 00:47:50 the challenge, but for, you know, anyone, you know, listening or who, who likes to follow this stuff, they're most likely up for the challenge. And I think that's really great that they gave them that opportunity to do it because I'm sure it's a lot more fun than running through tests from top to bottom and putting checks in boxes or whatever, you know, the old model was. Yeah. Well, I appreciate you guys having me. I hopefully I didn't, hopefully people find this interesting and educational. And, and I think, uh, to your point, I mean, there is a lot of great, uh, podcasts, there's books, there is a lot of information out there.
Starting point is 00:48:25 And so it's not impossible to transform. It's just a matter of like participating and doing it. Yeah. So any appearances coming up for you folk? So I'm speaking at Star West in Anaheim in October, and then in November I'm speaking at the DevOps East in Orlando. Andy? Well, I think when we did that little rehearsal, actually, I was smiling
Starting point is 00:49:00 because we will see each other at these two conferences. So I think the stalking continues, but please don't see it that way and adam as i said earlier when we were off the mic i will definitely make sure we get a couple glasses of beer or wine or whatever your preference is in beverage uh to make sure uh we show the appreciation uh that you are on this on this podcast additionally to the stuff that i mentioned i'm going to be at Java 1, which is going to be just a couple of days after this airs in San Francisco. And I also have QCon in San Francisco and CMG. Both of them are in November. So looking forward to that.
Starting point is 00:49:41 Hopefully, if some listeners are out there at these conferences, ping me. You can also ping me on Twitter at GrabnerAndy. And I think, Adam, you also have a pretty cool Twitter handle, don't you? Yes, I am bugman31. So please follow me, tweet me, ask me questions. I am emperorwilson on Twitter. I actually have a performance coming up, an appearance for the first time, Andy. My old band is getting back together for a reunion show.
Starting point is 00:50:11 So November 5th in Jersey City at Monty Hall in the performance space at WFMU will be opening for one of our influences from the 60s, the Silver Apples. So if you're in Jersey City, American Watercolor Movement is the name of the band. I just found out from somebody else that it was officially on, not directly from one of my bandmates,
Starting point is 00:50:32 which was kind of funny. They're like, oh, I heard you guys are playing. I'm like, oh, I didn't know that was official. So yeah, it's not a performance. Well, it is a performance related, but a different kind
Starting point is 00:50:41 of performance. Anyhow, yeah, you can also tweet. If you want to tweet anything about the podcast, do it hashtag pureperformanceatdynatrace, or you can also email us at pureperformanceatdynatrace.com. We'd love to hear from you. Anybody who wants to come on, be a guest,
Starting point is 00:51:02 please don't hesitate. We love having guests, and we love hearing stories from everybody. So participate, be part of us, be one of us. Join us. Adam, thank you so much for taking the time today. Enjoy the rest of your time up in warm California. Awesome. Thanks, guys. I appreciate it.
Starting point is 00:51:25 All right. Goodbye, everybody. Thank you. Awesome. Thanks, guys. I appreciate it. All right. Goodbye, everybody. Thank you. Bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.