PurePerformance - What is Data-Driven Continuous Delivery aka CDv2 with Tracy Ragan

Episode Date: January 11, 2021

When moving to microservice architectures its time to re-think continuous delivery. Just as many software services rely on a core data analytics engine to make better automated decisions we need to ap...ply the same for continuous delivery. We can assess the risk of every microservice deployment based on data from production and the desired change of configuration. We can assess the potential blast radius and mitigate it through modern delivery options such as blue/green, canaries or feature flags.Tracy Ragan, Creator & CEO of DeployHub, CDF board member and DevOps Institute Ambassador shares her thoughts on why we need to move to smarter data-driven delivery pipelines. Tracy (@TracyRagan) gives us insights into why not every microservice is created equal and what approaches we can take to better control updates that contain multiple microservice updates.Also make sure to check out their latest project Ortelius and take Tracy up on a virtual coffee chat as discussed in our podcast!https://www.linkedin.com/in/tracy-ragan-oms/https://twitter.com/TracyRaganhttps://github.com/orteliushttps://go.oncehub.com/15-30MinuteVirtualCoffeeWithTracy

Transcript
Discussion (0)
Starting point is 00:00:00 It's time for Pure Performance! Get your stopwatches ready, it's time for Pure Performance with Andy Grabner and Brian Wilson. Hello everybody and welcome to another episode of Pure Performance. As always, my name is Brian Wilson. And as always, Andy Grabner's name is Andy Grabner. Sometimes Andreas Grabner, but only for his mother. Happy New Year, Andy. Happy New Year.
Starting point is 00:00:46 I just wanted to say it's great that we're still we're still doing that show even in 2021 hoping that well knowing that 2020 is behind us and 2021 can only get better i i think so right with a lot of potential there's a lot of promise on the horizon so let's keep our fingers crossed that uh yeah for everybody and i think this is a good point came up a few episodes ago when we were talking about chaos where when when talking about chaos testing if if someone says, oh, that'll never happen again, just kind of a reference back to 2020. Exactly. Whatever won't happen, COVID, murder wasps, everything, you name it. I think there were some vampires somewhere. Who knows?
Starting point is 00:01:22 Anyway, we are in a new year new show it's yeah as you mentioned we've been doing this I think since I was going to say 2015
Starting point is 00:01:29 but it can't be I think 2016 2016 yeah and thanks to everyone who's been listening if anyone is on here who's listened to the first episode
Starting point is 00:01:38 and everyone on who has perfect attendance thank you so very much we love you all and we hope we keep entertaining you with our short banter. You know, really quickly, Andy, I got to mention this. I know we try to keep it short. We used to have it a lot longer in the beginning, right? My wife, we were driving home
Starting point is 00:01:54 from my mother-in-law's and she threw on this podcast and we're listening and talk about banter 15 minutes in, they hadn't gotten to the topic yet all they were talking about was things like their audio quality funnily enough um and just all this other stuff i'm like when does this show start we're 15 minutes in so to our guests who are about to introduce the reason i bring this up and it's relevant is way back in the early days we used to go quite a bit longer before we went in not 15 minutes now we're pretty good at getting right to the chase. So speaking of getting to the chase, Andy. Perfect. Yeah. I'm very honored to have for the inaugural episode in 2021,
Starting point is 00:02:31 for the New Year's episode, Tracy Reagan. And hopefully I pronounced the name correctly because I noticed multiple ways, probably how you can spell Reagan, Reagan. It is Reagan. And it is quite an honor to be here for the first 2021 podcast. It couldn't be a better way to start the year.
Starting point is 00:02:52 Awesome. Hey, Tracy, it's amazing. I have your LinkedIn profile open. And let me just read this out. Creator and CEO of DeployHub, helping DevOps teams simplify microservices at scale. CDF board and DevOps Institute ambassador. That sounds like you are really busy. I am really busy. I am really busy,
Starting point is 00:03:14 but I've been in a small company for the last 20 plus years. So busy is what you learned. Yeah, now very cool. And we got to know each other through the CDF, the Continuous Delivery Foundation. And I actually think, and I remember when we had a chat two weeks ago, a couple of weeks ago, and the invite popped up in my calendar. And then I thought, how did I end up getting a 15-minute coffee call with her? And then I actually remembered and I looked back into my emails and you were basically sending out, was it on Twitter or email or on Slack?
Starting point is 00:03:50 If you want to chat with me, just book a time for a coffee chat and then let's have a conversation, which I thought was really great because this started the whole communication or conversation we now have. And so thank you so much for being so open and available for the
Starting point is 00:04:06 community. Absolutely. It was my goal in 2020 when I realized we were all going to be seeing the world through the Zoom window to really reach out to folks and start talking about what we're facing, not just in terms of being in quarantine or pandemic, but what we're facing in terms of new technology, everybody is starting to talk about cloud native and K8s. And there is certainly a tsunami on the way. And there was no better time to do exactly what I did in 2020, to really start talking to people. And I have probably spoken to,
Starting point is 00:04:41 I'd probably say about 500 people over the last year, all in different levels of their journey in Kubernetes. And what better way to really define requirements for an open source project, right? Yeah, yeah. So that was my goal. That is really cool. So just let us help me understand, because maybe other people want to follow that same model. You pick a certain time range in a week or like a certain time slot or time slots, and you put it on a calendar and people can book the time or.
Starting point is 00:05:14 Yep. I block out my calendar from nine in the morning until one in the afternoon, my time, mountain time. And that way I have the afternoons to get work done. And I open up the calendar and I, you know, let everybody know. I, you know, reach out to people I see in LinkedIn. If somebody wants to, you know, wants to, you know, follow me on LinkedIn, I immediately send them an invite to say, let's not just follow each other. Let's talk.
Starting point is 00:05:39 I've never heard of someone doing that. That's really, really awesome. I think that's the first I've heard of it yeah that's really amazing cool hey now tracy i know we both and i'm sure brian included we have one of our favorite topics is kind of thinking and contemplating about what's the future of this discipline that has been around for a while, but it seems it's been stuck a little bit in the, in the old way we did things. And I'm talking about continuous delivery, talking about, I think you actually, I mean,
Starting point is 00:06:12 this is the first time I'd even, I mean, I had, I heard the term, you said it's time for a continuous delivery V2. Yes. Absolutely. Can you enlighten us what that means for you?
Starting point is 00:06:22 What, what do you, what do you want to achieve? Well, you know, you guys started, you started it when you mentioned chaos, right? Earlier in your bantering. And we are entering a phase of chaos engineering.
Starting point is 00:06:35 Like it or not, we are. When you think about the real benefit of Kubernetes is the auto-scaling and the fault tolerance. And how do you really get that? You get that by decomposing your monolithic applications into functions, microservices. The minute you do that, you create chaos engineering because you instead have one big monolithic that we sorted out all of the link issues at the earliest state in the development lifecycle, which is at the compile and link step.
Starting point is 00:07:07 And we're leaving that to runtime, whether it be in development or QA or production. That link step is being done at runtime. That, in essence, is chaos. And now while we have solved the problems of sort of encapsulating from the operating system. Once we've broken apart an application and we don't link it, we are exposing ourselves to problems with different versions of different microservices that make up different versions of the applications across many different clusters. It is a huge chaos problem and one I find fascinating. And that's what we have to think about. We have to really understand what we did when we decided to decompose an application version into independently deployable microservices. What did that mean to the continuous delivery pipeline? and how do we now need to morph from the CD perspective to be able to still have a North Star, still understand what an application version is,
Starting point is 00:08:08 and still be able to say, we want to put these new features in version 5.1 of our new application. How do we do that now? Let me ask you a question because I want to test my knowledge. I know Andy's probably got a million questions. I can see it. Now that we can see you, Andy, I can see your gears running. But I do a lot of theoretical.
Starting point is 00:08:27 I work in pre-sales. I'm not doing implementations. I read about stuff. We talk about stuff. We discuss things with customers. So the one thing that's always tripped me up a little bit has been the idea of service meshes, Istio, for example. And the way I described it recently to somebody,
Starting point is 00:08:44 thinking I understand it, was similar to what you're talking about. How in traditional applications, you would define your endpoint, you would have a monolith talking to a monolith, and you had a point and it was done. And if you're using something like Kubernetes, you have to define those endpoints in every single pod, every single container running in there. The idea of a service mesh is you just say, you know, connect to login service and Istio or your service mesh knows what those endpoints of login service and manages all that. Only because you're bringing this up. I just wanted to check. Am I getting my understanding of service mesh? Is that the general concept for that?
Starting point is 00:09:15 It is. Just to summarize what service mesh is, it's request routing. Yeah. Okay. It's pure. But it, I mean, don't even get me started on this conversation. That could be an entire different podcast. Exactly, no, no.
Starting point is 00:09:27 Maybe we'll take you up on that one, yeah. But in terms of the continuous delivery pipeline, in the future, my prediction is that, and when I first started saying this, when we were able to see people in person, they would look like, they would say, you know, I'm crazy, or they thought I'm green or something. We have the ability to get rid of dev, test, and prod because service mesh can do the routing. Now, if we understand, we have one big, massive cluster, right? And we put all of our microservices are immutable, so they're all running in there. What's the point of having a different cluster? Why can't we just get service mesh to route to the correct persona, the right version of the application? The only caveat to that is how do you manage multiple, you know, dev, test, and prod databases?
Starting point is 00:10:15 Which brings me to another really fun conversation, which is mono versus poly databases. So, yeah, service mesh is really going to, you know, most companies haven't started looking at Service Mesh, but they will. I was at a Spinnaker presentation and they, you know, they did this beautiful presentation. Then they said, now I'm going to tell you we did something and don't think we're crazy because it really solved a lot of problems. He said, we combined our dev and test into one cluster. And I was like, yes, I knew that would happen. So it's beginning. It is beginning. And finally, we will get rid of waterfall. We really have always, you know, we talk about waterfall like we've done it forever. But I mean, we talk about
Starting point is 00:11:00 Agile like we've gotten rid of waterfall, but we still do waterfall. We still, in Agile, we've gotten rid of Waterfall, but we still do Waterfall. In Agile, you do a small change to code. You check it out. You compile the whole beast, and you release the whole beast. In microservices, it's the last mile of Agile.
Starting point is 00:11:18 Now, we can really start thinking about how to get rid of DevTest and Prod. That's how it ties into CD. I think it might be interesting, actually, Andy, to have some follow-up conversations as well, because I think there's a lot of really cool topics we can get into.
Starting point is 00:11:33 I just wanted to check that service mesh thing, if that sounded like a little bit what you were talking about, which it sounds like somewhat. But yeah, let's go back to the idea of CD2. Yeah, the CD2. I still want to there's two comments that I have to your
Starting point is 00:11:47 statement. First, the first statement is Anita Ingle, the head of our DevOps,
Starting point is 00:11:53 or we now call it ACE team. She made a statement about two years ago
Starting point is 00:11:57 or three years ago where she said the maturity of an organization for her is indirect
Starting point is 00:12:02 proportional with the number of environments that you have so that means the more environments you have the less mature you are if you in the end you only have one environment that's prod then you obviously reach the highest level of maturity on the other side i gotta say and we talked with kelsey hightower and others right if you think
Starting point is 00:12:21 about kubernetes there's a lot of things changing in these platforms. And the question is, how do we deal with to kind of test the new platform versions, the new versions of things we're depending on? Because if everything runs in prod and you are updating your prod cluster to the latest version of Kubernetes without having the ability to test this somewhere,
Starting point is 00:12:43 then you may run into the problem that you're upgrading your Kubernetes cluster and all of a sudden everything falls apart. I mean, and maybe I get this wrong. Maybe there's a better option for doing this, but at least this is one of the few reasons I can also think of not trying to achieve prod only. Well, we'll see how the industry goes.
Starting point is 00:13:03 Yeah, yeah. We will. And, you know, there is a part of that statement that you just described, the maturity level based on environments, that has to do with your ability, your team's ability to do true configuration management. There should never be a guess about it. Now, certainly if you're making some kind of big update to a cluster, something at the low level, I would say maybe you want to test that in a different cluster, right? But for the majority of, you know, it's the old 80-20%, right? 80% of the code that we have running in a cluster really doesn't change that often. It's 20%. So how do we make the 20% as efficient as possible? And how do we support business agility by allowing code to get out to end users as quickly as possible? How do we deliver innovation
Starting point is 00:13:58 all the time? And that is the essence of a microservice, is the ability to do that. So while that 20% is critical and 20% may have to have its own cluster to be tested, I do predict that there will be a time in the future that for the majority of the changes, they will bounce right to production. Yeah. No, your word in God's ears or whatever,
Starting point is 00:14:23 that's at least a saying in German. I'm not sure if that translates well into English. Yeah, no, there's, there's, I just actually came across that. I forget what it was on some show.
Starting point is 00:14:34 Oh no, no, no, no. It wasn't, sorry. It was a, it was some political lacerate on Twitter.
Starting point is 00:14:42 Let's not go there. Yeah, exactly. I'm not even going to mention the names. But yeah. All right. So let's go back to CD version two. Now, Tracy, we have talked about in our coffee call that we had, and we talked about event-driven continuous delivery. I explained to you what we are doing with Kepton, kind of the same story about what you were saying. We were breaking up monolithic applications into services
Starting point is 00:15:10 and then connecting them through events, but we haven't done that, let's say, evolutionary step in continuous delivery yet. Is it about time? And actually, does it solve the problem? And so I would like to get your thoughts on what CD version 2 really looks like, what event-driven plays with, and what's the event-driven concept? I mean, what is CD version
Starting point is 00:15:33 2 for you? So if we think about microservices, let's just keep it in the context of a microservice, because that is where it really requires the biggest shift in thinking. Not all microservices are equal. We will have microservices that impact lots of applications. We will have microservices that are front-end that impacts only one application. We'll have microservices that are security-related, login routines, database access routines. And not all of them are equal. So why should we continue with a very authoritative workflow process that forces every microservice to go through the same workflow? Now, I think one of the, when I first looked at Keptit and I just, you know,
Starting point is 00:16:23 this has been some time ago and I've read through it, there is a concept of strategy that came up. And I kept thinking about that. And really what we need is not a CD workflow, but we need CD strategies based on the microservice. And the best way to do that is through something that's more templated, something that's more event-driven. And we really should be able to create a workflow on the fly or a strategy on the fly. So if we stop thinking about workflows, because workflows really puts us into this very strict kind of, you know, you do something at dev, you do something at test, you do something at prod. But if you think about a proper strategy for a particular microservice that has a particular risk level, then you start thinking in terms of, well, what do I need to really do to get this out?
Starting point is 00:17:14 What is the proper strategy of this particular microservice, we should be able to on the fly create a strategy that pushes it through the pipeline that's appropriate for that microservice. Now, whether we do it based on events or some kind of a templating engine, I'm a big fan of Stephen Tirana and his Jenkins templating engine. I think that that will save a lot of work for a lot of people who are using Jenkins. I'm a big fan of Kepton and how you have a kind of a control plane listener that says these are the events that could be executed. And even Tekton. This is the shift in where we're going.
Starting point is 00:18:03 Jenkins X just announced their beta 3.0, completely based on Tekton and the events catalog. So now that we have events, we have this idea, how do we best put them to work? And I feel like shifting from this concept of workflows and start thinking about the proper strategy for the item that we're managing is where we should be.
Starting point is 00:18:28 That is, in my mind, that's the essence of CD version two. And so that means fascinating. And thanks for that. I took a lot of notes. But so it starts then with the assessment of the risk of a microservice. You need to put them into different buckets and say hey this is a i don't know very low risk microservice you can go to production easily you can do canary deployment and then we have a certain model of how we turn on the canary
Starting point is 00:18:58 load but then there might be hey this is the login service and if this one fails then we obviously we have a big problem so we need to go through a different process or yeah a different strategy this is how do we assess the risk then how do we automate that or there is you know we have something now that we can really start leveraging and it's called machine learning we have all of this information that we should be pulling back from the production environments to start defining risk level. And it has to do with configuration too. If we go back to that discussion around maturity and configuration, that's what we are focused on at DeployHub. And that's why we're excited. This is sort of an auspicious day for us. This is our first full day for having Ortelius as part of the CD Foundation. And Ortelius is a microservice management tool that tracks microservice, it catalogs them, it tracks their deployment metadata, it can track how,
Starting point is 00:19:59 if it failed when it went out. So, and based based on those, those, those criteria, we're going to start understanding the criteria to start understanding the risk level of a microservice. And that is the essence of chaos engineering, right? Because we're going to let the data tell us that, not a human. We need the data to return that information and we need to act upon it. And, and that acting upon it should start with assessing a risk level or the blast radius. What's the blast radius of this microservice? Maybe it should go to a test environment before it goes right to prod because I can promise you in some of those, if we think about a strategy for a front end where it's just a
Starting point is 00:20:38 drop-down list that's being changed, that strategy might be let the developers test it and push it out to production right away. But if it's a security routine, we probably would want a strategy that might take it through several different steps of testing before it goes out the door. So we have to start allowing the data drive the CD pipeline. We have to have smart CD pipelines. It shouldn't be something that a human decides that this we're going to, in a very imperative way, say it has to be pushed through this kind of a workflow because we're not monolithic anymore. I think the idea of the data driving these
Starting point is 00:21:17 decisions is really, really important because as you were describing this, my brain initially is first time I'm hearing some of these concepts that you're bringing up here my brain was already like throwing up barriers like this you know and trying to think through them because i'm like obviously i just can't say you know that like let me think about what's making me react this way and it really came back to you know i haven't done i haven't worked i've been a sales engineer since 2011 so the last time I did performance testing was 2009, 2010-ish. Waterfall, no automated deployments, very immature models, right? And I think that's the key here.
Starting point is 00:21:52 Very, very immature models. And I remember one time having an argument with the product management team because I wanted to do a performance test on a release. And the developer was like, this is just such a minor thing, doesn't need testing, we're pushing it,
Starting point is 00:22:03 it's got to go out. You're not testing it. I'm like, you know, I was always like, this is just such a minor thing. It doesn't need testing. We're pushing it. It's got to go out. You're not testing it. I'm like, you know, I was always like, we should test everything. As a good performance tester would be, you know, try to make a fight for it. Of course, predictably, went to prod and crashed everything. Right? This minor little thing. It was some stupid mistake they made, right?
Starting point is 00:22:19 And that's what got me thinking, like, oh, how do say this is a non-important or a low-risk microservice? And I think your answer specifically to the data point being, let the data drive that, not the humans. But also, I think this all relies on the common theme we've been discussing so far about there having to be a maturity model in place before you're doing these things. The reason that one and the reason why I was resisting the idea initially is because we were doing an old-fashioned deployment there was very low maturity there was you know countless times you would deploy from dev to prod to qa i mean dev to qa to prod to qa with you know logging turn on full or debug you know stupid things like that because that wasn't being treated as code it wasn't be all these manual switches so it's the the long point i'm making is if you look at this and if you drop your guard to look at this and
Starting point is 00:23:08 not resist like i started to um to think like okay if you have a proper maturity model in place and you have your guard rails and you have as many you know things automated like your deployments probably assuming that's what a lot of what the play hub helps you with right is to automate all these pieces and make sure everything all the configs are properly set you remove the let's say the stupid risk the stupid human risk from it and you're you're left to just using the data you collect to capture the real risk um from the technical point of view which can help you do this so yeah no i i i like the idea sure in short i like the idea i had some reservations as we were going but i was like thought through a minute something to share because i figure a lot of people probably hearing this might be like oh
Starting point is 00:23:51 come on come on but again you have to be at a certain level this is not like i know how to drive so i'm going to get in a 747 and try to fly it and remember in your example um and by the way there is such a thing as a prod to test to dev, that's called an emergency release. Exactly. And there's a lot of those done on a very regular basis. But in your example, you're thinking in terms of monolithic too. Exactly. And monolithic could potentially have a bigger impact because it's monolithic.
Starting point is 00:24:26 When you're moving smaller functions out, your risk level actually comes down for that particular deployment. And that is the whole idea of Agile. Exactly. So that's why I keep saying we have really achieved Agile's last mile when we think about microservices. And microservices will be deployed all day long. This is not a, we're going to get into a room and have a meeting about a deployment of a single function and discuss it and then to schedule it and have people stamp it.
Starting point is 00:24:53 They have no idea what it does anyway, which used to make me crazy in those kind of deployment kind of approval meetings. It's like, you don't know anything about this anyways. Why are you approving it? Trust your developers. And if your developers break it, they need to fix it. And that's the importance of configuration management, having a difference report, understanding what you just did and being
Starting point is 00:25:14 able to back it out really quickly or shift from blue to green. We have the skills. We have the tools to be able to make this shift. And it's required. We don't have a way to go back. Microservices has pushed us to a place that we have to rethink and reimagine everything about our CD pipeline and start making it smart and start making it fast so businesses can really achieve the agility that they've always driven themselves to achieve. They want to be the first one on the market with their new feature. Banking, insurance, all of these heavily, even the securities area. They want those features out today.
Starting point is 00:25:58 They don't want to wait. They want to get that stuff out now. I want the vaccine yesterday. That's who we are now as consumers. We want it now. Tracy, I got a quick question for you then on this. I understand the happy world scenario where we all have microservices, we all can deploy them independently. And that's where we want to get to, obviously, with different maybe processes depending on the risk. But in what i've seen
Starting point is 00:26:25 also with organizations that are now moving to microservices and they they want to push something new out they always say well if we want to if we want to have this feature we need to push these five microservices out in this version because in the end they all encapsulate a value stream or like a value increment for me the challenge here is now how do we do you have any any thoughts on how you actually organize this and how you are controlling the rollout of services that should be independent but really they are not because they are depending on individual versions how do you do this this is through feature flags that you just deploy them and then you turn them on at some point or how does this work andy that was you're very kind to ask that question,
Starting point is 00:27:05 to be quite honest. That is the essence of what we're doing with Artilius. Think about Artilius as not a deployment solution. Artilius is a configuration management solution. So in the Artilius world, let's just break it down to really basic. A microservice is a component. Applications are a collection of components. And why I use the word component, because it may be something other than
Starting point is 00:27:31 a microservice. It could be a Lambda function. I don't really consider that a microservice. So it's a collection of components that can be independently deployed. What we have to be able to do is every time a microservice is updated, you know, it's registered to Quay, it's in Docker, new versions in Docker Hub. We have to be able to grab the details about that and version it. Once that's done, we know that anything that consumes it also has a new version. Now, you brought up another interesting idea, which we're talking about for 2021, and that's what I like to call component sets. So while microservices are supposed to be loosely coupled, they're not always loosely coupled. In fact, we know now that people are writing microservices, they're not an application.
Starting point is 00:28:20 They're not the teller application for the bank. They're just a set of microservices that have to be deployed together. That's what we're calling a component set. So what we do is we take that information and we pass that on to tools like Spinnaker or Argo or Helm to actually go off and do the deployment. And we pull back that information and we check that deployment file back into our logs so that it's hermetic and can be redeployed at any point in time. But what you end up with is a central database that shows the differences between two releases at a component level or at an application level
Starting point is 00:28:58 or at a cluster level. It can show the blast radius of a microservice. Even before you deploy it, you can say, I'm a microservice developer. I'm going to update this. How many people are actually consuming it? Oh, wow, 15 applications are using this. Maybe I should be a little more careful and notify everybody that is coming across. Or maybe our CD pipeline should be smart enough to say, this microservice has been updated. Go look at the configuration data and then re-execute the workflows for all of the testing for all of those applications before it goes out
Starting point is 00:29:31 the door. So everybody's had an opportunity to look at it. So it goes back to that statement that you read about the maturity level. It has to do with being able to understand how applications are put together, what their differences are as they get pushed across, and the versions that consume them, and what their blast radius is. So it's back to being able to see the puzzle, the top of the box of the puzzle. What are you building? What does this puzzle really look like? Even though it's logical, we are still building applications. And we still have to be able to see it that way. And that's what the Ortelius open source community, that's the problem set that we're solving.
Starting point is 00:30:16 So by the way, when I asked the question, I had no idea that you were actually releasing Utilius today. So this is not that I do your favor here. No, not the setup, really. releasing utilias today so this is not that i do your favor here but but but basically to the listeners now we we tell them that we've recorded this not in the new year but maybe in the old year if you look up the release date so damn it oh they already probably forgot what we were yeah so you know go on. No, this is interesting. So we have the, you know, as you know,
Starting point is 00:30:47 with what we do with Captain, we obviously have a very tight integration with monitoring tools, whether it's Prometheus or obviously also Dynatrace. That's where most of us work. And we have a lot of this data, right? Dependency data. We know we have version information
Starting point is 00:31:02 and that was also our thinking, like what can we do with this data? Or which other tools can leverage the data know we have version information and that was also our thinking like what what can we do this data or which other tools can leverage the data that we have by doing distributed tracing across your microservices and knowing exactly how many users are currently using a particular service that is like three levels down and what's the blast radius if this falls also how often has this component failed in the last month when it was talking to another service in a certain version range right we have all this data so i think we should also besides this podcast try to figure out how we can get our data to your tooling and it's that kind of combined data that's going to start giving us those risk assessments exactly yeah and that's what we start giving us those risk assessments. Exactly.
Starting point is 00:31:45 Yeah. And that's what we then can push back into that CD version 2, right? To define the strategy for any particular microservice based on the data that says this is the risk of it. Yeah. That would be cool. Yeah, it would be cool. And there's another component that I just learned about last week because I was invited to a hackathon that we had internally. And one of the guys, he was creating a tool. He analyzed our – so in Dynatrace, we detect problems and also root causes when we detect a problem.
Starting point is 00:32:18 And he was basically looking at the problem history of the last month. And he figured out, are there any particular points during the day where more problems occurred than other times during the day? And to what are they related to? Is it infrastructure problems? Because let's say every day at two o'clock in the afternoon, some team is doing infrastructure updates.
Starting point is 00:32:37 I don't know, right? And then he was looking at it on a daily basis, on an hourly basis, on a weekly basis. And it's very interesting to also then put this into consideration because if you know that there's an 80% chance, if you want to deploy now, that it fails based on historical data, not because of that service, but maybe because something else you don't have under control, then you can say, you know what, let's move this deployment window a little further out. Exactly. So if this is the point in time that it's
Starting point is 00:33:03 auto-scaling, maybe you don't want to deploy it at that point in time. Exactly. Yeah. So there's a lot of cool data that we can then use to influence our automated deployment decisions. Yes. I think this is the big area of, I don't want to say needs improvement.
Starting point is 00:33:21 I think the ideas are there, but the area that needs implementation. And we talk about this with a lot of our guests who have awesome tools, right, about data sharing and getting data from one tool to another because there's a lot of tools out there now that can leverage each other's data. And I think there really needs to be
Starting point is 00:33:40 a focus on integrating these tools. All of the APIs, all of the ingests, they can all process and use them. And there's a ton of potential floating out there to get to those even deeper maturity models. And I think that's the biggest challenge facing everyone is actually getting the time and the ability to get those set up. Because just think about when we can get all these things hooked up. It's really, really cool conceptually i just yeah i mean it's just a struggle of time right exactly i mean i think in the end every tool vendor wants to get as much data as possible
Starting point is 00:34:15 because the more data you have obviously the more magic you can do with it but still i think every every tool vendor has their speciality field where their AI, their ML, whatever it is, their algorithms, just based on their historical, based on their history, they can just do certain things like, Tracy, you can do probably great things in your tools with the information about deployments and metadata on these deployments. We can do a lot of great things on the damage rate side with distributed traces and root cause analysis of problems. But you're right. I mean, in the end, we need to figure out a better way to, to integrate these data streams to give the right data to the right tools so
Starting point is 00:34:53 that these tools can then make the right decision at the right moment in time. Yeah. There needs to be a, there needs to be a data stream framework. Right. There's your next open source project. You know, the you know the other comment i wanted to make on this is that um i love all these ideas but when i when i interact with customers in the real world at least the areas that i'm focusing on we know that there are no quote-unquote unicorns right um what used to be the unicorn is just someone getting there first and people following. But I do find that there are a lot of people who, there are a lot of companies
Starting point is 00:35:32 who then let's say the unicorn turns into a horse and a lot of people start getting horses. I think there are a lot of companies and organizations out there who buy a large dog and put a saddle on it and call it a horse, right? And I think that's the biggest challenge because what I run into quite often is a company where their heart's in the right spot, but they half-ass it. Because maybe that's all the resources they have. Maybe they have a hard time getting enough talent in there. They can actually execute whatever reasons. They get some of it done
Starting point is 00:36:05 and then everything when you try to get it to that next level it all starts to fall apart because they have a really shaky foundation or not a good foundation at all and to me that's like how do this is i guess going more on a philosophical level how do we overcome that because there's a lot of great ideas there's a lot of really awesome things people can do when they have that maturity model but i think a lot of people start. There's a lot of really awesome things people can do when they have that maturity model. But I think a lot of people start their journey with a really shaky foundation. And then from there, everything gets exponentially harder to build up. So how do you go back and... Yeah, it's cultural.
Starting point is 00:36:38 These are cultural problems. You know, I'm part of the DevOps Institute. And Jane Grohl always talks about the people of DevOps. And there is a cultural shift that we're facing. And one of them, one of the bigger, I think, cultural shifts is upper management allowing teams to fail. Failure has always had negative connotations to it. But really, if you fail, you've learned how not to do something. And failing and failing fast is the best way to, to, to move from a horse, from a dog to a horse. Because I think a lot of times we only want to,
Starting point is 00:37:19 we're, there is some, we're timid and we don't want to completely buy into a process. And so we only just try certain aspects of it. And that's what keeps that saddle on a dog. So when having an upper management who says, yes, we're going to have your back when you fail. I'm a director and I'm going to make sure that you're protected because you tried something new. You tried something innovative. And the next time we're going to get it right and it's going to make our lives easier, maybe in two months from now, not today,
Starting point is 00:37:49 that is the cultural shift that has to happen. Upper management has to have the backs of people who are trying new technologies. Because you're right, you can't have, you know, you don't want to put a saddle on a dog. I just made that up. I don't know if that's a real thing. I don't know either, but it works. It works really well.
Starting point is 00:38:08 But how do you get that upper management? I'm sure you've run into this all the time as well with places you're going to. How do you get that upper management to buy in? And I don't mean dollars with products. I just mean to say, yes, we are going to finally commit to this. Because that, I think, is always the hardest part. a lot of times we talk to the people on the ground doing this stuff and they get it right, but it's just the limitations. So how do you break through that barrier? How do you get people to take a vaccination? True. It's, it's, I think there's a,
Starting point is 00:38:38 it's, there's, it's not ignorance. It's being ill-informed. Yeah. And I think to take the risk or stay scared to take the risk because you're ill-informed if you have the information. Which is why we need to start leveraging all of the data that we have because there's nothing that upper management loves more than reports. Right? If we can show them from reports that we can achieve greater things with newer technology if we inform them. You know, somebody saying no is just a request for more information. Interesting. So how do we constantly provide upper management the information that they need so they can
Starting point is 00:39:16 make the right decision? Now, they may be super risk averse, and they're never going to want to move to a microservice environment. But guess what? The developers will do it anyway. Yeah. They might need another cluster. Yes.
Starting point is 00:39:31 They'll have their own cluster and then we have running all this really cool stuff. And then one of the directors will say, we need to get this to production and then it's born. So we have to inform them. We have to understand that they're busy with their day-to-day work and they're not down in the weeds. So how do to inform them. We have to understand that they're busy with their day-to-day work, and they're not down in the weeds. So how do we inform them?
Starting point is 00:39:50 Great. Yeah. It's a challenge. Hey, so kind of trying to wrap this thing up here, and I actually wonder, right, initially I thought the title of this episode is going to be CDV2. Now, looking back at my notes, yes, we talked about, obviously, it continues to live, but we talked about a much broader topic with a problem we really want to solve. It's basically smart.
Starting point is 00:40:15 I mean, I think you actually called it earlier somehow, like smarter delivery strategies for modern microservice architectures or something like that. Tracy, I want to actually give it back to you. What would you call this episode? What was the main topic? Well, we have covered many topics, but I do think that we are talking about how to make this continuous delivery smarter.
Starting point is 00:40:42 How do we leverage data? How do we bring all of this information together to stop the human factor of deciding a workflow and instead using the data to create a strategy? You could also, I mean, I just, you know, because we had some political things earlier, you could say, how to make continuous delivery great again. No, don't do that hello but it's funny yeah it is because you said how to make continuous delivery smarter and i said okay now i don't think i don't think we got into it as deep as we wanted to but in summary though
Starting point is 00:41:21 that's the that's the the crux of of CD2.0 that you're saying, right, is the idea of it being the strategy per microservice as opposed to a workflow. As opposed, yes, a predefined imperative kind of workflow. Everything goes through this flow. We can't do that anymore. Yeah. And in the end, it is smart automated decisions based on data. So it's data-driven decisions. And what we've been trying to do with Captain is where we try to put SLIs and SLOs at the center of everything we do. So every time we execute an action, we validate it against the data. of also using data upfront to really put a marker, a tag on a service and say, you are risk level two, you go here.
Starting point is 00:42:09 You're risk level five, you go here. Exactly. Before it ever goes out the door, gathering that information so that we could do some smart processing on it. We need to be able to apply that ML to that data. And between the monitoring data and the configuration data, we have a majority of it. We really do.
Starting point is 00:42:28 We have quite a bit of it. And to borrow on Andy's political thing to maybe go back before Andy's awareness of U.S. politics, we'll go back maybe 15 to 20 years ago and say we'll do some data-driven strategery. Data-driven strategery. For anybody who remembers the strategery one. I do, but I honestly don't recognize US politics today. Cool. Hey, Tracy, you mentioned a couple of projects today, Utilius and others. Artilius, what was the name of it? Abraham Artilius was the first mapmaker.
Starting point is 00:43:13 And I often remind people that not only was he the first mapmaker, he created the first Atlas, World Atlas. And how did he do that? He went around to all these cartographers and said, please give me your material. And he assembled one big map. He was the first open source community. Wow. Literally. I would have thought his last name would have been map. It's Abraham Ortelius. And so we figured it was a really befitting name because we're basically mapping a Death Star. If you think about a cluster and all the points of light, we are creating,
Starting point is 00:43:40 you know, we're mapping that and we're mapping that before it ever goes out to that cluster. We're saying, if you do this, this is what your cluster is going to look like today. Sounds like our smartscape. We also map everything in our smartscape, but that's for another discussion. Cool. We will definitely make sure, Tracy,
Starting point is 00:43:56 to get the links out there to the folks. Is there any, knowing that this airs in 2021, early 2021, are there any big events that are coming up that people should be aware of in, let's say, the first quarter of 2021? Well, in April, I am leading a track for the DevOps Online Summit, which is really cool. He does it through Slack. So certainly if anybody out there is listening and they would like to submit a talk on any of these topics, I would love to have their feedback. Tracy at deployhub.com is where you can reach me. So if you want to speak on a Slack-driven DevOps show, it's quite fun. I did it last year. There's a lot of discussion because what he does is he just runs the episode in Slack,
Starting point is 00:44:53 and then everybody's talking about it afterwards. I think it's a really great platform for doing that. Please reach out. And also make sure that if you have a chance, sign up for one of your coffee chats. Yes, yes. That's really good. Just send me an email and I'll send you my calendar link and we can, you know, chat away. Because I learned a lot.
Starting point is 00:45:14 I have learned so much from everybody. And I really have to thank everybody who's taken me up on those coffee chats in 2020. Because I have been able to really pull together a pretty clean roadmap for the Artilius project. Awesome. Well, hopefully we can get you back on. I know there were a couple of topics we touched upon early on. Maybe we can get you back to just dive into those more. And if there's anything more, if we want to go in deeper on CD 2.0 or anything there, I think it'd be great to have you back on. This was great. Service mesh.
Starting point is 00:45:47 Yeah, service mesh, all that. Yeah, I think there's, I can see you becoming a very recurring guest, but we'll try to reach that. I know you have other work to do as well. Still a little bit, right? Just a little. Well, thank you. Thank you so much, both of you.
Starting point is 00:46:03 This has been a pleasure. Awesome. Andy, any last final words or should i wrap it up just uh let's make sure that 2021 is going to be an awesome year and we have it all in our hands and wear a mask yeah amen to that all right thanks everyone for listening if you have any questions or comments for and or I, you can reach us at pure underscore DT on Twitter, or you can send us an old fashioned email at pure performance at dynatrace.com. And we will have all of Tracy's links in the show notes. So please make sure to check those out.
Starting point is 00:46:36 Thanks everyone for listening and happy new year. Bye-bye. Bye-bye.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.